| 00:02:55 | | genofire joins |
| 00:10:58 | | Zopolis4 (Zopolis4) joins |
| 00:47:53 | | tzt quits [Ping timeout: 258 seconds] |
| 00:52:40 | | Mateon2 joins |
| 00:54:24 | | Mateon1 quits [Ping timeout: 258 seconds] |
| 00:54:25 | | Mateon2 is now known as Mateon1 |
| 00:54:38 | | Mineroboter joins |
| 00:56:26 | | Mineroboter_ quits [Ping timeout: 250 seconds] |
| 01:02:51 | | dm4v quits [Client Quit] |
| 01:04:00 | | dm4v joins |
| 01:04:02 | | dm4v is now authenticated as dm4v |
| 01:04:02 | | dm4v quits [Changing host] |
| 01:04:02 | | dm4v (dm4v) joins |
| 01:07:27 | | genofire is now authenticated as genofire |
| 01:17:02 | | Zopolis4 quits [Remote host closed the connection] |
| 01:24:48 | | Arcorann (Arcorann) joins |
| 03:17:14 | <etnguyen03> | just making sure, bintray has been paused for some time right? just curious is there something being worked on? considering scaling down my workers |
| 03:18:56 | | DogsRNice quits [Client Quit] |
| 03:19:23 | <jodizzle> | etnguyen03: Join #binnedtray |
| 03:26:01 | | pcr leaves |
| 03:47:59 | | qw3rty_ joins |
| 03:50:16 | | tzt joins |
| 03:51:53 | | qw3rty__ quits [Ping timeout: 258 seconds] |
| 04:01:42 | | etnguyen03 quits [Client Quit] |
| 04:08:31 | | blankie joins |
| 04:08:31 | | blankie is now authenticated as blankie |
| 04:08:31 | | blankie quits [Changing host] |
| 04:08:31 | | blankie (blankie) joins |
| 05:21:10 | | thuban quits [Read error: Connection reset by peer] |
| 05:21:23 | | thuban joins |
| 05:28:34 | | blankie quits [Read error: Connection reset by peer] |
| 06:17:33 | | mary quits [Ping timeout: 258 seconds] |
| 06:44:57 | | blankie joins |
| 06:44:57 | | blankie is now authenticated as blankie |
| 06:44:57 | | blankie quits [Changing host] |
| 06:44:57 | | blankie (blankie) joins |
| 07:04:03 | | LeighR (LeighR) joins |
| 07:33:49 | <thuban> | outlinks from my ah.com crawl: archivebot or the urls project? |
| 07:34:30 | <thuban> | (about 300k from the first, smaller forum; second forum not finished yet) |
| 07:59:50 | | Daloader joins |
| 08:19:10 | | NF885 (NF885) joins |
| 08:24:23 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 08:56:26 | <Daloader> | Just wondering if anyone is using / has experimented with a AWS / GCP / Azure "Spot" Style fleet of micro instances to get lots of IPs and burst download projects |
| 09:06:30 | | Zopolis4 (Zopolis4) joins |
| 09:08:02 | <Zopolis4> | are the WARRIOR SUPPORTED messages accurate? they say that yahoo answers is not warrior compatible, but it clearly is |
| 09:11:22 | <Jake> | Those are kind of old, back when the warrior was not supported by a lot of the projects. I think every project should work on the new warrior now? |
| 09:12:46 | | NF885 quits [Ping timeout: 244 seconds] |
| 09:15:17 | <Zopolis4> | thought so, also some of those projects seem to have been completed, like google sites, reddit and periscopes? are they still running? the trackers seem empty |
| 09:16:44 | <masterX244> | reddit is a continuous project |
| 09:17:02 | <masterX244> | it regulary get new tasks since its effectively a "tail -f" on new content |
| 09:17:46 | <Zopolis4> | got it, and the others? |
| 09:21:26 | | mls quits [Remote host closed the connection] |
| 09:47:58 | | hooway joins |
| 09:48:46 | | blankie quits [Ping timeout: 258 seconds] |
| 09:49:05 | | blankie (blankie) joins |
| 10:00:39 | | Matthww quits [Client Quit] |
| 10:10:15 | | Matthww joins |
| 10:11:06 | | blankie quits [Client Quit] |
| 10:11:27 | | blankie joins |
| 10:11:27 | | blankie is now authenticated as blankie |
| 10:11:27 | | blankie quits [Changing host] |
| 10:11:27 | | blankie (blankie) joins |
| 10:14:22 | <masterX244> | JAA: TM-exchange trackpages are all uploaded to archive. Gotta write me another quick tool to extract the userpage-urls from that crawl |
| 10:32:10 | | Vukky quits [Read error: Connection reset by peer] |
| 10:32:20 | | shoghicp quits [Ping timeout: 250 seconds] |
| 10:33:41 | <Sanqui> | https://archiveteam.org/ is missing a favicon. I've designed two potential ones. |
| 10:34:21 | <Sanqui> | https://etc.sanqui.net/at_favicon.png - a simple favicon inspired by the 7 inch floppy logo and using EGA colors. |
| 10:34:46 | <Sanqui> | https://etc.sanqui.net/atyahoo_favicon.png - modeled after the old Yahoo! favicon |
| 10:35:14 | <Sanqui> | I personally like the second one, even though it doesn't match any logo we currently use; any wiki admin down to implement it? |
| 10:35:47 | <@OrIdow6> | That would presumably be J R W R |
| 10:36:29 | <@OrIdow6> | First one looks almost solid black to me, hard to make out details |
| 10:36:38 | <@OrIdow6> | Except for the corner |
| 10:36:53 | <Sanqui> | yeah, it's difficult to depict a solid black floppy |
| 10:37:09 | <Sanqui> | so i don't think it would make for a good favicon even if I refined it more |
| 10:39:21 | <@OrIdow6> | Yahoo thing is clever - with the AB dashboard one (which I am going to guess at like 40% that you made as well) it establishes a sort of theme |
| 10:41:18 | <Sanqui> | nope, I didn't make that one! but I do like it and it inspired me to mimic the Yahoo one, yeah |
| 10:45:24 | <Sanqui> | https://etc.sanqui.net/at2_favicon.png |
| 10:45:37 | <Sanqui> | here's one that's a bit more closely modeled after the current logo |
| 10:45:49 | <Sanqui> | all are also available as .icos at the same address for ease of use |
| 10:47:28 | | NF885 (NF885) joins |
| 10:48:26 | <@OrIdow6> | I like the Yahoo one more, but it seems to me that from a "marketing" angle the floppy one (and the new one is a lot better) makes more sense |
| 10:49:18 | <@OrIdow6> | But anyhow, it's not like I'm in charge of this |
| 10:49:59 | <Sanqui> | yeah, it's alright, it's not like this is a priority in any way, but I kind of, just made a good old bookmark bar and the AT wiki is sticking out for not having a favicon haha |
| 10:50:09 | <Sanqui> | so I thought I'd put my lackluster pixel art skills to ues |
| 10:50:11 | <Sanqui> | use* |
| 11:18:44 | | x9fff00 quits [Quit: leaving] |
| 11:31:42 | | blankie quits [Ping timeout: 250 seconds] |
| 11:31:49 | | blankie joins |
| 11:31:50 | | blankie is now authenticated as blankie |
| 11:31:50 | | blankie quits [Changing host] |
| 11:31:50 | | blankie (blankie) joins |
| 11:41:24 | | azureuser (x9fff00) joins |
| 11:43:13 | | azureuser is now known as x9fff00 |
| 12:02:30 | | pcr joins |
| 12:13:17 | | blankie quits [Ping timeout: 258 seconds] |
| 12:14:02 | | blankie joins |
| 12:14:02 | | blankie is now authenticated as blankie |
| 12:14:02 | | blankie quits [Changing host] |
| 12:14:02 | | blankie (blankie) joins |
| 12:29:06 | | NF885 quits [Ping timeout: 244 seconds] |
| 12:36:59 | | hooway_ joins |
| 12:36:59 | | hooway quits [Read error: Connection reset by peer] |
| 12:53:09 | | blankie quits [Ping timeout: 258 seconds] |
| 12:54:12 | | blankie joins |
| 12:54:12 | | blankie is now authenticated as blankie |
| 12:54:12 | | blankie quits [Changing host] |
| 12:54:12 | | blankie (blankie) joins |
| 13:02:07 | | blankie quits [Remote host closed the connection] |
| 13:02:50 | | blankie joins |
| 13:02:50 | | blankie is now authenticated as blankie |
| 13:02:50 | | blankie quits [Changing host] |
| 13:02:50 | | blankie (blankie) joins |
| 13:20:41 | | etnguyen03 (etnguyen03) joins |
| 13:49:29 | | shoghicp (shoghicp) joins |
| 14:22:46 | | LeighR quits [Ping timeout: 244 seconds] |
| 14:41:10 | | onetruth joins |
| 14:43:10 | | x9fff00 quits [Client Quit] |
| 14:50:35 | | x9fff00 (x9fff00) joins |
| 15:09:04 | | NF885 (NF885) joins |
| 15:09:54 | | spirit joins |
| 15:14:48 | <masterX244> | Switched to torrent at the Stackexchange dump, that was much faster for some odd reason. Extracting now and after that outlinks should be extracted soon |
| 15:37:07 | | Doran is now known as Doranwen |
| 15:47:58 | | pcr leaves |
| 15:50:38 | | blankie quits [Ping timeout: 258 seconds] |
| 15:54:44 | | Arcorann quits [Ping timeout: 250 seconds] |
| 15:59:24 | | pcr joins |
| 16:15:41 | | webdownload joins |
| 16:22:58 | | webdownload quits [Remote host closed the connection] |
| 16:27:17 | | NF885 quits [Ping timeout: 244 seconds] |
| 16:32:24 | | hooway joins |
| 16:32:24 | | hooway_ quits [Read error: Connection reset by peer] |
| 17:35:18 | | LeighR (LeighR) joins |
| 18:09:48 | | IKI joins |
| 18:21:16 | | IKI quits [Remote host closed the connection] |
| 18:29:00 | | rewby quits [Ping timeout: 250 seconds] |
| 18:29:12 | | rewby (rewby) joins |
| 18:49:16 | | PlsNoJava quits [Ping timeout: 258 seconds] |
| 18:54:38 | | PlsNoJava (ROpdebee) joins |
| 19:35:48 | | DogsRNice (Webuser299) joins |
| 19:59:41 | | spirit quits [Client Quit] |
| 20:00:23 | | NF885 (NF885) joins |
| 20:49:49 | <thuban> | does grab-site have a secret concurrency limit like seesaw-kit? |
| 20:53:42 | <masterX244> | how do you mean? hardcoded value that can't be exceeded by config? |
| 20:54:02 | <masterX244> | otherwise: the server and its response duration affects effective rate, too |
| 20:58:01 | <thuban> | i mean 'limit beyond which it starts to get flaky' |
| 20:58:57 | <thuban> | i'm currently at 20 and trying to decide whether i should go higher (seeing as i'm technically past the deadline already and there are a lot of pages left to do) |
| 21:00:00 | <@HCross> | DO NOT CHANGE THE CODE |
| 21:00:09 | <@HCross> | DO NOT FIDDLE WITH THE WARRIOR/PROJECT CODE |
| 21:00:26 | <thuban> | dude chill this is grab-site |
| 21:00:27 | <@HCross> | If you're using grab-site then that is fine |
| 21:00:45 | <@HCross> | but I would stick to 20 otherwise it does do strange things afaik |
| 21:00:57 | <thuban> | ah, oof |
| 21:01:20 | <jodizzle> | Doesn't wpull have a built-in per-domain concurrency limit? |
| 21:04:13 | | NF885 quits [Ping timeout: 244 seconds] |
| 21:04:59 | <jodizzle> | Not an expert, but according to some logs I have, wpull maxes out at 6 connections per (host, port, use_ssl) tuple. Since grab-site uses wpull, you might be limited by that. |
| 21:08:10 | <LeighR> | If there is a site that is increasingly fragile and I'm afraid will die due to neglect, is it better for me to use grab-site on my own to make sure it gets archived properly, and then upload to archive.org, or to ask one of you to have ArchiveBot do it? |
| 21:10:15 | <jodizzle> | The nice thing about AB other than convenience is that the data ends up in the WBM. Uploading to archive.org on your own doesn't do that. |
| 21:10:22 | <LeighR> | ok |
| 21:11:16 | <jodizzle> | But you can also do both if you're particularly concerned (just be mindful of IA resources) |
| 21:11:35 | <LeighR> | I seriously doubt it would be over 1GB total |
| 21:11:42 | <LeighR> | probably not even 100MB |
| 21:12:01 | <jodizzle> | What is the site? |
| 21:12:24 | <LeighR> | pemberley.com |
| 21:12:52 | <LeighR> | note its very, very slow load time, despite not having anything wild going on |
| 21:14:37 | <jodizzle> | Okay, sure. Join #archivebot |
| 21:14:40 | <LeighR> | it should probably only be archived at 1 or 2 concurrency, and slowly |
| 21:15:31 | <nyany> | oof, those loading times though |
| 21:17:13 | <LeighR> | yeah - that's why I'm afraid for it |
| 21:17:41 | <LeighR> | Early modern literature sites run old-school |
| 21:18:13 | <nyany> | that seems to be a static site though? |
| 21:18:16 | <nyany> | its godaddy too |
| 21:19:06 | <LeighR> | it should be blazing fast, and super cheap to host |
| 21:19:13 | <nyany> | oh no, that's wordpress |
| 21:19:15 | <nyany> | huh |
| 21:19:42 | | Doran (Doranwen) joins |
| 21:19:55 | | Doranwen quits [Ping timeout: 258 seconds] |
| 21:20:12 | <LeighR> | it'd probably be cheaper for them to move to wordpress.com |
| 21:21:29 | <LeighR> | but anyway - in general, if I find other precarious-looking sites, what's the best way to get them done by ArchiveBot? Make the request in that channel, and wait for an op to ask the bot? |
| 21:21:53 | <AK> | Pretty much |
| 21:21:56 | <AK> | That's what I do |
| 21:21:57 | <nyany> | basically, yup |
| 21:22:11 | <AK> | (Or if chat scrolls too quickly, throw it in here if it doesn't get seen in ab) |
| 21:22:50 | <LeighR> | so many mid-sized, deeply linked literary sites, done by hand starting in the late 90s |
| 21:25:31 | | VerifiedJ quits [Quit: The Lounge - https://thelounge.chat] |
| 21:26:16 | | VerifiedJ (VerifiedJ) joins |
| 21:34:25 | | pcr leaves |
| 21:34:26 | | pcr joins |
| 21:34:35 | | benjins quits [Remote host closed the connection] |
| 21:36:04 | | benjins joins |
| 21:36:49 | | LeighR quits [Client Quit] |
| 21:39:42 | | benjins is now authenticated as benjins |
| 21:57:11 | | onetruth quits [Read error: Connection reset by peer] |
| 22:00:08 | | Doran is now known as Doranwen |
| 22:07:26 | | fjjffjkf joins |
| 22:08:06 | | fjjffjkf quits [Remote host closed the connection] |
| 22:51:11 | | BlueMaxima joins |
| 23:37:52 | <@JAA> | thuban: wpull is fine at very high concurrencies in principle; it doesn't have the race condition issues like seesaw. But jodizzle's right, there's a hardcoded limit of 6 connections per host/port/use_ssl. Further, all processing is single-threaded. You'll quickly run into bottlenecks on SQLite, HTML parsing, WARC writing, Python's cookie jar, etc. I find that concurrencies above around a dozen are |
| 23:37:58 | <@JAA> | rarely useful. (Exceptions confirm the rule.) |
| 23:40:09 | | Stilett0 joins |
| 23:44:02 | | Stiletto quits [Ping timeout: 250 seconds] |
| 23:52:12 | | hooway quits [Client Quit] |