00:02:55genofire joins
00:10:58Zopolis4 (Zopolis4) joins
00:47:53tzt quits [Ping timeout: 258 seconds]
00:52:40Mateon2 joins
00:54:24Mateon1 quits [Ping timeout: 258 seconds]
00:54:25Mateon2 is now known as Mateon1
00:54:38Mineroboter joins
00:56:26Mineroboter_ quits [Ping timeout: 250 seconds]
01:02:51dm4v quits [Client Quit]
01:04:00dm4v joins
01:04:02dm4v quits [Changing host]
01:04:02dm4v (dm4v) joins
01:17:02Zopolis4 quits [Remote host closed the connection]
01:24:48Arcorann (Arcorann) joins
03:17:14<etnguyen03>just making sure, bintray has been paused for some time right? just curious is there something being worked on? considering scaling down my workers
03:18:56DogsRNice quits [Client Quit]
03:19:23<jodizzle>etnguyen03: Join #binnedtray
03:26:01pcr leaves
03:47:59qw3rty_ joins
03:50:16tzt joins
03:51:53qw3rty__ quits [Ping timeout: 258 seconds]
04:01:42etnguyen03 quits [Client Quit]
04:08:31blankie joins
04:08:31blankie quits [Changing host]
04:08:31blankie (blankie) joins
05:21:10thuban quits [Read error: Connection reset by peer]
05:21:23thuban joins
05:28:34blankie quits [Read error: Connection reset by peer]
06:17:33mary quits [Ping timeout: 258 seconds]
06:44:57blankie joins
06:44:57blankie quits [Changing host]
06:44:57blankie (blankie) joins
07:04:03LeighR (LeighR) joins
07:33:49<thuban>outlinks from my ah.com crawl: archivebot or the urls project?
07:34:30<thuban>(about 300k from the first, smaller forum; second forum not finished yet)
07:59:50Daloader joins
08:19:10NF885 (NF885) joins
08:24:23BlueMaxima quits [Read error: Connection reset by peer]
08:56:26<Daloader>Just wondering if anyone is using / has experimented with a AWS / GCP / Azure "Spot" Style fleet of micro instances to get lots of IPs and burst download projects
09:06:30Zopolis4 (Zopolis4) joins
09:08:02<Zopolis4>are the WARRIOR SUPPORTED messages accurate? they say that yahoo answers is not warrior compatible, but it clearly is
09:11:22<Jake>Those are kind of old, back when the warrior was not supported by a lot of the projects. I think every project should work on the new warrior now?
09:12:46NF885 quits [Ping timeout: 244 seconds]
09:15:17<Zopolis4>thought so, also some of those projects seem to have been completed, like google sites, reddit and periscopes? are they still running? the trackers seem empty
09:16:44<masterX244>reddit is a continuous project
09:17:02<masterX244>it regulary get new tasks since its effectively a "tail -f" on new content
09:17:46<Zopolis4>got it, and the others?
09:21:26mls quits [Remote host closed the connection]
09:47:58hooway joins
09:48:46blankie quits [Ping timeout: 258 seconds]
09:49:05blankie (blankie) joins
10:00:39Matthww quits [Client Quit]
10:10:15Matthww joins
10:11:06blankie quits [Client Quit]
10:11:27blankie joins
10:11:27blankie quits [Changing host]
10:11:27blankie (blankie) joins
10:14:22<masterX244>JAA: TM-exchange trackpages are all uploaded to archive. Gotta write me another quick tool to extract the userpage-urls from that crawl
10:32:10Vukky quits [Read error: Connection reset by peer]
10:32:20shoghicp quits [Ping timeout: 250 seconds]
10:33:41<Sanqui>https://archiveteam.org/ is missing a favicon. I've designed two potential ones.
10:34:21<Sanqui>https://etc.sanqui.net/at_favicon.png - a simple favicon inspired by the 7 inch floppy logo and using EGA colors.
10:34:46<Sanqui>https://etc.sanqui.net/atyahoo_favicon.png - modeled after the old Yahoo! favicon
10:35:14<Sanqui>I personally like the second one, even though it doesn't match any logo we currently use; any wiki admin down to implement it?
10:35:47<@OrIdow6>That would presumably be J R W R
10:36:29<@OrIdow6>First one looks almost solid black to me, hard to make out details
10:36:38<@OrIdow6>Except for the corner
10:36:53<Sanqui>yeah, it's difficult to depict a solid black floppy
10:37:09<Sanqui>so i don't think it would make for a good favicon even if I refined it more
10:39:21<@OrIdow6>Yahoo thing is clever - with the AB dashboard one (which I am going to guess at like 40% that you made as well) it establishes a sort of theme
10:41:18<Sanqui>nope, I didn't make that one! but I do like it and it inspired me to mimic the Yahoo one, yeah
10:45:24<Sanqui>https://etc.sanqui.net/at2_favicon.png
10:45:37<Sanqui>here's one that's a bit more closely modeled after the current logo
10:45:49<Sanqui>all are also available as .icos at the same address for ease of use
10:47:28NF885 (NF885) joins
10:48:26<@OrIdow6>I like the Yahoo one more, but it seems to me that from a "marketing" angle the floppy one (and the new one is a lot better) makes more sense
10:49:18<@OrIdow6>But anyhow, it's not like I'm in charge of this
10:49:59<Sanqui>yeah, it's alright, it's not like this is a priority in any way, but I kind of, just made a good old bookmark bar and the AT wiki is sticking out for not having a favicon haha
10:50:09<Sanqui>so I thought I'd put my lackluster pixel art skills to ues
10:50:11<Sanqui>use*
11:18:44x9fff00 quits [Quit: leaving]
11:31:42blankie quits [Ping timeout: 250 seconds]
11:31:49blankie joins
11:31:50blankie quits [Changing host]
11:31:50blankie (blankie) joins
11:41:24azureuser (x9fff00) joins
11:43:13azureuser is now known as x9fff00
12:02:30pcr joins
12:13:17blankie quits [Ping timeout: 258 seconds]
12:14:02blankie joins
12:14:02blankie quits [Changing host]
12:14:02blankie (blankie) joins
12:29:06NF885 quits [Ping timeout: 244 seconds]
12:36:59hooway_ joins
12:36:59hooway quits [Read error: Connection reset by peer]
12:53:09blankie quits [Ping timeout: 258 seconds]
12:54:12blankie joins
12:54:12blankie quits [Changing host]
12:54:12blankie (blankie) joins
13:02:07blankie quits [Remote host closed the connection]
13:02:50blankie joins
13:02:50blankie quits [Changing host]
13:02:50blankie (blankie) joins
13:20:41etnguyen03 (etnguyen03) joins
13:49:29shoghicp (shoghicp) joins
14:22:46LeighR quits [Ping timeout: 244 seconds]
14:41:10onetruth joins
14:43:10x9fff00 quits [Client Quit]
14:50:35x9fff00 (x9fff00) joins
15:09:04NF885 (NF885) joins
15:09:54spirit joins
15:14:48<masterX244>Switched to torrent at the Stackexchange dump, that was much faster for some odd reason. Extracting now and after that outlinks should be extracted soon
15:37:07Doran is now known as Doranwen
15:47:58pcr leaves
15:50:38blankie quits [Ping timeout: 258 seconds]
15:54:44Arcorann quits [Ping timeout: 250 seconds]
15:59:24pcr joins
16:15:41webdownload joins
16:22:58webdownload quits [Remote host closed the connection]
16:27:17NF885 quits [Ping timeout: 244 seconds]
16:32:24hooway joins
16:32:24hooway_ quits [Read error: Connection reset by peer]
17:35:18LeighR (LeighR) joins
18:09:48IKI joins
18:21:16IKI quits [Remote host closed the connection]
18:29:00rewby quits [Ping timeout: 250 seconds]
18:29:12rewby (rewby) joins
18:49:16PlsNoJava quits [Ping timeout: 258 seconds]
18:54:38PlsNoJava (ROpdebee) joins
19:35:48DogsRNice (Webuser299) joins
19:59:41spirit quits [Client Quit]
20:00:23NF885 (NF885) joins
20:49:49<thuban>does grab-site have a secret concurrency limit like seesaw-kit?
20:53:42<masterX244>how do you mean? hardcoded value that can't be exceeded by config?
20:54:02<masterX244>otherwise: the server and its response duration affects effective rate, too
20:58:01<thuban>i mean 'limit beyond which it starts to get flaky'
20:58:57<thuban>i'm currently at 20 and trying to decide whether i should go higher (seeing as i'm technically past the deadline already and there are a lot of pages left to do)
21:00:00<@HCross>DO NOT CHANGE THE CODE
21:00:09<@HCross>DO NOT FIDDLE WITH THE WARRIOR/PROJECT CODE
21:00:26<thuban>dude chill this is grab-site
21:00:27<@HCross>If you're using grab-site then that is fine
21:00:45<@HCross>but I would stick to 20 otherwise it does do strange things afaik
21:00:57<thuban>ah, oof
21:01:20<jodizzle>Doesn't wpull have a built-in per-domain concurrency limit?
21:04:13NF885 quits [Ping timeout: 244 seconds]
21:04:59<jodizzle>Not an expert, but according to some logs I have, wpull maxes out at 6 connections per (host, port, use_ssl) tuple. Since grab-site uses wpull, you might be limited by that.
21:08:10<LeighR>If there is a site that is increasingly fragile and I'm afraid will die due to neglect, is it better for me to use grab-site on my own to make sure it gets archived properly, and then upload to archive.org, or to ask one of you to have ArchiveBot do it?
21:10:15<jodizzle>The nice thing about AB other than convenience is that the data ends up in the WBM. Uploading to archive.org on your own doesn't do that.
21:10:22<LeighR>ok
21:11:16<jodizzle>But you can also do both if you're particularly concerned (just be mindful of IA resources)
21:11:35<LeighR>I seriously doubt it would be over 1GB total
21:11:42<LeighR>probably not even 100MB
21:12:01<jodizzle>What is the site?
21:12:24<LeighR>pemberley.com
21:12:52<LeighR>note its very, very slow load time, despite not having anything wild going on
21:14:37<jodizzle>Okay, sure. Join #archivebot
21:14:40<LeighR>it should probably only be archived at 1 or 2 concurrency, and slowly
21:15:31<nyany>oof, those loading times though
21:17:13<LeighR>yeah - that's why I'm afraid for it
21:17:41<LeighR>Early modern literature sites run old-school
21:18:13<nyany>that seems to be a static site though?
21:18:16<nyany>its godaddy too
21:19:06<LeighR>it should be blazing fast, and super cheap to host
21:19:13<nyany>oh no, that's wordpress
21:19:15<nyany>huh
21:19:42Doran (Doranwen) joins
21:19:55Doranwen quits [Ping timeout: 258 seconds]
21:20:12<LeighR>it'd probably be cheaper for them to move to wordpress.com
21:21:29<LeighR>but anyway - in general, if I find other precarious-looking sites, what's the best way to get them done by ArchiveBot? Make the request in that channel, and wait for an op to ask the bot?
21:21:53<AK>Pretty much
21:21:56<AK>That's what I do
21:21:57<nyany>basically, yup
21:22:11<AK>(Or if chat scrolls too quickly, throw it in here if it doesn't get seen in ab)
21:22:50<LeighR>so many mid-sized, deeply linked literary sites, done by hand starting in the late 90s
21:25:31VerifiedJ quits [Quit: The Lounge - https://thelounge.chat]
21:26:16VerifiedJ (VerifiedJ) joins
21:34:25pcr leaves
21:34:26pcr joins
21:34:35benjins quits [Remote host closed the connection]
21:36:04benjins joins
21:36:49LeighR quits [Client Quit]
21:57:11onetruth quits [Read error: Connection reset by peer]
22:00:08Doran is now known as Doranwen
22:07:26fjjffjkf joins
22:08:06fjjffjkf quits [Remote host closed the connection]
22:51:11BlueMaxima joins
23:37:52<@JAA>thuban: wpull is fine at very high concurrencies in principle; it doesn't have the race condition issues like seesaw. But jodizzle's right, there's a hardcoded limit of 6 connections per host/port/use_ssl. Further, all processing is single-threaded. You'll quickly run into bottlenecks on SQLite, HTML parsing, WARC writing, Python's cookie jar, etc. I find that concurrencies above around a dozen are
23:37:58<@JAA>rarely useful. (Exceptions confirm the rule.)
23:40:09Stilett0 joins
23:44:02Stiletto quits [Ping timeout: 250 seconds]
23:52:12hooway quits [Client Quit]