00:04:11elomatreb joins
00:27:25<elomatreb>Hi, I'm looking to upload a WARC crawl of a small site I did to the Internet Archive, and I came across the FAQ at https://wiki.archiveteam.org/index.php/Frequently_Asked_Questions#halp_pls_halp
00:27:44<elomatreb>My upload form currently looks like this: https://files.elomatreb.eu/f/c72afcd85fd7bd8a5026428d596288d7.png - is this fine?
00:47:01<Iki1>Someone will probably suggest adding additional metadata of some sort or another, but 1) you probably have the minimal info to upload and 2) you can add metadata after your warc gets uploaded
00:47:13<Iki1>So go ahead, imo
00:58:35etnguyen03 quits [Client Quit]
00:58:50etnguyen03 (etnguyen03) joins
01:02:27dm4v_ joins
01:04:16dm4v quits [Ping timeout: 250 seconds]
01:04:16dm4v_ is now known as dm4v
01:04:16dm4v quits [Changing host]
01:04:16dm4v (dm4v) joins
01:05:54<elomatreb>Iki1: What sort of additional metadata do you suggest?
01:06:05<elomatreb>Also, thanks!
01:21:41minari73 joins
01:49:08TheTechRobo joins
01:49:19<TheTechRobo>Yahoo!知恵袋 seems to be still open
01:49:51<TheTechRobo>Or would a better channel for this be the yahoo answers one?
01:50:09<TheTechRobo>Yeah, I'm moving to #noanswers.
01:50:18TheTechRobo leaves
02:21:26Iki1 quits [Read error: Connection reset by peer]
02:21:43Iki joins
02:25:26HackMii_ quits [Remote host closed the connection]
02:25:57HackMii_ (hacktheplanet) joins
02:46:32<thuban>youtube's rss feeds (eg https://www.youtube.com/feeds/videos.xml?channel_id=UCrTNhL_yO3tPTdQ5XgmmWjA) all seem to be 404ing for me, even though they're still linked in the page source. anyone else?
02:48:42<Jake>seems to be broken.
02:49:17<thuban>an ill omen
03:01:40Iki quits [Read error: Connection reset by peer]
03:10:29minari73 quits [Remote host closed the connection]
03:15:38BlueMaxima joins
03:35:19qw3rty_ joins
03:39:08qw3rty__ quits [Ping timeout: 258 seconds]
03:40:59Wayward (wayward) joins
03:44:51HackMii_ quits [Remote host closed the connection]
03:55:33HackMii_ (hacktheplanet) joins
04:01:43elomatreb quits [Client Quit]
04:08:27etnguyen03 quits [Client Quit]
04:15:35@Fusl quits [Excess Flood]
04:15:52Fusl (Fusl) joins
04:15:52@ChanServ sets mode: +o Fusl
05:48:58nertzy quits [Ping timeout: 250 seconds]
06:32:39LeGoupil joins
06:43:58LeGoupil quits [Client Quit]
07:14:31Arcorann__ joins
07:32:24duce1337 (duce1337) joins
07:50:47BlueMaxima_ joins
07:54:49BlueMaxima quits [Ping timeout: 258 seconds]
08:17:10BlueMaxima_ quits [Client Quit]
08:29:31roxfan joins
08:30:39<roxfan>hi, how can I find a specific group in the yahoo groups archive? there's a bunch of different files in the collection
08:32:30<thuban>roxfan: we're still organizing that data; of you tell us in #yahoosucks which group it is, someone should be able to help you find it
08:32:34<thuban>*if
08:33:30<roxfan>thx
08:36:39nertzy (nertzy) joins
08:38:46nertzy_ joins
08:41:35nertzy quits [Ping timeout: 258 seconds]
09:11:51themadpro (themadpro) joins
09:36:45nuroten quits [Remote host closed the connection]
10:04:33notak joins
10:32:55duce1337 quits [Read error: Connection reset by peer]
10:32:55duce1337_ (duce1337) joins
10:59:24notak quits [Client Quit]
11:20:52themadpro quits [Client Quit]
12:29:10Daloader joins
12:42:04roxfan quits [Remote host closed the connection]
12:43:42themadpro (themadpro) joins
12:54:00<yano>as far as sci-hub/libgen (see post in #archiveteam) most of it is available via bittorrent, https://phillm.net/libgen-stats-table-raw.php
13:12:50Doran is now known as Doranwen
13:19:33Daloader_ joins
13:22:40Daloader quits [Ping timeout: 250 seconds]
13:25:39<VerifiedJ>I guess they are talking about https://torrentfreak.com/fbi-has-gained-access-to-sci-hub-founders-apple-account-email-claims-210513/
13:26:52<russss>also note the actual legal request there was dated Feb 2019, just Apple was unable to reveal it until just now
13:48:20duce1337 (duce1337) joins
13:48:20duce1337_ quits [Read error: Connection reset by peer]
14:02:49nerdguy1138 quits [Ping timeout: 258 seconds]
14:04:11nuroten joins
14:04:47Iki joins
14:17:07nerdguy1138 (nerdguy1138) joins
14:19:56etnguyen03 (etnguyen03) joins
14:50:52themadpro quits [Client Quit]
14:54:23britmob25 joins
15:22:42Arcorann__ quits [Ping timeout: 250 seconds]
16:13:11spirit joins
16:24:19spirit quits [Client Quit]
17:11:05Sylirana quits [Remote host closed the connection]
17:12:19Sylirana (Sylirana) joins
17:47:45roxfan joins
18:03:09pcr leaves
18:03:11pcr joins
18:03:46jonboy3452 quits [Read error: Connection reset by peer]
18:19:59Jonboy345 joins
18:45:48sec^nd quits [Remote host closed the connection]
18:46:06sec^nd (second) joins
19:13:40Daloader_ quits [Ping timeout: 250 seconds]
20:36:53Jonboy345 quits [Ping timeout: 258 seconds]
20:37:47<marked>https://aaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.com/ from https://news.ycombinator.com/item?id=27156106
20:57:39onetruth joins
21:15:30Sylirana quits [Read error: Connection reset by peer]
21:15:50Sylirana (Sylirana) joins
21:53:45pcr leaves
22:04:57<betamax>JAA: do you have any idea how rate-limiting twitter is currently? I've just added in the next two twitter lists (part 3 and 4 of 17), and am wondering if I can up the concurrency / reduce the delay on the job for part 4 since it's on just twitter.com URLs now
22:06:56<@JAA>betamax: I haven't seen issues with it, but it's been a while since a job ran faster than default settings because it's usually mixed with outlinks that often can't be run as quickly. In the past, there were no rate limiting issues in twitter.com at all.
22:08:06<betamax>I'll give it a try and see how it goes.
22:09:45Jonboy345 joins
22:10:23<betamax>I've set it to 9 workers and [0 200] delay. If you (or others) think that's excessive, feel free to reduce. (Whether or not I use similar settings for later parts of the list will depend upon if the parts are running on separate pipelines)
22:11:51<@JAA>So that's actually 6 with 0-200 because there's a hard limit of 6 connections per host. But yeah, we'll see. :-)
22:20:54pcr joins
22:21:40rsn joins
22:23:28rsn_ quits [Ping timeout: 250 seconds]
22:59:10<@JAA>betamax: By the way, 70 % done of the websites.
23:05:47cmlow (cmlow) joins
23:13:48duce1337 quits [Client Quit]
23:18:01pcr leaves
23:29:31pcr joins
23:43:06lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)]