| 00:04:11 | | elomatreb joins |
| 00:27:25 | <elomatreb> | Hi, I'm looking to upload a WARC crawl of a small site I did to the Internet Archive, and I came across the FAQ at https://wiki.archiveteam.org/index.php/Frequently_Asked_Questions#halp_pls_halp |
| 00:27:44 | <elomatreb> | My upload form currently looks like this: https://files.elomatreb.eu/f/c72afcd85fd7bd8a5026428d596288d7.png - is this fine? |
| 00:47:01 | <Iki1> | Someone will probably suggest adding additional metadata of some sort or another, but 1) you probably have the minimal info to upload and 2) you can add metadata after your warc gets uploaded |
| 00:47:13 | <Iki1> | So go ahead, imo |
| 00:58:35 | | etnguyen03 quits [Client Quit] |
| 00:58:50 | | etnguyen03 (etnguyen03) joins |
| 01:02:27 | | dm4v_ joins |
| 01:04:16 | | dm4v quits [Ping timeout: 250 seconds] |
| 01:04:16 | | dm4v_ is now known as dm4v |
| 01:04:16 | | dm4v is now authenticated as dm4v |
| 01:04:16 | | dm4v quits [Changing host] |
| 01:04:16 | | dm4v (dm4v) joins |
| 01:05:54 | <elomatreb> | Iki1: What sort of additional metadata do you suggest? |
| 01:06:05 | <elomatreb> | Also, thanks! |
| 01:21:41 | | minari73 joins |
| 01:49:08 | | TheTechRobo joins |
| 01:49:19 | <TheTechRobo> | Yahoo!知恵袋 seems to be still open |
| 01:49:51 | <TheTechRobo> | Or would a better channel for this be the yahoo answers one? |
| 01:50:09 | <TheTechRobo> | Yeah, I'm moving to #noanswers. |
| 01:50:18 | | TheTechRobo leaves |
| 02:21:26 | | Iki1 quits [Read error: Connection reset by peer] |
| 02:21:43 | | Iki joins |
| 02:25:26 | | HackMii_ quits [Remote host closed the connection] |
| 02:25:57 | | HackMii_ (hacktheplanet) joins |
| 02:46:32 | <thuban> | youtube's rss feeds (eg https://www.youtube.com/feeds/videos.xml?channel_id=UCrTNhL_yO3tPTdQ5XgmmWjA) all seem to be 404ing for me, even though they're still linked in the page source. anyone else? |
| 02:48:42 | <Jake> | seems to be broken. |
| 02:49:17 | <thuban> | an ill omen |
| 03:01:40 | | Iki quits [Read error: Connection reset by peer] |
| 03:10:29 | | minari73 quits [Remote host closed the connection] |
| 03:15:38 | | BlueMaxima joins |
| 03:35:19 | | qw3rty_ joins |
| 03:39:08 | | qw3rty__ quits [Ping timeout: 258 seconds] |
| 03:40:59 | | Wayward (wayward) joins |
| 03:44:51 | | HackMii_ quits [Remote host closed the connection] |
| 03:55:33 | | HackMii_ (hacktheplanet) joins |
| 04:01:43 | | elomatreb quits [Client Quit] |
| 04:08:27 | | etnguyen03 quits [Client Quit] |
| 04:15:35 | | @Fusl quits [Excess Flood] |
| 04:15:52 | | Fusl (Fusl) joins |
| 04:15:52 | | @ChanServ sets mode: +o Fusl |
| 05:48:58 | | nertzy quits [Ping timeout: 250 seconds] |
| 06:32:39 | | LeGoupil joins |
| 06:43:58 | | LeGoupil quits [Client Quit] |
| 07:14:31 | | Arcorann__ joins |
| 07:32:24 | | duce1337 (duce1337) joins |
| 07:50:47 | | BlueMaxima_ joins |
| 07:54:49 | | BlueMaxima quits [Ping timeout: 258 seconds] |
| 08:17:10 | | BlueMaxima_ quits [Client Quit] |
| 08:29:31 | | roxfan joins |
| 08:30:39 | <roxfan> | hi, how can I find a specific group in the yahoo groups archive? there's a bunch of different files in the collection |
| 08:32:30 | <thuban> | roxfan: we're still organizing that data; of you tell us in #yahoosucks which group it is, someone should be able to help you find it |
| 08:32:34 | <thuban> | *if |
| 08:33:30 | <roxfan> | thx |
| 08:36:39 | | nertzy (nertzy) joins |
| 08:38:46 | | nertzy_ joins |
| 08:41:35 | | nertzy quits [Ping timeout: 258 seconds] |
| 09:11:51 | | themadpro (themadpro) joins |
| 09:36:45 | | nuroten quits [Remote host closed the connection] |
| 10:04:33 | | notak joins |
| 10:32:55 | | duce1337 quits [Read error: Connection reset by peer] |
| 10:32:55 | | duce1337_ (duce1337) joins |
| 10:59:24 | | notak quits [Client Quit] |
| 11:20:52 | | themadpro quits [Client Quit] |
| 12:29:10 | | Daloader joins |
| 12:42:04 | | roxfan quits [Remote host closed the connection] |
| 12:43:42 | | themadpro (themadpro) joins |
| 12:54:00 | <yano> | as far as sci-hub/libgen (see post in #archiveteam) most of it is available via bittorrent, https://phillm.net/libgen-stats-table-raw.php |
| 13:12:50 | | Doran is now known as Doranwen |
| 13:19:33 | | Daloader_ joins |
| 13:22:40 | | Daloader quits [Ping timeout: 250 seconds] |
| 13:25:39 | <VerifiedJ> | I guess they are talking about https://torrentfreak.com/fbi-has-gained-access-to-sci-hub-founders-apple-account-email-claims-210513/ |
| 13:26:52 | <russss> | also note the actual legal request there was dated Feb 2019, just Apple was unable to reveal it until just now |
| 13:48:20 | | duce1337 (duce1337) joins |
| 13:48:20 | | duce1337_ quits [Read error: Connection reset by peer] |
| 14:02:49 | | nerdguy1138 quits [Ping timeout: 258 seconds] |
| 14:04:11 | | nuroten joins |
| 14:04:47 | | Iki joins |
| 14:17:07 | | nerdguy1138 (nerdguy1138) joins |
| 14:19:56 | | etnguyen03 (etnguyen03) joins |
| 14:50:52 | | themadpro quits [Client Quit] |
| 14:54:23 | | britmob25 joins |
| 15:22:42 | | Arcorann__ quits [Ping timeout: 250 seconds] |
| 16:13:11 | | spirit joins |
| 16:24:19 | | spirit quits [Client Quit] |
| 17:11:05 | | Sylirana quits [Remote host closed the connection] |
| 17:12:19 | | Sylirana (Sylirana) joins |
| 17:47:45 | | roxfan joins |
| 18:03:09 | | pcr leaves |
| 18:03:11 | | pcr joins |
| 18:03:46 | | jonboy3452 quits [Read error: Connection reset by peer] |
| 18:19:59 | | Jonboy345 joins |
| 18:45:48 | | sec^nd quits [Remote host closed the connection] |
| 18:46:06 | | sec^nd (second) joins |
| 19:13:40 | | Daloader_ quits [Ping timeout: 250 seconds] |
| 20:36:53 | | Jonboy345 quits [Ping timeout: 258 seconds] |
| 20:37:47 | <marked> | https://aaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.com/ from https://news.ycombinator.com/item?id=27156106 |
| 20:57:39 | | onetruth joins |
| 21:15:30 | | Sylirana quits [Read error: Connection reset by peer] |
| 21:15:50 | | Sylirana (Sylirana) joins |
| 21:53:45 | | pcr leaves |
| 22:04:57 | <betamax> | JAA: do you have any idea how rate-limiting twitter is currently? I've just added in the next two twitter lists (part 3 and 4 of 17), and am wondering if I can up the concurrency / reduce the delay on the job for part 4 since it's on just twitter.com URLs now |
| 22:06:56 | <@JAA> | betamax: I haven't seen issues with it, but it's been a while since a job ran faster than default settings because it's usually mixed with outlinks that often can't be run as quickly. In the past, there were no rate limiting issues in twitter.com at all. |
| 22:08:06 | <betamax> | I'll give it a try and see how it goes. |
| 22:09:45 | | Jonboy345 joins |
| 22:10:23 | <betamax> | I've set it to 9 workers and [0 200] delay. If you (or others) think that's excessive, feel free to reduce. (Whether or not I use similar settings for later parts of the list will depend upon if the parts are running on separate pipelines) |
| 22:11:51 | <@JAA> | So that's actually 6 with 0-200 because there's a hard limit of 6 connections per host. But yeah, we'll see. :-) |
| 22:20:54 | | pcr joins |
| 22:21:40 | | rsn joins |
| 22:23:28 | | rsn_ quits [Ping timeout: 250 seconds] |
| 22:59:10 | <@JAA> | betamax: By the way, 70 % done of the websites. |
| 23:05:47 | | cmlow (cmlow) joins |
| 23:13:48 | | duce1337 quits [Client Quit] |
| 23:18:01 | | pcr leaves |
| 23:29:31 | | pcr joins |
| 23:43:06 | | lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)] |