00:02:42lennier2 joins
00:05:24lennier2__ quits [Ping timeout: 250 seconds]
00:21:00etnguyen03 quits [Client Quit]
00:21:26cascode quits [Ping timeout: 250 seconds]
00:22:27cascode joins
00:38:06Ifan joins
00:43:58<Ifan>Hey. Not sure if this is the right place for this. A whole bunch of US government websites are going down (eg. USAID.gov just now.) Various data sources are at risk of being lost, some of which are essential to an education project I work on. Eg. https://civilrightsdata.ed.gov/data. I heard this is a good place to mention something like this. I
00:43:58<Ifan>coudn't find any rules page so please let me know if I'm going about this wrong
00:44:28<pokechu22>Ifan: We're currently working on it; #UncleSamsArchive is the channel for that specifically
00:44:57<Ifan>Awesome. Thanks. I barely have any idea how this site works.
00:56:03cascode quits [Ping timeout: 260 seconds]
00:57:37cascode joins
00:59:23Naruyoko5 quits [Quit: Leaving]
01:14:29BlueMaxima_ joins
01:16:42Naruyoko joins
01:16:54BlueMaxima quits [Ping timeout: 250 seconds]
01:17:03cascode quits [Read error: Connection reset by peer]
01:17:16cascode joins
01:29:35etnguyen03 (etnguyen03) joins
01:33:56Webuser178098 joins
01:48:33Ifan quits [Client Quit]
02:00:39Webuser178098 quits [Client Quit]
02:00:48cascode quits [Ping timeout: 260 seconds]
02:01:11cascode joins
02:04:58<pabs>pokechu22: are there known ignores for fandom?
02:05:53<pokechu22>Not really beyond -i mediawiki
02:21:54scurvy_duck quits [Ping timeout: 250 seconds]
02:24:08cascode quits [Ping timeout: 260 seconds]
02:24:40cascode joins
02:30:54sec^nd quits [Remote host closed the connection]
02:31:12sec^nd (second) joins
03:07:21hackbug quits [Remote host closed the connection]
03:10:07hackbug (hackbug) joins
03:13:04scurvy_duck joins
03:16:38pabs quits [Ping timeout: 260 seconds]
03:16:54<nicolas17>/phonenixdown pabs
03:17:21pabs (pabs) joins
03:17:50<nulldata>nicolas17 - phoenix has risen again
03:35:53cascode quits [Ping timeout: 260 seconds]
03:38:21cascode joins
03:48:31<utulien_>Ifan - there's a torrent of most of the CDC data up online already.
03:48:53<utulien_>(not from archiveteam, but some guy on reddit. it's on the internet archive)
03:50:53etnguyen03 quits [Client Quit]
03:54:05etnguyen03 (etnguyen03) joins
03:54:13Webuser303121 joins
03:54:41Webuser303121 quits [Client Quit]
04:01:00scurvy_duck quits [Remote host closed the connection]
04:19:49etnguyen03 quits [Read error: Connection reset by peer]
05:00:00benjins3 quits [Read error: Connection reset by peer]
05:08:41<tech234a>DSLReports updated again with weird restriction: "NEWS: The full site corpus is only available (in readonly form) for 5 minutes past each hour, for members and guests."
05:09:14<tech234a>do we have ArchiveBot tooling tooling to only crawl a website for the first 5 minutes past each hour?
05:11:23<nicolas17>what the fuck
05:11:49<nicolas17>tech234a: we could manually pause and resume the AB job at the right moments ig
05:12:27<@JAA>Is it hosted on DSL with an hourly traffic limit? lol
05:13:18<tech234a>it gives a 503 error on the error pages, is it possible to have it retry those pages at the right time if it encounters one?
05:14:11<tech234a>also the error page says the 5 minutes per hour thing "may change at any time"
05:15:46<nicolas17>so we need someone to stare at the logs and pause/resume, wonderful
05:15:53<nicolas17>is there an AB job running already?
05:22:12<@JAA>No point in starting outside of those 5 minutes.
05:24:41<@JAA>But also, we tried to run it through Archivebot a couple years ago. It had to run extremely slowly. Like 5 requests per minute slowly.
05:27:44BlueMaxima_ quits [Read error: Connection reset by peer]
05:47:22Island quits [Read error: Connection reset by peer]
06:01:26<tech234a>doesn't seem to be working as advertised
06:04:22<@JAA>I am not surprised.
06:09:19<@JAA>It's clearly still being worked on, but if it doesn't get better soon, we could try to contact them.
06:49:33qwertyasdfuiopghjkl2 quits [Ping timeout: 260 seconds]
06:50:08utulien_ quits [Ping timeout: 260 seconds]
06:56:41niemasd (niemasd) joins
06:58:16<niemasd>I have a manual dump of the cdc.gov website from 1/25, and it has some pages that Wayback Machine doesn't. The files seem to have their original timestamps. Is there any way to bulk-add them to Wayback Machine? I can share them with the Archive team if so
06:59:02<niemasd>I imagine no since there's no way to confirm their validity, but I figured I'd check just in case
07:01:29<@JAA>→ #UncleSamsArchive
07:02:53<niemasd>Ah, thank you!
07:03:14<niemasd>I imagine no since there's no way to confirm their validity, but I figured I'd check just in case
07:03:25niemasd leaves
07:09:04qwertyasdfuiopghjkl2 (qwertyasdfuiopghjkl2) joins
07:09:32qwertyasdfuiopghjkl2 quits [Max SendQ exceeded]
07:30:30earl joins
07:45:36cascode quits [Ping timeout: 250 seconds]
07:45:43cascode joins
07:50:13cascode quits [Ping timeout: 260 seconds]
07:53:16cascode joins
08:05:20<@JAA>So DSLReports works now, kind of.
08:05:31<@JAA>But the rate limit situation isn't looking great.
08:06:22<@JAA>AB got 503s almost immediately. They're actually the same 'mostly closed' message, but it's obviously rate limiting.
08:06:32<@JAA>... and it's down again.
08:08:28<@JAA>This five minute window thing could only be made to work if we can go *hard* in those five minutes. But as I suspected, doesn't look like we can.
08:12:08cascode quits [Read error: Connection reset by peer]
08:12:21cascode joins
09:05:49<steering>>NEWS: The full site corpus is only available (in readonly form) for 5 minutes past each hour, for members and guests
09:05:51<steering>wat
09:08:05<steering>out of control AI scraping? xP
09:23:17<steering>it's still up right now though?
09:23:41<steering>oh only the homepage isup
09:28:20meisnick quits [Quit: Ooops, wrong browser tab.]
10:02:45<Chewie9999>TheTechRobo: Thanks! I'll try that. I love ntfy.sh :)
10:11:00earl quits [Client Quit]
11:11:31neggles quits [Quit: bye friends - ZNC - https://znc.in]
11:13:56T31M quits [Quit: ZNC - https://znc.in]
11:15:16T31M joins
11:20:39Stagnant_ quits [Remote host closed the connection]
11:39:23earl joins
12:00:04Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
12:02:47Bleo18260072271962345 joins
12:08:38Webuser654763 joins
12:34:19SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:34:50SkilledAlpaca418962 joins
12:46:01pixel leaves [Error from remote client]
12:48:13<Webuser654763>```Failed to submit discovered URLs.wantreadnil
12:48:13<Webuser654763>nil``` getting this runner docker for the government grab
12:49:19pixel (pixel) joins
12:52:18Stagnant_ (Stagnant) joins
13:49:40etnguyen03 (etnguyen03) joins
13:50:31_Dango360 (Dango360) joins
13:53:56Dango360_ quits [Ping timeout: 250 seconds]
14:02:35SootBector quits [Remote host closed the connection]
14:02:53SootBector (SootBector) joins
14:05:47etnguyen03 quits [Client Quit]
14:21:07etnguyen03 (etnguyen03) joins
14:40:17etnguyen03 quits [Client Quit]
14:43:28etnguyen03 (etnguyen03) joins
14:50:20Matthww joins
15:11:18<TheTechRobo>Webuser654763: Should be fixed.
15:30:51Dango360_ (Dango360) joins
15:34:02_Dango360 quits [Ping timeout: 250 seconds]
15:44:24etnguyen03 quits [Client Quit]
15:45:34Webuser781938 joins
15:45:43Webuser781938 quits [Client Quit]
15:46:07etnguyen03 (etnguyen03) joins
15:53:02Webuser365683 joins
16:02:59etnguyen03 quits [Client Quit]
16:03:15Webuser365683 quits [Client Quit]
17:00:54icedice (icedice) joins
17:05:25i_have_n0_idea quits [Quit: The Lounge - https://thelounge.chat]
17:05:43i_have_n0_idea (i_have_n0_idea) joins
17:09:28PredatorIWD25 quits [Read error: Connection reset by peer]
17:12:57PredatorIWD25 joins
17:22:43utulien joins
17:22:50etnguyen03 (etnguyen03) joins
17:35:52pseudorizer quits [Quit: ZNC 1.9.1 - https://znc.in]
17:37:46pseudorizer (pseudorizer) joins
17:46:19etnguyen03 quits [Client Quit]
17:50:59etnguyen03 (etnguyen03) joins
17:53:25<h2ibot>TheTechRobo edited YouTube (+53, Rewrite Wayback Machine section): https://wiki.archiveteam.org/?diff=54319&oldid=53952
18:32:21etnguyen03 quits [Client Quit]
18:37:42etnguyen03 (etnguyen03) joins
18:46:57etnguyen03 quits [Client Quit]
19:00:54scurvy_duck joins
19:10:23<eggdrop>[remind] pokechu22: https://sleepnomoreauction.com/ auctions close shortly cc TheTechRobo
19:15:03katocala quits [Ping timeout: 260 seconds]
19:15:16katocala joins
19:28:28katocala quits [Ping timeout: 260 seconds]
19:28:43<pokechu22>TheTechRobo: https://transfer.archivete.am/X5AeX/sleepnomoreauction.com_urls_redo.txt - it seems like there are more URLs now (though I'm not sure why?)
19:28:43<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/X5AeX/sleepnomoreauction.com_urls_redo.txt
19:29:16katocala joins
19:39:32Webuser654763 quits [Quit: Ooops, wrong browser tab.]
19:45:10HP_Archivist (HP_Archivist) joins
19:52:23HP_Archivist quits [Ping timeout: 260 seconds]
19:56:06<tech234a>DSLReports seems to generally be made available a few minutes before each hour starts as well, though the exact time might be inconsistent
19:57:10Miki_57 quits [Quit: Leaving]
20:05:51BornOn420 quits [Remote host closed the connection]
20:06:24BornOn420 (BornOn420) joins
20:24:08HP_Archivist (HP_Archivist) joins
20:24:18Shyy quits [Quit: The Lounge - https://thelounge.chat]
20:28:34BlueMaxima joins
20:32:28etnguyen03 (etnguyen03) joins
20:38:37lennier2_ joins
20:41:23lennier2 quits [Ping timeout: 260 seconds]
20:42:08cascode quits [Ping timeout: 250 seconds]
20:42:43cascode joins
20:43:28Shyy joins
21:02:34Shyy quits [Client Quit]
21:03:19HP_Archivist quits [Read error: Connection reset by peer]
21:03:59Shyy joins
21:05:44HP_Archivist (HP_Archivist) joins
21:08:49HP_Archivist quits [Read error: Connection reset by peer]
21:13:41HP_Archivist (HP_Archivist) joins
21:16:49HP_Archivist quits [Read error: Connection reset by peer]
21:17:14cascode quits [Ping timeout: 250 seconds]
21:18:20cascode joins
21:20:05HP_Archivist (HP_Archivist) joins
21:31:32cascode quits [Ping timeout: 250 seconds]
21:31:41cascode joins
21:54:30BlueMaxima quits [Ping timeout: 250 seconds]
22:14:48etnguyen03 quits [Client Quit]
22:22:24<szczot3k>How do we grab yt videos once again?
22:22:29<szczot3k>Preferably to get them into WBM
22:24:33lunik11 quits [Quit: :x]
22:24:44Island joins
22:25:10<nstrom|m>#down-the-tube:hackint.org
22:25:53<nstrom|m>there's a bot in there if the videos meet the criteria for the project
22:29:19lunik11 joins
22:31:50loug8318142 quits [Quit: The Lounge - https://thelounge.chat]
22:32:20etnguyen03 (etnguyen03) joins
22:46:31loug8318142 joins
22:49:11loug8318142 quits [Client Quit]
23:01:13etnguyen03 quits [Client Quit]
23:05:00<TheTechRobo>pokechu22: Running now
23:19:52cascode quits [Ping timeout: 250 seconds]
23:20:38scurvy_duck quits [Ping timeout: 260 seconds]
23:23:53cascode joins
23:32:34`