00:02:57fishingforsoup_ joins
00:06:49fishingforsoup quits [Ping timeout: 252 seconds]
00:46:56BlueMaxima joins
00:47:20benjinsm is now known as benjins
00:58:31tzt quits [Ping timeout: 252 seconds]
01:24:22yawkat quits [Ping timeout: 252 seconds]
01:26:37ThreeHM quits [Ping timeout: 265 seconds]
01:28:27ThreeHM (ThreeHeadedMonkey) joins
01:33:02yawkat (yawkat) joins
01:42:56<myself>Is there a best-practices guide somewhere for archiving crawler-resistant (heavily rate-limited) sites? I'm thinking specifically of https://wiki.seeedstudio.com/ and https://www.waveshare.com/wiki/Main_Page which have a ton of electronics resources that would suck to lose, and I'd like to either get AT chewing on them, or refine my own httrack-fu.
01:48:02monoxane quits [Quit: Ping timeout (120 seconds)]
01:48:21monoxane (monoxane) joins
01:50:48<myself>(IA seems to have archived a whole lot of 500 errors in the above, so that's not terribly helpful.)
01:52:46<pokechu22>The latter one is cloudflare, not sure about the first one. But the latter is also mediawiki so I can try saving it with wikiteam tools
02:01:48<pokechu22>Request-rate: 5/60 is pretty bad, though (5 pages every minute, so delay 12 seconds). The mediawiki API lets you export 50 revisions/API call and there are 55,948 revisions, so that's 224 minutes or 3.7 hours to save everything (not counting images), though
02:05:35<pokechu22>(the number of revisions is from https://www.waveshare.com/wiki/Special:Statistics)
02:33:15<@hook54321>Thibaultmol: it looks like they want the data deleted completely after it's shut down, so it's not likely
03:20:26lennier1 quits [Client Quit]
03:20:48lennier1 (lennier1) joins
03:47:27aismallard quits [Quit: see ya]
03:47:59aismallard joins
03:54:38Ketchup902 (Ketchup901) joins
03:56:37Ketchup901 quits [Client Quit]
04:17:58BlueMaxima quits [Read error: Connection reset by peer]
04:33:29fishingforsoup_ quits [Client Quit]
04:33:40fishingforsoup joins
05:01:34Tomo|m is now known as tomodachi94
05:05:05tomodachi94 quits [Client Quit]
05:05:10tomodachi94 joins
05:21:16<Jake>https://github.com/facebook/zstd/releases/tag/v1.5.4
05:28:33<@JAA>Oh, the dict compression improvements look nice.
05:35:54<@JAA>Ah, the largest performance increases are for levels 1-4. Not sure what we use actually. Can't find it in the code, only for the skippable dict frame.
05:36:25<@JAA>arkiver: ^
05:38:27user_ joins
05:42:19user__ quits [Ping timeout: 252 seconds]
06:00:51tzt (tzt) joins
06:04:08hackbug quits [Ping timeout: 252 seconds]
06:17:18<pokechu22>Do warc files record the certificate in any way? e.g. to establish a link from https://ip-104-243-80-232.user.start.ca/ to waterfallsofontario.com
06:18:17<@JAA>No
07:17:05mutantm0nkey quits [Ping timeout: 276 seconds]
07:17:43mutantm0nkey (mutantmonkey) joins
07:29:18Icyelut quits [Quit: bye]
07:34:55<pabs>do we have a way to archive Thingiverse pages? they are JS-heavy
07:34:58Island quits [Read error: Connection reset by peer]
08:02:46hitgrr8 joins
09:14:52monoxane quits [Client Quit]
09:16:42monoxane (monoxane) joins
09:46:00<h2ibot>Bzc6p edited Kepfeltoltes.eu (-5, /* Archiving */ 2022 images saved): https://wiki.archiveteam.org/?diff=49447&oldid=49401
09:47:00<h2ibot>Bzc6p edited Template:Hungarian websites (+7, eOldal closed): https://wiki.archiveteam.org/?diff=49448&oldid=48056
09:55:48HackMii_ quits [Remote host closed the connection]
09:56:11second (second) joins
09:56:23mut4ntmonkey (mutantmonkey) joins
09:57:19abirkill quits [Remote host closed the connection]
09:59:35mutantm0nkey quits [Ping timeout: 276 seconds]
10:00:15sec^nd quits [Ping timeout: 276 seconds]
10:00:16second is now known as sec^nd
10:01:41abirkill (abirkill) joins
10:02:12HackMii_ (hacktheplanet) joins
10:05:01user_ quits [Remote host closed the connection]
10:07:41Ketchup902 quits [Remote host closed the connection]
10:08:12Ketchup901 (Ketchup901) joins
10:22:06<h2ibot>Bzc6p edited Network.hu (+9, /* Archiving */ let's continue archiving with…): https://wiki.archiveteam.org/?diff=49449&oldid=46294
12:00:21<h2ibot>JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=49450&oldid=49388
12:57:30hackbug (hackbug) joins
13:12:35sonick (sonick) joins
13:34:52adia quits [Client Quit]
13:36:07adia (adia) joins
13:54:06Megame (Megame) joins
14:19:30Arcorann_ quits [Ping timeout: 252 seconds]
14:20:05kurkosdr joins
14:20:44<kurkosdr>JAA
14:23:25kalma joins
14:23:31kalma quits [Remote host closed the connection]
14:24:56<kurkosdr>@JAAPromised upload:https://archive.org/details/ftp.pctvsystems.comhttps://ia804706.us.archive.org/view_archive.php?archive=/31/items/ftp.pctvsystems.com/ftp.pctvsystems.com.tarHad to drop off and wait until I get home to do it, instead of doing it from the office, because I didn't want to attract the ire of the office's ITSUP, so it took a little
14:24:57<kurkosdr>longer.
14:25:21<kurkosdr>Reposting links:
14:25:23<kurkosdr>https://archive.org/details/ftp.pctvsystems.com
14:25:45<kurkosdr>https://ia804706.us.archive.org/view_archive.php?archive=/31/items/ftp.pctvsystems.com/ftp.pctvsystems.com.tar
14:25:57<kurkosdr>(for some reason multi-line text isn't supported)
14:28:40katocala quits [Ping timeout: 252 seconds]
14:29:18katocala joins
14:36:20Megame quits [Client Quit]
14:40:44kurkosdr quits [Remote host closed the connection]
15:07:57<@arkiver>posting in -bs since this concerns us now https://torrentfreak.com/sony-vs-quad9-court-hears-landmark-dns-piracy-blocking-case-230209/
15:11:11jacksonchen666 quits [Ping timeout: 265 seconds]
15:51:52<yano>interesting
15:55:47HackMii_ quits [Ping timeout: 276 seconds]
15:57:48HackMii_ (hacktheplanet) joins
16:27:25Island joins
16:43:18<xkey>arkiver: may I ask in what way the Quad9 case concerns you as archivists? honest question
16:48:05<@arkiver>xkey: we've recently started using it as resolver in various projects
16:48:16<xkey>ah got it, thanks
16:48:18<@arkiver>they have an unsecure (uncesored) server
16:48:28<@arkiver>and it's great they're fighting against censoring
16:48:33<xkey>for sure
17:11:41jacksonchen666 (jacksonchen666) joins
17:20:51girst quits [Quit: ZNC 1.8.2 - https://znc.in]
17:25:07girst (girst) joins
17:34:22spirit quits [Client Quit]
18:25:04jacksonchen666 quits [Client Quit]
19:00:24<myself>pabs: someone was working on archiving thingiverse: https://www.reddit.com/r/DataHoarder/comments/ihvrwc/update_on_the_thingiverse_archive/ and I don't know if that's the same person as https://thingiverse.nikow.pl/
19:02:52DLoader_ joins
19:06:25DLoader quits [Ping timeout: 252 seconds]
19:06:35DLoader_ is now known as DLoader
19:12:05sec^nd quits [Ping timeout: 276 seconds]
19:13:23mut4ntmonkey quits [Ping timeout: 276 seconds]
19:16:19sec^nd (second) joins
19:27:53mut4ntmonkey (mutantmonkey) joins
19:35:09umgr036 joins
19:45:08T31M quits [Quit: ZNC - https://znc.in]
19:45:10xkey quits [Quit: xkey]
19:45:26T31M joins
19:49:32xkey (xkey) joins
19:50:45<h2ibot>Bzc6p edited Indafotó (+18, Updated numbers and started archiving.): https://wiki.archiveteam.org/?diff=49451&oldid=49361
19:58:21<fishingforsoup>I'm so confused.
19:58:53<fishingforsoup>According to Tech Robo's site, this video is archived. But I click the link and it says the page isn't there.
19:58:54<fishingforsoup>https://www.youtube.com/watch?v=_-LmHJ6ya2Q
20:01:20<pokechu22>If it was only archived very recently, it probably hasn't been processed fully - check again in a few days maybe? (I'm not super familiar with the process for videos specifically; this is general advice)
20:01:55<fishingforsoup>Ah.
20:02:09<fishingforsoup>I submitted it here, but it was still premiering.
20:46:04<sidpatchy>https://thewiiu.com/ this forum seems to have been almost completely forgotten. Each webpage takes seconds (or more) to load. Archival is going to be a long process...
21:26:39Stiletto quits [Remote host closed the connection]
21:45:47<tomodachi94>joepie91 🏳️‍🌈: They
21:46:06<tomodachi94>* 🏳️‍🌈: They're not extremely hostile, but they allow mod authors to block automated downloads from unofficial clients.
21:46:41<joepie91|m>is that an actual block from a technical perspective, or rather an exemption from some API that 'unofficial clients' use?
21:48:11<pokechu22>I remember seeing 403s on curseforge from unrelated jobs via archivebot (I think)
21:48:29<joepie91|m>hm
21:48:39<pokechu22>I think they use cloudflare though
21:49:32<tomodachi94>joepie91|m: I'm not entirely sure. Their docs aren't very clear about it, I just remember having manually download some mods when I was installing a modpack.
21:49:54<tomodachi94>pokechu22: Yep, they use Cloudflare
21:50:58Stiletto joins
21:51:58eroc1990 quits [Ping timeout: 252 seconds]
21:52:19eroc1990 (eroc1990) joins
21:58:28<pokechu22>Unrelated to that... I was looking at the magportal.com archivebot job (https://archive.fart.website/archivebot/viewer/job/9mcwd) and noticed that some things are broken: https://web.archive.org/web/*/http://magportal.com/nr/rdir.php?w=464470 doesn't show anything (but https://web.archive.org/web/2/http://magportal.com/nr/rdir.php?w=464470 works fine), and the same applies
21:58:30<pokechu22>for some redirect destinations: https://transfer.archivete.am/inline/Yr4YO/magportal.com_broken_wbm_notes.txt - it looks like no data's actually missing, but something very weird is happening here
23:03:16hitgrr8 quits [Client Quit]
23:05:49fishingforsoup quits [Read error: Connection reset by peer]
23:26:22<h2ibot>TheTechRobo edited Twitch.tv (+11, clarity about #burnthetwitch and link to github): https://wiki.archiveteam.org/?diff=49452&oldid=49128
23:35:28Arcorann_ joins