| 00:02:57 | | fishingforsoup_ joins |
| 00:06:49 | | fishingforsoup quits [Ping timeout: 252 seconds] |
| 00:46:56 | | BlueMaxima joins |
| 00:47:20 | | benjinsm is now known as benjins |
| 00:47:21 | | benjins is now authenticated as benjins |
| 00:58:31 | | tzt quits [Ping timeout: 252 seconds] |
| 01:24:22 | | yawkat quits [Ping timeout: 252 seconds] |
| 01:26:37 | | ThreeHM quits [Ping timeout: 265 seconds] |
| 01:28:27 | | ThreeHM (ThreeHeadedMonkey) joins |
| 01:33:02 | | yawkat (yawkat) joins |
| 01:42:56 | <myself> | Is there a best-practices guide somewhere for archiving crawler-resistant (heavily rate-limited) sites? I'm thinking specifically of https://wiki.seeedstudio.com/ and https://www.waveshare.com/wiki/Main_Page which have a ton of electronics resources that would suck to lose, and I'd like to either get AT chewing on them, or refine my own httrack-fu. |
| 01:48:02 | | monoxane quits [Quit: Ping timeout (120 seconds)] |
| 01:48:21 | | monoxane (monoxane) joins |
| 01:50:48 | <myself> | (IA seems to have archived a whole lot of 500 errors in the above, so that's not terribly helpful.) |
| 01:52:46 | <pokechu22> | The latter one is cloudflare, not sure about the first one. But the latter is also mediawiki so I can try saving it with wikiteam tools |
| 02:01:48 | <pokechu22> | Request-rate: 5/60 is pretty bad, though (5 pages every minute, so delay 12 seconds). The mediawiki API lets you export 50 revisions/API call and there are 55,948 revisions, so that's 224 minutes or 3.7 hours to save everything (not counting images), though |
| 02:05:35 | <pokechu22> | (the number of revisions is from https://www.waveshare.com/wiki/Special:Statistics) |
| 02:33:15 | <@hook54321> | Thibaultmol: it looks like they want the data deleted completely after it's shut down, so it's not likely |
| 03:20:26 | | lennier1 quits [Client Quit] |
| 03:20:48 | | lennier1 (lennier1) joins |
| 03:47:27 | | aismallard quits [Quit: see ya] |
| 03:47:59 | | aismallard joins |
| 03:54:38 | | Ketchup902 (Ketchup901) joins |
| 03:56:37 | | Ketchup901 quits [Client Quit] |
| 04:17:58 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 04:33:29 | | fishingforsoup_ quits [Client Quit] |
| 04:33:40 | | fishingforsoup joins |
| 04:34:10 | | fishingforsoup is now authenticated as fishingforsoup |
| 05:01:34 | | Tomo|m is now known as tomodachi94 |
| 05:05:05 | | tomodachi94 quits [Client Quit] |
| 05:05:10 | | tomodachi94 joins |
| 05:21:16 | <Jake> | https://github.com/facebook/zstd/releases/tag/v1.5.4 |
| 05:28:33 | <@JAA> | Oh, the dict compression improvements look nice. |
| 05:35:54 | <@JAA> | Ah, the largest performance increases are for levels 1-4. Not sure what we use actually. Can't find it in the code, only for the skippable dict frame. |
| 05:36:25 | <@JAA> | arkiver: ^ |
| 05:38:27 | | user_ joins |
| 05:42:19 | | user__ quits [Ping timeout: 252 seconds] |
| 06:00:51 | | tzt (tzt) joins |
| 06:04:08 | | hackbug quits [Ping timeout: 252 seconds] |
| 06:17:18 | <pokechu22> | Do warc files record the certificate in any way? e.g. to establish a link from https://ip-104-243-80-232.user.start.ca/ to waterfallsofontario.com |
| 06:18:17 | <@JAA> | No |
| 07:17:05 | | mutantm0nkey quits [Ping timeout: 276 seconds] |
| 07:17:43 | | mutantm0nkey (mutantmonkey) joins |
| 07:29:18 | | Icyelut quits [Quit: bye] |
| 07:34:55 | <pabs> | do we have a way to archive Thingiverse pages? they are JS-heavy |
| 07:34:58 | | Island quits [Read error: Connection reset by peer] |
| 08:02:46 | | hitgrr8 joins |
| 09:14:52 | | monoxane quits [Client Quit] |
| 09:16:42 | | monoxane (monoxane) joins |
| 09:46:00 | <h2ibot> | Bzc6p edited Kepfeltoltes.eu (-5, /* Archiving */ 2022 images saved): https://wiki.archiveteam.org/?diff=49447&oldid=49401 |
| 09:47:00 | <h2ibot> | Bzc6p edited Template:Hungarian websites (+7, eOldal closed): https://wiki.archiveteam.org/?diff=49448&oldid=48056 |
| 09:55:48 | | HackMii_ quits [Remote host closed the connection] |
| 09:56:11 | | second (second) joins |
| 09:56:23 | | mut4ntmonkey (mutantmonkey) joins |
| 09:57:19 | | abirkill quits [Remote host closed the connection] |
| 09:59:35 | | mutantm0nkey quits [Ping timeout: 276 seconds] |
| 10:00:15 | | sec^nd quits [Ping timeout: 276 seconds] |
| 10:00:16 | | second is now known as sec^nd |
| 10:01:41 | | abirkill (abirkill) joins |
| 10:02:12 | | HackMii_ (hacktheplanet) joins |
| 10:05:01 | | user_ quits [Remote host closed the connection] |
| 10:07:41 | | Ketchup902 quits [Remote host closed the connection] |
| 10:08:12 | | Ketchup901 (Ketchup901) joins |
| 10:22:06 | <h2ibot> | Bzc6p edited Network.hu (+9, /* Archiving */ let's continue archiving with…): https://wiki.archiveteam.org/?diff=49449&oldid=46294 |
| 12:00:21 | <h2ibot> | JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=49450&oldid=49388 |
| 12:57:30 | | hackbug (hackbug) joins |
| 13:12:35 | | sonick (sonick) joins |
| 13:34:52 | | adia quits [Client Quit] |
| 13:36:07 | | adia (adia) joins |
| 13:54:06 | | Megame (Megame) joins |
| 14:19:30 | | Arcorann_ quits [Ping timeout: 252 seconds] |
| 14:20:05 | | kurkosdr joins |
| 14:20:44 | <kurkosdr> | JAA |
| 14:23:25 | | kalma joins |
| 14:23:31 | | kalma quits [Remote host closed the connection] |
| 14:24:56 | <kurkosdr> | @JAAPromised upload:https://archive.org/details/ftp.pctvsystems.comhttps://ia804706.us.archive.org/view_archive.php?archive=/31/items/ftp.pctvsystems.com/ftp.pctvsystems.com.tarHad to drop off and wait until I get home to do it, instead of doing it from the office, because I didn't want to attract the ire of the office's ITSUP, so it took a little |
| 14:24:57 | <kurkosdr> | longer. |
| 14:25:21 | <kurkosdr> | Reposting links: |
| 14:25:23 | <kurkosdr> | https://archive.org/details/ftp.pctvsystems.com |
| 14:25:45 | <kurkosdr> | https://ia804706.us.archive.org/view_archive.php?archive=/31/items/ftp.pctvsystems.com/ftp.pctvsystems.com.tar |
| 14:25:57 | <kurkosdr> | (for some reason multi-line text isn't supported) |
| 14:28:40 | | katocala quits [Ping timeout: 252 seconds] |
| 14:29:18 | | katocala joins |
| 14:36:20 | | Megame quits [Client Quit] |
| 14:40:44 | | kurkosdr quits [Remote host closed the connection] |
| 15:07:57 | <@arkiver> | posting in -bs since this concerns us now https://torrentfreak.com/sony-vs-quad9-court-hears-landmark-dns-piracy-blocking-case-230209/ |
| 15:11:11 | | jacksonchen666 quits [Ping timeout: 265 seconds] |
| 15:48:01 | | katocala is now authenticated as katocala |
| 15:51:52 | <yano> | interesting |
| 15:55:47 | | HackMii_ quits [Ping timeout: 276 seconds] |
| 15:57:48 | | HackMii_ (hacktheplanet) joins |
| 16:27:25 | | Island joins |
| 16:43:18 | <xkey> | arkiver: may I ask in what way the Quad9 case concerns you as archivists? honest question |
| 16:48:05 | <@arkiver> | xkey: we've recently started using it as resolver in various projects |
| 16:48:16 | <xkey> | ah got it, thanks |
| 16:48:18 | <@arkiver> | they have an unsecure (uncesored) server |
| 16:48:28 | <@arkiver> | and it's great they're fighting against censoring |
| 16:48:33 | <xkey> | for sure |
| 17:11:41 | | jacksonchen666 (jacksonchen666) joins |
| 17:20:51 | | girst quits [Quit: ZNC 1.8.2 - https://znc.in] |
| 17:25:07 | | girst (girst) joins |
| 17:34:22 | | spirit quits [Client Quit] |
| 18:25:04 | | jacksonchen666 quits [Client Quit] |
| 18:30:39 | | tomodachi94 is now authenticated as tomodachi94 |
| 19:00:24 | <myself> | pabs: someone was working on archiving thingiverse: https://www.reddit.com/r/DataHoarder/comments/ihvrwc/update_on_the_thingiverse_archive/ and I don't know if that's the same person as https://thingiverse.nikow.pl/ |
| 19:02:52 | | DLoader_ joins |
| 19:06:25 | | DLoader quits [Ping timeout: 252 seconds] |
| 19:06:35 | | DLoader_ is now known as DLoader |
| 19:12:05 | | sec^nd quits [Ping timeout: 276 seconds] |
| 19:13:23 | | mut4ntmonkey quits [Ping timeout: 276 seconds] |
| 19:16:19 | | sec^nd (second) joins |
| 19:27:53 | | mut4ntmonkey (mutantmonkey) joins |
| 19:35:09 | | umgr036 joins |
| 19:45:08 | | T31M quits [Quit: ZNC - https://znc.in] |
| 19:45:10 | | xkey quits [Quit: xkey] |
| 19:45:26 | | T31M joins |
| 19:45:28 | | T31M is now authenticated as T31M |
| 19:49:32 | | xkey (xkey) joins |
| 19:50:45 | <h2ibot> | Bzc6p edited Indafotó (+18, Updated numbers and started archiving.): https://wiki.archiveteam.org/?diff=49451&oldid=49361 |
| 19:58:21 | <fishingforsoup> | I'm so confused. |
| 19:58:53 | <fishingforsoup> | According to Tech Robo's site, this video is archived. But I click the link and it says the page isn't there. |
| 19:58:54 | <fishingforsoup> | https://www.youtube.com/watch?v=_-LmHJ6ya2Q |
| 20:01:20 | <pokechu22> | If it was only archived very recently, it probably hasn't been processed fully - check again in a few days maybe? (I'm not super familiar with the process for videos specifically; this is general advice) |
| 20:01:55 | <fishingforsoup> | Ah. |
| 20:02:09 | <fishingforsoup> | I submitted it here, but it was still premiering. |
| 20:46:04 | <sidpatchy> | https://thewiiu.com/ this forum seems to have been almost completely forgotten. Each webpage takes seconds (or more) to load. Archival is going to be a long process... |
| 21:26:39 | | Stiletto quits [Remote host closed the connection] |
| 21:45:47 | <tomodachi94> | joepie91 🏳️🌈: They |
| 21:46:06 | <tomodachi94> | * 🏳️🌈: They're not extremely hostile, but they allow mod authors to block automated downloads from unofficial clients. |
| 21:46:41 | <joepie91|m> | is that an actual block from a technical perspective, or rather an exemption from some API that 'unofficial clients' use? |
| 21:48:11 | <pokechu22> | I remember seeing 403s on curseforge from unrelated jobs via archivebot (I think) |
| 21:48:29 | <joepie91|m> | hm |
| 21:48:39 | <pokechu22> | I think they use cloudflare though |
| 21:49:32 | <tomodachi94> | joepie91|m: I'm not entirely sure. Their docs aren't very clear about it, I just remember having manually download some mods when I was installing a modpack. |
| 21:49:54 | <tomodachi94> | pokechu22: Yep, they use Cloudflare |
| 21:50:58 | | Stiletto joins |
| 21:51:58 | | eroc1990 quits [Ping timeout: 252 seconds] |
| 21:52:19 | | eroc1990 (eroc1990) joins |
| 21:58:28 | <pokechu22> | Unrelated to that... I was looking at the magportal.com archivebot job (https://archive.fart.website/archivebot/viewer/job/9mcwd) and noticed that some things are broken: https://web.archive.org/web/*/http://magportal.com/nr/rdir.php?w=464470 doesn't show anything (but https://web.archive.org/web/2/http://magportal.com/nr/rdir.php?w=464470 works fine), and the same applies |
| 21:58:30 | <pokechu22> | for some redirect destinations: https://transfer.archivete.am/inline/Yr4YO/magportal.com_broken_wbm_notes.txt - it looks like no data's actually missing, but something very weird is happening here |
| 23:03:16 | | hitgrr8 quits [Client Quit] |
| 23:05:49 | | fishingforsoup quits [Read error: Connection reset by peer] |
| 23:26:22 | <h2ibot> | TheTechRobo edited Twitch.tv (+11, clarity about #burnthetwitch and link to github): https://wiki.archiveteam.org/?diff=49452&oldid=49128 |
| 23:35:28 | | Arcorann_ joins |