00:46:29 | <pabs> | I'm guessing https://hiddenpalace.org/ is already done ? :) |
00:48:44 | <pokechu22> | I haven't run it or tcrf.net |
00:49:18 | <pokechu22> | but... take a look at https://hiddenpalace.org/Special:MediaStatistics |
00:49:52 | <pokechu22> | pretty sure most content is also mirrored on archive.org, but I haven't looked into it in detail and don't plan on doing so myself |
00:55:06 | <pokechu22> | see also https://tcrf.net/Special:MediaStatistics which is large but at least in the realm of _someone_ doing it (not me) |
01:01:52 | <pabs> | egads |
01:46:56 | <balrog> | is there a way to submit a wiki for archival? |
01:47:07 | <balrog> | we need https://www.theiphonewiki.com since it's likely to go down soon |
01:47:27 | <balrog> | see: https://twitter.com/zhuowei/status/1638356249262583810 |
02:01:37 | <michaelblob_> | pokechu22: i can run tcrf.net and theiphonewiki.com but hiddenpalace.org is kinda out of my (easily doable) bandwidth |
02:01:54 | <michaelblob_> | not sure if IA will appreciate a 4TB wiki dump lol |
02:07:20 | <@JAA> | Not even possible, you'd have to split it into (at least) 4 items. |
02:09:03 | <michaelblob_> | would be interesting to engineer a way to dump it but rip disk space |
02:24:18 | <pokechu22> | balrog: I'll run that soon, though the tweet you linked was deleted |
02:24:33 | <pokechu22> | oh, michaelblob_ already is doing it |
02:25:05 | <pokechu22> | I'm pretty sure the files uploaded to hiddenpalace.org are already on archive.org so it's not too much of a concern (plus one item per file probably makes more sense in that case) |
02:25:40 | <pabs> | balrog: should the iphone wiki get an AB too? |
02:48:51 | <qwertyasdfuiopghjkl> | pokechu22: Screenshot of the tweet: https://transfer.archivete.am/inline/Ewp2A/Screenshot_2023-03-22.png |
03:08:06 | <balrog> | pabs: probably |
03:09:48 | <pokechu22> | I like how they mention that but http://iphonedev.wiki/ doesn't exist (nor does http://iphonedevwiki.net/ as linked on theiphonewiki.com) |
03:23:52 | <qwertyasdfuiopghjkl> | It did exist a week ago: https://web.archive.org/web/20230314233239/https://iphonedev.wiki/index.php/Main_Page |
03:46:37 | | systwi__ (systwi) joins |
03:46:40 | | systwi quits [Ping timeout: 252 seconds] |
03:51:00 | | systwi__ is now known as systwi |
06:17:07 | | hitgrr8 joins |
09:47:35 | | hitgrr8 quits [Client Quit] |
09:47:35 | | qwertyasdfuiopghjkl quits [Client Quit] |
09:48:36 | | hitgrr8 joins |
09:59:22 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
13:31:12 | | Sir_Bedivere joins |
13:35:10 | | Bedivere quits [Ping timeout: 252 seconds] |
14:50:53 | | rewby|backup (rewby) joins |
14:50:53 | | @ChanServ sets mode: +o rewby|backup |
16:27:04 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
17:58:55 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
20:19:09 | | balrog quits [Client Quit] |
20:19:50 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
20:24:03 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
20:24:19 | <michaelblob_> | pokechu22: i'm running into issues with theiphonewiki.com and wiki.ippk.ru, maybe you could give them a try |
20:24:40 | <michaelblob_> | wiki.ippk.ru i get to /index.php?title=Special%3AExport&pages=%D0%A6%D0%B8%D1%84%D1%80%D0%BE%D0%B2%D1%8B%D0%B5_%D0%BA%D0%B0%D0%BD%D0%B8%D0%BA%D1%83%D0%BB%D1%8B_2009._%D0%97%D0%B0%D0%BE%D1%87%D0%BD%D1%8B%D0%B9_%D1%82%D1%83%D1%80&action=submit&offset=1&limit=1000 |
20:25:08 | <michaelblob_> | which resolves when i copy it into a browser but times out in cli |
20:25:28 | <michaelblob_> | theiphonewiki just borks after a couple cycles with xmlrevisions :/ |
20:26:16 | <pokechu22> | based on the 403s I'm seeing on archivebot there might be rate-limiting? |
20:26:24 | | balrog (balrog) joins |
20:28:34 | <michaelblob_> | for theiphonewiki? that would make sense |
20:29:56 | <pokechu22> | You can try something like --delay 1 (which I think is 1 second). For 117,232 revisions that'd be 2344.64 seconds of delay (in practice more because of transfer time and because of how xmlrevisions interacts with namespaces) |
20:30:56 | <michaelblob_> | i started it back up with --delay 2 |
20:31:01 | <michaelblob_> | hopefully it goes smoothly this time |
20:33:50 | <pokechu22> | The other thing to note is that the delay also affects downloading images, where you'd end up getting 2 seconds for metadata then 2 seconds for the images themselves; at 1,161 images and 2s delay that's 4644 seconds as well (which seems very close to the delay for revisions themselves). Not much you can do about that though, just know that it'll end up being pretty slow there |
20:34:52 | <michaelblob_> | yeah i got blocked even with delay of 2s |
20:42:13 | <pokechu22> | You can try increasing it further, possibly 4s? |
20:42:19 | <pokechu22> | They don't seem to be using cloudflare |
20:45:41 | <pokechu22> | Oh, also, the process where it gets the page list *isn't* affected properly by delay, so rate-limiting can get pretty upset at that |
20:46:12 | <pokechu22> | I have a workaround for that but it's a bit of a mess |
20:52:19 | <michaelblob_> | i get blocked by cloudfront :/ |
21:10:38 | <pokechu22> | I also note the rate-limiting on archivebot was only affecting https://www.theiphonewiki.com/w/index.php and not https://www.theiphonewiki.com/wiki/Special:WhatLinksHere so it might be possible to skip it by using https://www.theiphonewiki.com/wiki/Special:Export instead of https://www.theiphonewiki.com/w/index.php?title=Special:Export - I'll experiment |
21:12:11 | <pokechu22> | looks like it got mad after 143 instances of 1-2s, hmm |
21:53:10 | <michaelblob_> | alright delay 5s seems to still be running |
21:53:19 | <michaelblob_> | i'll probably just leave it running overnight and check tomorrow |
21:53:26 | <michaelblob_> | hopefully this is slow enough to not get booted |
21:58:49 | <michaelblob_> | that being said, tcrf has 400k images :( |
21:58:52 | <pokechu22> | Alright, good luck |
21:59:11 | <pokechu22> | And, yeah, TCRF is big - you did see https://tcrf.net/Special:MediaStatistics beforehand, right? |
21:59:27 | <michaelblob_> | yep i have enough space to hold it all but even 0.5s delay for 400k images is gonna take days |
22:00:56 | <pokechu22> | https://ru.rodovid.org/ took about a week for me to download (though that also had other jank associated with it) |
22:10:12 | | hitgrr8 quits [Client Quit] |
23:08:21 | | Bedivere joins |
23:10:28 | | Sir_Bedivere quits [Ping timeout: 252 seconds] |
23:42:14 | | HackMii quits [Ping timeout: 276 seconds] |
23:42:23 | | HackMii (hacktheplanet) joins |