00:14:54<pokechu22>I'm pretty sure I ran that a bit ago, and when I did run it there hadn't been any new content in a long time, hmm
00:17:27<pokechu22>https://archive.org/details/wiki-wikimaemoorg_202301 - downloaded near the end of January, and it looks like there were 242 edits in all of 2022 and 9 in January. Not sure if it's worth redownloading now or not (especially if the only new changes are vandalism)
00:18:17<pabs>probably not
00:18:20<pabs>https://wiki.linuxquestions.org/
00:35:34<pokechu22>also done a few months ago: https://archive.org/details/wiki-wikilinuxquestionsorg
00:45:17<pabs>did the bootstrappable miraheze wiki get done ?
00:51:35<pokechu22>I haven't done https://wiki.bootstrapping.world yet - not sure if that's what you're referring to or not
00:52:17<pabs>bootstrapping.miraheze.org
01:37:14dave (dave) joins
02:20:26TheTechRobo quits [Client Quit]
03:03:10TheTechRobo (TheTechRobo) joins
03:32:32<Bedivere>so miraheze is going offline?
04:09:07<pabs>yes
04:22:17hitgrr8 joins
04:32:10<pokechu22>Pabs: I've downloaded all of the cppreference languages (including https://upload.cppreference.com/ and https://sq.cppreference.com/ which aren't listed on the english main page). I haven't uploaded them to archive.org yet though
04:46:51systwi quits [Read error: Connection reset by peer]
04:46:52systwi_ (systwi) joins
04:59:10<pabs>HN thread about miraheze https://news.ycombinator.com/item?id=36362547
04:59:59<pabs>wow, some bad sysadmin described there...
05:01:48<pabs>also "As the former CFO and a former SRE for Miraheze, I'm going to try to save the farm"
05:05:45<pabs>a few wikis linked in that thread too
05:09:42<pabs>and theres a new thing similar to miraheze called WikiTide
05:34:38<Nemo_bis>Miraheze has always been about taking on much more work than they could possibly sustain with volunteers in the long term
05:35:14<Nemo_bis>pokechu22: can you try dumping https://fiction.miraheze.org/wiki/Accueil ? towards the end (?) of the dump I'm getting ERROR: HTTPSConnectionPool(host='fiction.miraheze.org', port=443): Read timed out. (read timeout=30)
05:35:45<pabs>https://dokuwiki.tachtler.net
05:35:53<Nemo_bis>arkiver: please be careful with the warriors; HTML download produces much more load than our XML exports and we don't want to bring the farm down
05:36:32<pokechu22>Sure, I'll try it
05:38:09<Nemo_bis>From my testing, 20 concurrent launcher.py work fine now but at some point above that I got HTTP 429 on the API. I'm not sure we *always* handle retries correctly.
05:38:23<Nemo_bis>Thanks
05:39:19<pokechu22>IIRC we don't retry 429s or 403s
05:40:21<pokechu22>ah, no, handleStatusCode doesn't do anything special on 403s, but 404, 429, and 5xx all result in it aborting
06:14:45<Nemo_bis>What is this https://soshirenmei.miraheze.org/wiki/Bienvenue_au_FNHWiki :D
06:16:38<Nemo_bis>For some reason, checkalive.py thinks it's HTTP. The wiki is fine though.
07:13:02<Exorcism|m><pabs> "https://dokuwiki.tachtler.net" <- https://archive.org/details/wiki-dokuwiki.tachtler.net-20230617 _I haven't finished uploading because IA's bandwidth is a bit crap_
07:13:17<Exorcism|m>s/'s//
08:16:46sepro quits [Ping timeout: 265 seconds]
08:24:21<Nemo_bis>pokechu22: could this be an unhandled HTTP 429? https://github.com/WikiTeam/wikiteam/issues/467
08:51:32mattx433 (mattx433) joins
09:29:26<Nemo_bis>An easy way to help is to run checkalive.py or checkalive.pl and manually open a random sample of wikis to see what's there https://github.com/WikiTeam/wikiteam/issues/465#issuecomment-1595679475
09:30:16<Nemo_bis>A lowly `shuf -n 20` can do wonders https://github.com/WikiTeam/wikiteam/commit/1b02cee1d52f481a9cb419a3c3628f12f9c61df4
09:30:49<Nemo_bis>(For those who aren't on GNU/Linux and don't know what I'm talking about: https://www.gnu.org/software/coreutils/manual/html_node/shuf-invocation.html )
09:31:45<Nemo_bis>(Yes, there's also some OpenBSD shuf it seems. No idea how widely available.)
09:43:15systwi_ quits [Read error: Connection reset by peer]
09:44:25systwi (systwi) joins
10:16:38Gereon quits [Ping timeout: 265 seconds]
10:28:23Gereon (Gereon) joins
10:44:43<@arkiver>Nemo_bis: yes, so we'll be careful
10:45:02<@arkiver>preserving the URLs in the Wayback Machine is important as well
10:45:10<@arkiver>but i agree it would be good to ensure an XML dump is made first
10:45:16<@arkiver>any idea when XML dumps might be done?
11:30:02TastyWiener95 quits [Ping timeout: 252 seconds]
11:30:56TastyWiener95 (TastyWiener95) joins
13:05:30that_lurker quits [Client Quit]
13:05:56that_lurker (that_lurker) joins
17:47:12<pokechu22>Nemo_bis: Possibly, in https://github.com/WikiTeam/wikiteam/blob/bf45fdec928aaa6dd579b602c67cb8165796c7fc/dumpgenerator.py#L826-L832 if site.api raises HTTPError and the code is not 405 then execution falls through and that would occur. Maybe the continue should be deindented by 1? Something looks wrong there
20:36:14sepro (sepro) joins
21:17:05hitgrr8 quits [Client Quit]
21:51:44Sir_Bedivere joins
21:54:17Bedivere quits [Ping timeout: 252 seconds]
22:57:50Naruyoko joins
23:41:28mikolaj|m joins