00:14:54 | <pokechu22> | I'm pretty sure I ran that a bit ago, and when I did run it there hadn't been any new content in a long time, hmm |
00:17:27 | <pokechu22> | https://archive.org/details/wiki-wikimaemoorg_202301 - downloaded near the end of January, and it looks like there were 242 edits in all of 2022 and 9 in January. Not sure if it's worth redownloading now or not (especially if the only new changes are vandalism) |
00:18:17 | <pabs> | probably not |
00:18:20 | <pabs> | https://wiki.linuxquestions.org/ |
00:35:34 | <pokechu22> | also done a few months ago: https://archive.org/details/wiki-wikilinuxquestionsorg |
00:45:17 | <pabs> | did the bootstrappable miraheze wiki get done ? |
00:51:35 | <pokechu22> | I haven't done https://wiki.bootstrapping.world yet - not sure if that's what you're referring to or not |
00:52:17 | <pabs> | bootstrapping.miraheze.org |
01:37:14 | | dave (dave) joins |
02:20:26 | | TheTechRobo quits [Client Quit] |
03:03:10 | | TheTechRobo (TheTechRobo) joins |
03:32:32 | <Bedivere> | so miraheze is going offline? |
04:09:07 | <pabs> | yes |
04:22:17 | | hitgrr8 joins |
04:32:10 | <pokechu22> | Pabs: I've downloaded all of the cppreference languages (including https://upload.cppreference.com/ and https://sq.cppreference.com/ which aren't listed on the english main page). I haven't uploaded them to archive.org yet though |
04:46:51 | | systwi quits [Read error: Connection reset by peer] |
04:46:52 | | systwi_ (systwi) joins |
04:59:10 | <pabs> | HN thread about miraheze https://news.ycombinator.com/item?id=36362547 |
04:59:59 | <pabs> | wow, some bad sysadmin described there... |
05:01:48 | <pabs> | also "As the former CFO and a former SRE for Miraheze, I'm going to try to save the farm" |
05:05:45 | <pabs> | a few wikis linked in that thread too |
05:09:42 | <pabs> | and theres a new thing similar to miraheze called WikiTide |
05:34:38 | <Nemo_bis> | Miraheze has always been about taking on much more work than they could possibly sustain with volunteers in the long term |
05:35:14 | <Nemo_bis> | pokechu22: can you try dumping https://fiction.miraheze.org/wiki/Accueil ? towards the end (?) of the dump I'm getting ERROR: HTTPSConnectionPool(host='fiction.miraheze.org', port=443): Read timed out. (read timeout=30) |
05:35:45 | <pabs> | https://dokuwiki.tachtler.net |
05:35:53 | <Nemo_bis> | arkiver: please be careful with the warriors; HTML download produces much more load than our XML exports and we don't want to bring the farm down |
05:36:32 | <pokechu22> | Sure, I'll try it |
05:38:09 | <Nemo_bis> | From my testing, 20 concurrent launcher.py work fine now but at some point above that I got HTTP 429 on the API. I'm not sure we *always* handle retries correctly. |
05:38:23 | <Nemo_bis> | Thanks |
05:39:19 | <pokechu22> | IIRC we don't retry 429s or 403s |
05:40:21 | <pokechu22> | ah, no, handleStatusCode doesn't do anything special on 403s, but 404, 429, and 5xx all result in it aborting |
06:14:45 | <Nemo_bis> | What is this https://soshirenmei.miraheze.org/wiki/Bienvenue_au_FNHWiki :D |
06:16:38 | <Nemo_bis> | For some reason, checkalive.py thinks it's HTTP. The wiki is fine though. |
07:13:02 | <Exorcism|m> | <pabs> "https://dokuwiki.tachtler.net" <- https://archive.org/details/wiki-dokuwiki.tachtler.net-20230617 _I haven't finished uploading because IA's bandwidth is a bit crap_ |
07:13:17 | <Exorcism|m> | s/'s// |
08:16:46 | | sepro quits [Ping timeout: 265 seconds] |
08:24:21 | <Nemo_bis> | pokechu22: could this be an unhandled HTTP 429? https://github.com/WikiTeam/wikiteam/issues/467 |
08:51:32 | | mattx433 (mattx433) joins |
09:29:26 | <Nemo_bis> | An easy way to help is to run checkalive.py or checkalive.pl and manually open a random sample of wikis to see what's there https://github.com/WikiTeam/wikiteam/issues/465#issuecomment-1595679475 |
09:30:16 | <Nemo_bis> | A lowly `shuf -n 20` can do wonders https://github.com/WikiTeam/wikiteam/commit/1b02cee1d52f481a9cb419a3c3628f12f9c61df4 |
09:30:49 | <Nemo_bis> | (For those who aren't on GNU/Linux and don't know what I'm talking about: https://www.gnu.org/software/coreutils/manual/html_node/shuf-invocation.html ) |
09:31:45 | <Nemo_bis> | (Yes, there's also some OpenBSD shuf it seems. No idea how widely available.) |
09:43:15 | | systwi_ quits [Read error: Connection reset by peer] |
09:44:25 | | systwi (systwi) joins |
10:16:38 | | Gereon quits [Ping timeout: 265 seconds] |
10:28:23 | | Gereon (Gereon) joins |
10:44:43 | <@arkiver> | Nemo_bis: yes, so we'll be careful |
10:45:02 | <@arkiver> | preserving the URLs in the Wayback Machine is important as well |
10:45:10 | <@arkiver> | but i agree it would be good to ensure an XML dump is made first |
10:45:16 | <@arkiver> | any idea when XML dumps might be done? |
11:30:02 | | TastyWiener95 quits [Ping timeout: 252 seconds] |
11:30:56 | | TastyWiener95 (TastyWiener95) joins |
13:05:30 | | that_lurker quits [Client Quit] |
13:05:56 | | that_lurker (that_lurker) joins |
17:47:12 | <pokechu22> | Nemo_bis: Possibly, in https://github.com/WikiTeam/wikiteam/blob/bf45fdec928aaa6dd579b602c67cb8165796c7fc/dumpgenerator.py#L826-L832 if site.api raises HTTPError and the code is not 405 then execution falls through and that would occur. Maybe the continue should be deindented by 1? Something looks wrong there |
20:36:14 | | sepro (sepro) joins |
21:17:05 | | hitgrr8 quits [Client Quit] |
21:51:44 | | Sir_Bedivere joins |
21:54:17 | | Bedivere quits [Ping timeout: 252 seconds] |
22:57:50 | | Naruyoko joins |
23:41:28 | | mikolaj|m joins |