02:33:20 | | Terbium quits [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.] |
02:33:52 | | Terbium joins |
03:54:37 | <pabs> | https://wiki.calafou.org |
04:29:37 | <pokechu22> | ah, that's another one where it detects api.php as http://wiki.calafou.org/api.php and index.php as http://wiki.calafou.org/index.php, but those redirect to https, which does something to POST requests resulting in "ATTENTION: This wiki does not allow some parameters in Special:Export, therefore pages with large histories may be truncated". I really need to look into fixing that |
04:29:39 | <pokechu22> | (but `--index https://wiki.calafou.org/index.php --api https://wiki.calafou.org/api.php` should work in the meantime) |
04:30:58 | <pokechu22> | hmm, no, that seems to break in a different way in this case... dunno what's going on... |
04:57:04 | <pokechu22> | ... huh, the site gives invalid XML? That doesn't make any sense and doesn't match what I'm seeing in-browser... |
05:01:17 | <pokechu22> | ok, yeah, the site is scuffed - it yields </revision></page></mediawiki> for a normal post to https://wiki.calafou.org/index.php/Special:Export but doesn't like a POST to https://wiki.calafou.org/index.php?action=submit&offset=1&limit=1000&pages=%27%27%27_mapas_de_flujos%27%27%27&title=Special%3AExport (why do we POST like that instead of including data in the request body?) |
05:02:05 | <pokechu22> | (and yields a </revision></mediawiki> in that case) |
05:04:03 | <pokechu22> | Looks like this is just a regression that happened at some point in the past and was later fixed - I'm surprised I never ran into it before now though: https://phabricator.wikimedia.org/T207974 |
05:18:26 | <yzqzss|m> | pokechu22: Same as https://github.com/saveweb/wikiteam3/issues/6 |
05:19:00 | <pokechu22> | Yeah, --xmlrevisions seems to be working fine here |
05:23:48 | <pokechu22> | hmm, grep -c '<revision>' *.xml gives 3744 while Special:Statistics says there should be 4,981 revisions - that's worrying |
05:25:34 | <yzqzss|m> | grep -E '<revision(.*?)>' *.xml |
05:25:41 | | hitgrr8 joins |
05:32:11 | <pokechu22> | `grep -cE '<revision(.*?)>' *.xml` gives 3744 too |
05:32:33 | <pokechu22> | as does `grep -c '<revision' *.xml` |
05:32:57 | <pokechu22> | This isn't the only wiki where the number seems to be a bit lower, but I assumed for others it might be lower due to deleted articles or something, which probably isn't the case here? |
05:59:11 | <yzqzss|m> | I also got 3744 by using https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/python3/tools/xml2titles.py . |
05:59:23 | <yzqzss|m> | Perhaps the webmaster forgot to run initSiteStats.php after some maintenance operations. |
13:31:05 | | Ryz quits [Ping timeout: 252 seconds] |
13:39:49 | | Ryz (Ryz) joins |
17:23:35 | <pokechu22> | If I change getXMLPageCore to add `xml = xml.replace('</revision>\n</mediawiki>', '</revision>\n </page>\n</mediawiki>')` then I can dump using Special:Export and ended up with 3745 revisions (so, 1 extra revision, probably because someone edited). So this is probably just a case of incorrect stats. |
19:58:52 | | Bedivere joins |
20:00:29 | | Sir_Bedivere quits [Ping timeout: 252 seconds] |
21:39:34 | | hitgrr8 quits [Client Quit] |
22:59:14 | | TheTechRobo quits [Read error: Connection reset by peer] |
23:01:13 | | TheTechRobo (TheTechRobo) joins |