02:33:20Terbium quits [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
02:33:52Terbium joins
03:54:37<pabs>https://wiki.calafou.org
04:29:37<pokechu22>ah, that's another one where it detects api.php as http://wiki.calafou.org/api.php and index.php as http://wiki.calafou.org/index.php, but those redirect to https, which does something to POST requests resulting in "ATTENTION: This wiki does not allow some parameters in Special:Export, therefore pages with large histories may be truncated". I really need to look into fixing that
04:29:39<pokechu22>(but `--index https://wiki.calafou.org/index.php --api https://wiki.calafou.org/api.php` should work in the meantime)
04:30:58<pokechu22>hmm, no, that seems to break in a different way in this case... dunno what's going on...
04:57:04<pokechu22>... huh, the site gives invalid XML? That doesn't make any sense and doesn't match what I'm seeing in-browser...
05:01:17<pokechu22>ok, yeah, the site is scuffed - it yields </revision></page></mediawiki> for a normal post to https://wiki.calafou.org/index.php/Special:Export but doesn't like a POST to https://wiki.calafou.org/index.php?action=submit&offset=1&limit=1000&pages=%27%27%27_mapas_de_flujos%27%27%27&title=Special%3AExport (why do we POST like that instead of including data in the request body?)
05:02:05<pokechu22>(and yields a </revision></mediawiki> in that case)
05:04:03<pokechu22>Looks like this is just a regression that happened at some point in the past and was later fixed - I'm surprised I never ran into it before now though: https://phabricator.wikimedia.org/T207974
05:18:26<yzqzss|m>pokechu22: Same as https://github.com/saveweb/wikiteam3/issues/6
05:19:00<pokechu22>Yeah, --xmlrevisions seems to be working fine here
05:23:48<pokechu22>hmm, grep -c '<revision>' *.xml gives 3744 while Special:Statistics says there should be 4,981 revisions - that's worrying
05:25:34<yzqzss|m>grep -E '<revision(.*?)>' *.xml
05:25:41hitgrr8 joins
05:32:11<pokechu22>`grep -cE '<revision(.*?)>' *.xml` gives 3744 too
05:32:33<pokechu22>as does `grep -c '<revision' *.xml`
05:32:57<pokechu22>This isn't the only wiki where the number seems to be a bit lower, but I assumed for others it might be lower due to deleted articles or something, which probably isn't the case here?
05:59:11<yzqzss|m>I also got 3744 by using https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/python3/tools/xml2titles.py .
05:59:23<yzqzss|m>Perhaps the webmaster forgot to run initSiteStats.php after some maintenance operations.
13:31:05Ryz quits [Ping timeout: 252 seconds]
13:39:49Ryz (Ryz) joins
17:23:35<pokechu22>If I change getXMLPageCore to add `xml = xml.replace('</revision>\n</mediawiki>', '</revision>\n </page>\n</mediawiki>')` then I can dump using Special:Export and ended up with 3745 revisions (so, 1 extra revision, probably because someone edited). So this is probably just a case of incorrect stats.
19:58:52Bedivere joins
20:00:29Sir_Bedivere quits [Ping timeout: 252 seconds]
21:39:34hitgrr8 quits [Client Quit]
22:59:14TheTechRobo quits [Read error: Connection reset by peer]
23:01:13TheTechRobo (TheTechRobo) joins