03:51:28TheTechRobo quits [Remote host closed the connection]
03:51:50TheTechRobo (TheTechRobo) joins
03:52:29Craigle quits [Quit: The Lounge - https://thelounge.chat]
03:54:13Craigle (Craigle) joins
04:07:58TheTechRobo quits [Remote host closed the connection]
04:08:20TheTechRobo (TheTechRobo) joins
04:32:24TheTechRobo quits [Remote host closed the connection]
04:32:50TheTechRobo (TheTechRobo) joins
09:39:05igloo22225 quits [Client Quit]
09:39:40igloo22225 (igloo22225) joins
09:44:56tech_exorcist (tech_exorcist) joins
12:07:59tech_exorcist quits [Remote host closed the connection]
12:08:24tech_exorcist (tech_exorcist) joins
12:38:50Iki1 joins
12:41:46Iki quits [Ping timeout: 240 seconds]
13:14:12Matthww1 quits [Ping timeout: 265 seconds]
13:29:48Matthww1 joins
16:04:16kdqep quits [Ping timeout: 240 seconds]
21:30:30qwertyasdfuiopghjkl joins
22:00:20kdqep (kdqep) joins
22:10:51<@JAA>Could someone dump Rodovid please? It consists of 25 wikis. Seven of them have been dumped before, but only once, a decade ago (Apr 2012)... Links are at the bottom of https://www.rodovid.org/
22:11:43<@JAA>It's an ancient version of MW, so that might cause some troubles.
22:12:02tech_exorcist quits [Client Quit]
22:12:04<@JAA>1.9.3 from 2007
22:14:41<@JAA>Rate limiting is in place as well, it appears.
22:39:40<pokechu22>"Crawl-delay:9" according to robots.txt
22:41:27<pokechu22>https://en.rodovid.org/wk/Special:Statistics says 186,265 total pages... and also "a total of 284 page views" which is good. Also https://en.rodovid.org/wk/Special:Version gives MediaWiki 1.9.3 from circa 2008...
22:44:21<pokechu22>Special:Export is restricted, https://en.rodovid.org/api.php seems to exist. Though https://en.rodovid.org/wk/Special:Recentchanges indicates that there's also a https://en.rodovid.org/wk/Special:Changedrecords which seems to be specialized... example https://en.rodovid.org/wk/Person:1445866
22:51:49<@JAA>I believe those are just specially formatted pages. The API seems to indicate they use XML.
22:52:05<@JAA>See e.g. https://en.rodovid.org/api.php?action=query&prop=revisions&pageids=81887&rvprop=timestamp|user|comment|content
22:52:18<@JAA>But yeah, I wonder whether the existing tooling can handle that.
22:57:46<pokechu22>https://www.mediawiki.org/wiki/API:Userinfo is from 1.11, seems like mwclient depends on that (and also wants 1.16?)
23:00:15<pokechu22>https://github.com/mwclient/mwclient/blame/6e664d98c24ce783d67a40efebda5e6c9e3379e8/README.md makes me think that getting it to work with wikiteam tools isn't going to be possible; there doesn't seem to be *any* version that supports mediawiki that old
23:01:19<pokechu22>(see also https://github.com/mwclient/mwclient/commit/74624b19597cec73f3196ba6e484d832d7243b5a)
23:04:27igloo22225 quits [Read error: Connection reset by peer]
23:04:50igloo22225 joins
23:05:54<pokechu22>Nothing in https://archive.org/search.php?query=rodovid%20wikiteam seems to be a good archive - they only contain a titles.txt file
23:12:22<@JAA>Ah :-(
23:17:51<pokechu22>If you want to try to do an !a < list, https://en.rodovid.org/wk?title=Special%3AAllpages&from=&namespace=0 with the other namespaces should probably capture everything... but not the page source, since e.g. https://en.rodovid.org/edit/Family:10000 is restricted
23:19:22<pokechu22>I'll build a list of all namespaces, one sec
23:20:14<@JAA>Yeah, page source would need to be fetched via the API.
23:21:50<@JAA>I think I'll put this on my list of todos. I'd want to do a proper complete archive.
23:22:05<@JAA>An !a < job for decent coverage would still be nice though.
23:23:11<pokechu22>https://en.rodovid.org/api.php nicely gives a list of valid namespaces near list=allpages
23:24:11<@JAA>Right, so does https://en.rodovid.org/wk/Special:Allpages with a bit of grepping. :-P
23:24:58<pokechu22>The first time I did it I just manually copied URLs after changing the dropdown... it's nice to have a clean comma-separated list
23:25:26<pokechu22>https://transfer.archivete.am/jCnmi/en.rodovid.org_all_pages_seed.txt
23:27:07<pokechu22>You'd want to ignore /edit/ but not /history/ probably...
23:28:30<pokechu22>Hmm, based on https://en.rodovid.org/wk/Special:Changedrecords vs https://en.rodovid.org/wk?title=Special:Changedrecords&onlylocal=1 doing the english one should cover most of the content, but maybe not in a super useful manner
23:29:11<@JAA>Some pages don't have /history/. :-(
23:29:53<@JAA>Ah, those are pages associated with another wiki.
23:30:00<pokechu22>Hmm, I think ones that aren't local don't have /history/ - interesting.
23:30:04<@JAA>E.g. https://en.rodovid.org/wk/Person:1390527 = https://ru.rodovid.org/wk/%D0%97%D0%B0%D0%BF%D0%B8%D1%81%D1%8C:1390527
23:31:44<@JAA>'Its data is periodically archived by the WikiTeam project at the Internet Archive.' per English Wikipedia... lol
23:31:47<pokechu22>The "In other languages" sidebar link there is both useful and probably problematic since it'd result in lots of duplication for those records that aren't native to a specific wiki
23:32:42<@JAA>Also stored on servers in Kyiv per enwiki. If that's true... Well, let's just say this just got a bit more interesting and important than I thought a few minutes ago.
23:34:43<pokechu22>https://en.rodovid.org/wk/About_Rodovid does say the maintainers are from Ukraine
23:37:17<@JAA>Seems to be hosted at AWS, unless they just proxy it through.
23:37:23<pokechu22>https://engine.rodovid.org/wk/Special:Statistics is probably a good starting point. It also looks like Special:AllPages only lists entries at exist locally... as https://engine.rodovid.org/wk?title=Special:Allpages&namespace=1000 has 9 pages of 1000 entries each, which seems close to 12,555 total pages
23:38:17<@JAA>Yeah, the Russian one is by far the largest, 777k pages.
23:38:56<@JAA>AB would still retrieve the pages in other languages as outlinks unless ignored, but when ignored, the browsability will suffer significantly.
23:39:20<pokechu22>Hmm, I'm less sure about that actually, https://es.rodovid.org/wk/Persona:326677 and https://en.rodovid.org/wk/Person:326677 both exist but https://engine.rodovid.org/wk/Person:326677 doesn't
23:40:57<pokechu22>https://en.rodovid.org/wk/Person:1419167 / https://ru.rodovid.org/wk/%D0%97%D0%B0%D0%BF%D0%B8%D1%81%D1%8C:1419167 - the ru Special:AllPages does have a longer list than the en Special:AllPages. Maybe engine.rodovid.org uses a separate database since it acts like a test site?
23:41:49<pokechu22>I should try and get wikiteam tools to work - the place where mwclient fails is for user info which is probably irrelevant for us since we don't log in to make edits. Maybe things'll work if I just rip out that check? No way that can end poorly :P
23:42:18<@JAA>It implies at the top that mwclient is only required for --xmlrevisions also.
23:43:25<pokechu22> --help says --xmlrevisions is MediaWiki 1.27+ only (now I'm confused as to what makes that different from --xml...)
23:43:46Sluggs quits [Ping timeout: 240 seconds]
23:47:51<pokechu22>Oh boy, https://www.mediawiki.org/wiki/API:Siteinfo is MW 1.8, even more stuff to rip out...
23:48:42<@JAA>But this has 1.9.3...?
23:49:38<@JAA>And meta=siteinfo exists on its API.
23:49:41<pokechu22>... hm... oh, my attempt at editing it locally failed
23:49:59<pokechu22>I do like how https://engine.rodovid.org/api.php?version=1 shows "ApiQueryRevisions: $Id$"
23:50:49<@JAA>:-)
23:51:02<pokechu22>(it = mwclient, and my attempt was copying /home/pokechu22/python2-env/lib/python2.7/site-packages/mwclient/ so that there was a mwclient/ directory next to dumpgenerator.py... which seemed to work at first, but now isn't working?)
23:52:00<pokechu22>No, wait, that worked, I'm just confused now
23:52:18<@JAA>That should work, yeah.
23:52:46<@JAA>Python prefers the cwd over the site and dist package dirs on imports.
23:53:23<@JAA>The details are a bit complicated though as it depends on how you invoke Python. And I have no idea how it works exactly in Python 2. I closed that chapter a long time ago and erased any information of it from my brain. :-)
23:54:48<@JAA>But yeah, it might depend on whether you do `python script.py`, `python -c 'import mwclient'`, or just `python` for the REPL.
23:55:28<pokechu22>https://en.rodovid.org/api.php?action=query&meta=siteinfo&siprop=general|namespaces works in my browser and *should* be what I'm sending, but I'm getting mwclient.errors.APIError: (u'unknown_meta', u"Unrecognised value for parameter 'meta'", u'incredibly long info message skipped')