00:11:13<pabs>arkiver: btw, is the DPoS you are working on here a continuous process, or one-shot for each wiki? ie do you use something like RecentChanges to update changed pages?
01:12:53davispuh quits [Ping timeout: 258 seconds]
04:35:24pabs quits [Ping timeout: 260 seconds]
04:51:06pabs (pabs) joins
05:58:33<@arkiver>pabs: continuous, but we may not use RecentChanges and just recapture the HTML
08:12:52<@arkiver>powdertoy.co.uk is fully supported
08:12:58<@arkiver>moving on to imslp.org
08:40:49<pabs>arkiver: and are wikis that fail in #wikibot to be captured? or ones not yet put in #wikibot? and do you exclude large ones, like the 2TB https://retrocdn.net/ ?
08:41:32<pabs>I think RecentChanges is useful to re-capture btw, for navigation purposes
08:49:36<@arkiver>pabs: this would run in addition to #wikibot , simply to get the wikis into the Wayback Machine as well, and that would include the 2 TB retrocdn.net
08:50:04<pabs>IIRC retrocdn.net failed there due to the size
08:51:18<@arkiver>we can get it here
08:51:23<@arkiver>well, with the Warrior project
08:51:48<@arkiver>i do want to be clear that this is not an effort to be run instead of #wikibot - #wikibot is very important to create dumps that can be used to put wikis back online
08:52:13<@arkiver>this Warrior project would be more of a "fill in the holes in the web archive" thing, since the Wayback Machine is easier to use to quickly find information in than the wiki dumps
09:04:54<pabs>sounds good to me
09:05:20<pabs>I usually only AB wikis when #wikibot fails and or its an ancient wiki
10:02:31nulldata-alt3 (nulldata) joins
10:03:49nulldata-alt quits [Ping timeout: 260 seconds]
10:03:49nulldata-alt3 is now known as nulldata-alt
10:05:21<@arkiver>pabs: do you have more examples of failing wikis?
10:08:53<pabs>many dokuwikis fail, https://mrakopedia.net is one I am doing in AB right now, failed in #wikibot
10:09:02<pabs>(its a big one)
10:09:19<pabs>I'm sure I'll find more next time I do a big dump
10:10:37<pabs>this failed and it looks like I gave up on it https://tolkiengateway.net/wiki/Main_Page
10:11:25<pabs>a pukiwiki that failed http://plamo.linet.gr.jp/
10:11:50<pabs>failed and given up https://atl.wiki/
10:40:12<@arkiver>pabs: you say "dokuwiki", is that something else than mediawiki? mrakopedia.net still seems like a mediawiki
10:41:36<@arkiver>interestingly API seems to be not enables for that one https://mrakopedia.net/w/api.php?action=query&meta=siteinfo
10:42:38<@arkiver>but looks like the wiki info is again available in the modules=startup script
10:46:21ericgallager quits [Quit: This computer has gone to sleep]
11:45:48davispuh joins
12:22:50ericgallager joins
12:52:44<pabs>arkiver: yeah, dokuwiki is another wiki engine. there are many. wikibot only supports dokuwiki/mediawiki/pukiwiki. moinmoin is another notable but unsupported one https://wiki.archiveteam.org/index.php/MoinMoin
13:17:56<@arkiver>pabs: what is the difference between dokuwiki and mediawiki? dokuwiki seems to function roughly the same and has references to mediawiki
13:18:51<pabs>different software with different but similar features/API/etc
13:19:19<@arkiver>pabs: are the differences documented somewhere, or do you have some examples so i know what to look for or where to look?
13:22:00<pabs>they have different markup formats, URL structure etc.
13:22:04<pabs>some examples https://lacie-nas.org/doku.php?id=start https://sebsauvage.net/wiki/doku.php?id=accueil https://lunch.org.uk/start https://crashcourse.ca/
13:22:29<pabs>https://www.wikimatrix.org/compare/dokuwiki+mediawiki
13:22:35<pabs>https://news.ycombinator.com/item?id=40853759
13:22:39<pabs>^ couple of comparisons
13:23:41<pabs>mediawiki is definitely more popular
13:24:08<pabs>pukiwiki (the other one supported by wikibot) tends to be used by Japanese wikis
13:24:24<pabs>all three of them are written in PHP
13:24:42<pabs>https://www.wikimatrix.org/compare/dokuwiki+mediawiki+pukiwiki
13:26:14<pabs>there are lots of others, Wiki.js, MoinMoin, PmWiki, WikkaWiki come to mind. some more on https://www.wikimatrix.org/
13:26:45<pabs>also gitit
13:27:23<pabs>wow, 82 different types on https://www.wikimatrix.org/compare
13:41:51<@arkiver>damn :P
13:41:57<@arkiver>well we'll go with mediawiki initially
13:42:13<@arkiver>just different versions of mediawiki already gives some annoyances
16:36:26<pokechu22>Note that there are dome dokuwiki sites that use a vector skin that looks similar to mediawiki, but still internally function completely differently
16:36:38<pokechu22>Oh, right, another edge case, let's see if it's still up...
16:37:03<pokechu22>ah, no, it's fully dead by now: https://wiki.education.minecraft.net/
16:37:21<pokechu22>this used to have a skin that didn't link to history or edit (but the pages still worked). it also had a bad certificate
16:45:09tzt quits [Ping timeout: 260 seconds]
16:51:31<@arkiver>pokechu22: we do search for the action=* page in order to replace it with action=raw, etc.
16:51:57<pokechu22>Yeah, I don't think it linked any of those (unless you forced another skin with useskin=vector), checking...
16:52:05<pokechu22>I don't think it had the keyboard shortcuts enabled either
16:52:06<@arkiver>basically we try to reuse the context as found in the HTML and some API endpoints as much as possible, without relying on static stuff across the wikis
16:57:10tzt (tzt) joins
17:10:24<pokechu22>ah right, this one was also problematic because /wiki/ URLs redirected, and only index.php worked. But https://web.archive.org/web/20220826024154/https://wiki.education.minecraft.net/index.php?title=3D_Printing_and_Minecraft is an example and didn't have any action= URLs other than <link rel="EditURI" type="application/rsd+xml"
17:10:27<pokechu22>href="https://wiki.education.minecraft.net/api.php?action=rsd" />
17:26:36<pokechu22>there's also https://wiki.activeworlds.com/index.php?title=Main_Page where most of the pages are restricted, including history and the API, but https://wiki.activeworlds.com/index.php?title=Special:RecentChanges still works
17:26:38<pokechu22>(https://web.archive.org/web/20230223054907/https://wiki.activeworlds.com/index.php?hidebots=1&limit=50&days=7&enhanced=1&title=Special:RecentChanges&urlversion=2 had content), leading to e.g. https://wiki.activeworlds.com/index.php?title=8.1&curid=7287&diff=33999&oldid=33998, and you also have https://wiki.activeworlds.com/index.php?title=8.1&oldid=33998&action=raw. But these are
17:26:40<pokechu22>pretty hard to discover.
17:27:09<pokechu22>looks like the page source provides wgCurRevisionId though
17:36:15<@arkiver>this is very very good to know pokechu22
17:36:26<@arkiver>thanks a lot the more edge cases we know about and i can test against, the better
17:40:33<Nemo_bis>we have a bunch in bug reports for wikiteam dumpgenerator ^^
17:45:11<@arkiver>Nemo_bis: do you have a link to that?
23:52:28<pabs>guess that is https://github.com/saveweb/wikiteam3/issues