00:11:13 | <pabs> | arkiver: btw, is the DPoS you are working on here a continuous process, or one-shot for each wiki? ie do you use something like RecentChanges to update changed pages? |
01:12:53 | | davispuh quits [Ping timeout: 258 seconds] |
04:35:24 | | pabs quits [Ping timeout: 260 seconds] |
04:51:06 | | pabs (pabs) joins |
05:58:33 | <@arkiver> | pabs: continuous, but we may not use RecentChanges and just recapture the HTML |
08:12:52 | <@arkiver> | powdertoy.co.uk is fully supported |
08:12:58 | <@arkiver> | moving on to imslp.org |
08:40:49 | <pabs> | arkiver: and are wikis that fail in #wikibot to be captured? or ones not yet put in #wikibot? and do you exclude large ones, like the 2TB https://retrocdn.net/ ? |
08:41:32 | <pabs> | I think RecentChanges is useful to re-capture btw, for navigation purposes |
08:49:36 | <@arkiver> | pabs: this would run in addition to #wikibot , simply to get the wikis into the Wayback Machine as well, and that would include the 2 TB retrocdn.net |
08:50:04 | <pabs> | IIRC retrocdn.net failed there due to the size |
08:51:18 | <@arkiver> | we can get it here |
08:51:23 | <@arkiver> | well, with the Warrior project |
08:51:48 | <@arkiver> | i do want to be clear that this is not an effort to be run instead of #wikibot - #wikibot is very important to create dumps that can be used to put wikis back online |
08:52:13 | <@arkiver> | this Warrior project would be more of a "fill in the holes in the web archive" thing, since the Wayback Machine is easier to use to quickly find information in than the wiki dumps |
09:04:54 | <pabs> | sounds good to me |
09:05:20 | <pabs> | I usually only AB wikis when #wikibot fails and or its an ancient wiki |
10:02:31 | | nulldata-alt3 (nulldata) joins |
10:03:49 | | nulldata-alt quits [Ping timeout: 260 seconds] |
10:03:49 | | nulldata-alt3 is now known as nulldata-alt |
10:05:21 | <@arkiver> | pabs: do you have more examples of failing wikis? |
10:08:53 | <pabs> | many dokuwikis fail, https://mrakopedia.net is one I am doing in AB right now, failed in #wikibot |
10:09:02 | <pabs> | (its a big one) |
10:09:19 | <pabs> | I'm sure I'll find more next time I do a big dump |
10:10:37 | <pabs> | this failed and it looks like I gave up on it https://tolkiengateway.net/wiki/Main_Page |
10:11:25 | <pabs> | a pukiwiki that failed http://plamo.linet.gr.jp/ |
10:11:50 | <pabs> | failed and given up https://atl.wiki/ |
10:40:12 | <@arkiver> | pabs: you say "dokuwiki", is that something else than mediawiki? mrakopedia.net still seems like a mediawiki |
10:41:36 | <@arkiver> | interestingly API seems to be not enables for that one https://mrakopedia.net/w/api.php?action=query&meta=siteinfo |
10:42:38 | <@arkiver> | but looks like the wiki info is again available in the modules=startup script |
10:46:21 | | ericgallager quits [Quit: This computer has gone to sleep] |
11:45:48 | | davispuh joins |
12:22:50 | | ericgallager joins |
12:52:44 | <pabs> | arkiver: yeah, dokuwiki is another wiki engine. there are many. wikibot only supports dokuwiki/mediawiki/pukiwiki. moinmoin is another notable but unsupported one https://wiki.archiveteam.org/index.php/MoinMoin |
13:17:56 | <@arkiver> | pabs: what is the difference between dokuwiki and mediawiki? dokuwiki seems to function roughly the same and has references to mediawiki |
13:18:51 | <pabs> | different software with different but similar features/API/etc |
13:19:19 | <@arkiver> | pabs: are the differences documented somewhere, or do you have some examples so i know what to look for or where to look? |
13:22:00 | <pabs> | they have different markup formats, URL structure etc. |
13:22:04 | <pabs> | some examples https://lacie-nas.org/doku.php?id=start https://sebsauvage.net/wiki/doku.php?id=accueil https://lunch.org.uk/start https://crashcourse.ca/ |
13:22:29 | <pabs> | https://www.wikimatrix.org/compare/dokuwiki+mediawiki |
13:22:35 | <pabs> | https://news.ycombinator.com/item?id=40853759 |
13:22:39 | <pabs> | ^ couple of comparisons |
13:23:41 | <pabs> | mediawiki is definitely more popular |
13:24:08 | <pabs> | pukiwiki (the other one supported by wikibot) tends to be used by Japanese wikis |
13:24:24 | <pabs> | all three of them are written in PHP |
13:24:42 | <pabs> | https://www.wikimatrix.org/compare/dokuwiki+mediawiki+pukiwiki |
13:26:14 | <pabs> | there are lots of others, Wiki.js, MoinMoin, PmWiki, WikkaWiki come to mind. some more on https://www.wikimatrix.org/ |
13:26:45 | <pabs> | also gitit |
13:27:23 | <pabs> | wow, 82 different types on https://www.wikimatrix.org/compare |
13:41:51 | <@arkiver> | damn :P |
13:41:57 | <@arkiver> | well we'll go with mediawiki initially |
13:42:13 | <@arkiver> | just different versions of mediawiki already gives some annoyances |
16:36:26 | <pokechu22> | Note that there are dome dokuwiki sites that use a vector skin that looks similar to mediawiki, but still internally function completely differently |
16:36:38 | <pokechu22> | Oh, right, another edge case, let's see if it's still up... |
16:37:03 | <pokechu22> | ah, no, it's fully dead by now: https://wiki.education.minecraft.net/ |
16:37:21 | <pokechu22> | this used to have a skin that didn't link to history or edit (but the pages still worked). it also had a bad certificate |
16:45:09 | | tzt quits [Ping timeout: 260 seconds] |
16:51:31 | <@arkiver> | pokechu22: we do search for the action=* page in order to replace it with action=raw, etc. |
16:51:57 | <pokechu22> | Yeah, I don't think it linked any of those (unless you forced another skin with useskin=vector), checking... |
16:52:05 | <pokechu22> | I don't think it had the keyboard shortcuts enabled either |
16:52:06 | <@arkiver> | basically we try to reuse the context as found in the HTML and some API endpoints as much as possible, without relying on static stuff across the wikis |
16:57:10 | | tzt (tzt) joins |
17:10:24 | <pokechu22> | ah right, this one was also problematic because /wiki/ URLs redirected, and only index.php worked. But https://web.archive.org/web/20220826024154/https://wiki.education.minecraft.net/index.php?title=3D_Printing_and_Minecraft is an example and didn't have any action= URLs other than <link rel="EditURI" type="application/rsd+xml" |
17:10:27 | <pokechu22> | href="https://wiki.education.minecraft.net/api.php?action=rsd" /> |
17:26:36 | <pokechu22> | there's also https://wiki.activeworlds.com/index.php?title=Main_Page where most of the pages are restricted, including history and the API, but https://wiki.activeworlds.com/index.php?title=Special:RecentChanges still works |
17:26:38 | <pokechu22> | (https://web.archive.org/web/20230223054907/https://wiki.activeworlds.com/index.php?hidebots=1&limit=50&days=7&enhanced=1&title=Special:RecentChanges&urlversion=2 had content), leading to e.g. https://wiki.activeworlds.com/index.php?title=8.1&curid=7287&diff=33999&oldid=33998, and you also have https://wiki.activeworlds.com/index.php?title=8.1&oldid=33998&action=raw. But these are |
17:26:40 | <pokechu22> | pretty hard to discover. |
17:27:09 | <pokechu22> | looks like the page source provides wgCurRevisionId though |
17:36:15 | <@arkiver> | this is very very good to know pokechu22 |
17:36:26 | <@arkiver> | thanks a lot the more edge cases we know about and i can test against, the better |
17:40:33 | <Nemo_bis> | we have a bunch in bug reports for wikiteam dumpgenerator ^^ |
17:45:11 | <@arkiver> | Nemo_bis: do you have a link to that? |
23:52:28 | <pabs> | guess that is https://github.com/saveweb/wikiteam3/issues |