00:25:52 | | pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.] |
00:32:03 | | pabs (pabs) joins |
01:11:54 | | Megame quits [Read error: Connection reset by peer] |
02:08:03 | | balrog quits [Quit: Bye] |
02:09:02 | | balrog (balrog) joins |
03:14:17 | <DigitalDragons> | list of mediawikis found in some web crawling i did/am doing: https://pad.notkiska.pw/p/lx0jPi-GiQE9fWzVXBOO |
03:14:49 | <DigitalDragons> | thought it might be interesting to the people here |
03:19:53 | <DigitalDragons> | (wiki.archiveteam.org is in there too :P) |
03:45:47 | <fireonlive> | ;) |
06:33:26 | | pokechu22 quits [Ping timeout: 252 seconds] |
06:54:38 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
07:00:08 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
07:46:12 | <mako> | pabs: maybe you've already solved this but there are existing tools for parsing dumps and extracing outlinks from the xml dumps (it requires parsing the wikitext a bit) |
07:47:09 | <pabs> | oh, hi mako :) |
07:47:12 | <pabs> | I haven't looked at it yet, maybe the WikiTeam page has something though |
07:48:08 | <pabs> | I did think that wikiteam/wikibot dumps should systematically have outlinks sent to the URLs project #// though |
07:49:39 | <pabs> | and I want to get the URLs/ArchiveBot projects to send all links to some filtering and from there to other projects |
07:51:09 | <pabs> | ah, the wiki page mentions a dormant wikis-grab Warrior project for grabbing outlinks |
07:52:47 | <pabs> | DigitalDragons: are you putting those through wikibot? |
07:56:13 | <mako> | maybe i'm confused. what do you mean by outlinks? i assumed you meant any of the external links in the text of the pages in the wikis being archived? |
07:56:38 | <pabs> | correct |
07:59:38 | <mako> | ok right. the dumps will (if done right) just be the source text (unrendered wikitext) of every revision of every page. i like the idea of restoring those in a way that makes those links go through some filter to wayback links or whatever but i don't know of a project like that. that's not typically my use case though so maybe other know of this |
08:00:57 | <mako> | since the revision text will also be right next to a timestamp, it would be really easy to do with wayback or anything else that does the memento standard |
08:03:07 | <pabs> | there is apparently a dormant wikis-grab project for it. usually what people do though is to archive wikis both here and through ArchiveBot, which saves outlinks too |
08:03:30 | <pabs> | but indeed the filtering idea isn't anywhere near started yet |
08:09:15 | <pabs> | aha, there are two perl modules for this: libmediawiki-dumpfile-perl libparse-mediawikidump-perl |
08:11:23 | <pabs> | and python3-mwparserfromhell ruby-wikicloth libsweble-wikitext-java |
14:43:26 | | that_lurker quits [Client Quit] |
14:46:39 | | that_lurker (that_lurker) joins |
17:55:20 | | Bedivere joins |
17:56:32 | | Sir_Bedivere quits [Ping timeout: 252 seconds] |
19:04:36 | | pokechu22 (pokechu22) joins |
19:15:32 | | that_lurker quits [Client Quit] |
19:15:59 | | that_lurker7 (that_lurker) joins |
19:21:28 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
20:14:19 | | that_lurker7 is now known as that_lurker |
21:01:29 | | PredatorIWD quits [Remote host closed the connection] |
21:02:08 | | PredatorIWD joins |
21:35:10 | | Jake quits [Ping timeout: 258 seconds] |
23:19:16 | | Megame (Megame) joins |