00:25:52pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
00:32:03pabs (pabs) joins
01:11:54Megame quits [Read error: Connection reset by peer]
02:08:03balrog quits [Quit: Bye]
02:09:02balrog (balrog) joins
03:14:17<DigitalDragons>list of mediawikis found in some web crawling i did/am doing: https://pad.notkiska.pw/p/lx0jPi-GiQE9fWzVXBOO
03:14:49<DigitalDragons>thought it might be interesting to the people here
03:19:53<DigitalDragons>(wiki.archiveteam.org is in there too :P)
03:45:47<fireonlive>;)
06:33:26pokechu22 quits [Ping timeout: 252 seconds]
06:54:38qwertyasdfuiopghjkl quits [Remote host closed the connection]
07:00:08qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
07:46:12<mako>pabs: maybe you've already solved this but there are existing tools for parsing dumps and extracing outlinks from the xml dumps (it requires parsing the wikitext a bit)
07:47:09<pabs>oh, hi mako :)
07:47:12<pabs>I haven't looked at it yet, maybe the WikiTeam page has something though
07:48:08<pabs>I did think that wikiteam/wikibot dumps should systematically have outlinks sent to the URLs project #// though
07:49:39<pabs>and I want to get the URLs/ArchiveBot projects to send all links to some filtering and from there to other projects
07:51:09<pabs>ah, the wiki page mentions a dormant wikis-grab Warrior project for grabbing outlinks
07:52:47<pabs>DigitalDragons: are you putting those through wikibot?
07:56:13<mako>maybe i'm confused. what do you mean by outlinks? i assumed you meant any of the external links in the text of the pages in the wikis being archived?
07:56:38<pabs>correct
07:59:38<mako> ok right. the dumps will (if done right) just be the source text (unrendered wikitext) of every revision of every page. i like the idea of restoring those in a way that makes those links go through some filter to wayback links or whatever but i don't know of a project like that. that's not typically my use case though so maybe other know of this
08:00:57<mako>since the revision text will also be right next to a timestamp, it would be really easy to do with wayback or anything else that does the memento standard
08:03:07<pabs>there is apparently a dormant wikis-grab project for it. usually what people do though is to archive wikis both here and through ArchiveBot, which saves outlinks too
08:03:30<pabs>but indeed the filtering idea isn't anywhere near started yet
08:09:15<pabs>aha, there are two perl modules for this: libmediawiki-dumpfile-perl libparse-mediawikidump-perl
08:11:23<pabs>and python3-mwparserfromhell ruby-wikicloth libsweble-wikitext-java
14:43:26that_lurker quits [Client Quit]
14:46:39that_lurker (that_lurker) joins
17:55:20Bedivere joins
17:56:32Sir_Bedivere quits [Ping timeout: 252 seconds]
19:04:36pokechu22 (pokechu22) joins
19:15:32that_lurker quits [Client Quit]
19:15:59that_lurker7 (that_lurker) joins
19:21:28qwertyasdfuiopghjkl quits [Remote host closed the connection]
20:14:19that_lurker7 is now known as that_lurker
21:01:29PredatorIWD quits [Remote host closed the connection]
21:02:08PredatorIWD joins
21:35:10Jake quits [Ping timeout: 258 seconds]
23:19:16Megame (Megame) joins