#wikiteam log for 2023-08-07

Home Search Previous day Next day

00:25:52		pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
00:32:03		pabs (pabs) joins
01:11:54		Megame quits [Read error: Connection reset by peer]
02:08:03		balrog quits [Quit: Bye]
02:09:02		balrog (balrog) joins
03:14:17	<DigitalDragons>	list of mediawikis found in some web crawling i did/am doing: https://pad.notkiska.pw/p/lx0jPi-GiQE9fWzVXBOO
03:14:49	<DigitalDragons>	thought it might be interesting to the people here
03:19:53	<DigitalDragons>	(wiki.archiveteam.org is in there too :P)
03:45:47	<fireonlive>	;)
06:33:26		pokechu22 quits [Ping timeout: 252 seconds]
06:54:38		qwertyasdfuiopghjkl quits [Remote host closed the connection]
07:00:08		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
07:46:12	<mako>	pabs: maybe you've already solved this but there are existing tools for parsing dumps and extracing outlinks from the xml dumps (it requires parsing the wikitext a bit)
07:47:09	<pabs>	oh, hi mako :)
07:47:12	<pabs>	I haven't looked at it yet, maybe the WikiTeam page has something though
07:48:08	<pabs>	I did think that wikiteam/wikibot dumps should systematically have outlinks sent to the URLs project #// though
07:49:39	<pabs>	and I want to get the URLs/ArchiveBot projects to send all links to some filtering and from there to other projects
07:51:09	<pabs>	ah, the wiki page mentions a dormant wikis-grab Warrior project for grabbing outlinks
07:52:47	<pabs>	DigitalDragons: are you putting those through wikibot?
07:56:13	<mako>	maybe i'm confused. what do you mean by outlinks? i assumed you meant any of the external links in the text of the pages in the wikis being archived?
07:56:38	<pabs>	correct
07:59:38	<mako>	ok right. the dumps will (if done right) just be the source text (unrendered wikitext) of every revision of every page. i like the idea of restoring those in a way that makes those links go through some filter to wayback links or whatever but i don't know of a project like that. that's not typically my use case though so maybe other know of this
08:00:57	<mako>	since the revision text will also be right next to a timestamp, it would be really easy to do with wayback or anything else that does the memento standard
08:03:07	<pabs>	there is apparently a dormant wikis-grab project for it. usually what people do though is to archive wikis both here and through ArchiveBot, which saves outlinks too
08:03:30	<pabs>	but indeed the filtering idea isn't anywhere near started yet
08:09:15	<pabs>	aha, there are two perl modules for this: libmediawiki-dumpfile-perl libparse-mediawikidump-perl
08:11:23	<pabs>	and python3-mwparserfromhell ruby-wikicloth libsweble-wikitext-java
14:43:26		that_lurker quits [Client Quit]
14:46:39		that_lurker (that_lurker) joins
17:55:20		Bedivere joins
17:56:32		Sir_Bedivere quits [Ping timeout: 252 seconds]
19:04:36		pokechu22 (pokechu22) joins
19:15:32		that_lurker quits [Client Quit]
19:15:59		that_lurker7 (that_lurker) joins
19:21:28		qwertyasdfuiopghjkl quits [Remote host closed the connection]
20:14:19		that_lurker7 is now known as that_lurker
21:01:29		PredatorIWD quits [Remote host closed the connection]
21:02:08		PredatorIWD joins
21:35:10		Jake quits [Ping timeout: 258 seconds]
23:19:16		Megame (Megame) joins

Home Search Previous day Next day