00:11:55nepeat quits [Ping timeout: 272 seconds]
00:15:31nepeat (nepeat) joins
00:30:15DogsRNice joins
01:04:29@Sanqui quits [Ping timeout: 272 seconds]
01:13:12Cobalt|m1 joins
01:15:23Sanqui joins
01:15:27Sanqui quits [Changing host]
01:15:27Sanqui (Sanqui) joins
01:15:27@ChanServ sets mode: +o Sanqui
04:46:55DogsRNice quits [Read error: Connection reset by peer]
06:04:31atphoenix_ is now known as atphoenix
13:13:27ThreeHM quits [Ping timeout: 272 seconds]
13:15:04ThreeHM (ThreeHeadedMonkey) joins
14:05:10<pabs>https://www.arrse.co.uk/wiki/ARRSEPedia_Intro 403 in #wikibot, seems to be IP reputation
18:18:43Matthww quits [Ping timeout: 272 seconds]
18:20:31Matthww joins
18:43:17DogsRNice joins
20:27:06hexagonwin (hexagonwin) joins
20:33:58<hexagonwin>hello. i'm trying to preserve a large (1200,000+ documents) wiki at namu.wiki (not urgent, not mediawiki). i'm aiming to get every revision for all documents in their RAW form. i'm curious how you folks handle different revisions. are different revisions just saved like individual documents so that most of the text gets duplicated, or is there some efficient format that mostly saves them as
20:33:58<hexagonwin>diffs?
20:37:17<pokechu22>For wikiteam tools/mediawiki, IIRC the XML export contains the full text of each revision in one giant file. That file gets compressed after the dump finishes (without any special compression to make individual records seekable, just normal zstd or 7-zip)
20:37:33<hexagonwin>thanks a lot!
20:37:54<klea>If you wanted to make it better, you could make a zstd dictionary of all revisions
20:38:13<klea>then concatenate that dictionary along with record offsets, I suppose, but that's probably a bit annoying to do.
20:39:21<hexagonwin>i guess for now i'll just grab the datas into a sqlite db or something and craft a xml dump from that
20:51:31pokechu22 quits [Quit: System maintenance]
21:03:29pokechu22 (pokechu22) joins
21:52:38Stagnant quits [Remote host closed the connection]