02:39:27<pokechu22>OK, I've tried a few different things, and I can't figure out how to create a 7z file that works properly with IA's viewer: https://archive.org/download/wiki-ar.rodovid.org (I tried renaming Special:Version.html to something else, and I tried using 7zip from the command line in WSL; both failed). Any ideas as to what I'm doing wrong? Again, the files seem to work when downloaded,
02:39:29<pokechu22>it's just IA's viewer that's disliking them
03:07:15<pokechu22>On further investigation, it's not my fault; https://archive.org/download/wiki-enrodovidorg is also affected (https://ia600208.us.archive.org/view_archive.php?archive=/31/items/wiki-enrodovidorg/enrodovidorg-20120410-wikidump.7z should also have an errors.log file);
03:07:17<pokechu22>https://ia801605.us.archive.org/view_archive.php?archive=/25/items/wiki-opensourceecology.org/opensourceecologyorg-20120531-history.xml.7z from https://archive.org/download/wiki-opensourceecology.org is the same
03:12:25<pokechu22>!a https://wiki.openmpt.org/ -i mediawiki
03:12:31<pokechu22>oops
04:19:16Kuatrero quits [Ping timeout: 240 seconds]
15:48:25Connection closed.
15:48:38atirclog (atirclog) joins
15:48:38Topic: https://archiveteam.org/index.php?title=WikiTeam
15:48:38Topic set by JAA at 2020-10-15 00:06:28Z
15:48:46Current users: atirclog (atirclog), AnotherIki, igloo22225 (igloo22225), tech_exorcist_ (tech_exorcist), HackMii (hacktheplanet), @chfoo (chfoo), @OrIdow6 (OrIdow6), pokechu22 (pokechu22), @arkiver (arkiver), Soulflare, @AlsoJAA (JAA), @JAA (JAA), luckcolors (luckcolors), @ChanServ, @Sanqui (Sanqui), @hook54321 (hook54321), fo0bar_, phuzion (phuzion), qxtal (qxtal), Mayk78, Nemo_bis (Nemo_bis), masterX244 (masterX244), DiscantX, user_ (gazorpazorp), tech234a (tech234a), Ryz (Ryz), @rewby (rewby), qw3rty, ThreeHM (ThreeHeadedMonkey), TheTechRobo (TheTechRobo), Craigle (Craigle), systwi (systwi), monika (boom), nepeat_ (nepeat), Terbium_, duce1337 (duce1337), sepro (sepro), eroc19902 (eroc1990), Matthww1, mrfooooo, atphoenix_ (atphoenix), michaelblob (michaelblob), qwertyasdfuiopghjkl, Iki1, Jake (Jake), Sanqui|m, mind_combatant, britmob|m, DigitalDragon
15:48:49Iki1 quits [Ping timeout: 265 seconds]
15:51:13tech_exorcist_ quits [Remote host closed the connection]
16:16:26Kuatrero joins
16:23:38Bedivere joins
16:24:52Kuatrero quits [Ping timeout: 240 seconds]
17:30:44<pokechu22>The ultimate fix would be to switch to python 3, of course, but that's probably not going to happen for a while
17:44:42<michaelblob>there's a python3 rewrite of the project but i'm not sure how stable it is
18:02:34<pokechu22>Theoretically reverse_readline() could still use byte counting since \n is still a single byte in utf-8... but in practice that seems a bit dubious
18:02:48<pokechu22>(could still use byte counting with unicode)
18:17:26Bedivere quits [Remote host closed the connection]
18:17:48Bedivere joins
19:04:13michaelblob quits [Read error: Connection reset by peer]
19:05:34michaelblob (michaelblob) joins
19:06:33<pokechu22>One more example of IA's thing being broken: https://archive.org/download/wiki-wikiteamorainorg_w - I'll just upload the 7z files I already made (via right-click -> 7-zip -> add to archive in Windows) under the assumption that this is an IA bug that'll be fixed
20:18:39<pokechu22>OK, all of them have been uploaded (other than ru.rodovid.org, which I'm still downloading (over halfway done, but due to the resumes I don't have an exact progress bar): https://archive.org/details/@pokechu22
21:02:35<pokechu22>I've emailed info@archive.org about that file listing issue
21:06:36<Nemo_bis>pokechu22: looks like an issue with solid archives
21:07:32<pokechu22>Definitely possible - I think I tried a non-solid archive, but maybe not?
21:08:23<Nemo_bis>pokechu22: that one is maybe partly solid and partly not https://paste.debian.net/plain/1253654
21:08:46<Nemo_bis>no wait it's solid but those .desc files are broken (empty files)
21:09:13<Nemo_bis>Anyway I wouldn't worry too much about the 7z view, for the longest time we didn't have it at all
21:09:39<pokechu22>Ah, actually, I think the first time I did a fully-solid one, and the second time it was the default for ultra, which is solid in 4GB blocks (and I assume that's effectively the same as solid because the whole file is less than 4GB, though the file did end up being larger)
21:09:57<Nemo_bis>No idea, I've not used 7z for Windows in like 15 years
21:10:18<Nemo_bis>doesn't launcher.py work for you?
21:10:25<pokechu22>One other interesting thing is that https://archive.org/download/wiki-wikiteamorainorg_w/wikiteamorainorg_w-20200208-wikidump.7z/images/2017 Survey Phabricator.png does work, so extracting is fine, it's just previewing
21:10:48<pokechu22>I haven't tried it, but my python 2 is in WSL so it'd probably be a mess
21:10:54<Nemo_bis>Ok, that's good to hear. URL encoding was also an issue sometimes.
21:11:02<Nemo_bis>Hmm right.
21:11:48<pokechu22>(and, these ones need manual specification of both the API and the index page, so I don't think using dumpgenerator.py adds much more complexity)
21:12:51<pokechu22>Looks like launcher.py does use non-solid archives though: https://github.com/WikiTeam/wikiteam/blob/master/launcher.py#L114-L115
21:17:21<Nemo_bis>Yes but I often change the settings manually
21:17:39<Nemo_bis>Or used to
21:18:34<Nemo_bis>Usually there's little point making a solid archive as most of the compression is in the history XML, but there are exceptions. Also, we used to compress things twice while recently the history 7z is copied and then updated with the images.
21:19:20<Nemo_bis>So I have no idea how many of the wikiteam dumps are solid archives. It shouldn't matter.
21:21:47<pokechu22>Oh yeah, I haven't been doing a separate history 7z from the images one (because I've been making them manually). Is that something that's actually useful?
22:00:32<Nemo_bis>pokechu22: it's useful for people who want to download less data and only analyse the XML, e.g. researchers; but it's not particularly important if the images directory is smallish
22:03:16<pokechu22>It seems like the images directory is generally about the same size as the history one (or smaller); for the largest one I have downloaded (fr.rodovid.org) images is 1GB and history is 2.5GB. For uk.rodovid.org it's a bit bigger (images 700MB, history 500MB). But eh, I think that's probably fine
22:08:54<Nemo_bis>but then there are crazy cases like https://archive.org/download/wiki-iskwikiupdeduph where the XML is 300 kB and the images are 300 GB
22:09:35<pokechu22>Yeah, definitely
22:10:30jodizzle (jodizzle) joins
23:16:52masterX244 quits [Ping timeout: 240 seconds]
23:23:50masterX244 (masterX244) joins