00:30:05BlackWinnerYoshi quits [Client Quit]
00:36:48<klea>btw, does 7z have a index so it can do compression like we do with warcs where you can do random reads of records without decompressing everything?
00:48:45etnguyen03 quits [Client Quit]
00:52:51abirkill (abirkill) joins
01:03:32Shard7959154 (Shard) joins
01:04:49Shard795915 quits [Ping timeout: 272 seconds]
01:04:49Shard7959154 is now known as Shard795915
01:26:12etnguyen03 (etnguyen03) joins
01:51:02lennier2_ joins
01:54:13lennier2 quits [Ping timeout: 272 seconds]
02:07:21Sluggs quits [Excess Flood]
02:07:40abirkill quits [Ping timeout: 256 seconds]
02:08:31Kotomind joins
02:14:56Sluggs (Sluggs) joins
02:24:40linuxgemini (linuxgemini) joins
02:31:10<h2ibot>Cruller edited CTGP-R (+21, Add [[Category:Gaming]]): https://wiki.archiveteam.org/?diff=59478&oldid=53564
02:34:25linuxgemini quits [Client Quit]
02:35:06linuxgemini (linuxgemini) joins
02:47:13<pabs>https://onemileatatime.com/news/spirit-airlines-canceling-flights/
03:33:01midou quits [Ping timeout: 272 seconds]
03:51:23Mateon1 quits [Ping timeout: 272 seconds]
03:52:01Mateon1 joins
04:03:22midou joins
04:12:20midou quits [Ping timeout: 256 seconds]
04:25:28Mateon1 quits [Remote host closed the connection]
04:28:53Mateon1 joins
04:47:23gosc joins
05:05:43mrminemeet joins
05:06:07mrminemeet_ quits [Ping timeout: 272 seconds]
05:11:30DogsRNice quits [Read error: Connection reset by peer]
05:23:36etnguyen03 quits [Remote host closed the connection]
05:23:51gosc quits [Ping timeout: 272 seconds]
05:27:49gosc joins
05:42:05nexussfan quits [Quit: Konversation terminated!]
05:55:31Mateon1 quits [Ping timeout: 272 seconds]
05:57:05Mateon1 joins
05:59:58midou joins
06:12:37midou quits [Ping timeout: 272 seconds]
06:16:05<gosc>how would I go about doing that sims thing? I tried using curl to check available urls but it was too slow I guess
06:17:01<pokechu22>I feel like you should be able to find a list of all released versions somewhere (google play?)
06:18:19Mateon1 quits [Ping timeout: 272 seconds]
06:18:48<gosc>pokechu22, the url didn't use the version number
06:18:53Mateon1 joins
06:19:03<gosc>not sure if old versions still run
06:19:11<pokechu22>oh
06:19:15<gosc>if they did, there would be like over 50 versions to individually check
06:19:39<gosc>164239 is version 52.0.0 and 161660 is version 50.0.0
06:19:43<gosc>copied from yesterday
06:20:07<pokechu22>Is 50.0.0 the oldest version?
06:20:14<gosc>nope
06:20:18<pokechu22>Do these URLs expire in some way?
06:20:26<gosc>they die on January 20
06:20:33<gosc>because shutdown
06:20:40<gosc>52.0.0 is the newest version
06:21:06<pokechu22>Right, but couldn't I just do an !ao < list in archivebot for like 160000 through 164239 to determine what gives 200s vs 404s?
06:21:11<gosc>version 50.0.0's files were saved in wayback which is how I could check
06:21:21<gosc>oh you can do that?
06:21:39<gosc>but wouldn't that archive the 404s though
06:21:58<gosc>also the list goes lower than 160000
06:21:59<pokechu22>Yeah, the meta-warc lists the statuses for each result (and you can also download the full WARC and process the request body instead of redownloading)
06:22:09<gosc>I see
06:22:12<pokechu22>It would, those are pretty small so I don't feel like that's a problem
06:22:47<pokechu22>that's the approach I've been using for refsheet.net
06:22:56<gosc>I see
06:23:03<gosc>145292 is 39.0.1 btw
06:23:40<gosc>well yeah you'd also have to parse the 200 urls, they're just a manifest listing the actual files
06:24:10<pokechu22>Yeah, that could either be done by redownloading it or by parsing them from the WARC
06:24:38midou joins
06:24:51ymgve_ joins
06:25:35<klea>i thought google exposed an api to get version numbers?
06:27:22<pokechu22>So, what do you think is a reasonable starting point? 100000? Doing it from 0 is also possible but I'm not sure how dense it is
06:28:04<klea>also potentially will 429
06:28:41<pokechu22>It's s3; I haven't seen 429s on it
06:29:01<klea>oh
06:29:05ymgve__ quits [Ping timeout: 272 seconds]
06:30:17<klea>i thought you both were talking about archiving google play files
06:30:29<@JAA>AWS will be happy to serve you more data/requests so they can charge the bucket owner more. :-)
06:31:12<pokechu22>I'm not sure whether https://eaassets-a.akamaihd.net/ behaves the same way or not, but if it's the same data on both it should be fine
06:31:56<pokechu22>I guess I can bruteforce in reverse starting at 165000 down to 0 and if it seems like it's slow and there hasn't been anything I can stop it early
06:32:17<klea>yea
06:32:27<klea>stopping early is done with !abort?
06:32:37<pokechu22>Or an ignore if things need to be retried
06:32:55<klea>oh
06:34:04midou quits [Read error: Connection reset by peer]
06:34:44<pokechu22>https://sims-campfire-prod-content.s3.amazonaws.com/dev/DL999/161660/assets/packages.windows.manifest also exists, hmm
06:35:10midou joins
06:35:27<gosc>pokechu22, sorry did something; started at 100000 and finished at 101000
06:35:30<gosc>those are done
06:35:32<gosc>it's all 403
06:36:08<gosc>klea, no, some guy posted a link to it and it ended up on google lol
06:36:19<pokechu22>Alright, I'll do 100000-165000 then; probably excessive but archivebot can run at con=6 so it should be fine
06:36:33<gosc>101001*
06:36:51<gosc>the first 1000 urls were checked by me already so yeah
06:37:13<pokechu22>eh, might as well redo it just to record it to WARC
06:37:19<gosc>from there how do we handle the 2xx files? since we still need to get the actual game contents from them
06:37:25<gosc>alright
06:38:29<pokechu22>Either use the meta-warc to determine which ones are 200s and redownload and parse those, or download the WARC and parse them from there (using e.g. https://pypi.org/project/warcio/ which I've already got set up)
06:38:30<klea>gosc: can you give me a example url from one of those 200 files?
06:39:08<gosc>https://sims-campfire-prod-content.s3.amazonaws.com/dev/DL999/145292/assets/packages.android.manifest
06:39:10<klea>oh ok, if you'll do it i won't interrupt
06:39:13Mateon1 quits [Ping timeout: 272 seconds]
06:39:51midou quits [Ping timeout: 272 seconds]
06:40:01<gosc>"file": ".*" gives the name to put in https://eaassets-a.akamaihd.net/sims-campfire-prod/dev/DL999/*/assets/.*
06:40:14Mateon1 joins
06:40:31<gosc>it's json so I imagine it can be parsed and added to a url template automatically?
06:40:34<gosc>or something
06:40:56<pokechu22>Yeah, should be pretty simple
06:41:09<klea>sba
06:41:19<gosc>alright thanks
06:46:31<pokechu22>It's running, got a 200 at https://sims-campfire-prod-content.s3.amazonaws.com/dev/DL999/102326/assets/packages.ios.manifest plus a few before then that I didn't catch on the dashboard
06:46:51<pokechu22>that's 23.0.0, so hmm, there might be some before that point too
06:47:47<gosc>nice!
06:56:41midou joins
06:56:48<h2ibot>Hans5958 edited EyeEm (+28, Add category): https://wiki.archiveteam.org/?diff=59479&oldid=59472
07:04:19gosc quits [Client Quit]
07:08:49<h2ibot>Hans5958 edited Template:Partiallysaved (+1060, Add date on Partially saved): https://wiki.archiveteam.org/?diff=59480&oldid=58931
07:19:45Mateon1 quits [Ping timeout: 272 seconds]
07:22:40Mateon1 joins
07:43:10Webuser510528 joins
07:43:32Webuser510528 quits [Client Quit]
07:46:04APOLLO03 quits [Read error: Connection reset by peer]
07:46:21Mateon1 quits [Ping timeout: 272 seconds]
07:47:28Mateon1 joins
07:47:49APOLLO03 joins
08:11:03Mateon1 quits [Ping timeout: 272 seconds]
08:14:10Mateon1 joins
08:41:52Mateon1 quits [Client Quit]
08:47:01Wohlstand (Wohlstand) joins
08:50:05Island quits [Read error: Connection reset by peer]
08:56:05<ericgallager>question about the ".ps" page:
08:56:08<ericgallager>https://wiki.archiveteam.org/index.php/.ps
08:56:26<ericgallager>Are there also pages for other TLDs? If so, should there be a category for them?
09:00:10<h2ibot>Cooljeanius edited Template:Partiallysaved (+9, fix pasto): https://wiki.archiveteam.org/?diff=59481&oldid=59480
09:08:41midou quits [Ping timeout: 272 seconds]
09:28:34midou joins
09:33:04midou quits [Ping timeout: 256 seconds]
09:39:06midou joins
09:42:52<@arkiver>ericgallager: may be nice to have them!
09:42:57<@arkiver>i'm not sure if there are
09:46:06midou quits [Ping timeout: 256 seconds]
09:48:50Kabaya quits [Read error: Connection reset by peer]
09:48:58Kabaya joins
10:00:01midou joins
10:17:43midou quits [Ping timeout: 272 seconds]
10:27:40midou joins
10:32:58Dada joins
10:36:23<h2ibot>Manu edited Distributed recursive crawls (+68, Candidates: Add https://parovoz.com/): https://wiki.archiveteam.org/?diff=59482&oldid=59083
10:42:24<h2ibot>Manu edited Distributed recursive crawls (+76, Candidates: Add https://guyanachronicle.com/): https://wiki.archiveteam.org/?diff=59483&oldid=59482
11:24:03<ericgallager>could start by importing Wikipedia's list, I guess: https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
11:40:41midou quits [Ping timeout: 272 seconds]
11:49:21midou joins
11:59:21Wohlstand quits [Quit: Wohlstand]
12:00:03Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:45Bleo182600722719623455222 joins
12:22:20<justauser>klea: 7z is "solid" by default, but you can switch it off.
12:26:38_null quits [Quit: Connection closed]
12:28:05_null (_null) joins
12:35:35<klea>justauser: thanks
12:38:56<justauser>I'm not sure whether it causes it to make a good index, though.
12:39:40<h2ibot>Sanqui edited Deathwatch (+222, Add HUMANITY stage deletion): https://wiki.archiveteam.org/?diff=59484&oldid=59069
12:39:41<h2ibot>Sanqui edited Deathwatch (+124, fix links): https://wiki.archiveteam.org/?diff=59485&oldid=59484
12:39:43<justauser>But at least it turns files into individually compressed streams.
12:40:12<klea>do you happen to have a recommended set of options?
12:40:40<h2ibot>Sanqui edited Deathwatch (+11, I put HUMANITY in the wrong year...): https://wiki.archiveteam.org/?diff=59486&oldid=59485
12:42:50<klea>i've seen the wikiteam project use: 7z a -t7z -m0=lzma2 -mx=9 -scsUTF-8 -md=64m -ms=off archivefile.7z archivefolder
12:51:03Wohlstand (Wohlstand) joins
12:55:30sec^nd quits [Remote host closed the connection]
12:55:53sec^nd (second) joins
13:25:58midou quits [Ping timeout: 256 seconds]
13:45:36<justauser>I personally stick to defaults when using 7z. WT might be a good one, "-ms=off" is what makes it non-solid.
13:47:58_null quits [Client Quit]
13:49:13_null (_null) joins
13:49:14_null quits [Remote host closed the connection]
13:50:46_null (_null) joins
13:54:31etnguyen03 (etnguyen03) joins
13:55:40gosc joins
13:56:24_null quits [Client Quit]
13:58:51_null (_null) joins
14:00:28_null quits [Client Quit]
14:01:59_null (_null) joins
14:06:22midou joins
14:12:22_null quits [Client Quit]
14:14:19_null (_null) joins
14:15:49_null quits [Client Quit]
14:17:46_null (_null) joins
14:23:13_null quits [Client Quit]
14:26:09_null (_null) joins
14:29:22etnguyen03 quits [Client Quit]
14:49:38etnguyen03 (etnguyen03) joins
14:56:15<klea>thanks
15:01:44midou quits [Ping timeout: 256 seconds]
15:05:47<cruller>Btw the default command used by PeaZip is as follows: '/path1/res/bin/7z/7z' a -t7z -m0=LZMA2 -mmt=on -mx3 -md=4m -mfb=32 -ms=1g -mqs=on -sccUTF-8 -bb0 -bse0 -bsp2 '-w/path2/' -snl -mtc=on -mta=on '/path2/directory_name.7z' '/path2/directory_name'
15:10:18nexussfan (nexussfan) joins
15:10:55midou joins
15:13:18Dada quits [Remote host closed the connection]
15:13:30Dada joins
15:26:08TunaLobster quits [Quit: So long and thanks for all the fish]
15:30:03TunaLobster joins
15:44:27cyanbox joins
15:45:34Czechball joins
15:56:30szczot3k quits [Remote host closed the connection]
15:56:44szczot3k (szczot3k) joins
16:11:52szczot3k quits [Remote host closed the connection]
16:13:36SootBector quits [Remote host closed the connection]
16:14:43SootBector (SootBector) joins
16:18:38<klea>well, i choose mx=9 for the command i gave
16:18:47szczot3k (szczot3k) joins
16:27:59ThreeHM quits [Quit: WeeChat 4.7.2]
16:44:02etnguyen03 quits [Client Quit]
16:44:36ThreeHM (ThreeHeadedMonkey) joins
16:50:32midou quits [Ping timeout: 256 seconds]
16:59:49midou joins
17:05:07_null quits [Client Quit]
17:06:22_null (_null) joins
17:28:32<h2ibot>KleaBot edited Main Page/Current Warrior Project (-6, Obtained data from WarriorHQ file): https://wiki.archiveteam.org/?diff=59487&oldid=58260
17:40:34Dada quits [Remote host closed the connection]
17:40:47Dada joins
17:43:01_null quits [Client Quit]
17:44:58_null (_null) joins
17:46:55_null quits [Client Quit]
17:50:19etnguyen03 (etnguyen03) joins
17:51:17_null (_null) joins
17:52:38_null quits [Client Quit]
17:55:40_null (_null) joins
17:56:37_null quits [Client Quit]
17:57:52_null (_null) joins
18:16:22etnguyen03 quits [Client Quit]
18:26:33Kotomind quits [Read error: Connection reset by peer]
18:26:45Kotomind joins
18:29:05<justauser>f_: No AFAIK.
18:29:44<justauser>I think Kiwix still creates ZIMs for their projects, but that's it.
18:30:36<justauser>They are supposed to still be providing dumps, but now you have to login and click through a promise not to use the data for such-and-such.
18:40:36gosc quits [Quit: Leaving]
18:45:01Basti joins
18:45:05<Basti>!stats eyeem BastiNOH
18:46:44<Basti>!stats telegram BastiNOH
18:47:01<Basti>!stats youtube BastiNOH
18:48:10Basti leaves
18:48:13Basti joins
18:48:26Basti quits [Read error: Connection reset by peer]
18:49:31lumidify quits [Remote host closed the connection]
19:03:23lumidify (lumidify) joins
19:03:43abirkill (abirkill) joins
19:05:18etnguyen03 (etnguyen03) joins
19:07:23HP_Archivist quits [Quit: Leaving]
19:16:05Webuser209288 joins
19:16:49<Webuser209288>Was this link below from MS' older dl center saved http://www.microsoft.com/downloads/details.aspx?FamilyId=19957FF9-1CDF-4594-AC32-C9BDDDA4873C&displaylang=en
19:19:04Webuser209288 quits [Client Quit]
19:20:58Webuser349614 joins
19:21:15Webuser349614 quits [Client Quit]
19:27:55rdg leaves [WeeChat 4.4.2]
20:01:13Gadelhas562873784438 joins
20:29:44Dada quits [Remote host closed the connection]
20:29:56Dada joins
20:35:19abirkill quits [Remote host closed the connection]
20:36:00abirkill (abirkill) joins
20:40:42Island joins
20:59:55abirkill quits [Ping timeout: 272 seconds]
21:11:27abirkill (abirkill) joins
21:26:51chrismeller quits [Quit: chrismeller]
21:27:49DogsRNice joins
21:28:54chrismeller3 (chrismeller) joins
21:29:05Wohlstand quits [Remote host closed the connection]
21:47:23chrismeller3 quits [Client Quit]
21:51:16<h2ibot>Klea edited Obsidian (+92, obsidian supports custom domains): https://wiki.archiveteam.org/?diff=59488&oldid=58872
21:55:16<h2ibot>Klea edited Obsidian (+38, Add example obsidian.md): https://wiki.archiveteam.org/?diff=59489&oldid=59488
21:59:08chrismeller3 (chrismeller) joins
22:03:45mrminemeet quits [Read error: Connection reset by peer]
22:07:00mrminemeet joins
22:12:56Webuser212986 joins
22:13:32Webuser212986 quits [Client Quit]
22:25:06chrismeller3 quits [Client Quit]
22:46:51etnguyen03 quits [Quit: Konversation terminated!]
23:15:29abirkill quits [Client Quit]
23:15:45abirkill (abirkill) joins
23:28:46lennier2 joins
23:30:47chrismeller3 (chrismeller) joins
23:31:55lennier2_ quits [Ping timeout: 272 seconds]
23:44:05Dada quits [Remote host closed the connection]
23:44:18Dada joins
23:49:36APOLLO03 quits [Read error: Connection reset by peer]
23:49:39tzt quits [Ping timeout: 272 seconds]
23:50:19APOLLO03 joins
23:54:21<masterx244|m>https://tvmanifest.iac.asp.att.net/Manifests/
23:54:21<masterx244|m>Random open directory that was linked on a reverse engineering discord
23:57:14Gadelhas562873784438 quits [Ping timeout: 256 seconds]