| 00:30:05 | | BlackWinnerYoshi quits [Client Quit] |
| 00:36:48 | <klea> | btw, does 7z have a index so it can do compression like we do with warcs where you can do random reads of records without decompressing everything? |
| 00:48:45 | | etnguyen03 quits [Client Quit] |
| 00:52:51 | | abirkill (abirkill) joins |
| 01:03:32 | | Shard7959154 (Shard) joins |
| 01:04:49 | | Shard795915 quits [Ping timeout: 272 seconds] |
| 01:04:49 | | Shard7959154 is now known as Shard795915 |
| 01:26:12 | | etnguyen03 (etnguyen03) joins |
| 01:51:02 | | lennier2_ joins |
| 01:54:13 | | lennier2 quits [Ping timeout: 272 seconds] |
| 02:07:21 | | Sluggs quits [Excess Flood] |
| 02:07:40 | | abirkill quits [Ping timeout: 256 seconds] |
| 02:08:31 | | Kotomind joins |
| 02:14:56 | | Sluggs (Sluggs) joins |
| 02:24:40 | | linuxgemini (linuxgemini) joins |
| 02:31:10 | <h2ibot> | Cruller edited CTGP-R (+21, Add [[Category:Gaming]]): https://wiki.archiveteam.org/?diff=59478&oldid=53564 |
| 02:34:25 | | linuxgemini quits [Client Quit] |
| 02:35:06 | | linuxgemini (linuxgemini) joins |
| 02:47:13 | <pabs> | https://onemileatatime.com/news/spirit-airlines-canceling-flights/ |
| 03:33:01 | | midou quits [Ping timeout: 272 seconds] |
| 03:51:23 | | Mateon1 quits [Ping timeout: 272 seconds] |
| 03:52:01 | | Mateon1 joins |
| 04:03:22 | | midou joins |
| 04:12:20 | | midou quits [Ping timeout: 256 seconds] |
| 04:25:28 | | Mateon1 quits [Remote host closed the connection] |
| 04:28:53 | | Mateon1 joins |
| 04:47:23 | | gosc joins |
| 05:05:43 | | mrminemeet joins |
| 05:06:07 | | mrminemeet_ quits [Ping timeout: 272 seconds] |
| 05:11:30 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:23:36 | | etnguyen03 quits [Remote host closed the connection] |
| 05:23:51 | | gosc quits [Ping timeout: 272 seconds] |
| 05:27:49 | | gosc joins |
| 05:42:05 | | nexussfan quits [Quit: Konversation terminated!] |
| 05:55:31 | | Mateon1 quits [Ping timeout: 272 seconds] |
| 05:57:05 | | Mateon1 joins |
| 05:59:58 | | midou joins |
| 06:12:37 | | midou quits [Ping timeout: 272 seconds] |
| 06:16:05 | <gosc> | how would I go about doing that sims thing? I tried using curl to check available urls but it was too slow I guess |
| 06:17:01 | <pokechu22> | I feel like you should be able to find a list of all released versions somewhere (google play?) |
| 06:18:19 | | Mateon1 quits [Ping timeout: 272 seconds] |
| 06:18:48 | <gosc> | pokechu22, the url didn't use the version number |
| 06:18:53 | | Mateon1 joins |
| 06:19:03 | <gosc> | not sure if old versions still run |
| 06:19:11 | <pokechu22> | oh |
| 06:19:15 | <gosc> | if they did, there would be like over 50 versions to individually check |
| 06:19:39 | <gosc> | 164239 is version 52.0.0 and 161660 is version 50.0.0 |
| 06:19:43 | <gosc> | copied from yesterday |
| 06:20:07 | <pokechu22> | Is 50.0.0 the oldest version? |
| 06:20:14 | <gosc> | nope |
| 06:20:18 | <pokechu22> | Do these URLs expire in some way? |
| 06:20:26 | <gosc> | they die on January 20 |
| 06:20:33 | <gosc> | because shutdown |
| 06:20:40 | <gosc> | 52.0.0 is the newest version |
| 06:21:06 | <pokechu22> | Right, but couldn't I just do an !ao < list in archivebot for like 160000 through 164239 to determine what gives 200s vs 404s? |
| 06:21:11 | <gosc> | version 50.0.0's files were saved in wayback which is how I could check |
| 06:21:21 | <gosc> | oh you can do that? |
| 06:21:39 | <gosc> | but wouldn't that archive the 404s though |
| 06:21:58 | <gosc> | also the list goes lower than 160000 |
| 06:21:59 | <pokechu22> | Yeah, the meta-warc lists the statuses for each result (and you can also download the full WARC and process the request body instead of redownloading) |
| 06:22:09 | <gosc> | I see |
| 06:22:12 | <pokechu22> | It would, those are pretty small so I don't feel like that's a problem |
| 06:22:47 | <pokechu22> | that's the approach I've been using for refsheet.net |
| 06:22:56 | <gosc> | I see |
| 06:23:03 | <gosc> | 145292 is 39.0.1 btw |
| 06:23:40 | <gosc> | well yeah you'd also have to parse the 200 urls, they're just a manifest listing the actual files |
| 06:24:10 | <pokechu22> | Yeah, that could either be done by redownloading it or by parsing them from the WARC |
| 06:24:38 | | midou joins |
| 06:24:51 | | ymgve_ joins |
| 06:25:35 | <klea> | i thought google exposed an api to get version numbers? |
| 06:27:22 | <pokechu22> | So, what do you think is a reasonable starting point? 100000? Doing it from 0 is also possible but I'm not sure how dense it is |
| 06:28:04 | <klea> | also potentially will 429 |
| 06:28:41 | <pokechu22> | It's s3; I haven't seen 429s on it |
| 06:29:01 | <klea> | oh |
| 06:29:05 | | ymgve__ quits [Ping timeout: 272 seconds] |
| 06:30:17 | <klea> | i thought you both were talking about archiving google play files |
| 06:30:29 | <@JAA> | AWS will be happy to serve you more data/requests so they can charge the bucket owner more. :-) |
| 06:31:12 | <pokechu22> | I'm not sure whether https://eaassets-a.akamaihd.net/ behaves the same way or not, but if it's the same data on both it should be fine |
| 06:31:56 | <pokechu22> | I guess I can bruteforce in reverse starting at 165000 down to 0 and if it seems like it's slow and there hasn't been anything I can stop it early |
| 06:32:17 | <klea> | yea |
| 06:32:27 | <klea> | stopping early is done with !abort? |
| 06:32:37 | <pokechu22> | Or an ignore if things need to be retried |
| 06:32:55 | <klea> | oh |
| 06:34:04 | | midou quits [Read error: Connection reset by peer] |
| 06:34:44 | <pokechu22> | https://sims-campfire-prod-content.s3.amazonaws.com/dev/DL999/161660/assets/packages.windows.manifest also exists, hmm |
| 06:35:10 | | midou joins |
| 06:35:27 | <gosc> | pokechu22, sorry did something; started at 100000 and finished at 101000 |
| 06:35:30 | <gosc> | those are done |
| 06:35:32 | <gosc> | it's all 403 |
| 06:36:08 | <gosc> | klea, no, some guy posted a link to it and it ended up on google lol |
| 06:36:19 | <pokechu22> | Alright, I'll do 100000-165000 then; probably excessive but archivebot can run at con=6 so it should be fine |
| 06:36:33 | <gosc> | 101001* |
| 06:36:51 | <gosc> | the first 1000 urls were checked by me already so yeah |
| 06:37:13 | <pokechu22> | eh, might as well redo it just to record it to WARC |
| 06:37:19 | <gosc> | from there how do we handle the 2xx files? since we still need to get the actual game contents from them |
| 06:37:25 | <gosc> | alright |
| 06:38:29 | <pokechu22> | Either use the meta-warc to determine which ones are 200s and redownload and parse those, or download the WARC and parse them from there (using e.g. https://pypi.org/project/warcio/ which I've already got set up) |
| 06:38:30 | <klea> | gosc: can you give me a example url from one of those 200 files? |
| 06:39:08 | <gosc> | https://sims-campfire-prod-content.s3.amazonaws.com/dev/DL999/145292/assets/packages.android.manifest |
| 06:39:10 | <klea> | oh ok, if you'll do it i won't interrupt |
| 06:39:13 | | Mateon1 quits [Ping timeout: 272 seconds] |
| 06:39:51 | | midou quits [Ping timeout: 272 seconds] |
| 06:40:01 | <gosc> | "file": ".*" gives the name to put in https://eaassets-a.akamaihd.net/sims-campfire-prod/dev/DL999/*/assets/.* |
| 06:40:14 | | Mateon1 joins |
| 06:40:31 | <gosc> | it's json so I imagine it can be parsed and added to a url template automatically? |
| 06:40:34 | <gosc> | or something |
| 06:40:56 | <pokechu22> | Yeah, should be pretty simple |
| 06:41:09 | <klea> | sba |
| 06:41:19 | <gosc> | alright thanks |
| 06:46:31 | <pokechu22> | It's running, got a 200 at https://sims-campfire-prod-content.s3.amazonaws.com/dev/DL999/102326/assets/packages.ios.manifest plus a few before then that I didn't catch on the dashboard |
| 06:46:51 | <pokechu22> | that's 23.0.0, so hmm, there might be some before that point too |
| 06:47:47 | <gosc> | nice! |
| 06:56:41 | | midou joins |
| 06:56:48 | <h2ibot> | Hans5958 edited EyeEm (+28, Add category): https://wiki.archiveteam.org/?diff=59479&oldid=59472 |
| 07:04:19 | | gosc quits [Client Quit] |
| 07:08:49 | <h2ibot> | Hans5958 edited Template:Partiallysaved (+1060, Add date on Partially saved): https://wiki.archiveteam.org/?diff=59480&oldid=58931 |
| 07:19:45 | | Mateon1 quits [Ping timeout: 272 seconds] |
| 07:22:40 | | Mateon1 joins |
| 07:43:10 | | Webuser510528 joins |
| 07:43:32 | | Webuser510528 quits [Client Quit] |
| 07:46:04 | | APOLLO03 quits [Read error: Connection reset by peer] |
| 07:46:21 | | Mateon1 quits [Ping timeout: 272 seconds] |
| 07:47:28 | | Mateon1 joins |
| 07:47:49 | | APOLLO03 joins |
| 08:11:03 | | Mateon1 quits [Ping timeout: 272 seconds] |
| 08:14:10 | | Mateon1 joins |
| 08:41:52 | | Mateon1 quits [Client Quit] |
| 08:47:01 | | Wohlstand (Wohlstand) joins |
| 08:50:05 | | Island quits [Read error: Connection reset by peer] |
| 08:56:05 | <ericgallager> | question about the ".ps" page: |
| 08:56:08 | <ericgallager> | https://wiki.archiveteam.org/index.php/.ps |
| 08:56:26 | <ericgallager> | Are there also pages for other TLDs? If so, should there be a category for them? |
| 09:00:10 | <h2ibot> | Cooljeanius edited Template:Partiallysaved (+9, fix pasto): https://wiki.archiveteam.org/?diff=59481&oldid=59480 |
| 09:08:41 | | midou quits [Ping timeout: 272 seconds] |
| 09:28:34 | | midou joins |
| 09:33:04 | | midou quits [Ping timeout: 256 seconds] |
| 09:39:06 | | midou joins |
| 09:42:52 | <@arkiver> | ericgallager: may be nice to have them! |
| 09:42:57 | <@arkiver> | i'm not sure if there are |
| 09:46:06 | | midou quits [Ping timeout: 256 seconds] |
| 09:48:50 | | Kabaya quits [Read error: Connection reset by peer] |
| 09:48:58 | | Kabaya joins |
| 10:00:01 | | midou joins |
| 10:17:43 | | midou quits [Ping timeout: 272 seconds] |
| 10:27:40 | | midou joins |
| 10:32:58 | | Dada joins |
| 10:36:23 | <h2ibot> | Manu edited Distributed recursive crawls (+68, Candidates: Add https://parovoz.com/): https://wiki.archiveteam.org/?diff=59482&oldid=59083 |
| 10:42:24 | <h2ibot> | Manu edited Distributed recursive crawls (+76, Candidates: Add https://guyanachronicle.com/): https://wiki.archiveteam.org/?diff=59483&oldid=59482 |
| 11:24:03 | <ericgallager> | could start by importing Wikipedia's list, I guess: https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains |
| 11:40:41 | | midou quits [Ping timeout: 272 seconds] |
| 11:49:21 | | midou joins |
| 11:59:21 | | Wohlstand quits [Quit: Wohlstand] |
| 12:00:03 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:45 | | Bleo182600722719623455222 joins |
| 12:22:20 | <justauser> | klea: 7z is "solid" by default, but you can switch it off. |
| 12:26:38 | | _null quits [Quit: Connection closed] |
| 12:28:05 | | _null (_null) joins |
| 12:35:35 | <klea> | justauser: thanks |
| 12:38:56 | <justauser> | I'm not sure whether it causes it to make a good index, though. |
| 12:39:40 | <h2ibot> | Sanqui edited Deathwatch (+222, Add HUMANITY stage deletion): https://wiki.archiveteam.org/?diff=59484&oldid=59069 |
| 12:39:41 | <h2ibot> | Sanqui edited Deathwatch (+124, fix links): https://wiki.archiveteam.org/?diff=59485&oldid=59484 |
| 12:39:43 | <justauser> | But at least it turns files into individually compressed streams. |
| 12:40:12 | <klea> | do you happen to have a recommended set of options? |
| 12:40:40 | <h2ibot> | Sanqui edited Deathwatch (+11, I put HUMANITY in the wrong year...): https://wiki.archiveteam.org/?diff=59486&oldid=59485 |
| 12:42:50 | <klea> | i've seen the wikiteam project use: 7z a -t7z -m0=lzma2 -mx=9 -scsUTF-8 -md=64m -ms=off archivefile.7z archivefolder |
| 12:51:03 | | Wohlstand (Wohlstand) joins |
| 12:55:30 | | sec^nd quits [Remote host closed the connection] |
| 12:55:53 | | sec^nd (second) joins |
| 13:25:58 | | midou quits [Ping timeout: 256 seconds] |
| 13:45:36 | <justauser> | I personally stick to defaults when using 7z. WT might be a good one, "-ms=off" is what makes it non-solid. |
| 13:47:58 | | _null quits [Client Quit] |
| 13:49:13 | | _null (_null) joins |
| 13:49:14 | | _null quits [Remote host closed the connection] |
| 13:50:46 | | _null (_null) joins |
| 13:54:31 | | etnguyen03 (etnguyen03) joins |
| 13:55:40 | | gosc joins |
| 13:56:24 | | _null quits [Client Quit] |
| 13:58:51 | | _null (_null) joins |
| 14:00:28 | | _null quits [Client Quit] |
| 14:01:59 | | _null (_null) joins |
| 14:06:22 | | midou joins |
| 14:12:22 | | _null quits [Client Quit] |
| 14:14:19 | | _null (_null) joins |
| 14:15:49 | | _null quits [Client Quit] |
| 14:17:46 | | _null (_null) joins |
| 14:23:13 | | _null quits [Client Quit] |
| 14:26:09 | | _null (_null) joins |
| 14:29:22 | | etnguyen03 quits [Client Quit] |
| 14:49:38 | | etnguyen03 (etnguyen03) joins |
| 14:56:15 | <klea> | thanks |
| 15:01:44 | | midou quits [Ping timeout: 256 seconds] |
| 15:05:47 | <cruller> | Btw the default command used by PeaZip is as follows: '/path1/res/bin/7z/7z' a -t7z -m0=LZMA2 -mmt=on -mx3 -md=4m -mfb=32 -ms=1g -mqs=on -sccUTF-8 -bb0 -bse0 -bsp2 '-w/path2/' -snl -mtc=on -mta=on '/path2/directory_name.7z' '/path2/directory_name' |
| 15:10:18 | | nexussfan (nexussfan) joins |
| 15:10:55 | | midou joins |
| 15:13:18 | | Dada quits [Remote host closed the connection] |
| 15:13:30 | | Dada joins |
| 15:26:08 | | TunaLobster quits [Quit: So long and thanks for all the fish] |
| 15:30:03 | | TunaLobster joins |
| 15:44:27 | | cyanbox joins |
| 15:45:34 | | Czechball joins |
| 15:56:30 | | szczot3k quits [Remote host closed the connection] |
| 15:56:44 | | szczot3k (szczot3k) joins |
| 16:11:52 | | szczot3k quits [Remote host closed the connection] |
| 16:13:36 | | SootBector quits [Remote host closed the connection] |
| 16:14:43 | | SootBector (SootBector) joins |
| 16:18:38 | <klea> | well, i choose mx=9 for the command i gave |
| 16:18:47 | | szczot3k (szczot3k) joins |
| 16:27:59 | | ThreeHM quits [Quit: WeeChat 4.7.2] |
| 16:44:02 | | etnguyen03 quits [Client Quit] |
| 16:44:36 | | ThreeHM (ThreeHeadedMonkey) joins |
| 16:50:32 | | midou quits [Ping timeout: 256 seconds] |
| 16:59:49 | | midou joins |
| 17:05:07 | | _null quits [Client Quit] |
| 17:06:22 | | _null (_null) joins |
| 17:28:32 | <h2ibot> | KleaBot edited Main Page/Current Warrior Project (-6, Obtained data from WarriorHQ file): https://wiki.archiveteam.org/?diff=59487&oldid=58260 |
| 17:40:34 | | Dada quits [Remote host closed the connection] |
| 17:40:47 | | Dada joins |
| 17:43:01 | | _null quits [Client Quit] |
| 17:44:58 | | _null (_null) joins |
| 17:46:55 | | _null quits [Client Quit] |
| 17:50:19 | | etnguyen03 (etnguyen03) joins |
| 17:51:17 | | _null (_null) joins |
| 17:52:38 | | _null quits [Client Quit] |
| 17:55:40 | | _null (_null) joins |
| 17:56:37 | | _null quits [Client Quit] |
| 17:57:52 | | _null (_null) joins |
| 18:16:22 | | etnguyen03 quits [Client Quit] |
| 18:26:33 | | Kotomind quits [Read error: Connection reset by peer] |
| 18:26:45 | | Kotomind joins |
| 18:29:05 | <justauser> | f_: No AFAIK. |
| 18:29:44 | <justauser> | I think Kiwix still creates ZIMs for their projects, but that's it. |
| 18:30:36 | <justauser> | They are supposed to still be providing dumps, but now you have to login and click through a promise not to use the data for such-and-such. |
| 18:40:36 | | gosc quits [Quit: Leaving] |
| 18:45:01 | | Basti joins |
| 18:45:05 | <Basti> | !stats eyeem BastiNOH |
| 18:46:44 | <Basti> | !stats telegram BastiNOH |
| 18:47:01 | <Basti> | !stats youtube BastiNOH |
| 18:48:10 | | Basti leaves |
| 18:48:13 | | Basti joins |
| 18:48:26 | | Basti quits [Read error: Connection reset by peer] |
| 18:49:31 | | lumidify quits [Remote host closed the connection] |
| 19:03:23 | | lumidify (lumidify) joins |
| 19:03:43 | | abirkill (abirkill) joins |
| 19:05:18 | | etnguyen03 (etnguyen03) joins |
| 19:07:23 | | HP_Archivist quits [Quit: Leaving] |
| 19:16:05 | | Webuser209288 joins |
| 19:16:49 | <Webuser209288> | Was this link below from MS' older dl center saved http://www.microsoft.com/downloads/details.aspx?FamilyId=19957FF9-1CDF-4594-AC32-C9BDDDA4873C&displaylang=en |
| 19:19:04 | | Webuser209288 quits [Client Quit] |
| 19:20:58 | | Webuser349614 joins |
| 19:21:15 | | Webuser349614 quits [Client Quit] |
| 19:27:55 | | rdg leaves [WeeChat 4.4.2] |
| 20:01:13 | | Gadelhas562873784438 joins |
| 20:29:44 | | Dada quits [Remote host closed the connection] |
| 20:29:56 | | Dada joins |
| 20:35:19 | | abirkill quits [Remote host closed the connection] |
| 20:36:00 | | abirkill (abirkill) joins |
| 20:40:42 | | Island joins |
| 20:59:55 | | abirkill quits [Ping timeout: 272 seconds] |
| 21:11:27 | | abirkill (abirkill) joins |
| 21:26:51 | | chrismeller quits [Quit: chrismeller] |
| 21:27:49 | | DogsRNice joins |
| 21:28:54 | | chrismeller3 (chrismeller) joins |
| 21:29:05 | | Wohlstand quits [Remote host closed the connection] |
| 21:47:23 | | chrismeller3 quits [Client Quit] |
| 21:51:16 | <h2ibot> | Klea edited Obsidian (+92, obsidian supports custom domains): https://wiki.archiveteam.org/?diff=59488&oldid=58872 |
| 21:55:16 | <h2ibot> | Klea edited Obsidian (+38, Add example obsidian.md): https://wiki.archiveteam.org/?diff=59489&oldid=59488 |
| 21:59:08 | | chrismeller3 (chrismeller) joins |
| 22:03:45 | | mrminemeet quits [Read error: Connection reset by peer] |
| 22:07:00 | | mrminemeet joins |
| 22:12:56 | | Webuser212986 joins |
| 22:13:32 | | Webuser212986 quits [Client Quit] |
| 22:25:06 | | chrismeller3 quits [Client Quit] |
| 22:46:51 | | etnguyen03 quits [Quit: Konversation terminated!] |
| 23:15:29 | | abirkill quits [Client Quit] |
| 23:15:45 | | abirkill (abirkill) joins |
| 23:28:46 | | lennier2 joins |
| 23:30:47 | | chrismeller3 (chrismeller) joins |
| 23:31:55 | | lennier2_ quits [Ping timeout: 272 seconds] |
| 23:44:05 | | Dada quits [Remote host closed the connection] |
| 23:44:18 | | Dada joins |
| 23:49:36 | | APOLLO03 quits [Read error: Connection reset by peer] |
| 23:49:39 | | tzt quits [Ping timeout: 272 seconds] |
| 23:50:19 | | APOLLO03 joins |
| 23:54:21 | <masterx244|m> | https://tvmanifest.iac.asp.att.net/Manifests/ |
| 23:54:21 | <masterx244|m> | Random open directory that was linked on a reverse engineering discord |
| 23:57:14 | | Gadelhas562873784438 quits [Ping timeout: 256 seconds] |