00:01:44AmAnd0A quits [Ping timeout: 252 seconds]
00:02:14AmAnd0A joins
00:03:53sonick (sonick) joins
00:04:20AmAnd0A quits [Read error: Connection reset by peer]
00:04:37AmAnd0A joins
00:11:38fullpwnmedia joins
00:11:40AmAnd0A quits [Ping timeout: 265 seconds]
00:13:38AmAnd0A joins
00:14:44AmAnd0A quits [Read error: Connection reset by peer]
00:15:02AmAnd0A joins
00:33:50AmAnd0A quits [Ping timeout: 252 seconds]
00:34:37AmAnd0A joins
01:11:01Mateon2 joins
01:12:41Mateon1 quits [Ping timeout: 252 seconds]
01:12:41Mateon2 is now known as Mateon1
01:25:33Jonimus quits [Quit: WeeChat 3.3]
01:33:31Mateon2 joins
01:34:08Mateon1 quits [Ping timeout: 252 seconds]
01:34:08Mateon2 is now known as Mateon1
01:37:34tbc1887 quits [Client Quit]
01:37:57tbc1887 (tbc1887) joins
01:45:48za3k quits [Client Quit]
01:59:46AmAnd0A quits [Read error: Connection reset by peer]
02:02:08AmAnd0A joins
02:28:04icedice quits [Client Quit]
02:28:28icedice (icedice) joins
03:03:48systwi quits [Ping timeout: 252 seconds]
03:12:52systwi (systwi) joins
04:07:08Naruyoko joins
06:01:59JackThompson3 quits [Ping timeout: 252 seconds]
06:19:03hitgrr8 joins
07:16:41c3manu (c3manu) joins
07:48:10zhongfu quits [Quit: cya losers]
07:52:29zhongfu (zhongfu) joins
07:56:02zhongfu quits [Client Quit]
07:58:02tbc1887 quits [Ping timeout: 252 seconds]
07:59:59tbc1887 (tbc1887) joins
08:00:11zhongfu (zhongfu) joins
08:29:00<Hans5958>Freenom is a shitter
08:46:12lumidify quits [Quit: leaving]
08:48:33spirit quits [Client Quit]
08:59:02lumidify (lumidify) joins
09:01:24Naruyoko5 joins
09:03:08Naruyoko quits [Ping timeout: 252 seconds]
09:19:25BlueMaxima joins
10:13:29BlueMaxima quits [Read error: Connection reset by peer]
10:35:12systwi__ (systwi) joins
10:35:39systwi quits [Ping timeout: 265 seconds]
12:05:04Ruthalas5 quits [Ping timeout: 265 seconds]
12:11:31Ruthalas5 (Ruthalas) joins
12:32:52icedice quits [Client Quit]
12:45:29spirit joins
12:46:27icedice (icedice) joins
12:52:57icedice quits [Client Quit]
12:55:31icedice (icedice) joins
13:28:09HP_Archivist (HP_Archivist) joins
14:35:03<HP_Archivist>How would one go about archiving this page that hosts what appears to be scanned magazine or article pages, but only allows one download per file individually vs all at once?
14:35:10<HP_Archivist>https://krantenbankzeeland.nl/issue/pzc/1988-06-11/edition/0/page/21
14:41:04za3k joins
14:47:03sec^nd quits [Remote host closed the connection]
14:47:25sec^nd (second) joins
15:36:31<joepie91|m>I don't suppose someone here knows a way to stash 500TB of historical package builds for cheap? ref. https://discourse.nixos.org/t/the-nixos-foundations-call-to-action-s3-costs-require-community-support/28672
15:49:12HP_Archivist quits [Read error: Connection reset by peer]
15:49:51HP_Archivist (HP_Archivist) joins
16:05:53AmAnd0A quits [Read error: Connection reset by peer]
16:07:37AmAnd0A joins
16:16:52<icedice>joepie91|m: 4 x 150TB Hetzner dedis resold by hostingby.design (formerly walkerservers/seedbox.io): https://hostingby.design/dedi-hetz/
16:18:26<joepie91|m>hm, what are they offering beyond what hetzner offers?
16:18:45<icedice>4 x 164,95€/month = 655,80€/month
16:18:48<icedice>Cheaper prices
16:18:54<joepie91|m>how's that work o_O
16:19:06<icedice>Volume discounts, I think
16:19:11<icedice>The owner is solid though
16:19:40<icedice>walkerservers/Seedbox.io has a great reputation
16:20:53<joepie91|m>mm
16:21:01joepie91|m will keep it in mind
16:21:48<icedice>https://www.hetzner.com/dedicated-rootserver/matrix-sx
16:21:52<icedice>^ For comparison
16:22:22<icedice>Hetzner charges 247,28€/month for 160TB
16:22:41<joepie91|m>right
16:23:11<icedice>And 404,36€/month for 224TB
16:23:13<icedice>Hmm
16:24:14<icedice>Seeing as the specs don't match, 150TB vs 160TB, I guess Hostingby.design co-locates with Hetzner and builds their own servers?
16:25:01<icedice>Not sure about volume discounts at Hetzner btw, but I know they at the very least have volume discounts at Leaseweb NL
16:25:30<icedice>And that's why their Leaseweb dedis have that good pricing
16:25:57<icedice>They're also pretty privacy-friendly
16:26:34<joepie91|m>that's less of a concern for this case :)
16:26:45<icedice>Yeah, I know
16:27:00<icedice>I'm just saying
16:27:04<joepie91|m>right
16:27:04<icedice>They seem pretty nice
16:29:32<icedice>Normally with Hetzner I would recommend tunneling it through a reverse proxy at some more chill hosting provider like BuyVM, but I guess NixOS doesn't have any major trolls in their community
16:29:56<icedice>Since Hetzner will respond to bogus complaints made by anyone and take servers down for it
16:30:39<joepie91|m>I wouldn't expect trouble with that in this case yeah
16:31:12<icedice>Yeah, it'll probably be all right
16:35:06<icedice>Does GitHub and GitLab have any limits?
16:36:16<icedice>Since it's open source software it should be possible to host it there
16:36:50<icedice>Would be a pain in the ass to upload 500TB of releases there (assuming they'd allow it), but there might be some way to automate it?
16:38:08<icedice>There's also SourceForge
16:38:08<spirit>google drive *ducks and hides*
16:38:28<icedice>Anything over 100TB will require manual approval from Google
17:14:16<imer>spirit: not anymore unfortunately
17:14:23<imer>5TB/user
17:14:24<spirit>i know, i was kidding
17:14:29<spirit>sorry
17:14:38<imer>you're fine :)
17:14:48<joepie91|m>icedice: anything that might violate ToS is pretty much off the table unfortunately
17:15:00<imer>just dealing with that migration currently lol they send me two emails a day
17:15:06<joepie91|m>this includes things that are reliant on goodwill of interpretation :)
17:15:14<imer>"hey did you know you're over the storage limit?"
17:15:17<joepie91|m>(as that's what I'd like to get away from)
17:22:25<imer>joepie91|m: it really depends on how valuable the data is, i'd say colo a box (or two for redundancy) with a bunch of drives (raw you'd need like 28x18tb, so like 35x18tb with 8+2 raidz2 vdevs? - thats easily doable) if you can tolerate data => gone in the unlikely event of the dc exploding
17:23:37<imer>higher up-front cost, but then you're only paying for colo/the odd replacement drive
17:24:32<imer>of course only if there's sysadmin knowledge, probably a bad idea otherwise
17:26:02<joepie91|m>colo is not really an option for organizational constraint reasons
17:26:07<joepie91|m>(I wouldn't be the one managing it)
17:26:35<imer>yeah, makes sense
17:26:42<joepie91|m>it'll probably be possible to argue for some software-level maintenance but once it requires someone to show up somewhere physically in person it'll very likely not be considered as an option :p
17:28:19<hexa->I have not read nixpkgs on my screen yet, but
17:28:27<hexa->at, there it is :D
17:30:55<FireFly>hehe
17:31:40<hexa->shoudl we at some point go the self-host route I would hope to scale horizontally, not vertically
17:31:50<hexa->being able to do maintenance without really affecting anyone
17:32:04<hexa->loosing a machine, and not having to spend the night to fix shit
17:32:38<hexa->and I remember that when I started using nixpkgs in 2019 it was said that the cached was between 200 and 250 TB
17:32:45<hexa->so growth is another issue that hasn
17:32:46<icedice><joepie91|m> icedice: anything that might violate ToS is pretty much off the table unfortunately
17:32:50<hexa->'t been talked about much yet
17:33:08<icedice>Yeah, it would have to be according to ToS obviously otherwise you put the data at risk
17:33:32<icedice>But I was wondering if GitHub/GitLab/SourceForge actually have any storage limits in place
17:33:48<icedice>Most project probably don't take up that much storage space
17:34:14<hexa->we only host the nixpkgs.git with github, and that is much smaller
17:34:52<hexa->the 450TB is source tarballs that we likely can'
17:35:01<hexa->t get back anymore, and build results from the last 10y or so
17:41:57AlsoHP_Archivist joins
17:41:57HP_Archivist quits [Ping timeout: 265 seconds]
17:48:34AmAnd0A quits [Ping timeout: 252 seconds]
17:49:03AmAnd0A joins
17:55:45spirit quits [Client Quit]
17:56:00<nicolas17>hexa- joepie91|m: how do I download some files from there?
17:56:33<joepie91|m>nicolas17: huh?
17:57:20<nicolas17>I want to get some package builds to see what the data is like, but there's no file listing or anything like that :)
17:57:48<joepie91|m>ah
17:57:56<joepie91|m>that's uh, slightly nontrivial
17:58:18<joepie91|m>easiest way is to pick out something from hydra.nixos.org, look up the narinfo hash which I believe is in there, and go from there
17:59:13<joepie91|m>then cache.nixos.org/<narhash>.nar or .nar.xz I believe
17:59:21<joepie91|m>it contains references to other stuff
17:59:53<joepie91|m>generally speaking each build has a .nar (package metadata), .ls (internal file listing), and then a tarball containing the actual package
18:00:44<hexa->actually .narinfo I think
18:00:58<joepie91|m>oh
18:01:02<joepie91|m>hm, right
18:01:14<joepie91|m>yeah please apply a "from memory" disclaimer to all of the above :p
18:01:48<hexa->http://cache.nixos.org/rri3iysi83ajafbc21qhjxn25hw8xmn3.narinfo
18:01:53<hexa->that's firefox 113.0.2 from nixos-23.05
18:02:35<hexa->it references URL: nar/17v5jakk2aj09njy4w5v5lmwqsnd17hqv6wyvphkwywj16w17b0b.nar.xz
18:02:47<hexa->and that is the url below cache.nixos.org again
18:02:52spirit joins
18:05:16thuban quits [Read error: Connection reset by peer]
18:05:48thuban joins
18:23:34<nicolas17>urgh
18:23:48<nicolas17>hexa-: that one is actually firefox debug symbols
18:23:58<nicolas17>which are internally compressed :/
18:24:59nicolas17 hunts for the actual binary
18:26:26<hexa->oh, sorry :D
18:26:48<nicolas17>there we go, 08709hdrbqqba4l9zgsi0kqmlklznym1fl3n82s8i9rwga3bggnm.nar 11a4ymjgl6kpmw1yxggr4mvk3835md3wsdh7yhskxgbafczayhmb.nar
18:26:53<hexa->95w0f7cvwrf195j2d83fplzcyykrnq9i
18:26:58<hexa->is the one for firefox
18:37:28<@JAA>500 TB, wow. All of Debian package history is like 150.
18:38:33<joepie91|m>JAA: sure, but https://repology.org/graph/map_repo_size_fresh.svg
18:38:37<joepie91|m>:p
18:38:52<@JAA>Yeah, fair, lol
18:40:27<@JAA>Are there any similar statistics for data size somewhere?
18:40:32<nicolas17>I think some deltaing or deduplication would be possible, but so far it seems it will need significant amounts of CPU and may not save enough disk :/
18:41:16<ehmry>I'd like to know if there is a way to drop the builds and keep the cached sources
18:41:45<ehmry>I suppose that might involve traversing the entire nixpkgs git history to find the tarballs
18:45:17<nicolas17>I can take the firefox-unwrapped-113.0.2 .nar.xz (60MB), apply a delta to it (15MB xdelta3, or 3.5MB bsdiff), compress it back with xz -6, and get a bit-identical result to the original firefox-unwrapped-113.0.1 .nar.xz
18:45:24<nicolas17>but the recompression takes 2 minutes so if you do it on the fly the user gets a 500KB/s download while the server is fully using a CPU core
18:46:58<nicolas17>missed a step while editing my message :P of course I decompress the .xz to apply the delta
18:51:50<ehmry>I've done some experiments and EROFS images are good at being deduped at a block level. in EROFS runs of compression don't go longer than a fixed block size
18:52:06<@JAA>icedice, joepie91|m: One note on Hetzner, you can usually get cheaper deals through the auction servers. As it happens, I just went over that the other day. At the time, they had a server with 15x10TB at €1.22/TB/mo with a raidz3-type setup.
18:52:26<ehmry>EROFS isn't designed as an archive format though
18:53:14<nicolas17>ehmry: block deduplication would be great if there's a tar or nar with some changed files and some identical ones
18:53:43<nicolas17>but firefox is mostly large amounts of code, and recompiling a binary with small code changes can cause lots of tiny changes all over the file
18:54:49<nicolas17>the two .nar.xz I tried add up to 121MB, decompressing them and storing them in borg (block deduplication + compressing the blocks with lzma) takes 112MB
18:57:48<nicolas17>storing one .nar.xz + the bsdiff takes 64MB, at significant CPU cost :/
18:59:07<nicolas17>ehmry: actually even with erofs it'd eat CPU since you have to recompress the .nar to get the bit-identical .nar.xz back
19:00:30<nicolas17>and *nothing* will be able to dedup/delta (erofs, borg, restic, xdelta3, bsdiff...) if you don't unxz first
19:02:01za3k quits [Client Quit]
19:08:57<nicolas17>hexa-: "we lose the historical data because AWS egress fees are enormous, have we considered using AWS Snowmobile?" lol did this guy not see the specs?
19:11:08<nicolas17>snowmobile is for transferring 100PB
19:11:19<nicolas17>AWS tells you "that will cost you $1M" and you say "wow for this much data that's so much cheaper than my alternatives"
19:11:53<immibis>then you use snowball which is smaller
19:12:06<nicolas17>exactly
19:12:07<immibis>(Snow Family is a stupid name, CMV)
19:12:25<immibis>(also what the fuck is the point of the products that put EC2 instances at your site)
19:12:28<nicolas17>snowmobile is absolutely overkill for this data, it fits in 2 snowballs
19:12:32<fireonlive>when they released the aws snowball i laughed for a solid 5 minutes
19:12:39<fireonlive>no reason of course
19:12:46<nicolas17>fireonlive: did you see the snowmobile announcement video?
19:12:47<immibis>snowball used to be just a big hard drive but now they let you use it for compute and they sell different mixes of compute and storage... why??
19:12:58<fireonlive>oh is that the semi trailer lmao
19:13:09<@JAA>Shipping container of hard drives, yup.
19:13:15<nicolas17>fireonlive: yep
19:13:15<immibis>snowmobile is the semi trailer, snowball is a big briefcase, i think they added a mini one that is just an external hard drive
19:13:29<@JAA>'Snowcone' for the small one.
19:13:29<nicolas17>immibis: the mini one has mini EC2 instances too
19:13:32<fireonlive>ah yes!
19:13:43<immibis>nicolas17: whyyyyyyyy
19:13:47<fireonlive>protip to execs: when namings things; pop it and close variants into urban dictionary
19:13:49<nicolas17>fireonlive: https://www.youtube.com/watch?v=8vQmTZTq7nw when they rolled said semitrailer into the stage
19:14:38<fireonlive>'successful startups, given the way they collect data
19:14:41<fireonlive><_<
19:15:04<fireonlive>ahhh right this lmao
19:15:04<nicolas17>unproven startups paying for data storage with investor money*
19:15:09<fireonlive>yeee
19:15:36<fireonlive>i do wonder the engineering behind the stage to support a semi
19:15:47<fireonlive>oh it's not on the stage
19:16:09<nicolas17>also I'm sure it's empty
19:16:14<fireonlive>ah ye true
19:16:22<nicolas17>that's an empty container, not full of 100PB disks :P
19:16:31<fireonlive>if there's one line that will always go up no matter what it's AWS' pricing :3
19:18:13<nicolas17>I actually think it's very rare that they raise prices
19:18:25that_lurker quits [Client Quit]
19:18:44that_lurker (that_lurker) joins
19:19:27<immibis>i feel like it would've been more entertaining if they'd had the truck come onto the stage without any fanfare, like just sneaking up behind the presenter while he's talking
19:20:32<@JAA>Or if the container had randomly collapsed while given a pat or something. Oh wait, wrong company.
19:20:50<nicolas17>immibis: https://www.youtube.com/watch?v=8_Xs8Ik0h1w&t=3225s
19:21:29<immibis>nicolas17: do you understand the point of the AWS on-premises stuff?
19:21:54<nicolas17>afaik most of cloud stuff is about moving things from the cap-ex budget line to the op-ex budget line
19:22:24<immibis>so basically corporate bullshit
19:22:40<immibis>can I get rich quick if I just buy servers for companies and have them pay monthly?
19:22:40<nicolas17>"it costs more in the long run" lol imagine caring about the long run
19:23:15<nicolas17>immibis: the number of VPS companies in existence seems to imply yes
19:24:46<joepie91|m>lol
19:25:04<joepie91|m>yeah that's just "being a hosting provider" really
19:25:21<joepie91|m>it's profitable if it's managed
19:25:27<nicolas17>but what do I know
19:25:33<nicolas17>they say running a company is like raising a child
19:25:37<nicolas17>and I want a vasectomy
19:25:38<joepie91|m>margins on unmanaged are razor thin though
19:26:19<immibis>nicolas17: they're providing flexibility, though - I mean literally just buy-now-pay-later for servers. Is the industry really that stupid?
19:26:31<immibis>(later = every month for the depreciation life of the server)
19:26:34<FireFly>the snow- naming doesn't make much sense, but.. I always found it a lil cute tbh
19:26:46<immibis>it's a snowball because it's cold storage, I guess
19:26:59<FireFly>I guess
19:27:11<immibis>or maybe they were thinking of continuously changing data and you are uploading a frozen snapshot
19:27:16<nicolas17>Backblaze's box is called Fireball
19:27:50<immibis>deezballs
19:28:51<immibis>nicolas17: what was the point of this link https://www.youtube.com/watch?v=8_Xs8Ik0h1w&t=3225s
19:32:24<FireFly>that collage of icons just reminds me of the aws logo quiz
19:40:28<fireonlive>*hard drives just spill out of the back of the snowmobile*
19:42:12<fireonlive>hm, so much 'direct to inbox' phishing from "trustwallet" for me lately
19:43:16<fireonlive>cashin' google slippin'
19:46:15<hexa->nicolas17: at this point in time alot of weird ideas are being floated by random people
19:46:49<hexa->people I have never heard of in the nixpkgs context
19:47:42<hexa->imo the most viable paths are going with b2 or r2 and their solution to kill the egress fees
19:47:53<hexa->b2 would mean we could keep fastly around
19:48:02<hexa->not sure how that would work with cloudflare in the picture
19:48:20<hexa->ultimately we should selfhost that shit for cost reasons
19:48:37<nicolas17>immibis: werner saying "we have a lot of services but it's your fault"
19:49:00<hexa->hydra already sits at hetzner, so why shouldn't the cache as well?
19:49:15<hexa->instead it sits at us-east-1 and we're pushing traffic over the fucking atlantic
19:49:32<hexa->after receiving it from builder who are also located in locations like dallas or washington
19:49:43<nicolas17>I'm working on that deduplication stuff anyway, for Apple archival reasons :P
19:50:18<nicolas17>I know someone with upwards of 50TB of Apple updates stored in a NAS and it should be possible to shrink that number a *lot*
19:50:18<hexa->the primary thing to find out is … who has tons of experience with an s3 compatible object store in that weight class
19:50:38<hexa->https://discourse.nixos.org/t/the-nixos-foundations-call-to-action-s3-costs-require-community-support/28672/53?u=hexa
19:50:51<hexa->this guy is alot smarter than me and always worth a read tbh
19:51:15<hexa->think he talked about fastcdc
19:51:24<hexa->> FastCDC: A Fast and Efficient Content-Defined Chunking Approach for Data Deduplication
19:57:01<nicolas17>yeah but again it depends on having decompressed files
19:57:22<joepie91|m>they definitely know what they're talking about, but they also definitely have a very... CDN sysop perspective :p
19:58:13<joepie91|m>like I mentioned in my reply, it's all good and well that you need specific infrastructure characteristics for optimal performance, but if we can't afford it then we can't afford it
19:59:19<joepie91|m>this is something that I often run into with people who work a lot with AWS-y stuff - their entire way of reasoning about infrastructure is super strongly focused on the exact technical properties provided by those specific systems, with kind of the assumption of unlimited budget and no apparent experience with how to get the most out of a shoestring budget
19:59:34<FireFly>the approach flokli mentioned experimenting with is interesting too (git-style tree/blob storage with deduplication, synthesizing nar files on demand)
19:59:42<nicolas17>and taking Firefox out of the deduplicated data store and compressing it to produce a file identical to the current .nar.xz has a 500KB/s output on my laptop...
19:59:45<joepie91|m>often presenting the 'perfect approach' as if it's the only acceptable approach
19:59:56<joepie91|m>(this is not specific to that user; it's something I encounter a lot)
20:02:44<hexa->nicolas17: good point
20:03:05<hexa->FireFly: tvix when
20:03:06<hexa->sorry :D
20:03:07<nicolas17>for archival that may be good enough
20:03:37<hexa->and they'd likely want something offering an s3 api again
20:03:41<nicolas17>but if you want decent latency then no
20:04:04<hexa->so minio, garage, ceph, etc.
20:04:12<FireFly>hexa-: I mean yeah, that's the problem :p not as a replacement today, but maybe in the medium-longer term
20:04:20<FireFly>(tvix-store-based substituter I mean)
20:04:27<hexa->yeah, I said what I think about what should happen next
20:04:30<hexa->r2 or b2
20:04:34<hexa->we need to leave s3
20:04:55<nicolas17>eg. my friend with the Apple archive often has his NAS turned off, so if he needs to wait for it to boot up and spin up the disks etc, he's already not particularly caring about TTFB :P
20:05:02<hexa->even paying 32k USD once beats staying for 4 months
20:05:12<hexa->hah, yeah :D
20:05:14<joepie91|m>hexa-: I think we can make the egress much lower
20:05:21<joepie91|m>by exploiting the lightsail loophole
20:05:22cdreimanu (c3manu) joins
20:05:36cdreimanu quits [Remote host closed the connection]
20:05:37<joepie91|m>does require a bit of elbow grease
20:05:38<nicolas17>joepie91|m: that's explicitly in the ToS as "don't", but also doesn't it give you like 1TB free?
20:05:45<joepie91|m>oh, it is?
20:05:49<hexa->both r2 and b2 sound like they waive the egress fees some way
20:05:59<joepie91|m>and well I was thinking of the $5/mo plan actually
20:06:14<joepie91|m>times N
20:06:19<hexa->yeah, fcking aws like that would agree with me
20:06:29<nicolas17>"51.3. You may not use Amazon Lightsail in a manner intended to avoid incurring data fees from other Services"
20:06:36<joepie91|m>bleh
20:06:44<hexa->which means they know their data fees are fucked up
20:06:49<joepie91|m>oh of course
20:06:49<hexa->q.e.d.
20:06:52<joepie91|m>it's literally their business
20:07:07<joepie91|m>egress is where AWS/Azure/GCP/etc. turn the juicy profits, by design
20:07:11<immibis>nicolas17: i did an experiment with deduplicating minecraft modpacks and they deduplicate very nicely. The sum total of all modpacks in that collection deduplicated from about 400GB down to about 5GB, just with per-file deduplication and no diffing similar files. (It helps that every Java class is a separate file)
20:07:30<joepie91|m>because it's the one metric that a) almost nobody can accurately estimate, b) overages are a thing for, and c) nobody thinks about until it appears on the invoice
20:07:36<fireonlive>egress fees are... egre....gious ... :D
20:07:49<hexa->thanks :D
20:07:55<immibis>chunking might not be the best strategy. If you have a version order to go by, it may make more sense to diff the same files from adjacent versions
20:07:55c3manu quits [Ping timeout: 265 seconds]
20:08:10<hexa->I agree that conjuring up the .narinfo file would sound very sweet
20:08:14<immibis>(binary diff obviously)
20:09:07<immibis>PSA for small AWS users: as weell as the 100GB free egress per month, you have an additional free 1000GB if you use cloudfront, even if you don't actually need cloudfront
20:12:25sonick quits [Client Quit]
20:12:31c3manu (c3manu) joins
20:35:05<nicolas17>immibis: yeah, I could do diffing or I could do content-defined chunking... but that's not the hard part
20:35:18<nicolas17>the hard part is reproducible recompression
20:37:20<nicolas17>need to extract zip files such that I can put the bitwise-identical zip back together later, and do the deltas or cdc on the extracted files
20:41:40<immibis>I did that. My implementation is really horrible and slow, though.
20:43:19<immibis>I noticed that most files in my sample are LZ-compressed the obvious way using the most recent match, but some have "sub-optimal" matches all over the place, which I guess could be from a fancy algorithm designed to optimize the LZ+huffman together instead of each one separately, or a compressor with memory limitations
20:44:11<nicolas17>oh that's awful
20:44:21<nicolas17>in my case I can just use zlib
20:44:30<immibis>deflate is LZ77+huffman
20:44:57<nicolas17>yeah I mean, zlib produces an identical result, vs having to match the exact behavior of a different deflate implementation
20:45:57<immibis>my implementation of reproducible LZ was to see how many of the next bytes could be encoded as one symbol (literal or back-reference), then note all possible ways to encode that symbol, then write which one appears in the compressed data. Usually this just gives you a long string of 0s; sometimes it doesn't.
20:46:09<immibis>(0 because of course you order them so the most optimal one is number 0)
20:46:27<immibis>nicolas17: if you know they were compressed by a certain version of zlib with certain settings, you just embed that
20:46:38<nicolas17>yep
20:47:38<nicolas17>...and then deal with the dmg inside the zip (which in most versions is stored uncompressed in the zip because it has its own block-wise zlib)
20:48:04<nicolas17>or nowadays lzfse
20:48:30<nicolas17>and make up a file format to store those layers and the instructions on how to recompress them
20:48:38<immibis>zip container is easy, you just extract all the parts that are not the file data, and store them separately, and then recurse into the file data, since you assume the file size is large compared to the metadat size
21:11:53<icedice>JAA: Right, I forgot about that
21:11:56<icedice>Yeah, true
21:12:14<icedice>I'd rather deal with hostingby.design than Hetzner though
21:12:20<icedice>But whatever gets the best pricing
21:39:33hitgrr8 quits [Client Quit]
21:39:45sec^nd quits [Remote host closed the connection]
21:47:08<icedice>joepie91|m: I'm pretty sure you don't want to delve into the decentralized hosting rabbithole, but I figured I'd send this anyway because why not:
21:47:10<icedice>https://pixeldrain.com/about
21:47:23<icedice>https://pixeldrain.com/hosting
21:47:47<icedice>^ File hosting site using Sia and their requirements from hosters
21:48:22<joepie91|m>I'm interested in decentralized storage mechanisms, but not in cryptogrifts :)
21:49:25<nicolas17>was Sia the one where the admin defended proof-of-work and "incompatible hashing algorithm to ensure mining hardware for other blockchains can't be reused"?
21:50:27<nicolas17>"i wonder if Apple is having to pull icons from popular reddit clients from WWDC slides -- apollo has found its way in them frequently in the past years" oh no
21:54:27c3manu quits [Client Quit]
21:55:55sec^nd (second) joins
22:00:15<fireonlive>i miss live apple keynotes
22:03:25<nicolas17>gonna be a busy monday for me
22:03:56<nicolas17>there's already 2 things I found in server-side config files that I won't know the *meaning* of until I dig into the iOS 17 beta
22:04:55<fireonlive>oooh
22:05:21<fireonlive>too bad one of them isn't <iMessageLessShit>true</...
22:05:24<fireonlive>:p
22:05:28<nicolas17>I swear they're using obscure abbreviations on purpose
22:05:48<nicolas17>home-rmvfsumbom=10.3.1
22:05:52<nicolas17>home-rmvfomdmwosu-internal=10.6
22:06:09<immibis>the only good proof-of-work is RandomX
22:06:44<imer>"rmvfomdmwosu" thats just someones cat walking across the keyboard
22:06:47<nicolas17>one of my theories is "required minimum version for something something something without software update"
22:07:03<immibis>(it's designed so that a CPU *is* essentially a RandomX ASIC, and only small gains are possible by specializing it for RandomX)
22:07:34<fireonlive>hm
22:07:45<immibis>mdm = mobile device management?
22:08:07<nicolas17>unlikely because there's rmvfoodmwosu and rmvfordmwosu too (single letter changed)
22:08:14<nicolas17>bottom of http://init.ess.apple.com/WebObjects/VCInit.woa/wa/getBag?ix=5
22:08:37<fireonlive>my first thoguht was 'matter' because of home(kit) but eh probalby not
22:08:52<nicolas17>home- likely refers to Apple Home stuff yeah
22:09:51<fireonlive>can i zoom forward to when everything uses this already: https://csa-iot.org/all-solutions/matter/ lol
22:10:51<nicolas17>the other is in Apple Maps, there's a config file in protobuf format, and to get names for the protobuf fields I have to decompile the code, new fields already appeared that iOS 16 doesn't have names for
22:11:03<fireonlive>sadly i don't think they're collabing on 'cast <video/audio/screen> to device Y'
22:11:19<nicolas17>95 { 1: 1684690460826; 2: 0; }
22:13:05<nicolas17>and 57 { 1: "https://gsp57-ssl-bcx.ls.apple.com/dispatcher.arpc" }
22:15:08<fireonlive>:o
22:20:02<nicolas17>https://gspe35-ssl.ls.apple.com/geo_manifest/dynamic/config?os=ios&os_version=17.0 who's up for archiving this daily for every x.y version into WARCs? :)
23:08:04Miori quits [Remote host closed the connection]
23:19:31AlsoHP_Archivist quits [Client Quit]
23:20:03HP_Archivist (HP_Archivist) joins
23:38:05sonick (sonick) joins
23:39:07BlueMaxima joins