00:05:27TheTechRobo is now known as OriginalUsername
00:05:33<pabs>arkiver: yeah, I thought so. btw, for snapshot.d.o, the current IA download speeds would be a big issue
00:05:34OriginalUsername is now known as TheTechRobo
00:06:41<nicolas17>as a secondary backup I think IA's speeds are fine tbh
00:16:13<nicolas17>put a cache proxy in front, if a file is requested often it will be cached, if it's not requested often then you should be glad IA is as fast as it is and we didn't have to send someone to find the correct LTO tape :D
00:18:09<pabs>as an offline backup, sure. as a live user-facing replica, no.
00:18:20<pabs>Debian people already complain about how unreliable snapshot.d.o is, we need more replicas to fix that
00:19:20<pabs>we definitely need to do more and smarter caching stuff, currently it doesn't redirect to the SHA1, just serves the file
00:20:09<nicolas17>it's deduplicating across the whole dataset based on whole-file hash, right?
00:20:47BearFortress joins
00:22:48<@arkiver>pabs: you meant IA as an actively used backup users can pull data from?
00:22:53<@arkiver>yeah not sure if that will work out
00:23:03<pabs>yeah
00:23:14<@arkiver>but as 'just an extra backup' from which data is not actively downloaded, it could be used
00:23:28<pabs>nicolas17: right
00:25:05<pabs>for SWH, I believe they are aiming for more of the 'extra backup' thing
00:27:04<flashfire42>How many fucking backups of debian do we need? everyone and their grandmother hosts one if IA is slow there is at least 30 more that arent right?
00:27:53<flashfire42>I mean more backups is better but if they complaining about debian backups XD
00:28:15<@arkiver>pabs: that is what IA might be used for yeah
00:28:24<@arkiver>for an active debian mirror, IA is not really the place
00:28:46<pabs>flashfire42: this is not backups of debian (there are tons of mirrors), but backups of all of Debian's history, every single upload, a snapshot of the Debian archive 4x daily. immensely useful for bisecting and other development stuff, as well as reproducible builds
00:29:33<flashfire42>Ah ok. I thought you meant just like the mirrors I was like dude I can just click randomly on the net and within an hour I will have found a linux mirror (along with lots of viruses probably)
00:30:01<pabs>snapshot.d.o is also fragile, because there are only two replicas right now (plus one backup). one of the replicas had all of its servers destroyed by water leakage too recently (hopefully the drives are ok)
00:30:23<fireonlive>did they put the pipes above the servers :(
00:30:36<pabs>an AC unit leaked onto the front of the rack
00:30:45<fireonlive>damn
00:46:51<@JAA>Ouch
00:53:55<pabs>arkiver: ^
01:46:24<nicolas17>pabs: this incredibly redundant data brings me back to thinking about deduplication and deltas :P
01:55:22<nicolas17>pabs: just to test, I decompressed the data.tar.xz and made deltas between the .tar files (I haven't tried to reproducibly compress it back or deal with control.tar.xz or the .deb wrapper): https://paste.debian.net/1291569/
01:56:31<pabs>sounds a bit like pristine-tar - adds some things to a git repo, so you can get a bit-exact tarball out of a branch
01:56:36<nicolas17>yep
01:57:07<pabs>I was surprised to read SWH does zero chunking of files, just like git does :(
01:57:08<flashfire42>Would be nice to see a background process of Hashing all items on IA. Could be useful for deduplication and/or research on hash collisions
01:57:34<nicolas17>in fact git is sometimes nice for this because 'git gc' figures out for you what to delta against what
01:57:47<nicolas17>but it doesn't scale to gigabytes
01:59:59<pabs>the rolling chunk hash stuff of restic/borg/etc need adding in more places
02:00:29<nicolas17>yes those are great
02:01:41<nicolas17>but sometimes files have small changes *all* over the place, and then borg either doesn't find enough duplicated chunks, or you lower the chunk size and RAM usage skyrockets (because too many chunks), then you need proper delta'ing
02:02:51nicolas17 tests a -dbg package
02:05:23<nicolas17>libwebkitgtk-1.0-0-dbg has 1.3GiB of data.tar, this will take a while :D
02:07:48<nicolas17>(also means bsdiff would be absolutely impractical to try)
04:59:55<fireonlive>IA dark theme when? :D
06:22:26Dango360 quits [Read error: Connection reset by peer]
06:24:46nicolas17 quits [Client Quit]
07:02:31BigBrain_ quits [Ping timeout: 245 seconds]
07:02:51Arcorann (Arcorann) joins
07:03:18Arcorann quits [Remote host closed the connection]
07:13:56nulldata quits [Ping timeout: 252 seconds]
07:17:19nulldata (nulldata) joins
07:24:30Arcorann (Arcorann) joins
07:32:04BigBrain_ (bigbrain) joins
08:05:48nulldata quits [Ping timeout: 265 seconds]
08:08:49nulldata (nulldata) joins
08:47:51qw3rty joins
09:03:23nulldata quits [Ping timeout: 252 seconds]
09:06:51nulldata (nulldata) joins
09:22:56BigBrain_ quits [Ping timeout: 245 seconds]
09:25:28BigBrain_ (bigbrain) joins
10:02:00igloo22225 quits [Quit: The Lounge - https://thelounge.chat]
10:03:27igloo22225 (igloo22225) joins
13:41:14Arcorann quits [Ping timeout: 265 seconds]
13:44:30andrew quits [Client Quit]
13:47:23andrew (andrew) joins
14:12:45PredatorIWD_ joins
14:16:02PredatorIWD quits [Ping timeout: 265 seconds]
15:03:53HP_Archivist quits [Ping timeout: 265 seconds]
15:59:48Dango360 (Dango360) joins
17:01:21andrew6 (andrew) joins
17:03:16andrew quits [Ping timeout: 265 seconds]
17:03:23andrew6 is now known as andrew
19:24:57andrew quits [Client Quit]
19:28:20andrew (andrew) joins
21:20:01nicolas17 joins
23:38:26nicolas17 quits [Ping timeout: 252 seconds]
23:42:41nicolas17 joins