00:50:07qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
02:37:31<TheTechRobo>Is there more detailed information on how IA stores stuff internally anywhere?
02:37:32<TheTechRobo>I'm interested in knowing how they detect corruption, what filesystem they use, if there is ever uncorrectable corruption, etc
02:43:37<HP_Archivist>I, too, would be interested in learning more (if they make it that info publicly available?)
03:27:09<steering>In theory an attacker could mount a collision attack, they'd just have to be the original creator/uploader of the file to IA :P
03:27:50<steering>something something don't download untrusted files
03:37:11<HP_Archivist>TheTechRobo: Actually, now that I think about it, I recall Jason mentioning briefly in one his of podcast episodes that they do occasionally have file integrity fixes.
03:37:50<HP_Archivist>Implying that something, somewhere, at some point errors out and they must have parity or something in effect to come rescue the file.
03:38:22<HP_Archivist>I doubt it's from actual bit rot, though.
03:38:48<TheTechRobo>HP_Archivist: Every item is stored on two disks, AIUI.
03:38:59<TheTechRobo>That's what the bup.php task does - synchronises the two servers.
03:39:24<HP_Archivist>Ah, that makes sense
03:39:45<@JAA>Correct, and in two locations, although it's usually San Francisco and Richmond.
03:40:20<@JAA>Apparently some things are getting served from Canada instead since very recently, as someone mentioned here.
03:40:22<TheTechRobo>JAA: Two disks or one disk per location?
03:40:32<@JAA>One disk per, as far as I know.
03:41:00<TheTechRobo>do they check integrity as the file is downloaded, or periodically (like a zfs scrub)? or a combination of both?
03:41:25<@JAA>No idea, and I've not really seen any details about the inner workings in public.
03:42:11<HP_Archivist>The existence of bup.php task also puts into perspective why they've always wanted item sizes to remain under a certain amount, e.g. it'll break the item if it's very large
03:42:36<TheTechRobo>TIL that IA has multiple datacenter locations. Thought it was just the church and their location in Canada.
03:42:49<@JAA>One thing I do recall being mentioned somewhere is that there are no complicated underlying storage mechanisms or anything. It's just plain disks, with some regular FS (probably ext<something>?), getting synced to the mirror server with rsync via the tasks.
03:42:55<TheTechRobo>HP_Archivist: Yeah, hard to fit a massive item onto two disks I guess.
03:43:20<HP_Archivist>^^ Exactly
03:43:45<@JAA>Yes, they have the church in SF and another in Richmond, plus something further north in SF although that might just be for networking. And Canada.
03:43:58<@JAA>Another location, not another church.
03:44:01<HP_Archivist>JAA I can actually see how that would be advantageous vs a more complex storage situation
03:44:14<TheTechRobo>Yeah, less moving parts that could break
03:44:40<TheTechRobo>Also stands to reason that there wouldn't be very many more complex solutions when IA started. Would probably be easier just to keep their existing archiecture
03:44:41<@JAA>Yeah, although there's an argument in favour of btrfs or ZFS for the bit rot checking.
03:45:00<@JAA>Not that that was an option 20+ years ago, and I'm sure things mostly grew organically.
03:45:37<TheTechRobo>Yeah, if they were redoing their storage nowadays, I think it'd be a bad idea not to use btrfs or zfs. But considering they have however many PB of data on their existing infra, I think it's fair they haven't updated it lol
03:45:49<@JAA>Yeah
03:45:55<HP_Archivist>Idk though. Tape backups have been around a long time. They could very easily back everything up on to LTO tape (assuming resources)
03:46:22<@JAA>Always those annoying little asterisks. :-P
03:46:29<TheTechRobo>> The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".
03:46:29<TheTechRobo>from https://wiki.archiveteam.org/index.php/Valhalla
03:46:32<HP_Archivist>:P
03:47:01<HP_Archivist>TheTechRobo: huh, interesting
03:47:06<TheTechRobo>No idea if LTO is still like that, haven't had the chance to use it.
03:47:11<HP_Archivist>So they've considered it or something like it
03:47:28<@JAA>Tape would also be a cold backup, effectively. Currently, they can use both copies to serve traffic.
03:47:36<TheTechRobo>That's also true
03:48:26<HP_Archivist>They could do snapshot backups onto tape, kept strictly cold for doomsday-type situations.
03:48:56<@JAA>I don't think 'nothing reads these anymore' is a very large concern for LTO. There are heavy industry weights behind the standard, and tape drives must be able to read two generations back, so you can easily migrate to a newer generation before they go extinct. Not to mention that it's still easy to find LTO1 drives which are at least claimed to be working.
03:49:41<TheTechRobo>HP_Archivist: It's probably harder to justify the massive cost of backing everything up if they can't use the backups to serve data
03:50:03<@JAA>There is no storage technology that you can just put on a shelf and then not touch for decades. You have to migrate somewhat regularly (at least every several years to one decade) anyway.
03:50:52<HP_Archivist>JAA: https://www.businesswire.com/news/home/20240715730109/en/Cerabyte-Unveils-Transformative-Ceramic-Based-Technology-for-Accessible-Permanent-Data-Storage-Expands-into-the-U.S
03:51:09<@JAA>It would certainly be nice if there were an offline copy that won't easily be affected by, say, the Big One. Although the Canadian mirror mostly solves those concerns.
03:51:10<TheTechRobo>That being said, a concentrated archive of certain "high-value" (whatever that would mean) parts of IA stored underground somewhere would be really cool to see.
03:52:04<TheTechRobo>HP_Archivist: Have any of those high-density physical data things ever made any progress?
03:52:06<@JAA>HP_Archivist: Yeah, *yawn*, let me know when it's out of the labs. :-)
03:52:26<@JAA>'Holographic storage' has been just around the corner since decades.
03:52:29<TheTechRobo>Every few months I see tons of hype about some new startup working on it and then never hear about the company again
03:52:31<HP_Archivist>Lol
03:53:01<@JAA>It's almost as if solving that problem is actually hard.
03:53:05<HP_Archivist>It's all very nice-sounding sure. But they have some real potential here, maybe.
03:53:07<@JAA>Really makes you think. :-)
03:53:27<HP_Archivist>Well, yeah, you're literally fighting entropy (physics)
03:54:02<TheTechRobo>Stupid physics, ruining everything...
03:54:22<HP_Archivist>TheTechRobo: There have been projects that have shown up over the last decade. Nothing that's actually made it to commercial markets. Yet.
03:55:06<HP_Archivist>I mean, the very watered down version of that would be M-Disc technology. Engraving.
03:55:27<HP_Archivist>Microsoft's Project Silica is still in the research phase afaik
03:56:03<@JAA>Arch Mission exists, but they've made what, five disks?
03:57:12<@JAA>M-Disc has nothing to do with these things. It's not even a new storage technology, just a modification of an existing one to make it more durable (DVD) or more profitable (Blu-ray).
03:57:14<HP_Archivist>Oh right, right. I was trying to think of their name a moment ago. Nova Spivack is the guy. iirc, they were charging a ton for what essentially was very small-size data
03:58:32<TheTechRobo>From Arch Mission's website:
03:58:32<TheTechRobo>> The current achievable density is several TB per disc (CD size of 12 cm, 1.2 mm thickness), which is of the same order as magnetic disk data storage. In a couple of years, the team hopes to achieve about 20 TB per disc.
03:58:39<TheTechRobo>> We estimate the ultimate capacity of 360 TB per disc using this technology.
03:58:52<TheTechRobo>Not sure exactly how expensive the discs are to make.
04:02:08<HP_Archivist>Objects that have lasted millennia - legible to this day - are usually stone, rock, or some other natural material that doesn't erode very quickly. That, plus the happenstance of however said objects were kept in terms of being exposed to climate.
04:07:28<@JAA>The data density is also marginally lower.
04:48:59DogsRNice quits [Read error: Connection reset by peer]
06:31:16Dango360 quits [Read error: Connection reset by peer]
06:41:37Dango360 (Dango360) joins
07:08:27BearFortress_ quits [Ping timeout: 256 seconds]
07:09:05BearFortress joins
07:26:17Dango360 quits [Client Quit]
08:00:39qwertyasdfuiopghjkl quits [Remote host closed the connection]
08:13:59SootBector quits [Ping timeout: 260 seconds]
08:15:54SootBector (SootBector) joins
09:15:13nulldata quits [Quit: So long and thanks for all the fish!]
09:16:28nulldata (nulldata) joins
09:38:05driib quits [Client Quit]
09:38:28driib (driib) joins
10:35:06qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
12:32:29MrMcNuggets (MrMcNuggets) joins
13:46:51Arcorann (Arcorann) joins
14:35:03JaffaCakes118 quits [Remote host closed the connection]
14:40:00JaffaCakes118 (JaffaCakes118) joins
14:44:28Arcorann quits [Ping timeout: 255 seconds]
15:56:47nulldata quits [Quit: So long and thanks for all the fish!]
15:58:07nulldata (nulldata) joins
16:08:33MrMcNuggets quits [Quit: WeeChat 4.3.2]
16:26:30DogsRNice joins
16:33:40Matthww9 joins
16:35:10Matthww quits [Ping timeout: 255 seconds]
16:35:10Matthww9 is now known as Matthww
16:51:50fuzzy80211 quits [Read error: Connection reset by peer]
16:52:24fuzzy80211 (fuzzy80211) joins
19:04:45HP_Archivist quits [Remote host closed the connection]
19:06:11HP_Archivist (HP_Archivist) joins
20:01:13Dango360 (Dango360) joins
23:33:34fuzzy80211 quits [Read error: Connection reset by peer]
23:34:09fuzzy80211 (fuzzy80211) joins
23:34:39myself quits [Quit: Ping timeout (120 seconds)]
23:34:54myself (myself) joins