00:50:07 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
02:37:31 | <TheTechRobo> | Is there more detailed information on how IA stores stuff internally anywhere? |
02:37:32 | <TheTechRobo> | I'm interested in knowing how they detect corruption, what filesystem they use, if there is ever uncorrectable corruption, etc |
02:43:37 | <HP_Archivist> | I, too, would be interested in learning more (if they make it that info publicly available?) |
03:27:09 | <steering> | In theory an attacker could mount a collision attack, they'd just have to be the original creator/uploader of the file to IA :P |
03:27:50 | <steering> | something something don't download untrusted files |
03:37:11 | <HP_Archivist> | TheTechRobo: Actually, now that I think about it, I recall Jason mentioning briefly in one his of podcast episodes that they do occasionally have file integrity fixes. |
03:37:50 | <HP_Archivist> | Implying that something, somewhere, at some point errors out and they must have parity or something in effect to come rescue the file. |
03:38:22 | <HP_Archivist> | I doubt it's from actual bit rot, though. |
03:38:48 | <TheTechRobo> | HP_Archivist: Every item is stored on two disks, AIUI. |
03:38:59 | <TheTechRobo> | That's what the bup.php task does - synchronises the two servers. |
03:39:24 | <HP_Archivist> | Ah, that makes sense |
03:39:45 | <@JAA> | Correct, and in two locations, although it's usually San Francisco and Richmond. |
03:40:20 | <@JAA> | Apparently some things are getting served from Canada instead since very recently, as someone mentioned here. |
03:40:22 | <TheTechRobo> | JAA: Two disks or one disk per location? |
03:40:32 | <@JAA> | One disk per, as far as I know. |
03:41:00 | <TheTechRobo> | do they check integrity as the file is downloaded, or periodically (like a zfs scrub)? or a combination of both? |
03:41:25 | <@JAA> | No idea, and I've not really seen any details about the inner workings in public. |
03:42:11 | <HP_Archivist> | The existence of bup.php task also puts into perspective why they've always wanted item sizes to remain under a certain amount, e.g. it'll break the item if it's very large |
03:42:36 | <TheTechRobo> | TIL that IA has multiple datacenter locations. Thought it was just the church and their location in Canada. |
03:42:49 | <@JAA> | One thing I do recall being mentioned somewhere is that there are no complicated underlying storage mechanisms or anything. It's just plain disks, with some regular FS (probably ext<something>?), getting synced to the mirror server with rsync via the tasks. |
03:42:55 | <TheTechRobo> | HP_Archivist: Yeah, hard to fit a massive item onto two disks I guess. |
03:43:20 | <HP_Archivist> | ^^ Exactly |
03:43:45 | <@JAA> | Yes, they have the church in SF and another in Richmond, plus something further north in SF although that might just be for networking. And Canada. |
03:43:58 | <@JAA> | Another location, not another church. |
03:44:01 | <HP_Archivist> | JAA I can actually see how that would be advantageous vs a more complex storage situation |
03:44:14 | <TheTechRobo> | Yeah, less moving parts that could break |
03:44:40 | <TheTechRobo> | Also stands to reason that there wouldn't be very many more complex solutions when IA started. Would probably be easier just to keep their existing archiecture |
03:44:41 | <@JAA> | Yeah, although there's an argument in favour of btrfs or ZFS for the bit rot checking. |
03:45:00 | <@JAA> | Not that that was an option 20+ years ago, and I'm sure things mostly grew organically. |
03:45:37 | <TheTechRobo> | Yeah, if they were redoing their storage nowadays, I think it'd be a bad idea not to use btrfs or zfs. But considering they have however many PB of data on their existing infra, I think it's fair they haven't updated it lol |
03:45:49 | <@JAA> | Yeah |
03:45:55 | <HP_Archivist> | Idk though. Tape backups have been around a long time. They could very easily back everything up on to LTO tape (assuming resources) |
03:46:22 | <@JAA> | Always those annoying little asterisks. :-P |
03:46:29 | <TheTechRobo> | > The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly". |
03:46:29 | <TheTechRobo> | from https://wiki.archiveteam.org/index.php/Valhalla |
03:46:32 | <HP_Archivist> | :P |
03:47:01 | <HP_Archivist> | TheTechRobo: huh, interesting |
03:47:06 | <TheTechRobo> | No idea if LTO is still like that, haven't had the chance to use it. |
03:47:11 | <HP_Archivist> | So they've considered it or something like it |
03:47:28 | <@JAA> | Tape would also be a cold backup, effectively. Currently, they can use both copies to serve traffic. |
03:47:36 | <TheTechRobo> | That's also true |
03:48:26 | <HP_Archivist> | They could do snapshot backups onto tape, kept strictly cold for doomsday-type situations. |
03:48:56 | <@JAA> | I don't think 'nothing reads these anymore' is a very large concern for LTO. There are heavy industry weights behind the standard, and tape drives must be able to read two generations back, so you can easily migrate to a newer generation before they go extinct. Not to mention that it's still easy to find LTO1 drives which are at least claimed to be working. |
03:49:41 | <TheTechRobo> | HP_Archivist: It's probably harder to justify the massive cost of backing everything up if they can't use the backups to serve data |
03:50:03 | <@JAA> | There is no storage technology that you can just put on a shelf and then not touch for decades. You have to migrate somewhat regularly (at least every several years to one decade) anyway. |
03:50:52 | <HP_Archivist> | JAA: https://www.businesswire.com/news/home/20240715730109/en/Cerabyte-Unveils-Transformative-Ceramic-Based-Technology-for-Accessible-Permanent-Data-Storage-Expands-into-the-U.S |
03:51:09 | <@JAA> | It would certainly be nice if there were an offline copy that won't easily be affected by, say, the Big One. Although the Canadian mirror mostly solves those concerns. |
03:51:10 | <TheTechRobo> | That being said, a concentrated archive of certain "high-value" (whatever that would mean) parts of IA stored underground somewhere would be really cool to see. |
03:52:04 | <TheTechRobo> | HP_Archivist: Have any of those high-density physical data things ever made any progress? |
03:52:06 | <@JAA> | HP_Archivist: Yeah, *yawn*, let me know when it's out of the labs. :-) |
03:52:26 | <@JAA> | 'Holographic storage' has been just around the corner since decades. |
03:52:29 | <TheTechRobo> | Every few months I see tons of hype about some new startup working on it and then never hear about the company again |
03:52:31 | <HP_Archivist> | Lol |
03:53:01 | <@JAA> | It's almost as if solving that problem is actually hard. |
03:53:05 | <HP_Archivist> | It's all very nice-sounding sure. But they have some real potential here, maybe. |
03:53:07 | <@JAA> | Really makes you think. :-) |
03:53:27 | <HP_Archivist> | Well, yeah, you're literally fighting entropy (physics) |
03:54:02 | <TheTechRobo> | Stupid physics, ruining everything... |
03:54:22 | <HP_Archivist> | TheTechRobo: There have been projects that have shown up over the last decade. Nothing that's actually made it to commercial markets. Yet. |
03:55:06 | <HP_Archivist> | I mean, the very watered down version of that would be M-Disc technology. Engraving. |
03:55:27 | <HP_Archivist> | Microsoft's Project Silica is still in the research phase afaik |
03:56:03 | <@JAA> | Arch Mission exists, but they've made what, five disks? |
03:57:12 | <@JAA> | M-Disc has nothing to do with these things. It's not even a new storage technology, just a modification of an existing one to make it more durable (DVD) or more profitable (Blu-ray). |
03:57:14 | <HP_Archivist> | Oh right, right. I was trying to think of their name a moment ago. Nova Spivack is the guy. iirc, they were charging a ton for what essentially was very small-size data |
03:58:32 | <TheTechRobo> | From Arch Mission's website: |
03:58:32 | <TheTechRobo> | > The current achievable density is several TB per disc (CD size of 12 cm, 1.2 mm thickness), which is of the same order as magnetic disk data storage. In a couple of years, the team hopes to achieve about 20 TB per disc. |
03:58:39 | <TheTechRobo> | > We estimate the ultimate capacity of 360 TB per disc using this technology. |
03:58:52 | <TheTechRobo> | Not sure exactly how expensive the discs are to make. |
04:02:08 | <HP_Archivist> | Objects that have lasted millennia - legible to this day - are usually stone, rock, or some other natural material that doesn't erode very quickly. That, plus the happenstance of however said objects were kept in terms of being exposed to climate. |
04:07:28 | <@JAA> | The data density is also marginally lower. |
04:48:59 | | DogsRNice quits [Read error: Connection reset by peer] |
06:31:16 | | Dango360 quits [Read error: Connection reset by peer] |
06:41:37 | | Dango360 (Dango360) joins |
07:08:27 | | BearFortress_ quits [Ping timeout: 256 seconds] |
07:09:05 | | BearFortress joins |
07:26:17 | | Dango360 quits [Client Quit] |
08:00:39 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
08:13:59 | | SootBector quits [Ping timeout: 260 seconds] |
08:15:54 | | SootBector (SootBector) joins |
09:15:13 | | nulldata quits [Quit: So long and thanks for all the fish!] |
09:16:28 | | nulldata (nulldata) joins |
09:38:05 | | driib quits [Client Quit] |
09:38:28 | | driib (driib) joins |
10:35:06 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
12:32:29 | | MrMcNuggets (MrMcNuggets) joins |
13:46:51 | | Arcorann (Arcorann) joins |
14:35:03 | | JaffaCakes118 quits [Remote host closed the connection] |
14:40:00 | | JaffaCakes118 (JaffaCakes118) joins |
14:44:28 | | Arcorann quits [Ping timeout: 255 seconds] |
15:56:47 | | nulldata quits [Quit: So long and thanks for all the fish!] |
15:58:07 | | nulldata (nulldata) joins |
16:08:33 | | MrMcNuggets quits [Quit: WeeChat 4.3.2] |
16:26:30 | | DogsRNice joins |
16:33:40 | | Matthww9 joins |
16:35:10 | | Matthww quits [Ping timeout: 255 seconds] |
16:35:10 | | Matthww9 is now known as Matthww |
16:51:50 | | fuzzy80211 quits [Read error: Connection reset by peer] |
16:52:24 | | fuzzy80211 (fuzzy80211) joins |
19:04:45 | | HP_Archivist quits [Remote host closed the connection] |
19:06:11 | | HP_Archivist (HP_Archivist) joins |
20:01:13 | | Dango360 (Dango360) joins |
23:33:34 | | fuzzy80211 quits [Read error: Connection reset by peer] |
23:34:09 | | fuzzy80211 (fuzzy80211) joins |
23:34:39 | | myself quits [Quit: Ping timeout (120 seconds)] |
23:34:54 | | myself (myself) joins |