00:46:31<Pedrosso>Is there any nice web page that is for specifically answering that? It feels like there would be considering the context, however, I've been looking but couldn't find anything with that purpose
00:47:05<@JAA>There's the official documentation of the metadata fields.
00:47:58<@JAA>By 'identification', do you mean the item identifier?
00:48:54<@JAA>For files, there are few hard rules. It strongly depends on the dataset. Some things might best be packed as a plain .tar (single file but still browsable online). Others would best be compressed. Others again uploaded separately.
00:49:55<@JAA>The hard rules are item size (hard limit of 1 TiB, best to stay some way beneath that) and file count (no? hard limit but things can get wonky beyond say a couple thousand files I've heard).
00:55:41<Pedrosso>I do mean the item identifier, yes.
01:03:36<@JAA>No real conventions there. I usually use something that, well, identifies the contents uniquely. A short ID of the archival target, e.g. domain name, and a month or date when it was archived. example.org_20231212 might represent a complete crawl of example.org from today.
01:04:13<Pedrosso>I see
03:13:40DogsRNice_ joins
03:16:50DogsRNice quits [Ping timeout: 240 seconds]
04:41:37nicolas17 quits [Read error: Connection reset by peer]
04:47:19nicolas17 joins
04:50:07DogsRNice_ quits [Read error: Connection reset by peer]
06:40:12datechnoman quits [Quit: The Lounge - https://thelounge.chat]
06:41:00magmaus3 (magmaus3) joins
06:41:44datechnoman (datechnoman) joins
06:57:14qwertyasdfuiopghjkl quits [Client Quit]
07:03:57qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
07:18:52magmaus3 quits [Client Quit]
07:25:49@AlsoJAA quits [Ping timeout: 272 seconds]
07:32:42Arcorann (Arcorann) joins
07:36:57AlsoJAA (JAA) joins
07:36:57@ChanServ sets mode: +o AlsoJAA
09:05:28qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
12:41:02magmaus3 (magmaus3) joins
13:11:58Arcorann quits [Ping timeout: 265 seconds]
13:25:56tbc1887 quits [Read error: Connection reset by peer]
13:26:20tbc1887 (tbc1887) joins
15:23:17qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
16:25:02DogsRNice joins
16:31:27qwertyasdfuiopghjkl quits [Remote host closed the connection]
18:33:50qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
18:48:12qwertyasdfuiopghjkl quits [Client Quit]
18:48:33nulldata quits [Ping timeout: 272 seconds]
18:50:27Matthww119 quits [Ping timeout: 272 seconds]
18:52:33qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
18:58:04nulldata (nulldata) joins
19:13:54qwertyasdfuiopghjkl quits [Client Quit]
19:15:26qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
19:18:26Craigle quits [Quit: The Lounge - https://thelounge.chat]
19:19:39Craigle (Craigle) joins
19:46:28Matthww119 joins
20:41:11h3ndr1k (h3ndr1k) joins
20:44:00<h3ndr1k>Hi, can I PM someone 3 items, which might need their torrents regenerated? The torrents contain fewer files than visible on archive.org.
20:44:40BearFortress_ joins
20:44:41BearFortress_ quits [Max SendQ exceeded]
20:44:50BearFortress_ joins
20:47:50BearFortress quits [Ping timeout: 240 seconds]
20:48:55BearFortress joins
20:49:25<@JAA>(h3ndr1k has crept into my PMs and I'm taking a look.)
20:49:38BearFortress__ joins
20:51:10<@JAA>Answer: the items are too large. IA only generates torrents up to 75 GiB.
20:51:44<@JAA>But an existing torrent doesn't get deleted once you exceed that limit, at which point the torrent is incomplete.
20:52:40<h3ndr1k>Oh ok. So I have to download via archive.org or ia-python (or how its called)?
20:52:50BearFortress_ quits [Ping timeout: 240 seconds]
20:53:12<@JAA>Yeah, most likely.
20:53:20BearFortress quits [Ping timeout: 240 seconds]
20:53:24<nicolas17>ah the *item* has a size limit too, okay
20:53:34<h3ndr1k>Thank you very much.
20:54:00<@JAA>It might be possible to generate torrents that have IA as web seed URLs for an entire item, but you'd probably need to distribute that with a different tracker than IA's since it shouldn't recognise that btih.
20:54:32<nicolas17>you would need to download everything in some other way in order to generate the torrent
20:54:53<@JAA>nicolas17: I'm not aware of a limit per file. IA used to generate torrents for every one of our megawarcs, and that was a significant bottleneck at one point, which is why we disabled it.
20:55:28<@JAA>Yes, I'm referring to an item you uploaded yourself, but I guess I misinterpreted the question in that regard.
21:04:28<@JAA>If there is any file size limit, it's more than 21.8 GiB. So I doubt there is.
21:04:43<@JAA>(I recently uploaded an item with one such file, and it's included in the torrent.)
21:23:43DLoader quits [Ping timeout: 272 seconds]
21:29:03DLoader joins
22:14:02qwertyasdfuiopghjkl quits [Remote host closed the connection]
22:22:43qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
23:03:20<nicolas17>60/3197 [15:14<17:07:54, 19.66s/MiB]
23:03:23<nicolas17>JAA: I'm in your hell now
23:06:39<@JAA>Welcome, take a seat and make yourself comfortable, you'll be here for a while. :-)
23:09:13<nicolas17>switched to my VPS
23:09:16<nicolas17>34/3197 [00:40<2:18:28, 2.63s/MiB]