00:46:31 | <Pedrosso> | Is there any nice web page that is for specifically answering that? It feels like there would be considering the context, however, I've been looking but couldn't find anything with that purpose |
00:47:05 | <@JAA> | There's the official documentation of the metadata fields. |
00:47:58 | <@JAA> | By 'identification', do you mean the item identifier? |
00:48:54 | <@JAA> | For files, there are few hard rules. It strongly depends on the dataset. Some things might best be packed as a plain .tar (single file but still browsable online). Others would best be compressed. Others again uploaded separately. |
00:49:55 | <@JAA> | The hard rules are item size (hard limit of 1 TiB, best to stay some way beneath that) and file count (no? hard limit but things can get wonky beyond say a couple thousand files I've heard). |
00:55:41 | <Pedrosso> | I do mean the item identifier, yes. |
01:03:36 | <@JAA> | No real conventions there. I usually use something that, well, identifies the contents uniquely. A short ID of the archival target, e.g. domain name, and a month or date when it was archived. example.org_20231212 might represent a complete crawl of example.org from today. |
01:04:13 | <Pedrosso> | I see |
03:13:40 | | DogsRNice_ joins |
03:16:50 | | DogsRNice quits [Ping timeout: 240 seconds] |
04:41:37 | | nicolas17 quits [Read error: Connection reset by peer] |
04:47:19 | | nicolas17 joins |
04:50:07 | | DogsRNice_ quits [Read error: Connection reset by peer] |
06:40:12 | | datechnoman quits [Quit: The Lounge - https://thelounge.chat] |
06:41:00 | | magmaus3 (magmaus3) joins |
06:41:44 | | datechnoman (datechnoman) joins |
06:57:14 | | qwertyasdfuiopghjkl quits [Client Quit] |
07:03:57 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
07:18:52 | | magmaus3 quits [Client Quit] |
07:25:49 | | @AlsoJAA quits [Ping timeout: 272 seconds] |
07:32:42 | | Arcorann (Arcorann) joins |
07:36:57 | | AlsoJAA (JAA) joins |
07:36:57 | | @ChanServ sets mode: +o AlsoJAA |
09:05:28 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
12:41:02 | | magmaus3 (magmaus3) joins |
13:11:58 | | Arcorann quits [Ping timeout: 265 seconds] |
13:25:56 | | tbc1887 quits [Read error: Connection reset by peer] |
13:26:20 | | tbc1887 (tbc1887) joins |
15:23:17 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
16:25:02 | | DogsRNice joins |
16:31:27 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
18:33:50 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
18:48:12 | | qwertyasdfuiopghjkl quits [Client Quit] |
18:48:33 | | nulldata quits [Ping timeout: 272 seconds] |
18:50:27 | | Matthww119 quits [Ping timeout: 272 seconds] |
18:52:33 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
18:58:04 | | nulldata (nulldata) joins |
19:13:54 | | qwertyasdfuiopghjkl quits [Client Quit] |
19:15:26 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
19:18:26 | | Craigle quits [Quit: The Lounge - https://thelounge.chat] |
19:19:39 | | Craigle (Craigle) joins |
19:46:28 | | Matthww119 joins |
20:41:11 | | h3ndr1k (h3ndr1k) joins |
20:44:00 | <h3ndr1k> | Hi, can I PM someone 3 items, which might need their torrents regenerated? The torrents contain fewer files than visible on archive.org. |
20:44:40 | | BearFortress_ joins |
20:44:41 | | BearFortress_ quits [Max SendQ exceeded] |
20:44:50 | | BearFortress_ joins |
20:47:50 | | BearFortress quits [Ping timeout: 240 seconds] |
20:48:55 | | BearFortress joins |
20:49:25 | <@JAA> | (h3ndr1k has crept into my PMs and I'm taking a look.) |
20:49:38 | | BearFortress__ joins |
20:51:10 | <@JAA> | Answer: the items are too large. IA only generates torrents up to 75 GiB. |
20:51:44 | <@JAA> | But an existing torrent doesn't get deleted once you exceed that limit, at which point the torrent is incomplete. |
20:52:40 | <h3ndr1k> | Oh ok. So I have to download via archive.org or ia-python (or how its called)? |
20:52:50 | | BearFortress_ quits [Ping timeout: 240 seconds] |
20:53:12 | <@JAA> | Yeah, most likely. |
20:53:20 | | BearFortress quits [Ping timeout: 240 seconds] |
20:53:24 | <nicolas17> | ah the *item* has a size limit too, okay |
20:53:34 | <h3ndr1k> | Thank you very much. |
20:54:00 | <@JAA> | It might be possible to generate torrents that have IA as web seed URLs for an entire item, but you'd probably need to distribute that with a different tracker than IA's since it shouldn't recognise that btih. |
20:54:32 | <nicolas17> | you would need to download everything in some other way in order to generate the torrent |
20:54:53 | <@JAA> | nicolas17: I'm not aware of a limit per file. IA used to generate torrents for every one of our megawarcs, and that was a significant bottleneck at one point, which is why we disabled it. |
20:55:28 | <@JAA> | Yes, I'm referring to an item you uploaded yourself, but I guess I misinterpreted the question in that regard. |
21:04:28 | <@JAA> | If there is any file size limit, it's more than 21.8 GiB. So I doubt there is. |
21:04:43 | <@JAA> | (I recently uploaded an item with one such file, and it's included in the torrent.) |
21:23:43 | | DLoader quits [Ping timeout: 272 seconds] |
21:29:03 | | DLoader joins |
22:14:02 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
22:22:43 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
23:03:20 | <nicolas17> | 60/3197 [15:14<17:07:54, 19.66s/MiB] |
23:03:23 | <nicolas17> | JAA: I'm in your hell now |
23:06:39 | <@JAA> | Welcome, take a seat and make yourself comfortable, you'll be here for a while. :-) |
23:09:13 | <nicolas17> | switched to my VPS |
23:09:16 | <nicolas17> | 34/3197 [00:40<2:18:28, 2.63s/MiB] |