#internetarchive log for 2025-01-05

Home Search Previous day Next day

00:13:05		BearFortress quits [Read error: Connection reset by peer]
00:13:10		atphoenix quits [Read error: Connection reset by peer]
00:13:14		BearFortress joins
00:13:42		atphoenix (atphoenix) joins
00:16:52		tzt (tzt) joins
00:52:13	<nicolas17>	who do I have to bribe to get access to a darked item /s
01:01:16		rewby (rewby) joins
02:05:46		HP_Archivist (HP_Archivist) joins
03:27:09		HP_Archivist quits [Read error: Connection reset by peer]
03:27:29		HP_Archivist (HP_Archivist) joins
04:02:23	<TheTechRobo>	How much of a speed increase do higher chunk sizes cause in ia-upload-stream? I know that each chunk will have some overhead, but how much overhead is it in practice?
04:03:20	<nicolas17>	I think the main performance problem is when you finish uploading and the server has to re-assemble the chunks, not during the upload
04:04:01	<nicolas17>	JAA is the expert there (?)
04:13:51	<@JAA>	Yeah. Single-part uploads are ideal with respect to IA's processing.
04:14:45	<@JAA>	When you do a multipart upload, each individual part gets written to the item server, hashed, and mirrored. Then the parts get merged and the resulting file is hashed and mirrored again.
04:15:25	<@JAA>	In addition, the multipart completion has to run as its own archive.php task; snowballing only covers the individual part uploads. So that introduces an another delay.
04:16:14	<@JAA>	So really, it's about how fast your uploads on a single connection are. When that's very slow, the parallelism from multipart uploads may be helpful.
04:16:49	<TheTechRobo>	Hm, so in an automated script uploading to IA, should I use single-part or multipart?
04:17:24	<TheTechRobo>	I was leaning towards multipart because of `ia`'s janky retries.
04:17:50	<@JAA>	I'd go with single part, unless your connection to IA is very bad.
04:18:04	<@JAA>	ia-upload-stream can still do single part uploads.
04:18:32	<@JAA>	Just set the part size larger than the file size. :-)
04:18:35	<TheTechRobo>	How does one do that? (Preferably without buffering potentially huge files into memory)
04:18:39	<TheTechRobo>	Ah lol
04:18:55	<@JAA>	Yeah, if you're piping into ia-upload-stream, that buffers everything in memory.
04:19:06	<@JAA>	If you have a local file, use --input-file instead.
04:19:18	<TheTechRobo>	--input-file doesn't buffer?
04:19:36	<@JAA>	No, it reads from disk (and it's on you to make sure the file doesn't change while ia-upload-stream does its thing).
04:19:47	<nicolas17>	TheTechRobo: no because if it needs to retry it can just read from the file again
04:20:11	<@JAA>	That does mean it reads it from disk at least twice, once for the hashing and once for the upload. (More than twice if there are retries.)
04:20:33	<TheTechRobo>	Ahh, dir-to-ia uses stdin. That explains it.
04:20:50	<@JAA>	Yeah, that's still experimental etc. :-P
04:20:57	<TheTechRobo>	Was confused because when I used dir-to-ia, the logs said it was reading each chunk into memory
04:21:02	<TheTechRobo>	JAA: Yeah, that's why I'm not using it here :-)
04:21:24	<TheTechRobo>	I am currently working on my Own Thing™ which will probably turn out worse.
04:22:04	<@JAA>	I'm currently fixing a few kinks in ia-upload-stream, then I'll direct my attention at dir-to-ia because I need it as well.
04:22:43	<@JAA>	Speaking of, I pushed a change tonight which allows out-of-order finishing of parts. That can make resumption a pain because resuming with gaps in parts is not yet supported.
04:23:56	<@JAA>	--parts will probably go away soon anyway once I implement ListParts support; there's no need to keep that state on the client side.
04:24:21	<@JAA>	Not sure yet whether I'll add that sort of resumption though. It doesn't play well with stdin streaming either.
04:25:00	<TheTechRobo>	> I pushed a change tonight
04:25:00	<TheTechRobo>	> 1 week ago
04:25:00	<TheTechRobo>	(unless you're not talking about f8b07ed6a5)
04:25:15	<@JAA>	I made the commit a week ago, yes, but I only pushed it tonight.
04:25:21	<TheTechRobo>	Ah
04:26:33		sonick (sonick) joins
04:27:54		sonick quits [Client Quit]
04:29:53		sonick_ (sonick) joins
04:31:58		sonick_ is now known as sonick
04:32:29	<TheTechRobo>	Is there a way to say 'unlimited part size' or should I just set it to an obscenely large value?
04:32:54	<@JAA>	There isn't, and yes.
04:34:23	<@JAA>	I'll be adding something soon to automatically set the part size to (effectively) unlimited when --input-file is used with --concurrency 1.
04:34:37	<@JAA>	(Where the latter is the default, of course.)
04:38:10	<@JAA>	(2^40 is effectively unlimited as that's the item size limit.)
05:02:11	<@JAA>	One thing I'm working on is fixing the progress bars, because it's all a huge mess as soon as parallelism is involved.
05:13:35		DogsRNice quits [Read error: Connection reset by peer]
05:40:19	<@JAA>	Huh, was the torrent size limit bumped? I see a complete torrent on a 115 GiB item.
05:40:53	<@JAA>	I believe it was 75 GiB not too long ago.
07:02:52	<@arkiver>	wasn't it just always at 1 TB?
07:02:59	<@arkiver>	i may be misremembering it though
07:12:24	<@JAA>	arkiver: I mean the _archive.torrent files. They were always cut off at some size, leading to people getting confused why their torrent downloads were incomplete.
07:12:53	<@JAA>	It was something like 20 or 25 GiB several years ago, then 75 for a while. Now it's apparently at least 115.
07:14:53	<@JAA>	It would generate a torrent at those sizes, then when you uploaded more, the task log would just say something to the effect of 'item too big, not regenerating torrent', and the torrent would be outdated and incomplete.
07:15:08		BornOn420 quits [Remote host closed the connection]
07:15:17	<@arkiver>	ah
07:15:52		BornOn420 (BornOn420) joins
09:03:23		nyakase quits [Quit: @ERROR: max connections (-1) reached -- try again later]
09:06:52		nyakase (nyakase) joins
11:33:30		MrMcNuggets (MrMcNuggets) joins
13:12:12		f_ quits [Remote host closed the connection]
13:12:20		f_ (funderscore) joins
14:03:27		nyakase quits [Quit: @ERROR: max connections (-1) reached -- try again later]
15:40:42	<that_lurker>	Why cap the torrent size as that would be the most efficent way to download (popular) files.
16:10:06		AlsoHP_Archivist joins
16:10:06		HP_Archivist quits [Read error: Connection reset by peer]
16:27:51		DogsRNice joins
16:40:40		MrMcNuggets quits [Quit: WeeChat 4.3.2]
16:50:23		andrew1 (andrew) joins
16:52:13		andrew quits [Ping timeout: 252 seconds]
16:52:13		andrew1 is now known as andrew
18:04:04		AlsoHP_Archivist quits [Client Quit]
19:27:17		geezabiscuit (geezabiscuit) joins
19:27:51		immibis quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
19:30:05		immibis joins
19:30:05		immibis is now authenticated as immibis
19:32:57		immibis leaves
19:55:57		SootBector quits [Remote host closed the connection]
19:56:20		SootBector (SootBector) joins
20:25:40		driib9 quits [Quit: Ping timeout (120 seconds)]
21:00:03		driib9 (driib) joins
21:00:44		driib9 quits [Client Quit]
21:02:39		driib9 (driib) joins
21:06:47		driib9 quits [Client Quit]
21:09:36		driib9 (driib) joins
21:18:55		driib9 quits [Client Quit]
21:20:56		driib9 (driib) joins
21:35:16		driib9 quits [Client Quit]
21:37:24		driib9 (driib) joins
21:42:44		driib99 (driib) joins
21:44:49		driib9 quits [Ping timeout: 252 seconds]
21:44:49		driib99 is now known as driib9
22:50:04		nukke quits [Quit: nukke]
23:01:56		nukke (nukke) joins
23:02:15		andrew1 (andrew) joins
23:03:43		andrew quits [Ping timeout: 260 seconds]
23:03:43		andrew1 is now known as andrew

Home Search Previous day Next day