#internetarchive log for 2025-03-18

Home Search Previous day Next day

00:00:20		nyakase quits [Remote host closed the connection]
00:01:03		nyakase (nyakase) joins
00:10:21		nyakase quits [Remote host closed the connection]
00:12:45		nyakase (nyakase) joins
02:53:20		Grzesiek11 joins
02:53:40	<Grzesiek11>	hello, I'm trying to upload via torrent, but it's not working
02:53:44	<Grzesiek11>	https://archive.org/download/Notch_Twitch_videos
02:54:09	<Grzesiek11>	the torrent is seeded and has a tracker. it's about 400 GiB in size. I don't get IA leeches.
02:55:22	<Grzesiek11>	nicolas17 said (on another channel) torrent upload broke during the recentish outage, is this correct?
02:55:42	<nicolas17>	https://archive.org/history/Notch_Twitch_videos works for me but the individual log links say I don't have permission
02:56:36	<Grzesiek11>	hey seems to work for me. guess it's not the same as the "history" link.
02:56:40	<Grzesiek11>	let's see
02:57:20	<Grzesiek11>	this would be during derive?
02:58:40	<@JAA>	I think it'd be a derive.php task, yeah.
02:58:52	<nicolas17>	I think so? when I tried it successfully (like a year ago) I just looked at what task was taking long :D
02:59:58	<Grzesiek11>	here none take long since it just seems to do nothing lol, I'm looking through the derive log and it says the torrent is a "NonSource", and there is "Nothing to do!"
03:00:07	<Grzesiek11>	gonna send the log gimme a sec
03:00:59	<Grzesiek11>	https://grzesiek11.stary.pc.pl/files/archive_org_derive.txt
03:01:21	<@JAA>	Someone said end of January that torrent uploads were still broken: https://old.reddit.com/r/internetarchive/comments/1gxs0p9/uploading_using_torrent/
03:01:40	<Grzesiek11>	welp....
03:02:14	<Grzesiek11>	guess nothing comes easy. I used the Python uploader for some tiny files before (not exceeding 5 GiB), maybe it will work
03:02:37	<Grzesiek11>	it took forever to upload those tiny files tho
03:03:18	<Grzesiek11>	thought torrent might be faster, but if it won't work at all, then it's infinitely slower. thanks for the help anyways.
03:10:51	<nicolas17>	how many files are they?
03:57:17	<Grzesiek11>	few hundred, I think about 300
04:03:45	<Grzesiek11>	estimated this is going to take about 17 days
04:07:17	<Grzesiek11>	wait, no, wrong, over a month
04:09:59	<Grzesiek11>	related question: if over the course of this month my internet dies or something, can I just use the same --spreadsheet file and it'll skip the already uploaded files or not?
04:12:41	<Grzesiek11>	as for the precise file count and size - it's exactly 550 files, of which about 1/3rd are larger than 10 MiB for a total of 377 GiB
04:14:07	<Grzesiek11>	(actually it's 545 not 550, I was looking at the entire directory rather than just the stuff to upload)
04:15:27		nukke quits [Quit: nukke]
04:22:33		nukke (nukke) joins
04:53:12	<nicolas17>	--checksum skips already uploaded files whose checksum matches the local file
04:53:24	<nicolas17>	I never used --spreadsheet but I assume --checksum works with it
04:53:49	<nicolas17>	the only caveat is it doesn't work right if you start uploading when there are derive jobs running
04:54:44	<nicolas17>	so if your upload fails, you may need to wait a few minutes for running jobs to finish before retrying, otherwise it will start uploading files that are already uploaded
04:59:26	<@JAA>	Oh boy, it's been 6 years now since I filed that issue.
05:00:45	<@JAA>	And it's not about derives. It's about archive.php tasks.
05:01:26	<@JAA>	A derive could be blocking those, but it could also be something else, including an error that needs to be fixed by IA staff.
05:02:27	<@JAA>	Basically, you want to check whether there are any non-finished tasks for the item, and only rerun when there aren't.
06:13:16		thalia (thalia) joins
06:36:07	<Vokun>	If torrent upload worked still, that would save me so much time
06:37:00	<Vokun>	Do we know why items have the ~1TiB limit or if this is a limitation that is planned to be fixed?
06:54:32		datechnoman quits [Quit: Ping timeout (120 seconds)]
06:55:12		datechnoman (datechnoman) joins
08:42:02		datechnoman quits [Client Quit]
09:40:25		datechnoman (datechnoman) joins
10:10:23		TheTechRobo quits [Quit: Ping timeout (120 seconds)]
10:12:42		TheTechRobo (TheTechRobo) joins
14:26:03		SootBector quits [Remote host closed the connection]
14:26:35		SootBector (SootBector) joins
15:30:38		threedeeitguy (threedeeitguy) joins
16:32:56		NatTheCat (NatTheCat) joins
16:53:55		IDK (IDK) joins
17:03:48	<nicolas17>	Vokun: afaik items can't be distributed across multiple servers, so there has to be a limit to their size
17:46:36		pokechu22 quits [Quit: WeeChat 4.4.2]
17:49:09		pokechu22 (pokechu22) joins
17:59:28		PredatorIWD255 joins
18:03:13		PredatorIWD25 quits [Ping timeout: 260 seconds]
18:03:14		PredatorIWD255 is now known as PredatorIWD25
19:40:19		Craigle quits [Quit: The Lounge - https://thelounge.chat]
19:41:12		Craigle (Craigle) joins
20:11:07	<@arkiver>	but nowadays drives are so big...
20:11:26	<@arkiver>	but yes there's various reasons for not allowing big items, one of them is handling them when stuff needs to be moved around
20:34:58		BornOn420 quits [Remote host closed the connection]
20:35:36		BornOn420 (BornOn420) joins
20:39:28		Dango360 quits [Quit: Leaving]
21:19:18		Lord_Nightmare quits [Quit: ZNC - http://znc.in]
21:25:18		rewby quits [Ping timeout: 250 seconds]
21:31:46		Lord_Nightmare (Lord_Nightmare) joins
21:36:20	<nicolas17>	arkiver: I said there has to be a limit, not that the limit being 1TB is okay, maybe it could be increased ;)
21:43:12	<imer>	more surprised they don't do automatic sharding of bigger files, but that's probably due to historic reasons (and to keep things simple)
21:43:57	<nicolas17>	imer: download links redirect to an individual file server, sharding would need a frontend to get them from multiple servers and concatenate them back
21:44:04	<imer>	yes :)
21:45:33	<imer>	does make the actual handling of data easier since you're dealing with reasonable sizes only
21:45:41	<imer>	very much a trade off
21:58:18		tzt quits [Ping timeout: 260 seconds]
22:01:05		tzt (tzt) joins
22:02:48		thalia quits [Quit: Connection closed for inactivity]
22:10:39	<@JAA>	IIRC, there was a blog post or something about this years ago where they answered various such questions and also 'why not Ceph' etc. My takeaway was 'less complexity is better', which I'd generally agree with, as it reduces the number of things that can go wrong. Hence the simple setup with paired servers and plain drives.
22:19:05	<imer>	yeah
22:21:38		rewby (rewby) joins
22:25:02		Lord_Nightmare quits [Client Quit]
22:28:39		Lord_Nightmare (Lord_Nightmare) joins

Home Search Previous day Next day