#internetarchive log for 2024-01-20

Home Search Previous day Next day

00:32:56	<katia>	https://github.com/adsblol/globe_history_2023 to upload this to IA, would i want 1 item per day and set the collection metadata to the identifier of another 'meta' item?
00:35:45	<@JAA>	The collection field has to be a collection item, and only IA staff can create those.
00:36:24	<@JAA>	Not sure if one item per day is best, but it'd probably work.
00:36:37	<@JAA>	I might also consider one item per month.
00:37:11	<@JAA>	arkiver can set you up with a collection.
00:37:35	<katia>	one item per year?
00:38:08	<@JAA>	Oh, is it more than one year? The repo name kind of suggests it's 2023 only.
00:38:24	<katia>	it's just because of github's limits
00:38:33	<katia>	they have some hard limits per repo of storage
00:38:38	<katia>	so i made a new repo for 2024
00:39:08	<@JAA>	Oh, this is your thing, I see. :-)
00:39:37	<@JAA>	1077 GiB is too large for one item.
00:42:01	<katia>	'If the files in an item have separate metadata, the files should probably be in different items' makes me think an item per day
00:42:08	<katia>	because i'd want date: metadata maybe?
00:42:12	<katia>	from https://archive.org/developers/items.html
00:42:48	<@JAA>	The date field can have a YYYY-MM format, but yeah, maybe an item per day works best anyway.
00:43:58	<katia>	any other metadata i'd want? mediatype:data
00:45:16	<katia>	licenseurl:, source: with the github url maybe
00:45:17	<nicolas17>	afaik mediatype and identifier are the important ones that you can't change later
00:45:27	<@JAA>	And collection
00:45:40	<katia>	i see
00:46:48	<@JAA>	I've come to like {previous,next}_item for this kind of sequence of items. Sadly doesn't get linked on the web interface though.
00:46:59	<katia>	https://help.archive.org/help/collections-a-basic-guide/ seems to suggest IA can change the collection once i upload 50+ items
00:47:21	<@JAA>	General IA help doesn't apply with AT. :-)
00:47:43	<@JAA>	As I said, arkiver can create a collection for you, and then you can just directly upload there.
00:48:00	<katia>	thank you, that's great :D <3
00:48:28	<@JAA>	Where 'directly upload' means 'set the collection metadata field at upload time', of course.
01:44:37		Webuser230 quits [Ping timeout: 265 seconds]
02:15:59		RealPerson joins
03:11:02		angenieux (angenieux) joins
03:17:45		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
05:15:25		DogsRNice quits [Read error: Connection reset by peer]
05:17:43		qwertyasdfuiopghjkl quits [Client Quit]
05:21:09		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
05:34:20		nulldata quits [Ping timeout: 240 seconds]
05:45:18		nulldata (nulldata) joins
06:04:36		qwertyasdfuiopghjkl quits [Client Quit]
06:22:45		jtagcat quits [Quit: Bye!]
06:23:08		jtagcat (jtagcat) joins
06:38:15		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
07:30:33	<@arkiver>	katia: can you upload an example item to IA? so i know what it would look like
07:30:42	<@arkiver>	and i read 1 item per day - is that still your plan?
07:37:26	<@arkiver>	note that Save Page Now is not really made for archiving an entire website one page at a time
07:37:26	<@arkiver>	it
07:37:33	<@arkiver>	it's more made for one-off web pages
09:08:39	<Nemo_bis>	nicolas17: nice work, is your script somewhere? would be nice to add the metadata https://archive.org/download/samsung-opensource-9881/metadata.json to the item metadata
09:10:40	<Nemo_bis>	nicolas17: if all the items are under 5 GB or so, the size hint is unlikely to matter too much
09:17:21	<katia>	arkiver, the files i wanna upload are these: https://github.com/adsblol/globe_history_2023/releases/
09:17:32	<katia>	it's a set of items per day
09:17:41	<katia>	usually 2
09:19:25	<katia>	https://archive.org/details/adsblol-globe-history-2023 i uploaded them here but will delete them and reupload one item per day
09:32:29		qwertyasdfuiopghjkl quits [Remote host closed the connection]
10:04:25		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
10:48:24		f_ (funderscore) joins
11:28:47		SootBector (SootBector) joins
11:30:06	<katia>	and maybe the name of the collection would look something like 'adsblol-globe-history'
11:48:13		qwertyasdfuiopghjkl quits [Client Quit]
11:54:16		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
13:16:50		Arcorann quits [Ping timeout: 240 seconds]
13:47:34		qwertyasdfuiopghjkl quits [Client Quit]
13:49:49		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
16:15:04	<nicolas17>	what the heck is going on
16:15:28	<nicolas17>	uploading to IA, I'm now getting 50s/MiB from home and 20MiB/s from my VPS
16:16:56	<Terbium>	you probably have better peering to Cogent or HE from home
16:17:17	<nicolas17>	last night I was getting like 500KiB/s from both
16:17:31	<nicolas17>	Terbium: note the units
16:17:58	<Terbium>	IA's upload is highly unstable from my experience so it looks pretty normal to me lol
16:19:03	<Terbium>	I never get consistent speeds for anything to/from IA and varies highly. Although better peering generally helps out a little
16:35:37		simon816 quits [Quit: ZNC 1.8.2 - https://znc.in]
16:40:49		simon816 (simon816) joins
16:56:10		simon816 quits [Client Quit]
17:00:50		simon816 (simon816) joins
17:01:14		qwertyasdfuiopghjkl quits [Remote host closed the connection]
17:39:33	<@JAA>	50 s/MiB is a special kind of hell though.
17:43:27		simon816 quits [Client Quit]
17:46:59	<nicolas17>	JAA: it uploaded a 6MB file, and when it started uploading the 400MB file and I saw the speed, I went "wtf that's absurd"
17:47:23	<nicolas17>	stopped, rsync'd it to my VPS at 5MB/s and uploaded to IA from there... 20MB/s what
17:47:51	<@JAA>	Nothing unusual there. Stuff's weird.
17:48:19	<@JAA>	I used to sometimes route my uploads through another server that was further away from IA and got better speeds. :-)
17:49:17	<nicolas17>	I'll have to write some more code to deal with these samsung uploads... I lose track of what's done and what isn't
17:49:50	<nicolas17>	if I keep the zip locally, the download script will notice it's already there and skip it, and the ia command will notice hash matches and skip it
17:49:59	<nicolas17>	but I'll run out of local disk space
17:50:40	<nicolas17>	if I delete the zip, then I have to track what's already done, if I just run my script again it will see the local file is missing and download it, then ia will see it's already on IA and won't upload it, so the download was pointless
17:50:48		simon816 (simon816) joins
17:52:33	<@JAA>	`touch uploaded; rm $zipfile && ln -s uploaded $zipfile`
17:52:54	<@JAA>	Not sure the ia CLI will be too happy with that though.
17:53:31	<@JAA>	But wouldn't be difficult to only pass non-symlinks to it.
17:55:23	<audrooku\|m>	Nicolas17: is the samsung data going into the wayback machine?
17:55:56		DogsRNice joins
18:05:50	<Nemo_bis>	The good old perl script kept track of that and might still work :)
18:06:08	<Nemo_bis>	https://github.com/kngenie/ias3upload
18:06:30	<Nemo_bis>	audrooku\|m: no
18:11:10	<audrooku\|m>	Fair
18:19:52		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
18:33:27	<Terbium>	I usually dual peer with Cogent and HE to to get the best uploads to IA to avoid extra transit hops.
18:34:08	<Terbium>	With good peering, even a server that's physically further can get better speeds than one with poor peering
18:34:52	<Terbium>	not that it helps too much as usually IAs ingress is unstable due to the sheer volume of data
20:09:10		qwertyasdfuiopghjkl quits [Client Quit]
20:15:11	<fireonlive>	need to get IA some terabit links
22:01:16	<nicolas17>	audrooku\|m: they are POST requests with a one-time token, can't do WBM :(
23:00:57		Arcorann (Arcorann) joins
23:27:57		katia quits [Remote host closed the connection]
23:28:13		katia (katia) joins

Home Search Previous day Next day