00:32:56 | <katia> | https://github.com/adsblol/globe_history_2023 to upload this to IA, would i want 1 item per day and set the collection metadata to the identifier of another 'meta' item? |
00:35:45 | <@JAA> | The collection field has to be a collection item, and only IA staff can create those. |
00:36:24 | <@JAA> | Not sure if one item per day is best, but it'd probably work. |
00:36:37 | <@JAA> | I might also consider one item per month. |
00:37:11 | <@JAA> | arkiver can set you up with a collection. |
00:37:35 | <katia> | one item per year? |
00:38:08 | <@JAA> | Oh, is it more than one year? The repo name kind of suggests it's 2023 only. |
00:38:24 | <katia> | it's just because of github's limits |
00:38:33 | <katia> | they have some hard limits per repo of storage |
00:38:38 | <katia> | so i made a new repo for 2024 |
00:39:08 | <@JAA> | Oh, this is your thing, I see. :-) |
00:39:37 | <@JAA> | 1077 GiB is too large for one item. |
00:42:01 | <katia> | 'If the files in an item have separate metadata, the files should probably be in different items' makes me think an item per day |
00:42:08 | <katia> | because i'd want date: metadata maybe? |
00:42:12 | <katia> | from https://archive.org/developers/items.html |
00:42:48 | <@JAA> | The date field can have a YYYY-MM format, but yeah, maybe an item per day works best anyway. |
00:43:58 | <katia> | any other metadata i'd want? mediatype:data |
00:45:16 | <katia> | licenseurl:, source: with the github url maybe |
00:45:17 | <nicolas17> | afaik mediatype and identifier are the important ones that you can't change later |
00:45:27 | <@JAA> | And collection |
00:45:40 | <katia> | i see |
00:46:48 | <@JAA> | I've come to like {previous,next}_item for this kind of sequence of items. Sadly doesn't get linked on the web interface though. |
00:46:59 | <katia> | https://help.archive.org/help/collections-a-basic-guide/ seems to suggest IA can change the collection once i upload 50+ items |
00:47:21 | <@JAA> | General IA help doesn't apply with AT. :-) |
00:47:43 | <@JAA> | As I said, arkiver can create a collection for you, and then you can just directly upload there. |
00:48:00 | <katia> | thank you, that's great :D <3 |
00:48:28 | <@JAA> | Where 'directly upload' means 'set the collection metadata field at upload time', of course. |
01:44:37 | | Webuser230 quits [Ping timeout: 265 seconds] |
02:15:59 | | RealPerson joins |
03:11:02 | | angenieux (angenieux) joins |
03:17:45 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
05:15:25 | | DogsRNice quits [Read error: Connection reset by peer] |
05:17:43 | | qwertyasdfuiopghjkl quits [Client Quit] |
05:21:09 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
05:34:20 | | nulldata quits [Ping timeout: 240 seconds] |
05:45:18 | | nulldata (nulldata) joins |
06:04:36 | | qwertyasdfuiopghjkl quits [Client Quit] |
06:22:45 | | jtagcat quits [Quit: Bye!] |
06:23:08 | | jtagcat (jtagcat) joins |
06:38:15 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
07:30:33 | <@arkiver> | katia: can you upload an example item to IA? so i know what it would look like |
07:30:42 | <@arkiver> | and i read 1 item per day - is that still your plan? |
07:37:26 | <@arkiver> | note that Save Page Now is not really made for archiving an entire website one page at a time |
07:37:26 | <@arkiver> | it |
07:37:33 | <@arkiver> | it's more made for one-off web pages |
09:08:39 | <Nemo_bis> | nicolas17: nice work, is your script somewhere? would be nice to add the metadata https://archive.org/download/samsung-opensource-9881/metadata.json to the item metadata |
09:10:40 | <Nemo_bis> | nicolas17: if all the items are under 5 GB or so, the size hint is unlikely to matter too much |
09:17:21 | <katia> | arkiver, the files i wanna upload are these: https://github.com/adsblol/globe_history_2023/releases/ |
09:17:32 | <katia> | it's a set of items per day |
09:17:41 | <katia> | usually 2 |
09:19:25 | <katia> | https://archive.org/details/adsblol-globe-history-2023 i uploaded them here but will delete them and reupload one item per day |
09:32:29 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
10:04:25 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
10:48:24 | | f_ (funderscore) joins |
11:28:47 | | SootBector (SootBector) joins |
11:30:06 | <katia> | and maybe the name of the collection would look something like 'adsblol-globe-history' |
11:48:13 | | qwertyasdfuiopghjkl quits [Client Quit] |
11:54:16 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
13:16:50 | | Arcorann quits [Ping timeout: 240 seconds] |
13:47:34 | | qwertyasdfuiopghjkl quits [Client Quit] |
13:49:49 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
16:15:04 | <nicolas17> | what the heck is going on |
16:15:28 | <nicolas17> | uploading to IA, I'm now getting 50s/MiB from home and 20MiB/s from my VPS |
16:16:56 | <Terbium> | you probably have better peering to Cogent or HE from home |
16:17:17 | <nicolas17> | last night I was getting like 500KiB/s from both |
16:17:31 | <nicolas17> | Terbium: note the units |
16:17:58 | <Terbium> | IA's upload is highly unstable from my experience so it looks pretty normal to me lol |
16:19:03 | <Terbium> | I never get consistent speeds for anything to/from IA and varies highly. Although better peering generally helps out a little |
16:35:37 | | simon816 quits [Quit: ZNC 1.8.2 - https://znc.in] |
16:40:49 | | simon816 (simon816) joins |
16:56:10 | | simon816 quits [Client Quit] |
17:00:50 | | simon816 (simon816) joins |
17:01:14 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
17:39:33 | <@JAA> | 50 s/MiB is a special kind of hell though. |
17:43:27 | | simon816 quits [Client Quit] |
17:46:59 | <nicolas17> | JAA: it uploaded a 6MB file, and when it started uploading the 400MB file and I saw the speed, I went "wtf that's absurd" |
17:47:23 | <nicolas17> | stopped, rsync'd it to my VPS at 5MB/s and uploaded to IA from there... 20MB/s *what* |
17:47:51 | <@JAA> | Nothing unusual there. Stuff's weird. |
17:48:19 | <@JAA> | I used to sometimes route my uploads through another server that was further away from IA and got better speeds. :-) |
17:49:17 | <nicolas17> | I'll have to write some more code to deal with these samsung uploads... I lose track of what's done and what isn't |
17:49:50 | <nicolas17> | if I keep the zip locally, the download script will notice it's already there and skip it, and the ia command will notice hash matches and skip it |
17:49:59 | <nicolas17> | but I'll run out of local disk space |
17:50:40 | <nicolas17> | if I delete the zip, then I have to track what's already done, if I just run my script again it will see the local file is missing and download it, then ia will see it's already on IA and won't upload it, so the download was pointless |
17:50:48 | | simon816 (simon816) joins |
17:52:33 | <@JAA> | `touch uploaded; rm $zipfile && ln -s uploaded $zipfile` |
17:52:54 | <@JAA> | Not sure the ia CLI will be too happy with that though. |
17:53:31 | <@JAA> | But wouldn't be difficult to only pass non-symlinks to it. |
17:55:23 | <audrooku|m> | Nicolas17: is the samsung data going into the wayback machine? |
17:55:56 | | DogsRNice joins |
18:05:50 | <Nemo_bis> | The good old perl script kept track of that and might still work :) |
18:06:08 | <Nemo_bis> | https://github.com/kngenie/ias3upload |
18:06:30 | <Nemo_bis> | audrooku|m: no |
18:11:10 | <audrooku|m> | Fair |
18:19:52 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
18:33:27 | <Terbium> | I usually dual peer with Cogent and HE to to get the best uploads to IA to avoid extra transit hops. |
18:34:08 | <Terbium> | With good peering, even a server that's physically further can get better speeds than one with poor peering |
18:34:52 | <Terbium> | not that it helps too much as usually IAs ingress is unstable due to the sheer volume of data |
20:09:10 | | qwertyasdfuiopghjkl quits [Client Quit] |
20:15:11 | <fireonlive> | need to get IA some terabit links |
22:01:16 | <nicolas17> | audrooku|m: they are POST requests with a one-time token, can't do WBM :( |
23:00:57 | | Arcorann (Arcorann) joins |
23:27:57 | | katia quits [Remote host closed the connection] |
23:28:13 | | katia (katia) joins |