00:00:20 | | nyakase quits [Remote host closed the connection] |
00:01:03 | | nyakase (nyakase) joins |
00:10:21 | | nyakase quits [Remote host closed the connection] |
00:12:45 | | nyakase (nyakase) joins |
02:53:20 | | Grzesiek11 joins |
02:53:40 | <Grzesiek11> | hello, I'm trying to upload via torrent, but it's not working |
02:53:44 | <Grzesiek11> | https://archive.org/download/Notch_Twitch_videos |
02:54:09 | <Grzesiek11> | the torrent is seeded and has a tracker. it's about 400 GiB in size. I don't get IA leeches. |
02:55:22 | <Grzesiek11> | nicolas17 said (on another channel) torrent upload broke during the recentish outage, is this correct? |
02:55:42 | <nicolas17> | https://archive.org/history/Notch_Twitch_videos works for me but the individual log links say I don't have permission |
02:56:36 | <Grzesiek11> | hey seems to work for me. guess it's not the same as the "history" link. |
02:56:40 | <Grzesiek11> | let's see |
02:57:20 | <Grzesiek11> | this would be during derive? |
02:58:40 | <@JAA> | I think it'd be a derive.php task, yeah. |
02:58:52 | <nicolas17> | I think so? when I tried it successfully (like a year ago) I just looked at what task was taking long :D |
02:59:58 | <Grzesiek11> | here none take long since it just seems to do nothing lol, I'm looking through the derive log and it says the torrent is a "NonSource", and there is "Nothing to do!" |
03:00:07 | <Grzesiek11> | gonna send the log gimme a sec |
03:00:59 | <Grzesiek11> | https://grzesiek11.stary.pc.pl/files/archive_org_derive.txt |
03:01:21 | <@JAA> | Someone said end of January that torrent uploads were still broken: https://old.reddit.com/r/internetarchive/comments/1gxs0p9/uploading_using_torrent/ |
03:01:40 | <Grzesiek11> | welp.... |
03:02:14 | <Grzesiek11> | guess nothing comes easy. I used the Python uploader for some tiny files before (not exceeding 5 GiB), *maybe* it will work |
03:02:37 | <Grzesiek11> | it took forever to upload those tiny files tho |
03:03:18 | <Grzesiek11> | thought torrent might be faster, but if it won't work at all, then it's infinitely slower. thanks for the help anyways. |
03:10:51 | <nicolas17> | how many files are they? |
03:57:17 | <Grzesiek11> | few hundred, I think about 300 |
04:03:45 | <Grzesiek11> | estimated this is going to take about 17 days |
04:07:17 | <Grzesiek11> | wait, no, wrong, over a month |
04:09:59 | <Grzesiek11> | related question: if over the course of this month my internet dies or something, can I just use the same --spreadsheet file and it'll skip the already uploaded files or not? |
04:12:41 | <Grzesiek11> | as for the precise file count and size - it's exactly 550 files, of which about 1/3rd are larger than 10 MiB for a total of 377 GiB |
04:14:07 | <Grzesiek11> | (actually it's 545 not 550, I was looking at the entire directory rather than just the stuff to upload) |
04:15:27 | | nukke quits [Quit: nukke] |
04:22:33 | | nukke (nukke) joins |
04:53:12 | <nicolas17> | --checksum skips already uploaded files whose checksum matches the local file |
04:53:24 | <nicolas17> | I never used --spreadsheet but I assume --checksum works with it |
04:53:49 | <nicolas17> | the only caveat is it doesn't work right if you start uploading when there are derive jobs running |
04:54:44 | <nicolas17> | so if your upload fails, you may need to wait a few minutes for running jobs to finish before retrying, otherwise it will start uploading files that are already uploaded |
04:59:26 | <@JAA> | Oh boy, it's been 6 years now since I filed that issue. |
05:00:45 | <@JAA> | And it's not about derives. It's about archive.php tasks. |
05:01:26 | <@JAA> | A derive could be blocking those, but it could also be something else, including an error that needs to be fixed by IA staff. |
05:02:27 | <@JAA> | Basically, you want to check whether there are any non-finished tasks for the item, and only rerun when there aren't. |
06:13:16 | | thalia (thalia) joins |
06:36:07 | <Vokun> | If torrent upload worked still, that would save me so much time |
06:37:00 | <Vokun> | Do we know why items have the ~1TiB limit or if this is a limitation that is planned to be fixed? |
06:54:32 | | datechnoman quits [Quit: Ping timeout (120 seconds)] |
06:55:12 | | datechnoman (datechnoman) joins |
08:42:02 | | datechnoman quits [Client Quit] |
09:40:25 | | datechnoman (datechnoman) joins |
10:10:23 | | TheTechRobo quits [Quit: Ping timeout (120 seconds)] |
10:12:42 | | TheTechRobo (TheTechRobo) joins |
14:26:03 | | SootBector quits [Remote host closed the connection] |
14:26:35 | | SootBector (SootBector) joins |
15:30:38 | | threedeeitguy (threedeeitguy) joins |
16:32:56 | | NatTheCat (NatTheCat) joins |
16:53:55 | | IDK (IDK) joins |
17:03:48 | <nicolas17> | Vokun: afaik items can't be distributed across multiple servers, so there has to be a limit to their size |
17:46:36 | | pokechu22 quits [Quit: WeeChat 4.4.2] |
17:49:09 | | pokechu22 (pokechu22) joins |
17:59:28 | | PredatorIWD255 joins |
18:03:13 | | PredatorIWD25 quits [Ping timeout: 260 seconds] |
18:03:14 | | PredatorIWD255 is now known as PredatorIWD25 |
19:40:19 | | Craigle quits [Quit: The Lounge - https://thelounge.chat] |
19:41:12 | | Craigle (Craigle) joins |
20:11:07 | <@arkiver> | but nowadays drives are so big... |
20:11:26 | <@arkiver> | but yes there's various reasons for not allowing big items, one of them is handling them when stuff needs to be moved around |
20:34:58 | | BornOn420 quits [Remote host closed the connection] |
20:35:36 | | BornOn420 (BornOn420) joins |
20:39:28 | | Dango360 quits [Quit: Leaving] |
21:19:18 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
21:25:18 | | rewby quits [Ping timeout: 250 seconds] |
21:31:46 | | Lord_Nightmare (Lord_Nightmare) joins |
21:36:20 | <nicolas17> | arkiver: I said there has to be a limit, not that the limit being 1TB is okay, maybe it *could* be increased ;) |
21:43:12 | <imer> | more surprised they don't do automatic sharding of bigger files, but that's probably due to historic reasons (and to keep things simple) |
21:43:57 | <nicolas17> | imer: download links redirect to an individual file server, sharding would need a frontend to get them from multiple servers and concatenate them back |
21:44:04 | <imer> | yes :) |
21:45:33 | <imer> | does make the actual handling of data easier since you're dealing with reasonable sizes only |
21:45:41 | <imer> | very much a trade off |
21:58:18 | | tzt quits [Ping timeout: 260 seconds] |
22:01:05 | | tzt (tzt) joins |
22:02:48 | | thalia quits [Quit: Connection closed for inactivity] |
22:10:39 | <@JAA> | IIRC, there was a blog post or something about this years ago where they answered various such questions and also 'why not Ceph' etc. My takeaway was 'less complexity is better', which I'd generally agree with, as it reduces the number of things that can go wrong. Hence the simple setup with paired servers and plain drives. |
22:19:05 | <imer> | yeah |
22:21:38 | | rewby (rewby) joins |
22:25:02 | | Lord_Nightmare quits [Client Quit] |
22:28:39 | | Lord_Nightmare (Lord_Nightmare) joins |