00:00:37 | | kaz (Kaz) joins |
00:00:37 | | @ChanServ sets mode: +o kaz |
00:05:30 | | @kaz quits [*.net *.split] |
00:28:58 | | kaz (Kaz) joins |
00:28:58 | | @ChanServ sets mode: +o kaz |
00:29:32 | | igloo22225 (igloo22225) joins |
00:30:22 | | linuxgemini (linuxgemini) joins |
01:10:23 | | kokos- joins |
01:37:26 | | katia_ (katia) joins |
03:39:37 | | PredatorIWD26 joins |
03:42:58 | | PredatorIWD2 quits [Ping timeout: 260 seconds] |
03:42:58 | | PredatorIWD26 is now known as PredatorIWD2 |
03:59:53 | | DogsRNice quits [Read error: Connection reset by peer] |
04:00:11 | | DogsRNice joins |
04:35:58 | | tech234a (tech234a) joins |
04:56:54 | | DogsRNice quits [Read error: Connection reset by peer] |
05:43:08 | | nukke quits [Ping timeout: 260 seconds] |
06:07:17 | | BearFortress_ quits [] |
06:45:04 | | BearFortress joins |
06:46:58 | | nukke (nukke) joins |
06:52:12 | | nukke quits [Ping timeout: 250 seconds] |
07:06:30 | | nukke (nukke) joins |
07:11:42 | | nukke quits [Ping timeout: 250 seconds] |
07:24:16 | | nukke (nukke) joins |
07:29:02 | | nukke quits [Ping timeout: 250 seconds] |
07:42:53 | | nukke (nukke) joins |
08:57:57 | <Nemo_bis> | I've just saved 75 % disk space in a staging directory of mine by switching WARCs from gzip to zstd -19 compression (without custom dictionary)... Is it ok to upload *.warc.zst files to archive.org collections or is there some special procedure to follow compared to gz? |
09:08:19 | | BornOn420 quits [Remote host closed the connection] |
09:15:16 | | nulldata quits [Quit: So long and thanks for all the fish!] |
09:16:53 | | nulldata (nulldata) joins |
09:24:19 | | Matthww joins |
09:24:47 | | BornOn420 (BornOn420) joins |
09:55:35 | | myself9 (myself) joins |
09:56:12 | | myself quits [Read error: Connection reset by peer] |
09:56:12 | | myself9 is now known as myself |
10:25:14 | | SootBector quits [Remote host closed the connection] |
10:32:20 | | katia_ quits [Ping timeout: 250 seconds] |
10:33:03 | | kokos- quits [Ping timeout: 260 seconds] |
11:03:36 | | kokos- joins |
11:20:06 | | katia_ (katia) joins |
11:28:51 | <@arkiver> | Nemo_bis: did you compress the records individually? |
11:29:02 | <@arkiver> | or just decompressed the .gz, and compressed the entire WARC as .zst ? |
11:29:55 | <@arkiver> | if your .zst WARC are valid, there is no special procedure, they can just be uploaded as normal and will be handled |
11:50:27 | <Nemo_bis> | arkiver: I just decompressed and recompressed as is. How do I verify if it's still valid? |
11:50:55 | <Nemo_bis> | The entire WARC file I mean. |
11:55:06 | <Nemo_bis> | I see that the WARC-Filename headers inside still reference *.gz but they don't need to match, right? They already didn't after smaller WARC filed were megawarc'ed |
12:54:08 | <OrIdow6> | Nemo_bis: I think what arkiver means is that your zstd-compressed version needs to have each record zstd-compressed separately, then the compressed versions concatenated together |
12:54:12 | <OrIdow6> | Not as a single stream |
12:54:23 | <OrIdow6> | Because that lets you seek |
12:56:45 | <@arkiver> | yes what OrIdow6 says |
12:57:01 | <@arkiver> | Nemo_bis: no they don't need to match |
12:58:00 | <@arkiver> | the way you recompressed it creates an invalid WARC, and explains the 75% disk space savings over .gz. if you compress the records individually for a correct WARC, the percentage saved should be much smaller |
13:06:27 | | funderscore is now known as f_ |
14:48:43 | <Nemo_bis> | Right, makes sense. Thanks! (I wasn't planning to upload these, they're just a local cache of mine.) |
15:48:40 | | katia_ quits [Ping timeout: 250 seconds] |
15:49:13 | | kokos- quits [Ping timeout: 260 seconds] |
15:56:59 | | kokos- joins |
15:59:12 | | th3z0l4_ joins |
16:01:28 | | th3z0l4 quits [Ping timeout: 260 seconds] |
16:25:18 | | katia_ (katia) joins |
16:32:29 | | SootBector (SootBector) joins |
16:36:58 | | SootBector quits [Remote host closed the connection] |
16:39:48 | | katia_ quits [Ping timeout: 250 seconds] |
16:39:58 | | kokos- quits [Ping timeout: 260 seconds] |
18:04:06 | <nicolas17> | turns out the reason I'm uploading at 5s/MiB instead of 5MiB/s is a much more global problem |
18:06:14 | <nicolas17> | I see friends complaining about crazy high packet loss to AWS us-east1 even |
18:17:06 | <nicolas17> | 602/631 [46:28<03:15, 6.73s/MiB] |
18:44:33 | | kokos- joins |
19:06:16 | | kokos- quits [Ping timeout: 250 seconds] |
19:28:42 | | kokos- joins |
19:29:06 | | katia_ (katia) joins |
19:37:18 | | that_lurker quits [Read error: Connection reset by peer] |
19:37:22 | | that_lurker (that_lurker) joins |
19:38:46 | | kokos- quits [Ping timeout: 250 seconds] |
19:39:03 | | katia_ quits [Ping timeout: 260 seconds] |
19:39:19 | | SootBector (SootBector) joins |
19:52:12 | | DogsRNice joins |
19:57:35 | | kokos- joins |
20:07:41 | | BornOn420 quits [Remote host closed the connection] |
20:08:15 | | katia_ (katia) joins |
20:41:28 | | kokos- quits [Client Quit] |
20:41:29 | | katia_ quits [Client Quit] |
21:47:12 | | Sidpatchy3 (Sidpatchy) joins |
21:47:58 | | Sidpatchy quits [Ping timeout: 260 seconds] |
21:47:58 | | Sidpatchy3 is now known as Sidpatchy |
22:09:18 | | BornOn420 (BornOn420) joins |
22:38:42 | | PredatorIWD2 quits [Read error: Connection reset by peer] |
22:42:00 | | PredatorIWD2 joins |
23:54:20 | | PredatorIWD2 quits [Read error: Connection reset by peer] |