00:14:13BornOn420 quits [Remote host closed the connection]
00:14:48BornOn420 (BornOn420) joins
00:34:13BornOn420 quits [Remote host closed the connection]
00:34:53BornOn420 (BornOn420) joins
00:39:06Exorcism quits [Quit: Ping timeout (120 seconds)]
00:39:37Exorcism (exorcism) joins
01:45:28AlsoHP_Archivist joins
01:49:14HP_Archivist quits [Ping timeout: 276 seconds]
03:40:47DopefishJustin quits [Remote host closed the connection]
03:43:14Sanqui quits [Ping timeout: 260 seconds]
03:48:06DopefishJustin joins
03:52:36<nicolas17>https://archive.org/download/macos-installassistant-24F5053f-warc any tool to download one file from inside this warc?
03:53:26Sanqui joins
03:53:31Sanqui quits [Changing host]
03:53:31Sanqui (Sanqui) joins
03:53:44<nicolas17>I can download the whole warc, get the offset from the cdx, and then use warcpayload from warctools to get the file I need; but that means 2x the disk writes, there has to be a way to use an HTTP range request T_T
03:55:02<nicolas17>nvm warcpayload is not working either
03:59:29<pokechu22>If it were a warc on web.archive.org you could use that, but it looks like it's not
03:59:52<pokechu22>HTTP range requests probably are possible but I don't know exactly how to do it either
04:00:08<nicolas17>yeah
04:00:24<nicolas17>todo: request a wbm-enabled collection to put these items in
04:02:52<nicolas17>the reason I even uploaded those WARCs instead of just using archivebot was to deduplicate the data across different URLs
04:16:46<BlankEclair>> [23/06/2025 13:52] <nicolas17> https://archive.org/download/macos-installassistant-24F5053f-warc any tool to download one file from inside this warc?
04:16:48<BlankEclair>what url?
04:18:58<nicolas17>the InstallAssistant.pkg which is 98% of the warc file size :P
04:19:23<nicolas17>idk why warctools is failing... I think it doesn't like the non-gzipped warc?
04:20:18<BlankEclair>ah
04:29:27<BlankEclair>nicolas17: curl -L https://archive.org/download/macos-installassistant-24F5053f-warc/macos-installassistant-24F5053f.warc -H 'Range: bytes=89381082-15628036354'
04:30:37<BlankEclair>ah wait, there's a discrepency w/ the content length
04:32:02<BlankEclair>bleh, dunno what it is tbh
04:34:12<BlankEclair>bytes=89381082-15717415220 maybe?
08:36:04Sluggs quits [Ping timeout: 260 seconds]
09:14:34simon816 quits [Quit: ZNC 1.9.1 - https://znc.in]
09:20:51simon816 (simon816) joins
10:09:32Sluggs (Sluggs) joins
13:14:19Sanqui quits [Ping timeout: 260 seconds]
13:24:55Sanqui joins
13:24:59Sanqui quits [Changing host]
13:24:59Sanqui (Sanqui) joins
13:27:29<@JAA>Yes, range requests. Since it's uncompressed, you can adjust it for the WARC header size to get just the HTTP response. And if there's no chunked TE, you can use the same thing to get past the HTTP headers, too.
13:49:00<@arkiver>i have some code at https://github.com/ArchiveTeam/zstd-dictionary-trainer/blob/master/trainer/urls.py that handles both zst and gz WARCs and extract specific records, and uncompresses them
13:49:09<@arkiver>i should separate that code out into something more usable
13:49:37<@arkiver>nicolas17: how can i make this useful? allow one to go "extract record with this URL from this WARC"? or something else?
13:49:48<@arkiver>or all records matching some regex?
14:31:54that_lurker quits [Remote host closed the connection]
14:31:58that_lurker (that_lurker) joins
14:48:13BearFortress quits [Read error: Connection reset by peer]
15:20:32BearFortress joins
17:46:09PredatorIWD25 quits [Read error: Connection reset by peer]
18:01:06PredatorIWD25 joins
18:04:26KoalaBear quits [Read error: Connection reset by peer]
18:07:00KoalaBear joins
22:40:09Sanqui quits [Ping timeout: 260 seconds]
22:51:41Sanqui joins
22:51:45Sanqui quits [Changing host]
22:51:45Sanqui (Sanqui) joins
23:20:40Yakov quits [Changing host]
23:20:40Yakov (Yakov) joins
23:29:43Stagnant_ quits [Remote host closed the connection]
23:30:03Stagnant_ (Stagnant) joins