00:14:13 | | BornOn420 quits [Remote host closed the connection] |
00:14:48 | | BornOn420 (BornOn420) joins |
00:34:13 | | BornOn420 quits [Remote host closed the connection] |
00:34:53 | | BornOn420 (BornOn420) joins |
00:39:06 | | Exorcism quits [Quit: Ping timeout (120 seconds)] |
00:39:37 | | Exorcism (exorcism) joins |
01:45:28 | | AlsoHP_Archivist joins |
01:49:14 | | HP_Archivist quits [Ping timeout: 276 seconds] |
03:40:47 | | DopefishJustin quits [Remote host closed the connection] |
03:43:14 | | Sanqui quits [Ping timeout: 260 seconds] |
03:48:06 | | DopefishJustin joins |
03:48:06 | | DopefishJustin is now authenticated as DopefishJustin |
03:52:36 | <nicolas17> | https://archive.org/download/macos-installassistant-24F5053f-warc any tool to download one file from inside this warc? |
03:53:26 | | Sanqui joins |
03:53:31 | | Sanqui is now authenticated as Sanqui |
03:53:31 | | Sanqui quits [Changing host] |
03:53:31 | | Sanqui (Sanqui) joins |
03:53:44 | <nicolas17> | I can download the whole warc, get the offset from the cdx, and then use warcpayload from warctools to get the file I need; but that means 2x the disk writes, there has to be a way to use an HTTP range request T_T |
03:55:02 | <nicolas17> | nvm warcpayload is not working either |
03:59:29 | <pokechu22> | If it were a warc on web.archive.org you could use that, but it looks like it's not |
03:59:52 | <pokechu22> | HTTP range requests probably are possible but I don't know exactly how to do it either |
04:00:08 | <nicolas17> | yeah |
04:00:24 | <nicolas17> | todo: request a wbm-enabled collection to put these items in |
04:02:52 | <nicolas17> | the reason I even uploaded those WARCs instead of just using archivebot was to deduplicate the data across different URLs |
04:16:46 | <BlankEclair> | > [23/06/2025 13:52] <nicolas17> https://archive.org/download/macos-installassistant-24F5053f-warc any tool to download one file from inside this warc? |
04:16:48 | <BlankEclair> | what url? |
04:18:58 | <nicolas17> | the InstallAssistant.pkg which is 98% of the warc file size :P |
04:19:23 | <nicolas17> | idk why warctools is failing... I think it doesn't like the non-gzipped warc? |
04:20:18 | <BlankEclair> | ah |
04:29:27 | <BlankEclair> | nicolas17: curl -L https://archive.org/download/macos-installassistant-24F5053f-warc/macos-installassistant-24F5053f.warc -H 'Range: bytes=89381082-15628036354' |
04:30:37 | <BlankEclair> | ah wait, there's a discrepency w/ the content length |
04:32:02 | <BlankEclair> | bleh, dunno what it is tbh |
04:34:12 | <BlankEclair> | bytes=89381082-15717415220 maybe? |
08:36:04 | | Sluggs quits [Ping timeout: 260 seconds] |
09:14:34 | | simon816 quits [Quit: ZNC 1.9.1 - https://znc.in] |
09:20:51 | | simon816 (simon816) joins |
10:09:32 | | Sluggs (Sluggs) joins |
13:14:19 | | Sanqui quits [Ping timeout: 260 seconds] |
13:24:55 | | Sanqui joins |
13:24:59 | | Sanqui is now authenticated as Sanqui |
13:24:59 | | Sanqui quits [Changing host] |
13:24:59 | | Sanqui (Sanqui) joins |
13:27:29 | <@JAA> | Yes, range requests. Since it's uncompressed, you can adjust it for the WARC header size to get just the HTTP response. And if there's no chunked TE, you can use the same thing to get past the HTTP headers, too. |
13:49:00 | <@arkiver> | i have some code at https://github.com/ArchiveTeam/zstd-dictionary-trainer/blob/master/trainer/urls.py that handles both zst and gz WARCs and extract specific records, and uncompresses them |
13:49:09 | <@arkiver> | i should separate that code out into something more usable |
13:49:37 | <@arkiver> | nicolas17: how can i make this useful? allow one to go "extract record with this URL from this WARC"? or something else? |
13:49:48 | <@arkiver> | or all records matching some regex? |
14:31:54 | | that_lurker quits [Remote host closed the connection] |
14:31:58 | | that_lurker (that_lurker) joins |
14:48:13 | | BearFortress quits [Read error: Connection reset by peer] |
15:20:32 | | BearFortress joins |
17:46:09 | | PredatorIWD25 quits [Read error: Connection reset by peer] |
18:01:06 | | PredatorIWD25 joins |
18:04:26 | | KoalaBear quits [Read error: Connection reset by peer] |
18:07:00 | | KoalaBear joins |
20:17:22 | | nicolas17 is now authenticated as nicolas17 |
22:40:09 | | Sanqui quits [Ping timeout: 260 seconds] |
22:51:41 | | Sanqui joins |
22:51:45 | | Sanqui is now authenticated as Sanqui |
22:51:45 | | Sanqui quits [Changing host] |
22:51:45 | | Sanqui (Sanqui) joins |
23:20:40 | | Yakov is now authenticated as Yakov |
23:20:40 | | Yakov quits [Changing host] |
23:20:40 | | Yakov (Yakov) joins |
23:29:43 | | Stagnant_ quits [Remote host closed the connection] |
23:30:03 | | Stagnant_ (Stagnant) joins |