00:03:50fuzzy8021 quits [Read error: Connection reset by peer]
00:03:59fuzzy80211 (fuzzy80211) joins
00:15:17systwi_ joins
02:00:13<nicolas17>hm, 13GB file in WBM, trying to download it gets 15MB and closes connection
02:00:24<nicolas17>that usually suggests the URL was only partially archived
02:00:47<pokechu22>I think there's a header with more information about if a file was truncated
02:00:52<nicolas17>but I remember seeing a header that says "download aborted due to 'time'" or something like that and I'm not seeing it here
02:01:19<nicolas17>https://web.archive.org/web/20240724054156/https://swcdn.apple.com/content/downloads/61/25/062-37076-A_IBQ9B6BPOH/d6qqf4pb3wagb0zgsdbd9kgscpurrjky5x/InstallAssistant.pkg
02:02:56<pokechu22>Hmm, yeah, that has an actual content-length: 13495235447 header too - I'd expect that to be different in this case
04:08:00<nicolas17>https://web.archive.org/web/20240924153134/https://swcdn.apple.com/content/downloads/48/28/062-80466-A_IX9VBQPBBE/8jcyvbnr93ho7qbzvnbw1jv9vtnmo76ifl/InstallAssistant.pkg and this one gives gateway timeout / bad gateway and never starts downloading
04:26:28<nicolas17>pokechu22: I wonder if IA removed that warning header saying the capture was truncated
04:26:46<nicolas17>because I didn't see it in any of these captures, not even those that fail to download after 15MB
04:35:20<@JAA>No, the warning header still exists as of a couple weeks ago: https://web.archive.org/web/20240911063246/http://nbg1-speed.hetzner.com/10GB.bin
04:36:56<@JAA>Or well, I'd expect the warning header to be based on the WARC-Truncated header, and I'm sure they won't change *that*.
04:52:25DogsRNice quits [Read error: Connection reset by peer]
06:09:41IDK (IDK) joins
06:17:27Lord_Nightmare quits [Quit: ZNC - http://znc.in]
06:21:11Lord_Nightmare (Lord_Nightmare) joins
08:15:28<qwertyasdfuiopghjkl>nicolas17: According to the x-archive-src header on your first URL it's in spn-cloudflare-20240724061613/spn-cloudflare-20240723125001-wwwb-front8.us.archive.org-8011.warc.gz, which https://archive.org/download/spn-cloudflare-20240724061613 says is only 953.8MiB, so unless that file somehow compressed very well it's truncated.
08:53:16IDK quits [Client Quit]
09:15:14nulldata quits [Quit: So long and thanks for all the fish!]
09:16:40nulldata (nulldata) joins
09:51:08driib quits [Quit: The Lounge - https://thelounge.chat]
09:51:34driib (driib) joins
09:52:39<@JAA>I'd love to see what that WARC record looks like.
09:54:03<@JAA>But the CDX API confirms the truncation to about 15.5 MB (after compression?): https://web.archive.org/cdx/search/cdx?url=https://swcdn.apple.com/content/downloads/61/25/062-37076-A_IBQ9B6BPOH/d6qqf4pb3wagb0zgsdbd9kgscpurrjky5x/InstallAssistant.pkg
10:52:14<@arkiver>nicolas17: i think it may really be better to rely on #archivebot for thi
10:52:16<@arkiver>s
12:59:29IDK (IDK) joins
13:35:37KoalaBear84 joins
13:39:04KoalaBear quits [Ping timeout: 260 seconds]
13:40:56qw3rty__ joins
13:44:12qw3rty_ quits [Ping timeout: 258 seconds]
13:58:50qw3rty_ joins
14:02:14qw3rty__ quits [Ping timeout: 258 seconds]
14:40:57qw3rty__ joins
14:44:46qw3rty_ quits [Ping timeout: 258 seconds]
15:39:52<nicolas17>arkiver: I am using archivebot for archiving things, these were pre-existing captures (no idea where they came from, maybe someone fed them into SPN or some crawling thing)
15:41:22<nicolas17>from the warc filename, I saw mentions of "SPNOUT", "GDELT" and "WDRP"
15:59:08IDK quits [Client Quit]
16:06:30JaffaCakes118_2 (JaffaCakes118) joins
16:06:46JaffaCakes118 quits [Remote host closed the connection]
18:35:44DogsRNice joins
18:36:22DogsRNice quits [Remote host closed the connection]
19:12:54<nicolas17>JAA: I was downloading the whole thing from WBM but the CDX API already answers that (via the hash) :/
19:13:46<nicolas17>that will be so much faster
19:26:06DogsRNice joins
19:53:37<TheTechRobo>JAA: How do you resume an interrupted ia-upload-stream? It requires --parts, right? What data structure is it expecting?
20:19:55<TheTechRobo>Ah, it's printed if the upload fails
23:16:59thalia joins