| 00:00:44 | <pokechu22> | The meta one is the job log and should be uploaded. The .cdx file is normally derived from the WARC by IA itself, though I don't know if that always happens or only happens for items that get indexed by web.archive.org. |
| 00:00:47 | | tekulvw quits [Ping timeout: 272 seconds] |
| 00:04:14 | <klea> | I think it only happens for items that get indexed by web.archive.org? https://archive.org/download/limewire.com_d_7xNKB_NfXjrIqBWo |
| 00:04:32 | <klea> | Tho, maybe it was me not running the derive thing after every file |
| 00:04:35 | <klea> | lemme make it derive. |
| 00:04:43 | <klea> | (if i remember howto) |
| 00:06:24 | <klea> | It seems if you have IA derive (which is the default I believe on the web uploader?), it will make a cdx. <https://archive.org/log/5191197716> claims it will do a CDXIndex. |
| 00:06:54 | <klea> | Huh |
| 00:06:58 | <klea> | [ PST: 2026-02-16 16:05:08 ] Executing: ulimit -v 1048576 && PYTHONPATH=/petabox/sw/lib/python timeout 600 /petabox/sw/bin/cdx_writer.pex 'WARCPROX-20260216205304743-00000-y1i40ow9.warc.gz' --file-prefix='limewire.com_d_7xNKB_NfXjrIqBWo' --exclude-list='/petabox/sw/wayback/web_excludes.txt' --stats-file='/f/_limewire.com_d_7xNKB_NfXjrIqBWo/cdxstats.json'> |
| 00:06:58 | <klea> | '/t/_limewire.com_d_7xNKB_NfXjrIqBWo/cdx.txt' |
| 00:07:09 | <klea> | Wait a second. |
| 00:07:35 | <klea> | Couldn't that be a way to bulk check lots of urls by making a warc with records of lots of data, and then getting the cdx and seeing what apparently is missing? |
| 00:07:50 | <klea> | Then you'd request deletion of all that crap, because nobody wants it. |
| 00:19:07 | | nine quits [Quit: See ya!] |
| 00:19:20 | | nine joins |
| 00:19:20 | | nine is now authenticated as nine |
| 00:19:20 | | nine quits [Changing host] |
| 00:19:20 | | nine (nine) joins |
| 00:23:56 | <cruller> | TheoH7: I uploaded the entire output directory. https://archive.org/details/community.jisc.ac.uk-2026-02-16-35e53623-00000 |
| 00:32:50 | | etnguyen03 quits [Client Quit] |
| 00:40:14 | | SootBector quits [Remote host closed the connection] |
| 00:41:22 | | SootBector (SootBector) joins |
| 00:42:30 | <TheoH7> | cruller: Thanks, have downloaded it. |
| 00:43:16 | <TheoH7> | It looks like I also managed to do one where hard-coded links to https://community.ja.net (old address from years ago) are clickable in the WARC. I will upload that to IA likely in a few hours. |
| 00:44:16 | <TheoH7> | To upload the whole directory, is the best way to zip and upload, or can you select a whole folder for upload? |
| 00:48:36 | <pokechu22> | You can upload multiple files at once within a directory (uploading directories/subdirectories might also be possible but I think is more complicated?) |
| 00:50:57 | <TheoH7> | pokechu22: Great, will do that. |
| 00:51:48 | <TheoH7> | Seems one of my crawls has somehow managed to start crawling old versions of this site stored on the Wayback Machine, which is odd. I've added the pattern to ignores but just curious how grab-site would've found and started crawling such URL's. |
| 00:52:11 | <TheoH7> | I do already have 1 crawl without that done though, and will only upload the 2nd one if contains materially more content |
| 01:02:20 | | tekulvw (tekulvw) joins |
| 01:03:29 | | ducky quits [Ping timeout: 272 seconds] |
| 01:04:19 | | etnguyen03 (etnguyen03) joins |
| 01:07:21 | | tekulvw quits [Ping timeout: 268 seconds] |
| 01:21:59 | | tekulvw (tekulvw) joins |
| 01:25:36 | | Webuser614729 joins |
| 01:26:29 | | Webuser614729 quits [Client Quit] |
| 01:27:43 | | wotd joins |
| 01:41:28 | | pokechu22 quits [Quit: System maintenance] |
| 02:22:10 | | sec^nd quits [Remote host closed the connection] |
| 02:22:35 | | sec^nd (second) joins |
| 02:36:40 | <nexussfan> | There's a site dedicated to archiving Iranian series and films <https://nostalgik-tv.com/> which says they have 4 terabytes of videos. Would it be a good idea to archive it, or not now? |
| 02:44:47 | | APOLLO03 quits [Ping timeout: 268 seconds] |
| 02:47:44 | | ducky (ducky) joins |
| 02:56:13 | | nine quits [Ping timeout: 272 seconds] |
| 02:58:38 | | nine joins |
| 02:58:40 | | nine is now authenticated as nine |
| 02:58:40 | | nine quits [Changing host] |
| 02:58:40 | | nine (nine) joins |
| 03:13:09 | | iPwnedYourIOTSmartdog quits [Ping timeout: 268 seconds] |
| 03:13:46 | | iPwnedYourIOTSmartdog joins |
| 04:01:44 | | etnguyen03 quits [Remote host closed the connection] |
| 04:11:07 | | tekulvw quits [Ping timeout: 268 seconds] |
| 04:14:04 | | tekulvw (tekulvw) joins |
| 04:23:37 | | tekulvw quits [Ping timeout: 272 seconds] |
| 04:26:20 | | Island quits [Read error: Connection reset by peer] |
| 04:28:43 | | tekulvw (tekulvw) joins |
| 04:33:45 | | tekulvw quits [Ping timeout: 272 seconds] |
| 05:04:47 | | n9nes quits [Ping timeout: 272 seconds] |
| 05:08:15 | | n9nes joins |
| 05:14:14 | | tekulvw (tekulvw) joins |
| 05:24:25 | | tekulvw quits [Ping timeout: 272 seconds] |
| 05:32:43 | | sec^nd quits [Remote host closed the connection] |
| 05:33:05 | | sec^nd (second) joins |
| 05:42:42 | <ericgallager> | I forget, did this make it here? https://www.theregister.com/2026/02/12/polyglot_notebooks_deprecation/ |
| 05:51:53 | | tekulvw (tekulvw) joins |
| 05:56:34 | | tekulvw quits [Ping timeout: 268 seconds] |
| 05:56:53 | | tekulvw (tekulvw) joins |
| 06:01:30 | | tekulvw quits [Ping timeout: 268 seconds] |
| 06:16:19 | | nexussfan quits [Quit: Konversation terminated!] |
| 06:17:13 | | ArchivalEfforts quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
| 06:17:22 | | ArchivalEfforts joins |
| 06:22:50 | | tekulvw (tekulvw) joins |
| 06:27:45 | | tekulvw quits [Ping timeout: 272 seconds] |
| 06:57:16 | | tekulvw (tekulvw) joins |
| 07:05:37 | | pokechu22 (pokechu22) joins |
| 08:52:56 | | ducky quits [Ping timeout: 268 seconds] |
| 08:53:09 | | ducky (ducky) joins |
| 08:54:25 | | Dango360 quits [Quit: The Lounge - https://thelounge.chat] |
| 09:29:34 | | TheEnbyperor_ quits [Read error: Connection reset by peer] |
| 09:30:09 | | cipherrot quits [Ping timeout: 272 seconds] |
| 09:30:09 | | TheEnbyperor quits [Ping timeout: 272 seconds] |
| 09:37:48 | | Snivy quits [Quit: The Lounge - https://thelounge.chat] |
| 09:38:00 | | TheEnbyperor joins |
| 09:38:11 | | petrichor (petrichor) joins |
| 09:38:17 | | Snivy (Snivy) joins |
| 09:38:23 | | Snivy quits [Remote host closed the connection] |
| 09:38:36 | | TheEnbyperor_ (TheEnbyperor) joins |
| 09:39:42 | | Snivy (Snivy) joins |
| 09:42:53 | | tekulvw quits [Ping timeout: 268 seconds] |
| 10:03:54 | | rohvani quits [Quit: The Lounge - https://thelounge.chat] |
| 10:09:54 | | @arkiver quits [Remote host closed the connection] |
| 10:10:21 | | arkiver (arkiver) joins |
| 10:10:21 | | @ChanServ sets mode: +o arkiver |
| 10:14:12 | | fireatseaparks quits [Remote host closed the connection] |
| 10:14:48 | | fireatseaparks (fireatseaparks) joins |