| 00:05:16 | | Arcorann (Arcorann) joins |
| 00:36:46 | | march_happy quits [Read error: Connection reset by peer] |
| 00:37:31 | | march_happy (march_happy) joins |
| 01:03:03 | | TheTechRobo quits [Read error: Connection reset by peer] |
| 01:03:26 | | TheTechRobo joins |
| 01:04:37 | | dudebloke joins |
| 01:05:48 | | dudebloke quits [Remote host closed the connection] |
| 01:07:26 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 01:12:54 | | Lord_Nightmare (Lord_Nightmare) joins |
| 01:15:08 | | qwertyasdfuiopghjkl joins |
| 02:04:15 | | pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.] |
| 02:07:24 | | pabs (pabs) joins |
| 02:15:08 | | HP_Archivist quits [Client Quit] |
| 03:01:26 | | BlueMaxima joins |
| 03:30:28 | | HackMii quits [Remote host closed the connection] |
| 03:36:30 | | HackMii (hacktheplanet) joins |
| 04:09:00 | | HackMii quits [Remote host closed the connection] |
| 04:10:02 | | HackMii (hacktheplanet) joins |
| 06:11:05 | | march_happy quits [Read error: Connection reset by peer] |
| 06:12:05 | | march_happy (march_happy) joins |
| 06:33:06 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 06:39:46 | | march_happy quits [Ping timeout: 240 seconds] |
| 07:56:16 | | Iki quits [Ping timeout: 240 seconds] |
| 08:01:40 | | geezabiscuit quits [Ping timeout: 240 seconds] |
| 08:10:37 | | knecht420 quits [Client Quit] |
| 08:11:06 | | knecht420 (knecht420) joins |
| 08:15:02 | | geezabiscuit (geezabiscuit) joins |
| 08:15:23 | | adia (adia) joins |
| 08:33:36 | | gazorpazorp quits [Remote host closed the connection] |
| 08:33:45 | | gazorpazorp (gazorpazorp) joins |
| 09:06:03 | | NIC007a83 quits [Read error: Connection reset by peer] |
| 10:23:41 | | mutantmonkey quits [Remote host closed the connection] |
| 10:24:00 | | mutantmonkey (mutantmonkey) joins |
| 10:57:40 | | pie_ quits [Ping timeout: 240 seconds] |
| 11:04:34 | | pie_ joins |
| 11:37:26 | | Iki joins |
| 11:38:04 | | Iki1 joins |
| 11:38:59 | | HackMii quits [Write error: Broken pipe] |
| 11:39:40 | | HackMii (hacktheplanet) joins |
| 11:41:46 | | Iki quits [Ping timeout: 240 seconds] |
| 11:43:08 | | pie_ quits [Client Quit] |
| 11:43:25 | | pie_ joins |
| 11:55:19 | | drexler_ joins |
| 11:56:38 | | drexler quits [Ping timeout: 265 seconds] |
| 12:18:24 | | pie_ quits [Client Quit] |
| 12:18:53 | | pie_ joins |
| 12:49:58 | | HackMii quits [Remote host closed the connection] |
| 12:50:27 | | HackMii (hacktheplanet) joins |
| 13:24:28 | | katocala quits [Ping timeout: 240 seconds] |
| 13:42:48 | | HP_Archivist (HP_Archivist) joins |
| 13:49:55 | | HackMii quits [Remote host closed the connection] |
| 13:50:30 | | HackMii (hacktheplanet) joins |
| 14:28:17 | | Nulo quits [Ping timeout: 265 seconds] |
| 14:30:29 | | Nulo joins |
| 14:45:46 | | Arcorann quits [Ping timeout: 240 seconds] |
| 15:52:39 | | drexler_ is now known as drexler |
| 15:54:41 | | tech_exorcist (tech_exorcist) joins |
| 16:22:10 | | tech_exorcist quits [Remote host closed the connection] |
| 16:22:52 | | tech_exorcist (tech_exorcist) joins |
| 16:28:42 | | Atom-- joins |
| 16:50:04 | | geezabiscuit quits [Ping timeout: 240 seconds] |
| 16:56:04 | | wyatt8740 quits [Ping timeout: 240 seconds] |
| 16:58:04 | | geezabiscuit (geezabiscuit) joins |
| 17:06:04 | | Stilett0 quits [Ping timeout: 240 seconds] |
| 17:11:16 | | tech_exorcist quits [Ping timeout: 240 seconds] |
| 18:14:15 | | Chris5010 quits [Quit: ] |
| 18:49:25 | <systwi_> | JAA: Sorry for the misplacement, there. |
| 18:49:41 | <systwi_> | I assume CDX is some sort of index, IIRC. |
| 18:49:55 | | wyatt8740 joins |
| 18:50:49 | <@JAA> | For context, it's about tech_exorcist's question in -ot the other day: |
| 18:50:49 | <@JAA> | < tech_exorcist> in AT's items on archive.org, what is the difference between <item name>.cdx.gz, <item name>.cdx.idx, .megawarc.json.gz, and .megawarc.os.cdx.gz, as in https://archive.org/download/archiveteam_scratch_20220620234008_608dec9d? apologies for my stupidity |
| 18:50:53 | <@JAA> | < tech_exorcist> and what info do the _meta.{sqlite,xml} files contain? |
| 18:51:16 | <systwi_> | Yes. ^ |
| 18:52:25 | <@JAA> | $identifier.cdx.gz is the CDX for the entire item, .warc.os.cdx.gz is the CDX for an individual WARC. |
| 18:53:25 | <@JAA> | .megawarc.json.gz contains information on the individual files that were merged into the megawarc. .megawarc.warc.{gz,zst} is the megawarc (duh). .megawarc.tar would contain any broken WARCs that couldn't be merged (and should normally be an empty file). |
| 18:54:11 | <systwi_> | Ohh, yeah, yeah, I remember megawarc from earlier wrt 0 byte pomf.se megawarc tarballs. |
| 18:54:23 | <@JAA> | ${identifier}_meta.{sqlite,xml} is item metadata (title, description, upload date, etc.). ${identifier}_files.xml lists all files in the item (with checksums). |
| 18:55:08 | <@JAA> | The files we actually upload are only .megawarc.warc.{gz,zst}, .megawarc.tar, and .megawarc.json.gz. The other files get generated by IA tasks. |
| 18:57:20 | <systwi_> | Yeah, that makes sense. I recall using the ${identifier}_files.xml for verifying file hashes; very handy, especially for massive files. |
| 18:57:57 | <systwi_> | and seeing ${identifier}_meta.sqlite/${identifier}_meta.xml having IA item metadata. :-) |
| 18:58:59 | <systwi_> | Thanks for the info. So, the CDX files are indexes of their respective WARCs, right? |
| 19:02:47 | <@OrIdow6> | Their respective collections of records, since it may not be a single file for some |
| 19:02:52 | <@OrIdow6> | https://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2006/ |
| 19:02:56 | <@OrIdow6> | Is the spec |
| 19:03:35 | <systwi_> | Thank you, checking... |
| 19:05:15 | <@JAA> | (Only response records) |
| 19:14:19 | <systwi_> | Understood, thank you both. |
| 19:28:09 | | Minkafighter quits [Quit: The Lounge - https://thelounge.chat] |
| 19:28:33 | | Minkafighter joins |
| 19:40:36 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 19:42:27 | | cajually quits [Read error: Connection reset by peer] |
| 19:42:38 | | cajually joins |
| 19:43:35 | | Minkafighter quits [Client Quit] |
| 19:44:12 | | Minkafighter joins |
| 20:29:23 | | Stiletto joins |
| 21:22:48 | | Stiletto quits [Remote host closed the connection] |
| 21:24:20 | | HP_Archivist quits [Client Quit] |
| 21:30:07 | | mikael joins |
| 21:34:25 | | Stiletto joins |
| 22:09:38 | | hexa- quits [Quit: WeeChat 3.5] |
| 22:10:43 | | hexa- (hexa-) joins |
| 22:12:00 | | lennier1 quits [Client Quit] |
| 22:13:29 | | lennier1 (lennier1) joins |
| 22:21:46 | | sec^nd quits [Ping timeout: 240 seconds] |
| 22:23:06 | | sec^nd (second) joins |
| 22:23:27 | | qwertyasdfuiopghjkl joins |
| 22:43:25 | | HackMii quits [Remote host closed the connection] |
| 22:43:54 | | HackMii (hacktheplanet) joins |
| 23:02:32 | | sec^nd quits [Remote host closed the connection] |
| 23:02:53 | | sec^nd (second) joins |
| 23:15:43 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 23:34:31 | | Arcorann (Arcorann) joins |
| 23:35:25 | | BlueMaxima joins |