00:05:16Arcorann (Arcorann) joins
00:36:46march_happy quits [Read error: Connection reset by peer]
00:37:31march_happy (march_happy) joins
01:03:03TheTechRobo quits [Read error: Connection reset by peer]
01:03:26TheTechRobo joins
01:04:37dudebloke joins
01:05:48dudebloke quits [Remote host closed the connection]
01:07:26Lord_Nightmare quits [Quit: ZNC - http://znc.in]
01:12:54Lord_Nightmare (Lord_Nightmare) joins
01:15:08qwertyasdfuiopghjkl joins
02:04:15pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
02:07:24pabs (pabs) joins
02:15:08HP_Archivist quits [Client Quit]
03:01:26BlueMaxima joins
03:30:28HackMii quits [Remote host closed the connection]
03:36:30HackMii (hacktheplanet) joins
04:09:00HackMii quits [Remote host closed the connection]
04:10:02HackMii (hacktheplanet) joins
06:11:05march_happy quits [Read error: Connection reset by peer]
06:12:05march_happy (march_happy) joins
06:33:06BlueMaxima quits [Read error: Connection reset by peer]
06:39:46march_happy quits [Ping timeout: 240 seconds]
07:56:16Iki quits [Ping timeout: 240 seconds]
08:01:40geezabiscuit quits [Ping timeout: 240 seconds]
08:10:37knecht420 quits [Client Quit]
08:11:06knecht420 (knecht420) joins
08:15:02geezabiscuit (geezabiscuit) joins
08:15:23adia (adia) joins
08:33:36gazorpazorp quits [Remote host closed the connection]
08:33:45gazorpazorp (gazorpazorp) joins
09:06:03NIC007a83 quits [Read error: Connection reset by peer]
10:23:41mutantmonkey quits [Remote host closed the connection]
10:24:00mutantmonkey (mutantmonkey) joins
10:57:40pie_ quits [Ping timeout: 240 seconds]
11:04:34pie_ joins
11:37:26Iki joins
11:38:04Iki1 joins
11:38:59HackMii quits [Write error: Broken pipe]
11:39:40HackMii (hacktheplanet) joins
11:41:46Iki quits [Ping timeout: 240 seconds]
11:43:08pie_ quits [Client Quit]
11:43:25pie_ joins
11:55:19drexler_ joins
11:56:38drexler quits [Ping timeout: 265 seconds]
12:18:24pie_ quits [Client Quit]
12:18:53pie_ joins
12:49:58HackMii quits [Remote host closed the connection]
12:50:27HackMii (hacktheplanet) joins
13:24:28katocala quits [Ping timeout: 240 seconds]
13:42:48HP_Archivist (HP_Archivist) joins
13:49:55HackMii quits [Remote host closed the connection]
13:50:30HackMii (hacktheplanet) joins
14:28:17Nulo quits [Ping timeout: 265 seconds]
14:30:29Nulo joins
14:45:46Arcorann quits [Ping timeout: 240 seconds]
15:52:39drexler_ is now known as drexler
15:54:41tech_exorcist (tech_exorcist) joins
16:22:10tech_exorcist quits [Remote host closed the connection]
16:22:52tech_exorcist (tech_exorcist) joins
16:28:42Atom-- joins
16:50:04geezabiscuit quits [Ping timeout: 240 seconds]
16:56:04wyatt8740 quits [Ping timeout: 240 seconds]
16:58:04geezabiscuit (geezabiscuit) joins
17:06:04Stilett0 quits [Ping timeout: 240 seconds]
17:11:16tech_exorcist quits [Ping timeout: 240 seconds]
18:14:15Chris5010 quits [Quit: ]
18:49:25<systwi_>JAA: Sorry for the misplacement, there.
18:49:41<systwi_>I assume CDX is some sort of index, IIRC.
18:49:55wyatt8740 joins
18:50:49<@JAA>For context, it's about tech_exorcist's question in -ot the other day:
18:50:49<@JAA>< tech_exorcist> in AT's items on archive.org, what is the difference between <item name>.cdx.gz, <item name>.cdx.idx, .megawarc.json.gz, and .megawarc.os.cdx.gz, as in https://archive.org/download/archiveteam_scratch_20220620234008_608dec9d? apologies for my stupidity
18:50:53<@JAA>< tech_exorcist> and what info do the _meta.{sqlite,xml} files contain?
18:51:16<systwi_>Yes. ^
18:52:25<@JAA>$identifier.cdx.gz is the CDX for the entire item, .warc.os.cdx.gz is the CDX for an individual WARC.
18:53:25<@JAA>.megawarc.json.gz contains information on the individual files that were merged into the megawarc. .megawarc.warc.{gz,zst} is the megawarc (duh). .megawarc.tar would contain any broken WARCs that couldn't be merged (and should normally be an empty file).
18:54:11<systwi_>Ohh, yeah, yeah, I remember megawarc from earlier wrt 0 byte pomf.se megawarc tarballs.
18:54:23<@JAA>${identifier}_meta.{sqlite,xml} is item metadata (title, description, upload date, etc.). ${identifier}_files.xml lists all files in the item (with checksums).
18:55:08<@JAA>The files we actually upload are only .megawarc.warc.{gz,zst}, .megawarc.tar, and .megawarc.json.gz. The other files get generated by IA tasks.
18:57:20<systwi_>Yeah, that makes sense. I recall using the ${identifier}_files.xml for verifying file hashes; very handy, especially for massive files.
18:57:57<systwi_>and seeing ${identifier}_meta.sqlite/${identifier}_meta.xml having IA item metadata. :-)
18:58:59<systwi_>Thanks for the info. So, the CDX files are indexes of their respective WARCs, right?
19:02:47<@OrIdow6>Their respective collections of records, since it may not be a single file for some
19:02:52<@OrIdow6>https://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2006/
19:02:56<@OrIdow6>Is the spec
19:03:35<systwi_>Thank you, checking...
19:05:15<@JAA>(Only response records)
19:14:19<systwi_>Understood, thank you both.
19:28:09Minkafighter quits [Quit: The Lounge - https://thelounge.chat]
19:28:33Minkafighter joins
19:40:36qwertyasdfuiopghjkl quits [Client Quit]
19:42:27cajually quits [Read error: Connection reset by peer]
19:42:38cajually joins
19:43:35Minkafighter quits [Client Quit]
19:44:12Minkafighter joins
20:29:23Stiletto joins
21:22:48Stiletto quits [Remote host closed the connection]
21:24:20HP_Archivist quits [Client Quit]
21:30:07mikael joins
21:34:25Stiletto joins
22:09:38hexa- quits [Quit: WeeChat 3.5]
22:10:43hexa- (hexa-) joins
22:12:00lennier1 quits [Client Quit]
22:13:29lennier1 (lennier1) joins
22:21:46sec^nd quits [Ping timeout: 240 seconds]
22:23:06sec^nd (second) joins
22:23:27qwertyasdfuiopghjkl joins
22:43:25HackMii quits [Remote host closed the connection]
22:43:54HackMii (hacktheplanet) joins
23:02:32sec^nd quits [Remote host closed the connection]
23:02:53sec^nd (second) joins
23:15:43qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
23:34:31Arcorann (Arcorann) joins
23:35:25BlueMaxima joins