| 01:10:46 | | qw3rty__ quits [Read error: Connection reset by peer] |
| 02:42:55 | <@hook54321> | it's the coke URLs that return status 500 for some reason |
| 02:43:04 | | Ryz quits [Remote host closed the connection] |
| 02:43:57 | <@hook54321> | not sure whether to mark it as unavailable or no redirect |
| 02:44:15 | | Ryz (Ryz) joins |
| 14:15:30 | | atphoenix_ (atphoenix) joins |
| 14:17:41 | | atphoenix quits [Ping timeout: 258 seconds] |
| 15:18:05 | | kiska quits [Remote host closed the connection] |
| 15:18:05 | | flashfire42 quits [Remote host closed the connection] |
| 16:40:59 | | acm joins |
| 17:25:12 | | acm quits [Remote host closed the connection] |
| 17:53:24 | | kiskaWeebChat quits [Ping timeout: 250 seconds] |
| 18:55:02 | | Daloader_ joins |
| 20:22:54 | | Daloader_ quits [Ping timeout: 250 seconds] |
| 20:27:24 | | jtagcat quits [Quit: Bye!] |
| 20:53:15 | | jtagcat (jtagcat) joins |
| 21:03:03 | <aarchi> | On the topic of today's discussion in #noanswers about archiving responses as-is, without processing: hook54321 mentioned to me that saving WARCs of URLTeam data rather than just mappings had been considered before I joined. |
| 21:04:44 | <aarchi> | I think that would be very beneficial for data integrity. For example, when it was discovered that go-hawaii-edu had issues, the responses could be re-parsed to see which shortcodes need to be redone. |
| 21:05:57 | <aarchi> | Plus, the WARCs could be ingested by IA, so people not aware of URLTeam (or my URLHero link resolver, once I get that up), can still resolve dead redirects. |
| 21:08:35 | <aarchi> | The only downsides I can see are the increased storage size and it being a new format. Archive Team is already familiar with large storage requirements, so that should be old hat. If we kept releasing the vertical pipe-separated (|) mappings in addition to the WARCs, then clients that only want that data don't need to change their parsers. |
| 21:09:45 | | flashfire42 (flashfire42) joins |
| 21:10:47 | | kiska (kiska) joins |
| 21:21:04 | <@JAA> | I believe the desire for WARC on this project is as old as the project itself, basically. |
| 21:21:27 | <@JAA> | But it'd need a bunch of dev work on tracker and client. |
| 21:29:04 | <@hook54321> | aarchi: it probably won't happen anytime soon, it's been over 5 years that it's been talked about from what I know. |
| 21:30:46 | <@hook54321> | there was some talk at one point about potentially converting existing ones to WARC, but that's probably a bad idea, and not everything needed to do that is saved. |
| 21:31:32 | <@JAA> | s/probably/definitely/ |
| 21:31:43 | <aarchi> | Yeah don’t convert old ones |
| 21:31:54 | <@JAA> | Unless 'converting' means retrieving the same URLs again as WARCs. |
| 21:32:55 | <@hook54321> | "<luckcolor> we must fake warc records for the dead ones" |
| 21:33:56 | <aarchi> | That would obfuscate any earlier processing errors in the client or tracker |
| 21:34:42 | <aarchi> | That wouldn’t provide the benefit I proposed, in the case of Hawaiʻi |
| 21:34:52 | <@JAA> | Yeah, hell no. |