00:00:52IDK quits [Quit: Connection closed for inactivity]
01:09:04luckcolors (luckcolors) joins
01:13:48luckcolors quits [Client Quit]
01:14:06luckcolors (luckcolors) joins
02:29:32sonick quits [Client Quit]
03:00:07AlsoTheTechRobo is now known as TheTechRobo
06:20:30Arcorann_ joins
06:48:12Arcorann_ quits [Read error: Connection reset by peer]
06:54:23Arcorann_ joins
07:36:35<@JAA>3 years and 4 months after my Picosong archive, the WARC/1.1 revisit record bug is still unfixed. :-(
07:43:31<@JAA>arkiver: Do I need to personally come to San Francisco to write the (probably) two lines of code that fix this?
08:05:23<Nemo_bis>JAA: does that actually work? does the office internet get you access to the internal repositories or something? :) or can one submit patches in paper?
08:07:11<@JAA>Nemo_bis: I was thinking of bringing a wrench. ;-)
08:25:25nothere quits [Quit: Leaving]
08:25:58systwi_ joins
08:27:40nothere joins
08:28:08systwi_ quits [Client Quit]
08:30:46nothere_ joins
08:32:28nothere quits [Ping timeout: 252 seconds]
08:41:10tzt quits [Read error: Connection reset by peer]
08:42:12tzt (tzt) joins
09:15:48<Nemo_bis>digital preservation pentesting
12:21:02<@arkiver>JAA: can you remind me of the details of this bug?
12:21:17<@arkiver>most of IA is remote anyway nowadays :P
13:16:38Arcorann_ quits [Ping timeout: 252 seconds]
13:46:18qw3rty__ joins
13:49:38qw3rty_ quits [Ping timeout: 252 seconds]
16:20:16balrog quits [Client Quit]
16:28:34balrog (balrog) joins
17:21:05qw3rty__ quits [Read error: Connection reset by peer]
17:51:59<@JAA>arkiver: All revisit records from my qwarc archives are broken on playback. They get indexed and appear in the calendar, but on playback, it either redirects to another snapshot or shows the 'not archived yet' message. I believe that the underlying bug is not treating sub-second precision in WARC-Refers-To-Date. I tracked down similar code in OpenWayback at one point, which seems to match exactly in
17:52:05<@JAA>behaviour, although I don't know whether the WBM actually uses it.
17:52:35<@JAA>2022-02-01 18:03:01 UTC <@JAA> Smells like it: https://github.com/iipc/openwayback/blob/c7fac11479ee4447a71ee337f91018f7808757a2/wayback-core/src/main/java/org/archive/wayback/resourcestore/resourcefile/WarcResource.java#L180-L190 directly invokes parse14DigitISODate from org.archive.utils.ArchiveUtils. Not sure where to find that, but the name suggests that it wouldn't handle the trailing
17:52:41<@JAA>fractional seconds.
18:03:44<Jake>Could you send an example URL for that?
18:07:04<@JAA>Works: https://web.archive.org/web/20191005171531/https://picosong.com/static/widget/index.html?file=/cdn/7cb193ae81d13d79f6e108480e116a1b.mp3&autoplay=true
18:07:22<@JAA>Breaks: https://web.archive.org/web/20191005171531/https://picosong.com/static/widget/index.html?file=/cdn/2c627c968c5071f927c4938d2c9e38c0.mp3&autoplay=true
18:07:37<@JAA>The latter should be a revisit record to the former. Both come from https://archive.org/download/picosong.com_201910_part0/picosong-site-00000.warc.gz
18:08:55<Jake>Thanks!
18:10:42<@JAA>Plenty more examples in there of course, but the shortest URLs are all broken due to an unrelated issue. I retrieved https://picosong.com/00 which redirected to https://picosong.com/00/ which redirected to the homepage. But since the WBM ignores that trailing slash, you can't access either. The latter is a revisit record of the former in the WARC because the body is empty. (qwarc no longer does this
18:10:48<@JAA>for empty/short responses because it actually increases the WARC size.)
18:43:56michaelblob quits [Read error: Connection reset by peer]
18:52:40michaelblob (michaelblob) joins
19:40:23qw3rty joins