00:00:43ericgallager quits [Quit: This computer has gone to sleep]
00:04:29ericgallager joins
00:33:18hamouda quits [Quit: Ooops, wrong browser tab.]
00:34:51<pabs>arkiver: https://dustri.org/b/trivial-anti-crawler-with-caddy.html
00:35:56<pokechu22>Yeah, I've seen stuff like that occasionally
00:36:59h|ca2 quits [Ping timeout: 268 seconds]
00:53:17<yzqzss>https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1-annotated/#profile-server-not-modified
00:53:17<yzqzss>hi guys, I have a question about the WARC revist record: Can I use `server-not-modified` profile for revisit record across multiple urls (WARC-Target-URI) ?
01:03:12h|ca2 (h) joins
01:03:14<@AlsoJAA>yzqzss: I'm struggling to think of a way for that to happen. Do you have an example? I believe servers should only respond with 304 on conditional requests against the same URI.
01:03:19<yzqzss>Here is the case: I perform a GET request for a.com/1.txt and got a response record that includes an etag response header. Knowing that b.com/mirror/1.txt is a copy of that file,then, I send a GET request to b.com/mirror/1.txt with If-None-Match: $etag header and—as expected—received a "304 Not Modified" response. In this scenario, can I replace the response from b.com with a "revisit" record with "server-not-modified" profile
01:03:19<yzqzss>that uses warc-refers-to to point to the response from a.com ?
01:04:34<@AlsoJAA>Hmm, that seems like an abuse of the ETag to me.
01:04:36<klea>No, you'd be faking a response that never ocurred.
01:04:50<@AlsoJAA>klea: Incorrect.
01:04:53<klea>Hmm?
01:05:24<klea>AFAIK If you do a revisit record you have to query the original resource, independently of it returning the same content.
01:05:42kirb joins
01:05:45<@AlsoJAA>There are different types of revisit records, and yzqzss is specifically asking about the server-not-modified profile.
01:06:22<klea>Ah.
01:06:45<klea>The ETag is only guaranteed to be the same if the server software is the same.
01:06:55<klea>Which considering mirrors, may not be the case.
01:07:23<@AlsoJAA>Yeah. An ETag is an opaque identifier that is only supposed to be used for conditional requests against the same URI ('same resource' is the wording in the HTTP specs).
01:07:52kirb quits [Client Quit]
01:11:32<@AlsoJAA>So since this kind of usage isn't HTTP-conformant, I guess it's never been considered with the revisit profile either.
01:20:25Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…]
01:20:32<steering>I would say that you are indeed faking a response that never occurred, but the point at which you're faking it is `If-None-Match: $etag` :P
01:21:17<yzqzss>lol
01:21:52<steering>" An entity tag is an opaque validator for differentiating between multiple representations of the same resource, regardless of whether those multiple representations are due to resource state changes over time, content negotiation resulting in multiple representations being valid at the same time, or both." (rfc 9110)
01:22:04<@AlsoJAA>Yeah, but it lands you in front of the HTTP Crime Court, not the WARC Crime Court. :-P
01:22:26<steering>key words being "of the same resource"; the server could legally be giving you a response for some other path since you gave it an etag :P
01:23:50<@AlsoJAA>Section 3.1 basically defines a 'resource' as identified by a URI. It carves out a 'not limited to', but that's what is being used in practice.
01:24:29<@AlsoJAA>The server response is legal, sending the value from another URI in the header is not.
01:24:34<klea>AlsoJAA: But if you don't follow HTTP you can't/shouldn't/mustn't be making WARCs?
01:24:49<steering>AlsoJAA: yeah that's what i mean, its undefined behavior at that point
01:25:35<@AlsoJAA>Agreed
01:25:36gatagoto (gatagoto) joins
01:25:53<yzqzss>In my use case, the servers all run the same software. anyway, I’m happy to discuss different scenarios, this is a interesting topic xD
01:25:56<@AlsoJAA>yzqzss: My recommendation would be to stick to defined behaviour and therefore the identical-payload-digest revisits in that scenario.
01:26:20<@AlsoJAA>Having to download everything multiple times is annoying but not really avoidable in that case.
01:27:58<@AlsoJAA>arkiver: ^ Curious what you think about this.
01:29:03<klea>TODO: Make my website have multi terabyte files for archivists to rechecksum time and time again for the hundred or so mirror urls i'd expose.
01:32:42thewinwin85 joins
01:35:19<steering>archivists: just don't mirror any of them
01:35:28<steering>:P
01:36:26Island quits [Read error: Connection reset by peer]
01:36:41<klea>hmm, yeah, I should shove useful content into some of those files at random offsets, where you have to start doing fun things the file formats wiki may like to know about.
01:36:48thewinwin8 quits [Ping timeout: 268 seconds]
01:36:50thewinwin85 is now known as thewinwin8
01:36:58<steering>doubt it :P
01:37:18<steering>but at that point thats just a disk image :P
01:53:06missaustraliana joins
01:53:06@Fusl quits [Quit: K-Lined]
01:53:16Fusl (Fusl) joins
01:53:16@ChanServ sets mode: +o Fusl
01:54:05JAA (JAA) joins
01:54:05@ChanServ sets mode: +o JAA
02:00:41kirb joins
02:39:13nexussfan quits [Read error: Connection reset by peer]
02:39:28nexussfan (nexussfan) joins
02:39:47nexussfan quits [Client Quit]
02:47:17thewinwin81 joins
02:48:20thewinwin8 quits [Ping timeout: 268 seconds]
02:48:23thewinwin81 is now known as thewinwin8
02:54:04Webuser827187 quits [Quit: Ooops, wrong browser tab.]
03:07:30etnguyen03 quits [Remote host closed the connection]
03:44:06HP_Archivist (HP_Archivist) joins
03:57:39Pendonym quits [Quit: interwebs prob went out]
04:03:03eythian quits [Quit: http://quassel-irc.org - Chat comfortabel. Waar dan ook.]
04:04:35eythian joins
04:05:29Guest58 joins
04:06:55NatTheCat3 (NatTheCat) joins
04:09:07NatTheCat quits [Ping timeout: 268 seconds]
04:09:07NatTheCat3 is now known as NatTheCat
04:16:24Pendonym (Pendonym) joins
04:39:17nine quits [Quit: See ya!]
04:39:30nine joins
04:41:23DogsRNice_ quits [Read error: Connection reset by peer]
05:17:32thewinwin83 joins
05:21:16thewinwin8 quits [Ping timeout: 268 seconds]
05:21:18thewinwin83 is now known as thewinwin8
05:39:09gatagoto quits [Ping timeout: 268 seconds]
05:40:51gatagoto (gatagoto) joins
05:48:32unknownsrc quits [Quit: Ping timeout (120 seconds)]
05:48:45unknownsrc (unknownsrc) joins
06:00:42rohvani quits [Quit: The Lounge - https://thelounge.chat]
06:05:30rohvani joins
06:31:28nine quits [Client Quit]
06:31:42nine joins
06:46:55Pendonym quits [Client Quit]
06:47:07chrismeller3790 (chrismeller) joins
06:47:36chrismeller379 quits [Ping timeout: 268 seconds]
06:47:37chrismeller3790 is now known as chrismeller379
07:09:37Pendonym (Pendonym) joins
08:31:11amphitryon_ joins
08:32:07flugga joins
08:34:10nathang218438 quits [Read error: Connection reset by peer]
08:34:54amphitryon quits [Ping timeout: 268 seconds]
08:42:59nathang218438 joins
08:43:34gatagoto quits [Quit: leaving]
08:50:57Paw-chivist joins
08:57:53camrod63629 quits [Quit: no-clipped reality]
09:00:25camrod63629 (camrod) joins
09:01:04traxys5 (traxys) joins
09:03:53traxys quits [Ping timeout: 268 seconds]
09:04:50traxys (traxys) joins
09:07:35traxys5 quits [Ping timeout: 268 seconds]
09:14:48traxys5 (traxys) joins
09:15:46triplecamera|m is now known as triplecamera
09:16:10triplecamera is now known as triplecamera|m
09:16:25hyperreal8 (hyperreal) joins
09:17:27traxys quits [Ping timeout: 268 seconds]
09:17:42chrismeller3791 (chrismeller) joins
09:18:04hyperreal quits [Ping timeout: 268 seconds]
09:18:04hyperreal8 is now known as hyperreal
09:18:42skankhunt424 (skankhunt42) joins
09:19:01traxys (traxys) joins
09:19:11chrismeller379 quits [Ping timeout: 248 seconds]
09:19:14chrismeller3791 is now known as chrismeller379
09:21:19traxys5 quits [Ping timeout: 248 seconds]
09:21:46skankhunt42 quits [Ping timeout: 268 seconds]
09:21:46skankhunt424 is now known as skankhunt42
09:23:09traxys2 (traxys) joins
09:24:14Paw-chivist quits [Client Quit]
09:24:31traxys quits [Ping timeout: 248 seconds]
09:24:32traxys2 is now known as traxys
09:25:51triplecamera|m quits [Quit: Reconnecting]
09:25:57triplecamera|m (triplecamera|m) joins
10:35:12thewinwin81 joins
10:37:00thewinwin8 quits [Ping timeout: 268 seconds]
10:37:04thewinwin81 is now known as thewinwin8
10:56:35VerifiedJ quits [Quit: The Lounge - https://thelounge.chat]
10:57:17VerifiedJ (VerifiedJ) joins
11:00:18Bleo18260072271962345522201107 quits [Quit: The Lounge - https://thelounge.chat]
11:03:00Bleo18260072271962345522201107 joins
11:07:27unknownsrc quits [Ping timeout: 248 seconds]
11:37:34<@arkiver>generally agreed with JAA AlsoJAA there, i want to look a bit closer at the described situation though
11:50:40<h2ibot>Qazwsxplm edited 2004 (-6, /* Events */): https://wiki.archiveteam.org/?diff=62809&oldid=54599
11:51:44<h2ibot>Qazwsxplm edited AOL Pictures (+0): https://wiki.archiveteam.org/?diff=62810&oldid=58435
11:51:45<h2ibot>Qazwsxplm edited Dopplr (+9): https://wiki.archiveteam.org/?diff=62811&oldid=61862
11:51:46<h2ibot>Qazwsxplm edited Yahoo! Briefcase (+48): https://wiki.archiveteam.org/?diff=62812&oldid=59140
11:51:47<h2ibot>Qazwsxplm edited Rec Room (+26): https://wiki.archiveteam.org/?diff=62813&oldid=62804
11:52:44<h2ibot>Thewinwin edited Deathwatch (+84, DataLounge Closing): https://wiki.archiveteam.org/?diff=62814&oldid=62714
11:52:45<h2ibot>Pendonym edited Discord (+128, /* Archival services */): https://wiki.archiveteam.org/?diff=62815&oldid=62680
12:02:08<cruller>arkiver: https://github.com/microlinkhq/is-antibot
12:03:07<cruller>IIRC, Zeno or/and Brozzler also have some antibot detectors.
12:05:48oxtyped quits [Ping timeout: 268 seconds]
12:24:59skankhunt42 quits [Quit: Ping timeout (120 seconds)]
12:25:19skankhunt42 (skankhunt42) joins
12:34:10<h2ibot>User edited ArchiveBot/Cuba (+20): https://wiki.archiveteam.org/?diff=62816&oldid=62285
12:34:11<h2ibot>User edited ArchiveBot/Cuba (-1): https://wiki.archiveteam.org/?diff=62817&oldid=62816
12:34:12<h2ibot>User created Category:Cuba (+0, Created blank page): https://wiki.archiveteam.org/?oldid=62818
12:36:10<h2ibot>User edited ArchiveBot/Cuba (+24): https://wiki.archiveteam.org/?diff=62819&oldid=62817
12:38:11<h2ibot>User edited ArchiveBot/Cuba (+1396, BOT - Updating page: {{saved}} (10),…): https://wiki.archiveteam.org/?diff=62820&oldid=62819
12:43:11<h2ibot>User edited Template:ArchiveBot (-109, Don't understand the problem of a navbox that…): https://wiki.archiveteam.org/?diff=62821&oldid=62470
12:43:12<h2ibot>User edited ArchiveBot/Cuba (+0): https://wiki.archiveteam.org/?diff=62822&oldid=62820
12:43:34nine quits [Quit: See ya!]
12:43:47nine joins
12:45:41<klea>Paw-chivist: The problem with a navbox that adds links that is included in a lot of pages is every single one of those pages now links to every other page the navbox links, which makes Special:WhatLinksHere less useful.
12:49:01<justauser>https://datalounge.com/ is shutting down 2026-07-31, without becoming readonly.
12:49:02<justauser>As far as I can tell, it's not trivially spiderable.
12:49:02<justauser>Also, a reminder about Blice.
12:50:21<klea>And re the AB category, yeah, we could include it on the pages themselves instead.
12:56:29oxtyped joins
13:00:51kansei- quits [Quit: ZNC 1.10.1 - https://znc.in]
13:01:16kansei (kansei) joins
13:20:54that_lurker2 joins
13:28:06<h2ibot>Cruller edited Deathwatch (+2, /* 2026-07 */ Corrected copy-and-paste errors): https://wiki.archiveteam.org/?diff=62823&oldid=62814
13:28:28<cruller>Also, a reminder about OPENREC.tv
13:30:06<h2ibot>User edited Deathwatch (+432, /* 2026-08 */): https://wiki.archiveteam.org/?diff=62824&oldid=62823
13:30:39Paw-chivist joins
13:31:44<Paw-chivist>to klea : I solved the problem with the script, it was the category added by navbox, please don't revert that
13:34:40Paw-chivist quits [Read error: Connection reset by peer]
13:50:34<h2ibot>Cruller edited Deathwatch (-36, /* 2026-07 */): https://wiki.archiveteam.org/?diff=62825&oldid=62824
14:21:48unknownsrc (unknownsrc) joins
14:31:31dxrt quits [Quit: ZNC - http://znc.sourceforge.net]
14:31:57dxrt joins
14:44:35@imer quits [Quit: Oh no]
14:45:31flugga quits [Quit: euphrates closing]
14:47:09imer (imer) joins
14:47:09@ChanServ sets mode: +o imer
14:48:22DogsRNice joins
15:02:20Island joins
15:06:55@JAA quits [Ping timeout: 248 seconds]
15:08:08JAA (JAA) joins
15:08:08@ChanServ sets mode: +o JAA
15:11:07moth3 quits [Remote host closed the connection]
15:12:38moth3 joins
15:28:30nexussfan (nexussfan) joins
15:43:04Mateon2 joins
15:45:20Mateon1 quits [Ping timeout: 268 seconds]
15:45:20Mateon2 is now known as Mateon1
16:00:15Webuser824872 joins
16:08:45<c3manu>cruller: by accident i found the page https://blog.altairjp.co.jp/ still active, while all other ones are either dead or redirecting to Siemens. it's running in #archivebot now
16:09:06<c3manu>i then checked and also found https://store.altair.co.kr/, but that seems pretty small. but still :)
16:12:17<cruller>c3manu++
16:12:18<eggdrop>[karma] 'c3manu' now has 159 karma!
16:27:31grill_ (grill) joins
16:27:53grill quits [Ping timeout: 268 seconds]
17:02:15nimaje quits [Read error: Connection reset by peer]
17:36:24nimaje joins
17:46:17sg-72 quits [Quit: Leaving]
17:56:41<nicolas17>$ rclone size b2:samsung-oss-n17
17:56:43<nicolas17>Total objects: 12 (12)
17:56:44<nicolas17>Total size: 5.298 GiB (5688953078 Byte)
18:09:19moth3 quits [Ping timeout: 248 seconds]
18:27:14nimaje quits [Client Quit]
18:38:06nimaje joins
18:43:44skankhunt42 quits [Quit: Ping timeout (120 seconds)]
18:44:01skankhunt42 (skankhunt42) joins
18:59:53moth3 joins
19:03:50Wohlstand (Wohlstand) joins
19:24:08<h2ibot>Usernam edited List of websites excluded from the Wayback Machine (+23, SingleFile capture of…): https://wiki.archiveteam.org/?diff=62826&oldid=62688
19:35:43nimaje quits [Ping timeout: 248 seconds]
19:36:26nimaje joins
19:47:43nine quits [Quit: See ya!]
19:47:57nine joins
19:48:04arch quits [Remote host closed the connection]
19:48:21arch (arch) joins
19:57:17Doranwen quits [Remote host closed the connection]
19:57:39Doranwen (Doranwen) joins
20:16:44thewinwin86 joins
20:20:59thewinwin8 quits [Ping timeout: 268 seconds]
20:21:07thewinwin86 is now known as thewinwin8
20:23:43nimaje quits [Ping timeout: 248 seconds]
20:24:43nimaje joins
20:29:06grill_ is now known as grill
20:33:19nimaje quits [Ping timeout: 268 seconds]
20:34:41nimaje joins