| 00:00:43 | | ericgallager quits [Quit: This computer has gone to sleep] |
| 00:04:29 | | ericgallager joins |
| 00:33:18 | | hamouda quits [Quit: Ooops, wrong browser tab.] |
| 00:34:51 | <pabs> | arkiver: https://dustri.org/b/trivial-anti-crawler-with-caddy.html |
| 00:35:56 | <pokechu22> | Yeah, I've seen stuff like that occasionally |
| 00:36:59 | | h|ca2 quits [Ping timeout: 268 seconds] |
| 00:53:17 | <yzqzss> | https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1-annotated/#profile-server-not-modified |
| 00:53:17 | <yzqzss> | hi guys, I have a question about the WARC revist record: Can I use `server-not-modified` profile for revisit record across multiple urls (WARC-Target-URI) ? |
| 01:03:12 | | h|ca2 (h) joins |
| 01:03:14 | <@AlsoJAA> | yzqzss: I'm struggling to think of a way for that to happen. Do you have an example? I believe servers should only respond with 304 on conditional requests against the same URI. |
| 01:03:19 | <yzqzss> | Here is the case: I perform a GET request for a.com/1.txt and got a response record that includes an etag response header. Knowing that b.com/mirror/1.txt is a copy of that file,then, I send a GET request to b.com/mirror/1.txt with If-None-Match: $etag header and—as expected—received a "304 Not Modified" response. In this scenario, can I replace the response from b.com with a "revisit" record with "server-not-modified" profile |
| 01:03:19 | <yzqzss> | that uses warc-refers-to to point to the response from a.com ? |
| 01:04:34 | <@AlsoJAA> | Hmm, that seems like an abuse of the ETag to me. |
| 01:04:36 | <klea> | No, you'd be faking a response that never ocurred. |
| 01:04:50 | <@AlsoJAA> | klea: Incorrect. |
| 01:04:53 | <klea> | Hmm? |
| 01:05:24 | <klea> | AFAIK If you do a revisit record you have to query the original resource, independently of it returning the same content. |
| 01:05:42 | | kirb joins |
| 01:05:45 | <@AlsoJAA> | There are different types of revisit records, and yzqzss is specifically asking about the server-not-modified profile. |
| 01:06:22 | <klea> | Ah. |
| 01:06:45 | <klea> | The ETag is only guaranteed to be the same if the server software is the same. |
| 01:06:55 | <klea> | Which considering mirrors, may not be the case. |
| 01:07:23 | <@AlsoJAA> | Yeah. An ETag is an opaque identifier that is only supposed to be used for conditional requests against the same URI ('same resource' is the wording in the HTTP specs). |
| 01:07:52 | | kirb quits [Client Quit] |
| 01:11:32 | <@AlsoJAA> | So since this kind of usage isn't HTTP-conformant, I guess it's never been considered with the revisit profile either. |
| 01:20:25 | | Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…] |
| 01:20:32 | <steering> | I would say that you are indeed faking a response that never occurred, but the point at which you're faking it is `If-None-Match: $etag` :P |
| 01:21:17 | <yzqzss> | lol |
| 01:21:52 | <steering> | " An entity tag is an opaque validator for differentiating between multiple representations of the same resource, regardless of whether those multiple representations are due to resource state changes over time, content negotiation resulting in multiple representations being valid at the same time, or both." (rfc 9110) |
| 01:22:04 | <@AlsoJAA> | Yeah, but it lands you in front of the HTTP Crime Court, not the WARC Crime Court. :-P |
| 01:22:26 | <steering> | key words being "of the same resource"; the server could legally be giving you a response for some other path since you gave it an etag :P |
| 01:23:50 | <@AlsoJAA> | Section 3.1 basically defines a 'resource' as identified by a URI. It carves out a 'not limited to', but that's what is being used in practice. |
| 01:24:29 | <@AlsoJAA> | The server response is legal, sending the value from another URI in the header is not. |
| 01:24:34 | <klea> | AlsoJAA: But if you don't follow HTTP you can't/shouldn't/mustn't be making WARCs? |
| 01:24:49 | <steering> | AlsoJAA: yeah that's what i mean, its undefined behavior at that point |
| 01:25:35 | <@AlsoJAA> | Agreed |
| 01:25:36 | | gatagoto (gatagoto) joins |
| 01:25:53 | <yzqzss> | In my use case, the servers all run the same software. anyway, I’m happy to discuss different scenarios, this is a interesting topic xD |
| 01:25:56 | <@AlsoJAA> | yzqzss: My recommendation would be to stick to defined behaviour and therefore the identical-payload-digest revisits in that scenario. |
| 01:26:20 | <@AlsoJAA> | Having to download everything multiple times is annoying but not really avoidable in that case. |
| 01:27:58 | <@AlsoJAA> | arkiver: ^ Curious what you think about this. |
| 01:29:03 | <klea> | TODO: Make my website have multi terabyte files for archivists to rechecksum time and time again for the hundred or so mirror urls i'd expose. |
| 01:32:42 | | thewinwin85 joins |
| 01:35:19 | <steering> | archivists: just don't mirror any of them |
| 01:35:28 | <steering> | :P |
| 01:36:26 | | Island quits [Read error: Connection reset by peer] |
| 01:36:41 | <klea> | hmm, yeah, I should shove useful content into some of those files at random offsets, where you have to start doing fun things the file formats wiki may like to know about. |
| 01:36:48 | | thewinwin8 quits [Ping timeout: 268 seconds] |
| 01:36:50 | | thewinwin85 is now known as thewinwin8 |
| 01:36:58 | <steering> | doubt it :P |
| 01:37:18 | <steering> | but at that point thats just a disk image :P |
| 01:53:06 | | missaustraliana joins |
| 01:53:06 | | @Fusl quits [Quit: K-Lined] |
| 01:53:16 | | Fusl (Fusl) joins |
| 01:53:16 | | @ChanServ sets mode: +o Fusl |
| 01:54:05 | | JAA (JAA) joins |
| 01:54:05 | | @ChanServ sets mode: +o JAA |
| 02:00:41 | | kirb joins |
| 02:39:13 | | nexussfan quits [Read error: Connection reset by peer] |
| 02:39:28 | | nexussfan (nexussfan) joins |
| 02:39:47 | | nexussfan quits [Client Quit] |
| 02:47:17 | | thewinwin81 joins |
| 02:48:20 | | thewinwin8 quits [Ping timeout: 268 seconds] |
| 02:48:23 | | thewinwin81 is now known as thewinwin8 |
| 02:54:04 | | Webuser827187 quits [Quit: Ooops, wrong browser tab.] |
| 03:07:30 | | etnguyen03 quits [Remote host closed the connection] |
| 03:44:06 | | HP_Archivist (HP_Archivist) joins |
| 03:57:39 | | Pendonym quits [Quit: interwebs prob went out] |
| 04:03:03 | | eythian quits [Quit: http://quassel-irc.org - Chat comfortabel. Waar dan ook.] |
| 04:04:35 | | eythian joins |
| 04:05:29 | | Guest58 joins |
| 04:06:55 | | NatTheCat3 (NatTheCat) joins |
| 04:09:07 | | NatTheCat quits [Ping timeout: 268 seconds] |
| 04:09:07 | | NatTheCat3 is now known as NatTheCat |
| 04:16:24 | | Pendonym (Pendonym) joins |
| 04:39:17 | | nine quits [Quit: See ya!] |
| 04:39:30 | | nine joins |
| 04:41:23 | | DogsRNice_ quits [Read error: Connection reset by peer] |
| 05:17:32 | | thewinwin83 joins |
| 05:21:16 | | thewinwin8 quits [Ping timeout: 268 seconds] |
| 05:21:18 | | thewinwin83 is now known as thewinwin8 |
| 05:39:09 | | gatagoto quits [Ping timeout: 268 seconds] |
| 05:40:51 | | gatagoto (gatagoto) joins |
| 05:48:32 | | unknownsrc quits [Quit: Ping timeout (120 seconds)] |
| 05:48:45 | | unknownsrc (unknownsrc) joins |
| 06:00:42 | | rohvani quits [Quit: The Lounge - https://thelounge.chat] |
| 06:05:30 | | rohvani joins |
| 06:31:28 | | nine quits [Client Quit] |
| 06:31:42 | | nine joins |
| 06:46:55 | | Pendonym quits [Client Quit] |
| 06:47:07 | | chrismeller3790 (chrismeller) joins |
| 06:47:36 | | chrismeller379 quits [Ping timeout: 268 seconds] |
| 06:47:37 | | chrismeller3790 is now known as chrismeller379 |
| 07:09:37 | | Pendonym (Pendonym) joins |
| 08:31:11 | | amphitryon_ joins |
| 08:32:07 | | flugga joins |
| 08:34:10 | | nathang218438 quits [Read error: Connection reset by peer] |
| 08:34:54 | | amphitryon quits [Ping timeout: 268 seconds] |
| 08:42:59 | | nathang218438 joins |
| 08:43:34 | | gatagoto quits [Quit: leaving] |
| 08:50:57 | | Paw-chivist joins |
| 08:57:53 | | camrod63629 quits [Quit: no-clipped reality] |
| 09:00:25 | | camrod63629 (camrod) joins |
| 09:01:04 | | traxys5 (traxys) joins |
| 09:03:53 | | traxys quits [Ping timeout: 268 seconds] |
| 09:04:50 | | traxys (traxys) joins |
| 09:07:35 | | traxys5 quits [Ping timeout: 268 seconds] |
| 09:14:48 | | traxys5 (traxys) joins |
| 09:15:46 | | triplecamera|m is now known as triplecamera |
| 09:16:10 | | triplecamera is now known as triplecamera|m |
| 09:16:25 | | hyperreal8 (hyperreal) joins |
| 09:17:27 | | traxys quits [Ping timeout: 268 seconds] |
| 09:17:42 | | chrismeller3791 (chrismeller) joins |
| 09:18:04 | | hyperreal quits [Ping timeout: 268 seconds] |
| 09:18:04 | | hyperreal8 is now known as hyperreal |
| 09:18:42 | | skankhunt424 (skankhunt42) joins |
| 09:19:01 | | traxys (traxys) joins |
| 09:19:11 | | chrismeller379 quits [Ping timeout: 248 seconds] |
| 09:19:14 | | chrismeller3791 is now known as chrismeller379 |
| 09:21:19 | | traxys5 quits [Ping timeout: 248 seconds] |
| 09:21:46 | | skankhunt42 quits [Ping timeout: 268 seconds] |
| 09:21:46 | | skankhunt424 is now known as skankhunt42 |
| 09:23:09 | | traxys2 (traxys) joins |
| 09:24:14 | | Paw-chivist quits [Client Quit] |
| 09:24:31 | | traxys quits [Ping timeout: 248 seconds] |
| 09:24:32 | | traxys2 is now known as traxys |
| 09:24:59 | | triplecamera|m is now authenticated as triplecamera|m |
| 09:25:51 | | triplecamera|m quits [Quit: Reconnecting] |
| 09:25:57 | | triplecamera|m (triplecamera|m) joins |
| 10:35:12 | | thewinwin81 joins |
| 10:37:00 | | thewinwin8 quits [Ping timeout: 268 seconds] |
| 10:37:04 | | thewinwin81 is now known as thewinwin8 |
| 10:56:35 | | VerifiedJ quits [Quit: The Lounge - https://thelounge.chat] |
| 10:57:17 | | VerifiedJ (VerifiedJ) joins |
| 11:00:18 | | Bleo18260072271962345522201107 quits [Quit: The Lounge - https://thelounge.chat] |
| 11:03:00 | | Bleo18260072271962345522201107 joins |
| 11:07:27 | | unknownsrc quits [Ping timeout: 248 seconds] |
| 11:37:34 | <@arkiver> | generally agreed with JAA AlsoJAA there, i want to look a bit closer at the described situation though |
| 11:50:40 | <h2ibot> | Qazwsxplm edited 2004 (-6, /* Events */): https://wiki.archiveteam.org/?diff=62809&oldid=54599 |
| 11:51:44 | <h2ibot> | Qazwsxplm edited AOL Pictures (+0): https://wiki.archiveteam.org/?diff=62810&oldid=58435 |
| 11:51:45 | <h2ibot> | Qazwsxplm edited Dopplr (+9): https://wiki.archiveteam.org/?diff=62811&oldid=61862 |
| 11:51:46 | <h2ibot> | Qazwsxplm edited Yahoo! Briefcase (+48): https://wiki.archiveteam.org/?diff=62812&oldid=59140 |
| 11:51:47 | <h2ibot> | Qazwsxplm edited Rec Room (+26): https://wiki.archiveteam.org/?diff=62813&oldid=62804 |
| 11:52:44 | <h2ibot> | Thewinwin edited Deathwatch (+84, DataLounge Closing): https://wiki.archiveteam.org/?diff=62814&oldid=62714 |
| 11:52:45 | <h2ibot> | Pendonym edited Discord (+128, /* Archival services */): https://wiki.archiveteam.org/?diff=62815&oldid=62680 |
| 12:02:08 | <cruller> | arkiver: https://github.com/microlinkhq/is-antibot |
| 12:03:07 | <cruller> | IIRC, Zeno or/and Brozzler also have some antibot detectors. |
| 12:05:48 | | oxtyped quits [Ping timeout: 268 seconds] |
| 12:24:59 | | skankhunt42 quits [Quit: Ping timeout (120 seconds)] |
| 12:25:19 | | skankhunt42 (skankhunt42) joins |
| 12:34:10 | <h2ibot> | User edited ArchiveBot/Cuba (+20): https://wiki.archiveteam.org/?diff=62816&oldid=62285 |
| 12:34:11 | <h2ibot> | User edited ArchiveBot/Cuba (-1): https://wiki.archiveteam.org/?diff=62817&oldid=62816 |
| 12:34:12 | <h2ibot> | User created Category:Cuba (+0, Created blank page): https://wiki.archiveteam.org/?oldid=62818 |
| 12:36:10 | <h2ibot> | User edited ArchiveBot/Cuba (+24): https://wiki.archiveteam.org/?diff=62819&oldid=62817 |
| 12:38:11 | <h2ibot> | User edited ArchiveBot/Cuba (+1396, BOT - Updating page: {{saved}} (10),…): https://wiki.archiveteam.org/?diff=62820&oldid=62819 |
| 12:43:11 | <h2ibot> | User edited Template:ArchiveBot (-109, Don't understand the problem of a navbox that…): https://wiki.archiveteam.org/?diff=62821&oldid=62470 |
| 12:43:12 | <h2ibot> | User edited ArchiveBot/Cuba (+0): https://wiki.archiveteam.org/?diff=62822&oldid=62820 |
| 12:43:34 | | nine quits [Quit: See ya!] |
| 12:43:47 | | nine joins |
| 12:45:41 | <klea> | Paw-chivist: The problem with a navbox that adds links that is included in a lot of pages is every single one of those pages now links to every other page the navbox links, which makes Special:WhatLinksHere less useful. |
| 12:49:01 | <justauser> | https://datalounge.com/ is shutting down 2026-07-31, without becoming readonly. |
| 12:49:02 | <justauser> | As far as I can tell, it's not trivially spiderable. |
| 12:49:02 | <justauser> | Also, a reminder about Blice. |
| 12:50:21 | <klea> | And re the AB category, yeah, we could include it on the pages themselves instead. |
| 12:56:29 | | oxtyped joins |
| 13:00:51 | | kansei- quits [Quit: ZNC 1.10.1 - https://znc.in] |
| 13:01:16 | | kansei (kansei) joins |
| 13:20:54 | | that_lurker2 joins |
| 13:28:06 | <h2ibot> | Cruller edited Deathwatch (+2, /* 2026-07 */ Corrected copy-and-paste errors): https://wiki.archiveteam.org/?diff=62823&oldid=62814 |
| 13:28:28 | <cruller> | Also, a reminder about OPENREC.tv |
| 13:30:06 | <h2ibot> | User edited Deathwatch (+432, /* 2026-08 */): https://wiki.archiveteam.org/?diff=62824&oldid=62823 |
| 13:30:39 | | Paw-chivist joins |
| 13:31:44 | <Paw-chivist> | to klea : I solved the problem with the script, it was the category added by navbox, please don't revert that |
| 13:34:40 | | Paw-chivist quits [Read error: Connection reset by peer] |
| 13:50:34 | <h2ibot> | Cruller edited Deathwatch (-36, /* 2026-07 */): https://wiki.archiveteam.org/?diff=62825&oldid=62824 |
| 14:21:48 | | unknownsrc (unknownsrc) joins |
| 14:31:31 | | dxrt quits [Quit: ZNC - http://znc.sourceforge.net] |
| 14:31:57 | | dxrt joins |
| 14:44:35 | | @imer quits [Quit: Oh no] |
| 14:45:31 | | flugga quits [Quit: euphrates closing] |
| 14:47:09 | | imer (imer) joins |
| 14:47:09 | | @ChanServ sets mode: +o imer |
| 14:48:22 | | DogsRNice joins |
| 15:02:20 | | Island joins |
| 15:06:55 | | @JAA quits [Ping timeout: 248 seconds] |
| 15:08:08 | | JAA (JAA) joins |
| 15:08:08 | | @ChanServ sets mode: +o JAA |
| 15:11:07 | | moth3 quits [Remote host closed the connection] |
| 15:12:38 | | moth3 joins |
| 15:28:30 | | nexussfan (nexussfan) joins |
| 15:43:04 | | Mateon2 joins |
| 15:45:20 | | Mateon1 quits [Ping timeout: 268 seconds] |
| 15:45:20 | | Mateon2 is now known as Mateon1 |
| 16:00:15 | | Webuser824872 joins |
| 16:08:45 | <c3manu> | cruller: by accident i found the page https://blog.altairjp.co.jp/ still active, while all other ones are either dead or redirecting to Siemens. it's running in #archivebot now |
| 16:09:06 | <c3manu> | i then checked and also found https://store.altair.co.kr/, but that seems pretty small. but still :) |
| 16:12:17 | <cruller> | c3manu++ |
| 16:12:18 | <eggdrop> | [karma] 'c3manu' now has 159 karma! |
| 16:27:31 | | grill_ (grill) joins |
| 16:27:53 | | grill quits [Ping timeout: 268 seconds] |
| 17:02:15 | | nimaje quits [Read error: Connection reset by peer] |
| 17:36:24 | | nimaje joins |
| 17:46:17 | | sg-72 quits [Quit: Leaving] |
| 17:56:41 | <nicolas17> | $ rclone size b2:samsung-oss-n17 |
| 17:56:43 | <nicolas17> | Total objects: 12 (12) |
| 17:56:44 | <nicolas17> | Total size: 5.298 GiB (5688953078 Byte) |
| 18:09:19 | | moth3 quits [Ping timeout: 248 seconds] |
| 18:27:14 | | nimaje quits [Client Quit] |
| 18:38:06 | | nimaje joins |
| 18:43:44 | | skankhunt42 quits [Quit: Ping timeout (120 seconds)] |
| 18:44:01 | | skankhunt42 (skankhunt42) joins |
| 18:59:53 | | moth3 joins |
| 19:03:50 | | Wohlstand (Wohlstand) joins |