00:02:33xkey quits [Quit: WeeChat 4.7.1]
00:02:45xkey (xkey) joins
00:05:20spirit joins
00:30:35etnguyen03 (etnguyen03) joins
02:03:08Webuser389292 joins
02:05:36Webuser389292 quits [Client Quit]
02:12:54ericgallager joins
02:18:22BearFortress joins
02:20:59BearFortress_ quits [Ping timeout: 260 seconds]
02:44:22cyanbox joins
02:44:53etnguyen03 quits [Client Quit]
02:49:39etnguyen03 (etnguyen03) joins
02:53:26Guest58 joins
03:01:58etnguyen03 quits [Remote host closed the connection]
03:04:00Guest58 quits [Client Quit]
03:46:47Guest58 joins
03:46:56Guest58 quits [Client Quit]
03:54:30Wohlstand (Wohlstand) joins
04:08:30Guest58 joins
04:21:16Guest58 quits [Client Quit]
04:24:32Prokonsul_Piotrus joins
04:24:51Guest58 joins
04:25:13<Prokonsul_Piotrus>hello. I would like to request archival of some websites (Polish fanzines) I found. They are not archiving and they are made of multiple pages, archiving them one by one is a PITA. Can you help?
04:27:21<nicolas17>yes, give links
04:27:25<Prokonsul_Piotrus>one is at https://www.letsgoretro.pl/ second is at https://www.valetz.pl/ third is at https://szortal.pl/
04:28:27<nicolas17>also giving hundreds of individual pages one by one to savepagenow is not just "a PITA", it's something you *shouldn't* do; for example images and scripts used in those pages end up saved hundreds of times
04:28:29<Prokonsul_Piotrus>first is a fan archove of several fanzines and other old stuff (90s/00s); the other two are still working pages for some old faznines from that era with official archives etc. Valuable stuff for historians of the net, I think
04:29:12Guest58 quits [Client Quit]
04:29:34<Prokonsul_Piotrus>good to know, but not explained well in the web materials that come up for google search for queries, including IA's blogs and such
04:32:57Guest58 joins
04:38:04Guest58 quits [Client Quit]
04:43:34pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat]
04:46:29Guest58 joins
04:48:26Guest58 quits [Client Quit]
04:51:50Guest58 joins
04:53:55Guest58 quits [Client Quit]
05:16:11Webuser323720 joins
05:16:21<Webuser323720>skibidi sigma
05:17:07Webuser323720 quits [Client Quit]
05:21:25Guest58 joins
05:23:59Guest58 quits [Client Quit]
05:33:03Guest58 joins
05:36:37Guest58 quits [Client Quit]
05:38:03<erkinalp>maybe we should also include e-gov projects: https://koreajoongangdaily.joins.com/news/2025-10-01/national/socialAffairs/NIRS-fire-destroys-governments-cloud-storage-system-no-backups-available/2412936
05:53:23<Prokonsul_Piotrus>while checking some stuff I also noticed a major Polish library of open access works does not seem to be fully archived. Their main page is at https://wolnelektury.pl/ - I hope you can archive it now and repeat it every year or so? It is aa Polish Gutenberg Project like intiative, pretty big, with Wkipedia articles about it and so on
05:54:13<Prokonsul_Piotrus>fYI, the specific subpage of the project I just checked and Wayback Machine reported as never archived is https://wolnelektury.pl/katalog/autor/jarek-zawadzki/
05:57:51Guest58 joins
06:06:44ducky quits [Ping timeout: 260 seconds]
06:08:23Guest58 quits [Client Quit]
06:11:09Guest58 joins
06:13:00Guest58 quits [Client Quit]
06:15:06Guest58 joins
06:16:56Guest58 quits [Client Quit]
06:19:06Guest58 joins
06:21:10Guest58 quits [Client Quit]
06:21:39Guest58 joins
06:22:38ducky (ducky) joins
06:28:03Guest58 quits [Client Quit]
06:29:27Guest58 joins
06:30:13notSokar joins
06:31:33valdikss joins
06:31:49Sokar quits [Ping timeout: 260 seconds]
06:32:16Guest58 quits [Client Quit]
06:33:15Guest58 joins
06:37:32Guest58 quits [Client Quit]
06:42:37Guest58 joins
06:44:41Guest58 quits [Client Quit]
06:59:01<h2ibot>Usernam edited List of websites excluded from the Wayback Machine (+127): https://wiki.archiveteam.org/?diff=57577&oldid=57572
07:09:02<h2ibot>Usernam edited List of websites excluded from the Wayback Machine (+46): https://wiki.archiveteam.org/?diff=57578&oldid=57577
07:27:09<twiswist_>http://homepage.eircom.net/~rechargedflash4/imario.swf HTTP redirects to https://eircomnetwebspace.eir.ie/enable which says: Please note this webspace and its associated information will be permanently removed on the 21st of October 2025. If you are an eircom net webspace user and would like to access your webspace, you can have the webspace temporarily reactivated by logging in below. Please note the webspace will still be
07:27:09<twiswist_>permanently removed on the 21st of October 2025.
07:27:40<twiswist_>That appears to be another ISP web host
07:32:29<twiswist_>1006351 pages of homepage.eircom.net.json on the wbm
07:47:42Webuser915438 joins
07:48:57Webuser915438 quits [Client Quit]
07:53:27<twiswist_>(as in items captured under the domain, not actual homepages)
08:08:09notSokar quits [Client Quit]
08:08:18Sokar joins
08:17:56Guest58 joins
08:28:17twiswist_ quits [Quit: twiswist_]
09:30:52tek_dmn- quits [Quit: ZNC - https://znc.in]
10:28:44StarletCharlotte joins
10:28:48<StarletCharlotte>This might've been archived at some point already, but this site has been decaying since 2005. It used to be the home of an IRC network dedicated to fandoms, particularly video games, and was affiliated with various fan sites from the time that no longer exist. It's... Rather sad. https://dorksnet.org/
10:29:07<StarletCharlotte>(not sure if this is actually urgent, whoops)
10:40:24pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
10:43:17pabs (pabs) joins
10:46:49tek_dmn (tek_dmn) joins
10:47:58StarletCharlotte quits [Client Quit]
11:00:03Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
11:02:47Bleo182600722719623455222 joins
11:06:34^ quits [Ping timeout: 260 seconds]
11:09:44^ (^) joins
11:14:41<Hans5958><twiswist_> "http://homepage.eircom.net/~..." <- FYI, eir is noted on the Deathwatch
11:42:53jefferderp quits [Quit: Ooops, wrong browser tab.]
12:22:23kdy quits [Remote host closed the connection]
12:22:35kdy (kdy) joins
12:54:29FiTheArchiver joins
12:55:01FiTheArchiver quits [Client Quit]
13:20:45unicron joins
13:23:29Prokonsul_Piotrus quits [Quit: Ooops, wrong browser tab.]
13:36:58Shard (Shard) joins
13:38:35Webuser461905 joins
13:42:45Webuser461905 quits [Client Quit]
14:03:54Shard quits [Ping timeout: 260 seconds]
14:18:41VerifiedJ quits [Remote host closed the connection]
14:19:17VerifiedJ (VerifiedJ) joins
14:29:36Shard (Shard) joins
14:34:04BearFortress_ joins
14:37:44BearFortress quits [Ping timeout: 260 seconds]
14:42:19Wohlstand1 (Wohlstand) joins
14:42:19Wohlstand quits [Read error: Connection reset by peer]
14:42:19Wohlstand1 is now known as Wohlstand
14:55:06Wohlstand quits [Read error: Connection reset by peer]
14:55:30pedantic-darwin joins
14:59:13Wohlstand (Wohlstand) joins
14:59:47<justauser|m>>for example images and scripts used in those pages end up saved hundreds of times
14:59:48<justauser|m>AFAIK, Wayback won't fetch again if there is a recent capture.
15:21:48kansei quits [Quit: ZNC 1.10.1 - https://znc.in]
15:24:54kansei (kansei) joins
15:30:09unicron quits [Client Quit]
15:41:03<TheTechRobo>It has to, because Brozzler needs to load all the page resources to properly function.
15:45:00<TheTechRobo>Actually warcprox's dedupe is a little nicer than I thought it was. I'm not sure if they use the same dedupe DB across their cluster, but if they do, depending on their configuration, it at least won't be written to WARC again. (Didn't know about the blackout period option, will have to start using that in mnbot.)
15:45:31<TheTechRobo>Depends on how they configured everything.
16:07:34Dango360 quits [Ping timeout: 260 seconds]
16:16:59Dango360 (Dango360) joins
16:29:53<fuzzy80211>datechnoman if no one has asked, need you to search your stash for urls related to #sourceforget
16:34:59Wohlstand quits [Client Quit]
16:38:02BearFortress joins
16:41:25BearFortress_ quits [Ping timeout: 258 seconds]
16:50:32Wohlstand (Wohlstand) joins
16:50:37Wohlstand quits [Client Quit]
16:50:49notarobot174 joins
16:51:05<nicolas17>did anyone archivebot Prokonsul_Piotrus's requests?
16:53:17<justauser|m>ABV says: no.
16:54:49notarobot17 quits [Ping timeout: 260 seconds]
16:54:49notarobot174 is now known as notarobot17
16:56:06notarobot170 joins
17:00:04notarobot17 quits [Ping timeout: 260 seconds]
17:00:04notarobot170 is now known as notarobot17
17:02:03notarobot178 joins
17:04:38Webuser581910 joins
17:04:57Webuser581910 quits [Client Quit]
17:06:15notarobot176 joins
17:06:29notarobot17 quits [Ping timeout: 260 seconds]
17:06:29notarobot176 is now known as notarobot17
17:06:56notarobot176 joins
17:09:31Island joins
17:10:10notarobot178 quits [Ping timeout: 258 seconds]
17:10:10notarobot177 joins
17:10:49<nicolas17>1 done 2 running
17:10:56notarobot17 quits [Ping timeout: 258 seconds]
17:10:56notarobot177 is now known as notarobot17
17:14:00notarobot176 quits [Ping timeout: 258 seconds]
17:24:13Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…]
17:25:02unicron joins
17:25:07Hackerpcs quits [Ping timeout: 258 seconds]
17:26:58<cruller>From 20250921100711 to 20250921140024, 1,046 pages on https://tver.jp/ were processed by SPN, but the logo image was written to WARC(s) only once. https://web.archive.org/cdx/search/cdx?url=https://tver.jp/images/tver_10_anniversary_logo.svg&from=20250920&to=20250922
17:30:19<cruller>But HTTP messages must have been sent 1,046 times, right?
17:31:56<justauser|m>This might be tested by SPNing from your own server.
17:32:28<pokechu22>I think SPN avoids re-saving images it already saved (without sending any request). Individual archivebot jobs don't re-save images that they've already processed either (though separate archivebot jobs are not aware of eachother)
17:32:38<justauser|m>It's not too likely, but possible, that the URL in question is served by some local cache, or by Wayback Machine itself.
17:33:13Hackerpcs (Hackerpcs) joins
17:33:24<justauser|m>pokechu22: TheTechRobo says it has to load the image.
17:33:24<pokechu22>There are also revisit records, but I'm not sure if SPN uses them (you can't download SPN WARCs so you can't really check)
17:33:50<justauser|m>Revisit records are exposed in CDX FWIW.
17:34:15<justauser|m>They have mimetype of "warc/revisit".
17:40:32<TheTechRobo>Yeah, it has to be requested every time, but I realized that warcprox (which I believe is what brozzler is used with in prod) can be configured not to write revisit records for captures that occurred too recently.
17:41:04<TheTechRobo>So AFAIK they still have to be requested, but if it turns out to be the same response, it might not have to be rewritten.
17:45:56Shard quits [Quit: Im doing something rq. Il brb]
17:51:47<h2ibot>Justauser edited Main Page/Current Warrior Project (+2, Default back to Telegram): https://wiki.archiveteam.org/?diff=57579&oldid=57493
17:52:48<cruller>It seems a bit odd not to log anything (not even "warc/revisit") about the http communications that actually took place.
17:52:51Shard (Shard) joins
17:53:31nine- joins
17:53:39Shard quits [Client Quit]
17:53:53nine quits [Read error: Connection reset by peer]
17:53:53nine- is now known as nine
17:53:57nine quits [Changing host]
17:53:57nine (nine) joins
17:54:14<justauser|m>Index bloat is the reason, probably.
17:54:26<cruller>It would be better in practical terms, though.
17:54:57ducky_ (ducky) joins
17:56:10ducky quits [Ping timeout: 258 seconds]
17:56:10ducky_ is now known as ducky
17:58:44Shard (Shard) joins
18:05:18BearFortress_ joins
18:08:49BearFortress quits [Ping timeout: 258 seconds]
18:14:51<h2ibot>Justauser edited Site exploration (+192, update links): https://wiki.archiveteam.org/?diff=57580&oldid=54386
18:53:40cyanbox quits [Remote host closed the connection]
18:54:01cyanbox joins
18:54:46<@JAA>justauser|m: The AB viewer isn't reliable in the short term. It lags by at least hours, not rarely days.
18:55:43hexagonwin quits [Remote host closed the connection]
18:55:55hexagonwin joins
18:55:59yasomimi (yasomi) joins
18:59:04yasomi quits [Ping timeout: 260 seconds]
18:59:04yasomimi is now known as yasomi
19:13:24wingding quits []
19:37:42Island quits [Read error: Connection reset by peer]
19:50:16Webuser739565 joins
19:50:26Webuser739565 quits [Client Quit]
20:07:01<spirit>JAA: thanks re serving webarchive with intact urls, i will try to find some time to test those options
20:07:11spirit quits [Quit: Leaving]
20:45:25DogsRNice joins
20:49:40twiswist (twiswist) joins
21:04:29Shard quits [Quit: Im doing something rq. Il brb]
21:31:19justauser|m leaves [User left]
21:42:06Shard (Shard) joins
21:44:04lennier2_ joins
21:47:04lennier2 quits [Ping timeout: 260 seconds]
21:52:17Shard quits [Client Quit]
21:55:39Shard (Shard) joins
21:56:51etnguyen03 (etnguyen03) joins
22:17:51emphie quits [Remote host closed the connection]
22:18:31emphie joins
22:28:54Shard quits [Client Quit]
22:40:14andrewnyr quits [Quit: The Lounge - https://thelounge.chat]
22:40:30andrewnyr joins
22:52:06nathang21 quits [Ping timeout: 258 seconds]
22:54:53unicron quits [Quit: Connection closed for inactivity]
23:22:58unicron joins
23:32:32etnguyen03 quits [Client Quit]
23:36:44nathang21 joins
23:43:18Island joins
23:49:36nathang21 quits [Ping timeout: 258 seconds]
23:56:01<nicolas17>!seen Prokonsul_Piotrus
23:56:01<eggdrop>[seen] Prokonsul_Piotrus (~Prokonsul@114.203.14.164) was last seen quitting from #archiveteam-bs 10 hours 32 minutes 32 seconds ago (2025-10-06T13:23:29Z), stating "Quit: Ooops, wrong browser tab."