00:02:33 | | xkey quits [Quit: WeeChat 4.7.1] |
00:02:45 | | xkey (xkey) joins |
00:05:20 | | spirit joins |
00:30:35 | | etnguyen03 (etnguyen03) joins |
02:03:08 | | Webuser389292 joins |
02:05:36 | | Webuser389292 quits [Client Quit] |
02:12:54 | | ericgallager joins |
02:18:22 | | BearFortress joins |
02:20:59 | | BearFortress_ quits [Ping timeout: 260 seconds] |
02:44:22 | | cyanbox joins |
02:44:53 | | etnguyen03 quits [Client Quit] |
02:49:39 | | etnguyen03 (etnguyen03) joins |
02:53:26 | | Guest58 joins |
03:01:58 | | etnguyen03 quits [Remote host closed the connection] |
03:04:00 | | Guest58 quits [Client Quit] |
03:46:47 | | Guest58 joins |
03:46:56 | | Guest58 quits [Client Quit] |
03:54:30 | | Wohlstand (Wohlstand) joins |
04:08:30 | | Guest58 joins |
04:21:16 | | Guest58 quits [Client Quit] |
04:24:32 | | Prokonsul_Piotrus joins |
04:24:51 | | Guest58 joins |
04:25:13 | <Prokonsul_Piotrus> | hello. I would like to request archival of some websites (Polish fanzines) I found. They are not archiving and they are made of multiple pages, archiving them one by one is a PITA. Can you help? |
04:27:21 | <nicolas17> | yes, give links |
04:27:25 | <Prokonsul_Piotrus> | one is at https://www.letsgoretro.pl/ second is at https://www.valetz.pl/ third is at https://szortal.pl/ |
04:28:27 | <nicolas17> | also giving hundreds of individual pages one by one to savepagenow is not just "a PITA", it's something you *shouldn't* do; for example images and scripts used in those pages end up saved hundreds of times |
04:28:29 | <Prokonsul_Piotrus> | first is a fan archove of several fanzines and other old stuff (90s/00s); the other two are still working pages for some old faznines from that era with official archives etc. Valuable stuff for historians of the net, I think |
04:29:12 | | Guest58 quits [Client Quit] |
04:29:34 | <Prokonsul_Piotrus> | good to know, but not explained well in the web materials that come up for google search for queries, including IA's blogs and such |
04:32:57 | | Guest58 joins |
04:38:04 | | Guest58 quits [Client Quit] |
04:43:34 | | pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat] |
04:46:29 | | Guest58 joins |
04:48:26 | | Guest58 quits [Client Quit] |
04:51:50 | | Guest58 joins |
04:53:55 | | Guest58 quits [Client Quit] |
05:16:11 | | Webuser323720 joins |
05:16:21 | <Webuser323720> | skibidi sigma |
05:17:07 | | Webuser323720 quits [Client Quit] |
05:21:25 | | Guest58 joins |
05:23:59 | | Guest58 quits [Client Quit] |
05:33:03 | | Guest58 joins |
05:36:37 | | Guest58 quits [Client Quit] |
05:38:03 | <erkinalp> | maybe we should also include e-gov projects: https://koreajoongangdaily.joins.com/news/2025-10-01/national/socialAffairs/NIRS-fire-destroys-governments-cloud-storage-system-no-backups-available/2412936 |
05:53:23 | <Prokonsul_Piotrus> | while checking some stuff I also noticed a major Polish library of open access works does not seem to be fully archived. Their main page is at https://wolnelektury.pl/ - I hope you can archive it now and repeat it every year or so? It is aa Polish Gutenberg Project like intiative, pretty big, with Wkipedia articles about it and so on |
05:54:13 | <Prokonsul_Piotrus> | fYI, the specific subpage of the project I just checked and Wayback Machine reported as never archived is https://wolnelektury.pl/katalog/autor/jarek-zawadzki/ |
05:57:51 | | Guest58 joins |
06:06:44 | | ducky quits [Ping timeout: 260 seconds] |
06:08:23 | | Guest58 quits [Client Quit] |
06:11:09 | | Guest58 joins |
06:13:00 | | Guest58 quits [Client Quit] |
06:15:06 | | Guest58 joins |
06:16:56 | | Guest58 quits [Client Quit] |
06:19:06 | | Guest58 joins |
06:21:10 | | Guest58 quits [Client Quit] |
06:21:39 | | Guest58 joins |
06:22:38 | | ducky (ducky) joins |
06:28:03 | | Guest58 quits [Client Quit] |
06:29:27 | | Guest58 joins |
06:30:13 | | notSokar joins |
06:31:33 | | valdikss joins |
06:31:49 | | Sokar quits [Ping timeout: 260 seconds] |
06:32:16 | | Guest58 quits [Client Quit] |
06:33:15 | | Guest58 joins |
06:37:32 | | Guest58 quits [Client Quit] |
06:42:37 | | Guest58 joins |
06:44:41 | | Guest58 quits [Client Quit] |
06:59:01 | <h2ibot> | Usernam edited List of websites excluded from the Wayback Machine (+127): https://wiki.archiveteam.org/?diff=57577&oldid=57572 |
07:09:02 | <h2ibot> | Usernam edited List of websites excluded from the Wayback Machine (+46): https://wiki.archiveteam.org/?diff=57578&oldid=57577 |
07:27:09 | <twiswist_> | http://homepage.eircom.net/~rechargedflash4/imario.swf HTTP redirects to https://eircomnetwebspace.eir.ie/enable which says: Please note this webspace and its associated information will be permanently removed on the 21st of October 2025. If you are an eircom net webspace user and would like to access your webspace, you can have the webspace temporarily reactivated by logging in below. Please note the webspace will still be |
07:27:09 | <twiswist_> | permanently removed on the 21st of October 2025. |
07:27:40 | <twiswist_> | That appears to be another ISP web host |
07:32:29 | <twiswist_> | 1006351 pages of homepage.eircom.net.json on the wbm |
07:47:42 | | Webuser915438 joins |
07:48:57 | | Webuser915438 quits [Client Quit] |
07:53:27 | <twiswist_> | (as in items captured under the domain, not actual homepages) |
08:08:09 | | notSokar quits [Client Quit] |
08:08:18 | | Sokar joins |
08:17:56 | | Guest58 joins |
08:28:17 | | twiswist_ quits [Quit: twiswist_] |
09:30:52 | | tek_dmn- quits [Quit: ZNC - https://znc.in] |
10:28:44 | | StarletCharlotte joins |
10:28:48 | <StarletCharlotte> | This might've been archived at some point already, but this site has been decaying since 2005. It used to be the home of an IRC network dedicated to fandoms, particularly video games, and was affiliated with various fan sites from the time that no longer exist. It's... Rather sad. https://dorksnet.org/ |
10:29:07 | <StarletCharlotte> | (not sure if this is actually urgent, whoops) |
10:40:24 | | pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.] |
10:43:17 | | pabs (pabs) joins |
10:46:49 | | tek_dmn (tek_dmn) joins |
10:47:58 | | StarletCharlotte quits [Client Quit] |
11:00:03 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
11:02:47 | | Bleo182600722719623455222 joins |
11:06:34 | | ^ quits [Ping timeout: 260 seconds] |
11:09:44 | | ^ (^) joins |
11:14:41 | <Hans5958> | <twiswist_> "http://homepage.eircom.net/~..." <- FYI, eir is noted on the Deathwatch |
11:42:53 | | jefferderp quits [Quit: Ooops, wrong browser tab.] |
12:22:23 | | kdy quits [Remote host closed the connection] |
12:22:35 | | kdy (kdy) joins |
12:54:29 | | FiTheArchiver joins |
12:55:01 | | FiTheArchiver quits [Client Quit] |
13:20:45 | | unicron joins |
13:23:29 | | Prokonsul_Piotrus quits [Quit: Ooops, wrong browser tab.] |
13:36:58 | | Shard (Shard) joins |
13:38:35 | | Webuser461905 joins |
13:42:45 | | Webuser461905 quits [Client Quit] |
14:03:54 | | Shard quits [Ping timeout: 260 seconds] |
14:18:41 | | VerifiedJ quits [Remote host closed the connection] |
14:19:17 | | VerifiedJ (VerifiedJ) joins |
14:29:36 | | Shard (Shard) joins |
14:34:04 | | BearFortress_ joins |
14:37:44 | | BearFortress quits [Ping timeout: 260 seconds] |
14:42:19 | | Wohlstand1 (Wohlstand) joins |
14:42:19 | | Wohlstand quits [Read error: Connection reset by peer] |
14:42:19 | | Wohlstand1 is now known as Wohlstand |
14:55:06 | | Wohlstand quits [Read error: Connection reset by peer] |
14:55:30 | | pedantic-darwin joins |
14:59:13 | | Wohlstand (Wohlstand) joins |
14:59:47 | <justauser|m> | >for example images and scripts used in those pages end up saved hundreds of times |
14:59:48 | <justauser|m> | AFAIK, Wayback won't fetch again if there is a recent capture. |
15:21:48 | | kansei quits [Quit: ZNC 1.10.1 - https://znc.in] |
15:24:54 | | kansei (kansei) joins |
15:30:09 | | unicron quits [Client Quit] |
15:41:03 | <TheTechRobo> | It has to, because Brozzler needs to load all the page resources to properly function. |
15:45:00 | <TheTechRobo> | Actually warcprox's dedupe is a little nicer than I thought it was. I'm not sure if they use the same dedupe DB across their cluster, but if they do, depending on their configuration, it at least won't be written to WARC again. (Didn't know about the blackout period option, will have to start using that in mnbot.) |
15:45:31 | <TheTechRobo> | Depends on how they configured everything. |
16:07:34 | | Dango360 quits [Ping timeout: 260 seconds] |
16:16:59 | | Dango360 (Dango360) joins |
16:29:53 | <fuzzy80211> | datechnoman if no one has asked, need you to search your stash for urls related to #sourceforget |
16:34:59 | | Wohlstand quits [Client Quit] |
16:38:02 | | BearFortress joins |
16:41:25 | | BearFortress_ quits [Ping timeout: 258 seconds] |
16:50:32 | | Wohlstand (Wohlstand) joins |
16:50:37 | | Wohlstand quits [Client Quit] |
16:50:49 | | notarobot174 joins |
16:51:05 | <nicolas17> | did anyone archivebot Prokonsul_Piotrus's requests? |
16:53:17 | <justauser|m> | ABV says: no. |
16:54:49 | | notarobot17 quits [Ping timeout: 260 seconds] |
16:54:49 | | notarobot174 is now known as notarobot17 |
16:56:06 | | notarobot170 joins |
17:00:04 | | notarobot17 quits [Ping timeout: 260 seconds] |
17:00:04 | | notarobot170 is now known as notarobot17 |
17:02:03 | | notarobot178 joins |
17:04:38 | | Webuser581910 joins |
17:04:57 | | Webuser581910 quits [Client Quit] |
17:06:15 | | notarobot176 joins |
17:06:29 | | notarobot17 quits [Ping timeout: 260 seconds] |
17:06:29 | | notarobot176 is now known as notarobot17 |
17:06:56 | | notarobot176 joins |
17:09:31 | | Island joins |
17:10:10 | | notarobot178 quits [Ping timeout: 258 seconds] |
17:10:10 | | notarobot177 joins |
17:10:49 | <nicolas17> | 1 done 2 running |
17:10:56 | | notarobot17 quits [Ping timeout: 258 seconds] |
17:10:56 | | notarobot177 is now known as notarobot17 |
17:14:00 | | notarobot176 quits [Ping timeout: 258 seconds] |
17:24:13 | | Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…] |
17:25:02 | | unicron joins |
17:25:07 | | Hackerpcs quits [Ping timeout: 258 seconds] |
17:26:58 | <cruller> | From 20250921100711 to 20250921140024, 1,046 pages on https://tver.jp/ were processed by SPN, but the logo image was written to WARC(s) only once. https://web.archive.org/cdx/search/cdx?url=https://tver.jp/images/tver_10_anniversary_logo.svg&from=20250920&to=20250922 |
17:30:19 | <cruller> | But HTTP messages must have been sent 1,046 times, right? |
17:31:56 | <justauser|m> | This might be tested by SPNing from your own server. |
17:32:28 | <pokechu22> | I think SPN avoids re-saving images it already saved (without sending any request). Individual archivebot jobs don't re-save images that they've already processed either (though separate archivebot jobs are not aware of eachother) |
17:32:38 | <justauser|m> | It's not too likely, but possible, that the URL in question is served by some local cache, or by Wayback Machine itself. |
17:33:13 | | Hackerpcs (Hackerpcs) joins |
17:33:24 | <justauser|m> | pokechu22: TheTechRobo says it has to load the image. |
17:33:24 | <pokechu22> | There are also revisit records, but I'm not sure if SPN uses them (you can't download SPN WARCs so you can't really check) |
17:33:50 | <justauser|m> | Revisit records are exposed in CDX FWIW. |
17:34:15 | <justauser|m> | They have mimetype of "warc/revisit". |
17:40:32 | <TheTechRobo> | Yeah, it has to be requested every time, but I realized that warcprox (which I believe is what brozzler is used with in prod) can be configured not to write revisit records for captures that occurred too recently. |
17:41:04 | <TheTechRobo> | So AFAIK they still have to be requested, but if it turns out to be the same response, it might not have to be rewritten. |
17:45:56 | | Shard quits [Quit: Im doing something rq. Il brb] |
17:51:47 | <h2ibot> | Justauser edited Main Page/Current Warrior Project (+2, Default back to Telegram): https://wiki.archiveteam.org/?diff=57579&oldid=57493 |
17:52:48 | <cruller> | It seems a bit odd not to log anything (not even "warc/revisit") about the http communications that actually took place. |
17:52:51 | | Shard (Shard) joins |
17:53:31 | | nine- joins |
17:53:39 | | Shard quits [Client Quit] |
17:53:53 | | nine quits [Read error: Connection reset by peer] |
17:53:53 | | nine- is now known as nine |
17:53:57 | | nine is now authenticated as nine |
17:53:57 | | nine quits [Changing host] |
17:53:57 | | nine (nine) joins |
17:54:14 | <justauser|m> | Index bloat is the reason, probably. |
17:54:26 | <cruller> | It would be better in practical terms, though. |
17:54:57 | | ducky_ (ducky) joins |
17:56:10 | | ducky quits [Ping timeout: 258 seconds] |
17:56:10 | | ducky_ is now known as ducky |
17:58:44 | | Shard (Shard) joins |
18:05:18 | | BearFortress_ joins |
18:08:49 | | BearFortress quits [Ping timeout: 258 seconds] |
18:14:51 | <h2ibot> | Justauser edited Site exploration (+192, update links): https://wiki.archiveteam.org/?diff=57580&oldid=54386 |
18:53:40 | | cyanbox quits [Remote host closed the connection] |
18:54:01 | | cyanbox joins |
18:54:46 | <@JAA> | justauser|m: The AB viewer isn't reliable in the short term. It lags by at least hours, not rarely days. |
18:55:43 | | hexagonwin quits [Remote host closed the connection] |
18:55:55 | | hexagonwin joins |
18:55:59 | | yasomimi (yasomi) joins |
18:59:04 | | yasomi quits [Ping timeout: 260 seconds] |
18:59:04 | | yasomimi is now known as yasomi |
19:13:24 | | wingding quits [] |
19:37:42 | | Island quits [Read error: Connection reset by peer] |
19:50:16 | | Webuser739565 joins |
19:50:26 | | Webuser739565 quits [Client Quit] |
20:07:01 | <spirit> | JAA: thanks re serving webarchive with intact urls, i will try to find some time to test those options |
20:07:11 | | spirit quits [Quit: Leaving] |
20:45:25 | | DogsRNice joins |
20:49:40 | | twiswist (twiswist) joins |
21:04:29 | | Shard quits [Quit: Im doing something rq. Il brb] |
21:31:19 | | justauser|m leaves [User left] |
21:42:06 | | Shard (Shard) joins |
21:44:04 | | lennier2_ joins |
21:47:04 | | lennier2 quits [Ping timeout: 260 seconds] |
21:52:17 | | Shard quits [Client Quit] |
21:55:39 | | Shard (Shard) joins |
21:56:51 | | etnguyen03 (etnguyen03) joins |
22:17:51 | | emphie quits [Remote host closed the connection] |
22:18:31 | | emphie joins |
22:28:54 | | Shard quits [Client Quit] |
22:40:14 | | andrewnyr quits [Quit: The Lounge - https://thelounge.chat] |
22:40:30 | | andrewnyr joins |
22:52:06 | | nathang21 quits [Ping timeout: 258 seconds] |
22:54:53 | | unicron quits [Quit: Connection closed for inactivity] |
23:22:58 | | unicron joins |
23:32:32 | | etnguyen03 quits [Client Quit] |
23:36:44 | | nathang21 joins |
23:43:18 | | Island joins |
23:49:36 | | nathang21 quits [Ping timeout: 258 seconds] |
23:56:01 | <nicolas17> | !seen Prokonsul_Piotrus |
23:56:01 | <eggdrop> | [seen] Prokonsul_Piotrus (~Prokonsul@114.203.14.164) was last seen quitting from #archiveteam-bs 10 hours 32 minutes 32 seconds ago (2025-10-06T13:23:29Z), stating "Quit: Ooops, wrong browser tab." |