00:12:04hackbug quits [Remote host closed the connection]
00:14:54tekulvw (tekulvw) joins
00:16:27hackbug (hackbug) joins
00:19:14Dada quits [Remote host closed the connection]
00:22:00tekulvw quits [Ping timeout: 268 seconds]
00:27:32tekulvw (tekulvw) joins
00:32:37tekulvw quits [Ping timeout: 272 seconds]
00:36:52tekulvw (tekulvw) joins
00:36:52<klea> https://discourse.nixos.org/t/garbage-collecting-cache-nixos-org/74249/10
00:37:05<klea>Tonight we enabled Bucket Versioning and configured a Lifecycle Rule to delete the non-default object version after 30 days. See enable bucket versioning · NixOS/infra@9cf1919 · GitHub for details.
00:37:05<klea>Then we deleted everything contained in the following datasets. This should amount to roughly 100 TiB, so something like 10% of the total S3 size.
00:37:05<klea> datasets/narinfos-nixos-images-2026-01-06T01-13Z.parquet · brianmcgee/nix-cache-dataset at main
00:37:05<klea> datasets/narinfos-nixos-images-dangling-refs-2026-01-06T01-13Z.parquet · brianmcgee/nix-cache-dataset at main
00:37:05<klea>The result is that the default version served by cache.nixos.org will return a HTTP 404 response. The non-default version is still around and can be restored within the next 30 days should the deletion cause severe issues.
00:40:30<klea>Tho, since it's a cache, shouldn't affect much?
00:41:31@JAA doesn't know enough about how any of that works.
00:41:34<klea>hexa-: is there a way to make sure things for which sources have linkrotted, but there's still the source code in the cache, for the source code to be archived from the cache, or to have the cache avoid losing source code?
00:42:19<@JAA>Does 'we deleted' mean it's already gone from public view and restorable until next month, or is it still publicly accessible until then?
00:43:26<klea>I suppose it means it's not the latest version, and would 404 from cache.nixos.org, but would still work if you do a aws s3 signed request with requester-pays to the bucket asking for the older revision?
00:45:02<nicolas17>I think if a file gets "overwritten" today, the old version will get deleted after 30 days
00:45:09<nicolas17>oh wait
00:45:13<nicolas17>those were different steps
00:46:03tekulvw quits [Ping timeout: 268 seconds]
00:46:17<nicolas17>yeah ok, they "deleted" a bunch of files that they determined through external processes (dangling refs?), but they're still retrievable via versioning
00:46:23<nicolas17>for the next 30 days
00:46:28<nicolas17>maybe not publicly
00:47:37<nicolas17>is the bucket public?
00:48:10<klea>yes, but requires requester-pays.
00:48:38<nicolas17>what's the bucket name?
00:49:44<klea>I suppose cache.nixos.org?
00:50:00<nicolas17>seems it's https://s3.amazonaws.com/nix-cache
00:51:29<nicolas17>listing is disabled anyway
00:51:47<klea>There's also https://s3.amazonaws.com/nix-channels
00:52:05<klea>2026-02-21 00:51:29 <nicolas17> listing is disabled anyway <- The parquet file (database format) should contain the removed data?
00:52:34<nicolas17>if file listing is disabled, I suspect I also can't say "list versions of file X"
00:52:46<nicolas17>unless the parquet file has the version IDs?
00:52:50<klea>jhttps://blog.erethon.com/blog/2025/07/31/how-nixos-is-built/ was interesting.
00:53:11<klea>I don't know, I should try to check the parquet file.
00:53:32<nicolas17>I have no idea how to read parquet :P
00:53:54<nicolas17>lol @ using huggingface for this
00:54:01<klea>I mean, it's a CDN :p
00:56:07<klea>https://transfer.archivete.am/inline/iWYaN/2026-02-21T00:55:40Z--console.txt
00:59:36<klea>nicolas17: there's a python thing called parquet-tools apparently.
01:00:14<klea>otherwise, no it doesn't seem to include version ids?
01:09:31tekulvw (tekulvw) joins
01:14:25tekulvw quits [Ping timeout: 268 seconds]
01:16:32Wohlstand quits [Quit: Wohlstand]
01:26:51Cupping1285 quits [Quit: bye]
01:27:45Cupping1285 joins
01:39:03Arcorann_ (Arcorann) joins
01:42:10Arcorann quits [Ping timeout: 268 seconds]
02:25:27<h2ibot>Hans5958 created Roblox Groups (+57, Redirected page to [[Roblox#Group Walls…): https://wiki.archiveteam.org/?oldid=60532
03:22:27tekulvw (tekulvw) joins
03:27:00tekulvw quits [Ping timeout: 268 seconds]
04:02:35<pabs>https://arstechnica.com/tech-policy/2026/02/wikipedia-bans-archive-today-after-site-executed-ddos-and-altered-web-captures/
04:02:44<pabs>woops, already posted
04:05:14lennier2 quits [Ping timeout: 268 seconds]
04:06:04lennier2 joins
04:14:36etnguyen03 quits [Remote host closed the connection]
04:18:05nexussfan quits [Read error: Connection reset by peer]
04:35:10Bog joins
04:37:29Bog quits [Client Quit]
05:02:50rover joins
05:04:57roverinexile quits [Ping timeout: 272 seconds]
05:04:57n9nes quits [Ping timeout: 272 seconds]
05:05:12n9nes joins
05:16:25tekulvw (tekulvw) joins
05:18:05<tmg1|michelson>a few hours later, opendiary still full of bad responses
05:21:05tekulvw quits [Ping timeout: 268 seconds]
05:40:34Stvkimension11 (Stvkimension11) joins
05:49:03tekulvw (tekulvw) joins
05:51:02Stvkimension11 quits [Client Quit]
05:53:43tekulvw quits [Ping timeout: 272 seconds]
06:02:18<steering>> roughly 100 TiB ... 10% of the total size.
06:02:29<steering>W. A. T.
06:02:49<BlankEclair>a lil chonker
06:19:03midou quits [Ping timeout: 272 seconds]
06:31:59midou joins
06:36:52aliz joins
06:37:16Island quits [Read error: Connection reset by peer]
06:58:57aliz quits [Client Quit]
07:16:19<hexa->JAA: ig by passing version id for the object
07:18:14<hexa->the version id for all old objects is null fwiw
07:23:13<hexa->https://gist.github.com/Mic92/7bcacea70a8babf327e45dc102489445
07:24:49<hexa->what got deleted is things we really don't need anymore, like images created for nixos tests
07:24:59<hexa->and also old installers iirc
07:30:40<hexa->oh, I think they're not queryable over the fastly cache, likely due to missing permissions
07:33:52<hexa->or maybe delete markers just shadow everything over the s3 web api, dunno
07:37:11ducky quits [Remote host closed the connection]
07:41:00ducky (ducky) joins
07:41:08SootBector quits [Remote host closed the connection]
07:42:18SootBector (SootBector) joins
07:50:31ducky quits [Remote host closed the connection]
07:54:15ducky (ducky) joins
07:57:13lflare quits [Ping timeout: 272 seconds]
07:58:56lflare (lflare) joins
08:00:38tekulvw (tekulvw) joins
08:04:53ducky_ (ducky) joins
08:05:44tekulvw quits [Ping timeout: 268 seconds]
08:07:21ducky quits [Ping timeout: 272 seconds]
08:09:53ducky_ quits [Ping timeout: 272 seconds]
08:13:41lflare quits [Ping timeout: 272 seconds]
08:15:46lflare (lflare) joins
08:21:45ducky (ducky) joins
08:26:33ducky quits [Remote host closed the connection]
08:28:15lflare quits [Client Quit]
08:38:34lflare (lflare) joins
08:47:44AlsoHP_Archivist joins
08:48:31HP_Archivist quits [Ping timeout: 272 seconds]
09:01:56lflare quits [Client Quit]
09:09:40ducky (ducky) joins
09:27:15Nekroschizofrenetyk joins
09:28:26<Nekroschizofrenetyk>Hi, I'm trying to archive (to IA) some pages off this site but I'm getting 520s and 503s. Could somebody check, if it's me or if it's unarchiveable? https://www.olawsky.de/schlesien/forum.html
09:32:14bilboed08 joins
09:32:32SootBector quits [Remote host closed the connection]
09:32:35sec^nd quits [Remote host closed the connection]
09:33:05sec^nd (second) joins
09:33:43SootBector (SootBector) joins
09:35:07lflare (lflare) joins
09:36:01bilboed0 quits [Ping timeout: 272 seconds]
09:36:10Nekroschizofrenetyk quits [Client Quit]
09:39:11bilboed08 quits [Ping timeout: 272 seconds]
09:50:34lflare quits [Ping timeout: 268 seconds]
09:54:41lflare (lflare) joins
09:56:56SootBector quits [Remote host closed the connection]
09:58:04SootBector (SootBector) joins
10:19:05lflare quits [Ping timeout: 272 seconds]
10:21:43lflare (lflare) joins
10:22:21sec^nd quits [Remote host closed the connection]
10:22:22bilboed0 joins
10:22:47sec^nd (second) joins
10:36:48lennier2_ joins
10:36:49lennier2 quits [Ping timeout: 272 seconds]
10:39:02tekulvw (tekulvw) joins
10:43:47tekulvw quits [Ping timeout: 272 seconds]
10:46:09Dada joins
11:00:52midou quits [Ping timeout: 268 seconds]
11:09:10TheEnbyperor_ quits [Read error: Connection reset by peer]
11:09:29TheEnbyperor_ (TheEnbyperor) joins
11:10:26linuxgemini1 (linuxgemini) joins
11:10:43midou joins
11:11:21linuxgemini quits [Ping timeout: 268 seconds]
11:11:22linuxgemini1 is now known as linuxgemini
11:11:58lflare quits [Ping timeout: 268 seconds]
11:12:07lflare (lflare) joins
11:46:40Aurora joins
11:48:08<Aurora>hi, havent checked on this in a while, just making sure you guys received my 11.3 million jsons, not sure if it went through
11:48:34<Aurora>the tenor ones, i thought i was in that channel, my bad
12:00:00Bleo1826007227196234552220 quits [Quit: The Lounge - https://thelounge.chat]
12:02:46Bleo1826007227196234552220 joins
12:08:15<justauser>Aurora: We definitely got the list. Not doing anything for Tenor for now.
12:08:15<justauser>Nekroschizofrenetyk: Worksforme. Want an AB run?
12:10:07<Aurora>thank you for the confirmation! good luck with the rest of the work
12:11:32<Aurora>wait i just noticed, you said list, but i also uploaded all of the jsons to IA later on
12:11:52<Aurora>https://archive.org/details/tenor-legacyids-json in case you didnt get those
12:11:57<justauser>I don't think we need them for anything.
12:12:34<Aurora>got it, theyre there if you do
12:13:57<justauser>!tell Nekroschizofrenetyk https://www.olawsky.de/schlesien/forum.html works for me. Want an AB run?
12:13:59<eggdrop>[tell] ok, I'll tell Nekroschizofrenetyk when they join next
12:19:05Aurora quits [Client Quit]
12:42:00Ryz2 quits [Quit: Ping timeout (120 seconds)]
12:42:10Ryz2 (Ryz) joins
12:47:01<hexagonwin>could someone please check if https://namu.wiki/raw/Linux loads (on home internet, not vpn or datacenter)?
12:47:23<hexagonwin>i'm working on scraping that site, different IPs are giving very different results
12:56:09etnguyen03 (etnguyen03) joins
13:01:51midou quits [Ping timeout: 272 seconds]
13:02:28midou joins
13:05:56etnguyen03 quits [Client Quit]
13:10:26etnguyen03 (etnguyen03) joins
13:14:46tekulvw (tekulvw) joins
13:16:42<masterx244|m>hCaptcha, then after solving it a site with japanese text
13:16:42<masterx244|m>germany, Telekom as ISP
13:17:28<kline>works here, also hcaptchas
13:17:37<kline>korean text though
13:17:51<IDK>hcaptcha with korean text
13:18:04<IDK>sweden, Telia as ISP
13:19:37tekulvw quits [Ping timeout: 268 seconds]
13:21:55<cruller>hCaptcha doesn't appear for me
13:22:04<cruller>Japan, IIJ as ISP
13:23:03<hexagonwin>thanks. are you all getting the 'raw' text or the login page?
13:23:22<hexagonwin>i don't have access to any residential internet outside korea so i tried archive.is, i get a login page there
13:24:40<cruller>(In incognito mode, hcaptcha appeared.)
13:24:49<hexagonwin>interesting..
13:25:39<cruller>I can get 'raw' text in default mode.
13:25:42<IDK>hexagonwin: used vpn gate to connect to a KDDI server, got login page, no hcaptcha
13:26:00<hexagonwin>cruller may i ask what device/browser you used?
13:26:58<cruller>Win11, ungoogled-chromium 144.0.7559.167
13:27:08<hexagonwin>thanks
13:30:30<cruller>IIRC, when I first opened it with the default profile, Cloudflare appeared.
13:36:59Arcorann__ (Arcorann) joins
13:38:29DlugasnyPL joins
13:40:29Arcorann_ quits [Ping timeout: 272 seconds]
13:42:17<masterx244|m>was raw text for me, too
13:42:40<@arkiver>hexagonwin: if it is going away, and it's possible to archive with ArchiveBot, please also archive it with ArchiveBot
13:43:03UwU quits [Quit: bye]
13:43:06<hexagonwin>arkiver: not going away anytime soon (hopefully), and it's impossible to download with archivebot
13:43:18<hexagonwin>it'll need a full automated browser (lol)
13:43:29<@arkiver>alright :)
13:43:45UwU joins
13:43:51<hexagonwin>this captcha thing looks unsolvable for now so idk how it'll go though
13:47:40kansei (kansei) joins
13:47:51<justauser>I think it's already listed as a #Y candidate?
13:51:16<hexagonwin>justauser i don't really think that's necessary
13:51:50<hexagonwin>it surely can be split into work items, it's just that it needs an automated web browser (and captcha solving technique for some pages)
13:52:18<hexagonwin>i already successfully got the list of all the (normal?) documents
13:53:29<justauser>If at least some IPs don't present a CAPTCHA, it could be a good candidate for running in distributed way.
13:53:32Arcorann__ quits [Ping timeout: 268 seconds]
13:53:38<justauser>But I didn't add it to the list.
13:54:01<justauser>Manu did.
13:55:17<hexagonwin>the rendered html page and diff between each revision can be scraped without captcha, but the raw page needs it (cf turnstile or hcaptcha)
13:55:50<hexagonwin>since all the raw pages can be recreated later by applying the diff to the very first revision.. the biggest problem is getting the first revision of all documents
13:57:12<cruller>FWIW, https://namu.moe/, a mirror site for Namuwiki, was restored a few days ago.
13:57:26<hexagonwin>yeah, it's been working for a while now
13:57:28<cruller>See also https://en.namu.wiki/w/%EB%82%98%EB%AC%B4%EB%AA%A8%EC%97%90%20%EB%AF%B8%EB%9F%AC
13:57:39<hexagonwin>btw here's the list of documents https://transfer.archivete.am/1N5s7/namuwiki_doculist_260221.xz
13:57:39<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/1N5s7/namuwiki_doculist_260221.xz
13:58:22<hexagonwin>damn the english machine translation is absolutely terrible lol
14:04:33oxtyped quits [Ping timeout: 272 seconds]
14:09:59UwU quits [Client Quit]
14:10:37UwU joins
14:10:40oxtyped joins
14:29:07lflare is now known as RJHacker79931
14:29:09lflare (lflare) joins
14:29:15RJHacker79931 quits [Ping timeout: 272 seconds]
14:32:05DlugasnyPL quits [Quit: Leaving]
14:39:11Wohlstand (Wohlstand) joins
14:57:33nexussfan (nexussfan) joins
15:12:15etnguyen03 quits [Client Quit]
15:15:09<h2ibot>Bzc6p edited Philips (+0, typo): https://wiki.archiveteam.org/?diff=60533&oldid=60489
15:17:13etnguyen03 (etnguyen03) joins
16:02:21midou quits [Ping timeout: 272 seconds]
16:04:06midou joins