| 00:12:04 | | hackbug quits [Remote host closed the connection] |
| 00:14:54 | | tekulvw (tekulvw) joins |
| 00:16:27 | | hackbug (hackbug) joins |
| 00:19:14 | | Dada quits [Remote host closed the connection] |
| 00:22:00 | | tekulvw quits [Ping timeout: 268 seconds] |
| 00:27:32 | | tekulvw (tekulvw) joins |
| 00:32:37 | | tekulvw quits [Ping timeout: 272 seconds] |
| 00:36:52 | | tekulvw (tekulvw) joins |
| 00:36:52 | <klea> | https://discourse.nixos.org/t/garbage-collecting-cache-nixos-org/74249/10 |
| 00:37:05 | <klea> | Tonight we enabled Bucket Versioning and configured a Lifecycle Rule to delete the non-default object version after 30 days. See enable bucket versioning · NixOS/infra@9cf1919 · GitHub for details. |
| 00:37:05 | <klea> | Then we deleted everything contained in the following datasets. This should amount to roughly 100 TiB, so something like 10% of the total S3 size. |
| 00:37:05 | <klea> | datasets/narinfos-nixos-images-2026-01-06T01-13Z.parquet · brianmcgee/nix-cache-dataset at main |
| 00:37:05 | <klea> | datasets/narinfos-nixos-images-dangling-refs-2026-01-06T01-13Z.parquet · brianmcgee/nix-cache-dataset at main |
| 00:37:05 | <klea> | The result is that the default version served by cache.nixos.org will return a HTTP 404 response. The non-default version is still around and can be restored within the next 30 days should the deletion cause severe issues. |
| 00:40:30 | <klea> | Tho, since it's a cache, shouldn't affect much? |
| 00:41:31 | | @JAA doesn't know enough about how any of that works. |
| 00:41:34 | <klea> | hexa-: is there a way to make sure things for which sources have linkrotted, but there's still the source code in the cache, for the source code to be archived from the cache, or to have the cache avoid losing source code? |
| 00:42:19 | <@JAA> | Does 'we deleted' mean it's already gone from public view and restorable until next month, or is it still publicly accessible until then? |
| 00:43:26 | <klea> | I suppose it means it's not the latest version, and would 404 from cache.nixos.org, but would still work if you do a aws s3 signed request with requester-pays to the bucket asking for the older revision? |
| 00:45:02 | <nicolas17> | I think if a file gets "overwritten" today, the old version will get deleted after 30 days |
| 00:45:09 | <nicolas17> | oh wait |
| 00:45:13 | <nicolas17> | those were different steps |
| 00:46:03 | | tekulvw quits [Ping timeout: 268 seconds] |
| 00:46:17 | <nicolas17> | yeah ok, they "deleted" a bunch of files that they determined through external processes (dangling refs?), but they're still retrievable via versioning |
| 00:46:23 | <nicolas17> | for the next 30 days |
| 00:46:28 | <nicolas17> | maybe not publicly |
| 00:47:37 | <nicolas17> | is the bucket public? |
| 00:48:10 | <klea> | yes, but requires requester-pays. |
| 00:48:38 | <nicolas17> | what's the bucket name? |
| 00:49:44 | <klea> | I suppose cache.nixos.org? |
| 00:50:00 | <nicolas17> | seems it's https://s3.amazonaws.com/nix-cache |
| 00:51:29 | <nicolas17> | listing is disabled anyway |
| 00:51:47 | <klea> | There's also https://s3.amazonaws.com/nix-channels |
| 00:52:05 | <klea> | 2026-02-21 00:51:29 <nicolas17> listing is disabled anyway <- The parquet file (database format) should contain the removed data? |
| 00:52:34 | <nicolas17> | if file listing is disabled, I suspect I also can't say "list versions of file X" |
| 00:52:46 | <nicolas17> | unless the parquet file has the version IDs? |
| 00:52:50 | <klea> | jhttps://blog.erethon.com/blog/2025/07/31/how-nixos-is-built/ was interesting. |
| 00:53:11 | <klea> | I don't know, I should try to check the parquet file. |
| 00:53:32 | <nicolas17> | I have no idea how to read parquet :P |
| 00:53:54 | <nicolas17> | lol @ using huggingface for this |
| 00:54:01 | <klea> | I mean, it's a CDN :p |
| 00:56:07 | <klea> | https://transfer.archivete.am/inline/iWYaN/2026-02-21T00:55:40Z--console.txt |
| 00:59:36 | <klea> | nicolas17: there's a python thing called parquet-tools apparently. |
| 01:00:14 | <klea> | otherwise, no it doesn't seem to include version ids? |
| 01:09:31 | | tekulvw (tekulvw) joins |
| 01:14:25 | | tekulvw quits [Ping timeout: 268 seconds] |
| 01:16:32 | | Wohlstand quits [Quit: Wohlstand] |
| 01:26:51 | | Cupping1285 quits [Quit: bye] |
| 01:27:45 | | Cupping1285 joins |
| 01:39:03 | | Arcorann_ (Arcorann) joins |
| 01:42:10 | | Arcorann quits [Ping timeout: 268 seconds] |
| 02:25:27 | <h2ibot> | Hans5958 created Roblox Groups (+57, Redirected page to [[Roblox#Group Walls…): https://wiki.archiveteam.org/?oldid=60532 |
| 03:22:27 | | tekulvw (tekulvw) joins |
| 03:27:00 | | tekulvw quits [Ping timeout: 268 seconds] |
| 04:02:35 | <pabs> | https://arstechnica.com/tech-policy/2026/02/wikipedia-bans-archive-today-after-site-executed-ddos-and-altered-web-captures/ |
| 04:02:44 | <pabs> | woops, already posted |
| 04:05:14 | | lennier2 quits [Ping timeout: 268 seconds] |
| 04:06:04 | | lennier2 joins |
| 04:14:36 | | etnguyen03 quits [Remote host closed the connection] |
| 04:18:05 | | nexussfan quits [Read error: Connection reset by peer] |
| 04:35:10 | | Bog joins |
| 04:37:29 | | Bog quits [Client Quit] |
| 05:02:50 | | rover joins |
| 05:04:57 | | roverinexile quits [Ping timeout: 272 seconds] |
| 05:04:57 | | n9nes quits [Ping timeout: 272 seconds] |
| 05:05:12 | | n9nes joins |
| 05:16:25 | | tekulvw (tekulvw) joins |
| 05:18:05 | <tmg1|michelson> | a few hours later, opendiary still full of bad responses |
| 05:21:05 | | tekulvw quits [Ping timeout: 268 seconds] |
| 05:40:34 | | Stvkimension11 (Stvkimension11) joins |
| 05:49:03 | | tekulvw (tekulvw) joins |
| 05:51:02 | | Stvkimension11 quits [Client Quit] |
| 05:53:43 | | tekulvw quits [Ping timeout: 272 seconds] |
| 06:02:18 | <steering> | > roughly 100 TiB ... 10% of the total size. |
| 06:02:29 | <steering> | W. A. T. |
| 06:02:49 | <BlankEclair> | a lil chonker |
| 06:19:03 | | midou quits [Ping timeout: 272 seconds] |
| 06:31:59 | | midou joins |
| 06:36:52 | | aliz joins |
| 06:37:16 | | Island quits [Read error: Connection reset by peer] |
| 06:58:57 | | aliz quits [Client Quit] |
| 07:16:19 | <hexa-> | JAA: ig by passing version id for the object |
| 07:18:14 | <hexa-> | the version id for all old objects is null fwiw |
| 07:23:13 | <hexa-> | https://gist.github.com/Mic92/7bcacea70a8babf327e45dc102489445 |
| 07:24:49 | <hexa-> | what got deleted is things we really don't need anymore, like images created for nixos tests |
| 07:24:59 | <hexa-> | and also old installers iirc |
| 07:30:40 | <hexa-> | oh, I think they're not queryable over the fastly cache, likely due to missing permissions |
| 07:33:52 | <hexa-> | or maybe delete markers just shadow everything over the s3 web api, dunno |
| 07:37:11 | | ducky quits [Remote host closed the connection] |
| 07:41:00 | | ducky (ducky) joins |
| 07:41:08 | | SootBector quits [Remote host closed the connection] |
| 07:42:18 | | SootBector (SootBector) joins |
| 07:50:31 | | ducky quits [Remote host closed the connection] |
| 07:54:15 | | ducky (ducky) joins |
| 07:57:13 | | lflare quits [Ping timeout: 272 seconds] |
| 07:58:56 | | lflare (lflare) joins |
| 08:00:38 | | tekulvw (tekulvw) joins |
| 08:04:53 | | ducky_ (ducky) joins |
| 08:05:44 | | tekulvw quits [Ping timeout: 268 seconds] |
| 08:07:21 | | ducky quits [Ping timeout: 272 seconds] |
| 08:09:53 | | ducky_ quits [Ping timeout: 272 seconds] |
| 08:13:41 | | lflare quits [Ping timeout: 272 seconds] |
| 08:15:46 | | lflare (lflare) joins |
| 08:21:45 | | ducky (ducky) joins |
| 08:26:33 | | ducky quits [Remote host closed the connection] |
| 08:28:15 | | lflare quits [Client Quit] |
| 08:38:34 | | lflare (lflare) joins |
| 08:47:44 | | AlsoHP_Archivist joins |
| 08:48:31 | | HP_Archivist quits [Ping timeout: 272 seconds] |
| 09:01:56 | | lflare quits [Client Quit] |
| 09:09:40 | | ducky (ducky) joins |
| 09:27:15 | | Nekroschizofrenetyk joins |
| 09:28:26 | <Nekroschizofrenetyk> | Hi, I'm trying to archive (to IA) some pages off this site but I'm getting 520s and 503s. Could somebody check, if it's me or if it's unarchiveable? https://www.olawsky.de/schlesien/forum.html |
| 09:32:14 | | bilboed08 joins |
| 09:32:32 | | SootBector quits [Remote host closed the connection] |
| 09:32:35 | | sec^nd quits [Remote host closed the connection] |
| 09:33:05 | | sec^nd (second) joins |
| 09:33:43 | | SootBector (SootBector) joins |
| 09:35:07 | | lflare (lflare) joins |
| 09:36:01 | | bilboed0 quits [Ping timeout: 272 seconds] |
| 09:36:10 | | Nekroschizofrenetyk quits [Client Quit] |
| 09:39:11 | | bilboed08 quits [Ping timeout: 272 seconds] |
| 09:50:34 | | lflare quits [Ping timeout: 268 seconds] |
| 09:54:41 | | lflare (lflare) joins |
| 09:56:56 | | SootBector quits [Remote host closed the connection] |
| 09:58:04 | | SootBector (SootBector) joins |
| 10:19:05 | | lflare quits [Ping timeout: 272 seconds] |
| 10:21:43 | | lflare (lflare) joins |
| 10:22:21 | | sec^nd quits [Remote host closed the connection] |
| 10:22:22 | | bilboed0 joins |
| 10:22:47 | | sec^nd (second) joins |
| 10:36:48 | | lennier2_ joins |
| 10:36:49 | | lennier2 quits [Ping timeout: 272 seconds] |
| 10:39:02 | | tekulvw (tekulvw) joins |
| 10:43:47 | | tekulvw quits [Ping timeout: 272 seconds] |
| 10:46:09 | | Dada joins |
| 11:00:52 | | midou quits [Ping timeout: 268 seconds] |
| 11:09:10 | | TheEnbyperor_ quits [Read error: Connection reset by peer] |
| 11:09:29 | | TheEnbyperor_ (TheEnbyperor) joins |
| 11:10:26 | | linuxgemini1 (linuxgemini) joins |
| 11:10:43 | | midou joins |
| 11:11:21 | | linuxgemini quits [Ping timeout: 268 seconds] |
| 11:11:22 | | linuxgemini1 is now known as linuxgemini |
| 11:11:58 | | lflare quits [Ping timeout: 268 seconds] |
| 11:12:07 | | lflare (lflare) joins |
| 11:46:40 | | Aurora joins |
| 11:48:08 | <Aurora> | hi, havent checked on this in a while, just making sure you guys received my 11.3 million jsons, not sure if it went through |
| 11:48:34 | <Aurora> | the tenor ones, i thought i was in that channel, my bad |
| 12:00:00 | | Bleo1826007227196234552220 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:46 | | Bleo1826007227196234552220 joins |
| 12:08:15 | <justauser> | Aurora: We definitely got the list. Not doing anything for Tenor for now. |
| 12:08:15 | <justauser> | Nekroschizofrenetyk: Worksforme. Want an AB run? |
| 12:10:07 | <Aurora> | thank you for the confirmation! good luck with the rest of the work |
| 12:11:32 | <Aurora> | wait i just noticed, you said list, but i also uploaded all of the jsons to IA later on |
| 12:11:52 | <Aurora> | https://archive.org/details/tenor-legacyids-json in case you didnt get those |
| 12:11:57 | <justauser> | I don't think we need them for anything. |
| 12:12:34 | <Aurora> | got it, theyre there if you do |
| 12:13:57 | <justauser> | !tell Nekroschizofrenetyk https://www.olawsky.de/schlesien/forum.html works for me. Want an AB run? |
| 12:13:59 | <eggdrop> | [tell] ok, I'll tell Nekroschizofrenetyk when they join next |
| 12:19:05 | | Aurora quits [Client Quit] |
| 12:42:00 | | Ryz2 quits [Quit: Ping timeout (120 seconds)] |
| 12:42:10 | | Ryz2 (Ryz) joins |
| 12:47:01 | <hexagonwin> | could someone please check if https://namu.wiki/raw/Linux loads (on home internet, not vpn or datacenter)? |
| 12:47:23 | <hexagonwin> | i'm working on scraping that site, different IPs are giving very different results |
| 12:56:09 | | etnguyen03 (etnguyen03) joins |
| 13:01:51 | | midou quits [Ping timeout: 272 seconds] |
| 13:02:28 | | midou joins |
| 13:05:56 | | etnguyen03 quits [Client Quit] |
| 13:10:26 | | etnguyen03 (etnguyen03) joins |
| 13:14:46 | | tekulvw (tekulvw) joins |
| 13:16:42 | <masterx244|m> | hCaptcha, then after solving it a site with japanese text |
| 13:16:42 | <masterx244|m> | germany, Telekom as ISP |
| 13:17:28 | <kline> | works here, also hcaptchas |
| 13:17:37 | <kline> | korean text though |
| 13:17:51 | <IDK> | hcaptcha with korean text |
| 13:18:04 | <IDK> | sweden, Telia as ISP |
| 13:19:37 | | tekulvw quits [Ping timeout: 268 seconds] |
| 13:21:55 | <cruller> | hCaptcha doesn't appear for me |
| 13:22:04 | <cruller> | Japan, IIJ as ISP |
| 13:23:03 | <hexagonwin> | thanks. are you all getting the 'raw' text or the login page? |
| 13:23:22 | <hexagonwin> | i don't have access to any residential internet outside korea so i tried archive.is, i get a login page there |
| 13:24:40 | <cruller> | (In incognito mode, hcaptcha appeared.) |
| 13:24:49 | <hexagonwin> | interesting.. |
| 13:25:39 | <cruller> | I can get 'raw' text in default mode. |
| 13:25:42 | <IDK> | hexagonwin: used vpn gate to connect to a KDDI server, got login page, no hcaptcha |
| 13:26:00 | <hexagonwin> | cruller may i ask what device/browser you used? |
| 13:26:58 | <cruller> | Win11, ungoogled-chromium 144.0.7559.167 |
| 13:27:08 | <hexagonwin> | thanks |
| 13:30:30 | <cruller> | IIRC, when I first opened it with the default profile, Cloudflare appeared. |
| 13:36:59 | | Arcorann__ (Arcorann) joins |
| 13:38:29 | | DlugasnyPL joins |
| 13:40:29 | | Arcorann_ quits [Ping timeout: 272 seconds] |
| 13:42:17 | <masterx244|m> | was raw text for me, too |
| 13:42:40 | <@arkiver> | hexagonwin: if it is going away, and it's possible to archive with ArchiveBot, please also archive it with ArchiveBot |
| 13:43:03 | | UwU quits [Quit: bye] |
| 13:43:06 | <hexagonwin> | arkiver: not going away anytime soon (hopefully), and it's impossible to download with archivebot |
| 13:43:18 | <hexagonwin> | it'll need a full automated browser (lol) |
| 13:43:29 | <@arkiver> | alright :) |
| 13:43:45 | | UwU joins |
| 13:43:51 | <hexagonwin> | this captcha thing looks unsolvable for now so idk how it'll go though |
| 13:47:40 | | kansei (kansei) joins |
| 13:47:51 | <justauser> | I think it's already listed as a #Y candidate? |
| 13:51:16 | <hexagonwin> | justauser i don't really think that's necessary |
| 13:51:50 | <hexagonwin> | it surely can be split into work items, it's just that it needs an automated web browser (and captcha solving technique for some pages) |
| 13:52:18 | <hexagonwin> | i already successfully got the list of all the (normal?) documents |
| 13:53:29 | <justauser> | If at least some IPs don't present a CAPTCHA, it could be a good candidate for running in distributed way. |
| 13:53:32 | | Arcorann__ quits [Ping timeout: 268 seconds] |
| 13:53:38 | <justauser> | But I didn't add it to the list. |
| 13:54:01 | <justauser> | Manu did. |
| 13:55:17 | <hexagonwin> | the rendered html page and diff between each revision can be scraped without captcha, but the raw page needs it (cf turnstile or hcaptcha) |
| 13:55:50 | <hexagonwin> | since all the raw pages can be recreated later by applying the diff to the very first revision.. the biggest problem is getting the first revision of all documents |
| 13:57:12 | <cruller> | FWIW, https://namu.moe/, a mirror site for Namuwiki, was restored a few days ago. |
| 13:57:26 | <hexagonwin> | yeah, it's been working for a while now |
| 13:57:28 | <cruller> | See also https://en.namu.wiki/w/%EB%82%98%EB%AC%B4%EB%AA%A8%EC%97%90%20%EB%AF%B8%EB%9F%AC |
| 13:57:39 | <hexagonwin> | btw here's the list of documents https://transfer.archivete.am/1N5s7/namuwiki_doculist_260221.xz |
| 13:57:39 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/1N5s7/namuwiki_doculist_260221.xz |
| 13:58:22 | <hexagonwin> | damn the english machine translation is absolutely terrible lol |
| 14:04:33 | | oxtyped quits [Ping timeout: 272 seconds] |
| 14:09:59 | | UwU quits [Client Quit] |
| 14:10:37 | | UwU joins |
| 14:10:40 | | oxtyped joins |
| 14:29:07 | | lflare is now authenticated as * |
| 14:29:07 | | lflare is now known as RJHacker79931 |
| 14:29:09 | | lflare (lflare) joins |
| 14:29:15 | | RJHacker79931 quits [Ping timeout: 272 seconds] |
| 14:32:05 | | DlugasnyPL quits [Quit: Leaving] |
| 14:39:11 | | Wohlstand (Wohlstand) joins |
| 14:57:33 | | nexussfan (nexussfan) joins |
| 15:12:15 | | etnguyen03 quits [Client Quit] |
| 15:15:09 | <h2ibot> | Bzc6p edited Philips (+0, typo): https://wiki.archiveteam.org/?diff=60533&oldid=60489 |
| 15:17:13 | | etnguyen03 (etnguyen03) joins |
| 16:02:21 | | midou quits [Ping timeout: 272 seconds] |
| 16:04:06 | | midou joins |