00:18:42Wohlstand (Wohlstand) joins
00:23:43Wohlstand quits [Ping timeout: 272 seconds]
00:48:06<klea>btw, it'd be neat to have a system to track planned and unplanned outages in systems that AT uses.
00:54:14<klea>/cc DigitalDragons ^
01:06:23pika joins
01:09:58SootBector quits [Remote host closed the connection]
01:11:04SootBector (SootBector) joins
01:16:55pika quits [Ping timeout: 272 seconds]
01:17:38Suika_ quits [Ping timeout: 256 seconds]
01:24:35pika joins
01:25:41pokechu22 quits [Quit: WeeChat 4.7.1]
01:26:08pokechu22 (pokechu22) joins
01:26:33Suika joins
01:29:35pika quits [Ping timeout: 272 seconds]
01:37:50pika joins
01:42:53pika quits [Ping timeout: 272 seconds]
01:49:15Sk1d quits [Read error: Connection reset by peer]
01:51:30pika joins
01:51:51pika leaves
01:59:47cyan_box joins
02:04:06cyanbox_ quits [Ping timeout: 256 seconds]
02:36:51MrMcNuggets (MrMcNuggets) joins
03:06:39etnguyen03 (etnguyen03) joins
03:18:10etnguyen03 quits [Remote host closed the connection]
03:19:47etnguyen03 (etnguyen03) joins
03:21:38etnguyen03 quits [Remote host closed the connection]
03:22:47etnguyen03 (etnguyen03) joins
03:38:21etnguyen03 quits [Client Quit]
03:46:06CYBERDEV quits [Quit: Leaving]
03:46:16etnguyen03 (etnguyen03) joins
03:54:55CYBERDEV joins
03:59:21etnguyen03 quits [Remote host closed the connection]
04:27:37chrismeller3 quits [Quit: chrismeller3]
05:02:38Webuser290291 joins
05:02:55Webuser290291 quits [Client Quit]
05:04:14wotd quits [Remote host closed the connection]
05:04:18n9nes quits [Ping timeout: 256 seconds]
05:04:47wotd joins
05:07:05n9nes joins
05:14:25chunkynutz60 quits [Ping timeout: 272 seconds]
05:17:20ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
05:18:00ThetaDev joins
05:22:08<HP_Archivist>I just noticed this: https://www.reddit.com/r/Archivists/comments/1q9n5nt/nara_is_shutting_down_history_hub_for_citizen/
05:22:26<HP_Archivist>https://historyhub.history.gov/citizen_archivists/f/discussions
05:22:29<HP_Archivist>Can we grab it?
05:23:12<pokechu22>"On January 15, 2026 the History Hub site will be “frozen in time.” The site will remain available for reference until February 13, 2026."
05:23:57<HP_Archivist>Does that mean it will be frozen in time and online or?
05:24:16Webuser025436 joins
05:24:18<pokechu22>Presumably that means they made it read-only yesterday and it'll be online for a month before they close it fully
05:24:19<nicolas17>sounds like it will be frozen and online between jan 15 and feb 13, and offline after feb 13
05:24:29<pokechu22>I'm seeing incapsula on it
05:24:34<@arkiver>i've been working on a job in AB
05:24:34<nicolas17>pls add to deathwatch
05:24:38<@arkiver>but not sure if it went well
05:24:57<pokechu22>It didn't work
05:26:00<HP_Archivist>Hm.Throw it into AB now?
05:26:13<pokechu22>It seems like the incapsula JS challenge would need to be solved, and I don't know how long that lasts
05:26:48<Webuser025436>Hi. Ming Pao Canada, a hk-based newspaper that has a Canada edition and daily newspaper, announced they are shutting down as of today. [1] Notably, they have an archive of all articles Ming Pao Hong Kong (the HK edition). All of Ming Pao HK's articles pre-2021 have been removed a long time ago from the internet because of the media situation in HK.
05:26:49<Webuser025436>Is it possible to archive? https://www.mingpaocanada.com/
05:26:49<Webuser025436>[1]: https://www.cp24.com/news/canada/2026/01/13/ex-journalists-lament-closure-of-ming-pao-canadas-last-chinese-language-daily-paper/
05:28:13<h2ibot>Pokechu22 edited Deathwatch (+237, /* 2026-02 */ https://historyhub.history.gov/): https://wiki.archiveteam.org/?diff=60210&oldid=60209
05:28:30<pokechu22>Webuser025436: I believe we've already started an archivebot job for that; I'm going to double-check the status of it
05:29:59<pokechu22>Webuser025436: It's currently running, together with http://mingshengbao.com/ - see http://archivebot.com/?initialFilter=mingpaocanada#log-container-6cnonr9bi6enxit9na57wckan
05:30:55<Webuser025436>Many thanks 🙏
05:31:15<pokechu22>Where is the archive of the pre-2021 Hong Kong articles? I can't read Chinese so I'd like to make sure we're saving that too
05:32:54<nicolas17>pokechu22: wonder if we should speed up that job
05:33:28<pokechu22>It closes on the end of January; today was date of the last edition being published
05:33:44<pokechu22>(at least according to https://www.cp24.com/news/canada/2026/01/13/ex-journalists-lament-closure-of-ming-pao-canadas-last-chinese-language-daily-paper/)
05:37:03<Webuser025436>pokechu22 All HK articles for a given day are shown here: https://www.mingpaocanada.com/tor/htm/News/YYYYMMDD/HK-GAindex_r.htm
05:37:03<Webuser025436>So for example: https://www.mingpaocanada.com/tor/htm/News/20140710/HK-GAindex_r.htm
05:37:03<Webuser025436>The earliest AFAICT is 20140710.
05:38:53<pokechu22>Is there a page that lists all of the previous ones? I assume there must be because archivebot has found https://www.mingpaocanada.com/tor/htm/News/20220429/tam1_r.htm but I don't know exactly where it came from
05:40:12<pokechu22>(there's https://www.mingpaocanada.com/tor/htm/Responsive/archiveList.cfm but that seems to only directly show the last week)
05:40:19<nicolas17>pokechu22: also I'm seeing many requests like https://www.mingpaocanada.com/tor/htm/News/20220815/TD/TD/tdc1.txt that redirect to an error page, might be a crawling glitch finding garbage in JS or something?
05:41:58<nicolas17>ah yes, docPath: "HK-GA/gc/gcc1.txt"
05:41:59<pokechu22>Yeah, looks like that comes from https://www.mingpaocanada.com/tor/htm/News/20220815/HK-gaa1_r.htm containing a POST to /Tor/cfc/popular_addone.cfc with HK-GA/ga/gaa1.txt as a parameter
05:42:44<nicolas17>not sure how to avoid this, excluding *.txt feels too broad
05:43:39<pokechu22>It's probably fine to just leave them as-is since there's 1 per article and most articles have several images as well
05:44:25<nicolas17>well it's also following the redirect and saving errorpage.html every single time
05:45:21<pokechu22>Looks like that's not a new issue: https://web.archive.org/web/20260901000000*/https://www.mingpaocanada.com/errorpage.html :)
05:45:36<pokechu22>... ok, though 10660 snapshots on January 16 is probably still excessive
05:46:26<nicolas17>pain
05:50:09<pokechu22>I guess I can check what dates it's already found using ab2f
05:55:16<nicolas17>I was thinking something like tor/htm/News/[0-9]{8}/[A-Z]+/[A-Z]+/[a-z]+[0-9]\.txt
05:55:27<nicolas17>but that's not exhaustive, will need a few more patterns
05:56:12sec^nd quits [Remote host closed the connection]
05:56:38sec^nd (second) joins
05:59:31evergreen56 joins
06:02:06evergreen5 quits [Ping timeout: 256 seconds]
06:02:06evergreen56 is now known as evergreen5
06:04:34<pokechu22>The oldest archivebot has found so far is http://www.mingpaocanada.com/TOR/htm/News/20220319/HK-GAindex_r.htm
06:05:09<pokechu22>JAA: can you trace http://www.mingpaocanada.com/TOR/htm/News/20220319/HK-GAindex_r.htm on 6cnonr9bi6enxit9na57wckan please?
06:06:09<Webuser025436>is the link i provided not good enough above? this link contains outlinks to all hk articles for a given day: https://www.mingpaocanada.com/tor/htm/News/20140710/HK-GAindex_r.htm
06:07:00<pokechu22>It is, but now I'm trying to figure out if archivebot will have already found those or if I need to start the job in a way that will discover those
06:11:25<pokechu22>(there isn't any good way to add urls to an existing archivebot job, but I could start a new one with a list of those pages for all days since 2014 or similar, along with https://www.mingpaocanada.com/van/htm/News/20260116/VAindex_r.htm and https://www.mingpaocanada.com/tor/htm/News/20260116/TAindex_r.htm)
06:14:47chunkynutz60 joins
06:17:24nexussfan quits [Quit: Konversation terminated!]
07:32:26chrismeller3 (chrismeller) joins
09:00:02midou quits [Ping timeout: 256 seconds]
09:20:45midou joins
09:27:48midou quits [Ping timeout: 256 seconds]
09:41:56midou joins
09:46:45midou quits [Ping timeout: 272 seconds]
09:52:45midou joins
10:23:29midou quits [Ping timeout: 272 seconds]
10:43:05<h2ibot>KleaBot made 2 bot changes: https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters[]=nsInvert&wpfilters[]=associated
10:43:34<klea>mhmm
10:44:14<klea>oh i love that my terminal thinks the url is shorter.
10:44:37<klea>so my browser opened <https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters> instead
10:45:30Dada joins
11:14:24midou joins
11:20:00<alexlehm>the same happens in my irc client, it does not consider [] as a valid url character
11:22:15<klea>time to urlencode it, or wrap it in <>
11:22:25<klea>alexlehm: how does <https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters[]=nsInvert&wpfilters[]=associated> open?
11:23:28<alexlehm>"https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters"
11:24:14<klea>alexlehm: and KleaBot made 2 bot changes: https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters%5B%5D=nsInvert&wpfilters%5B%5D=associated
11:24:42<alexlehm>it would probably work with https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters%5B%5D=nsInvert&wpfilters%5B%5D=associated
11:30:46ArchivalEfforts quits [Ping timeout: 256 seconds]
11:30:58ArchivalEfforts joins
11:31:51midou quits [Read error: Connection reset by peer]
11:34:08HP_Archivist quits [Quit: Leaving]
11:35:05<Juest>hexchat processes the url fine
11:38:51Doomaholic quits [Ping timeout: 272 seconds]
11:39:07<alexlehm>: could also be a stop character