| 00:18:42 | | Wohlstand (Wohlstand) joins |
| 00:23:43 | | Wohlstand quits [Ping timeout: 272 seconds] |
| 00:48:06 | <klea> | btw, it'd be neat to have a system to track planned and unplanned outages in systems that AT uses. |
| 00:54:14 | <klea> | /cc DigitalDragons ^ |
| 01:06:23 | | pika joins |
| 01:09:58 | | SootBector quits [Remote host closed the connection] |
| 01:11:04 | | SootBector (SootBector) joins |
| 01:16:55 | | pika quits [Ping timeout: 272 seconds] |
| 01:17:38 | | Suika_ quits [Ping timeout: 256 seconds] |
| 01:24:35 | | pika joins |
| 01:25:41 | | pokechu22 quits [Quit: WeeChat 4.7.1] |
| 01:26:08 | | pokechu22 (pokechu22) joins |
| 01:26:33 | | Suika joins |
| 01:29:35 | | pika quits [Ping timeout: 272 seconds] |
| 01:37:50 | | pika joins |
| 01:42:53 | | pika quits [Ping timeout: 272 seconds] |
| 01:49:15 | | Sk1d quits [Read error: Connection reset by peer] |
| 01:51:30 | | pika joins |
| 01:51:51 | | pika leaves |
| 01:59:47 | | cyan_box joins |
| 02:04:06 | | cyanbox_ quits [Ping timeout: 256 seconds] |
| 02:36:51 | | MrMcNuggets (MrMcNuggets) joins |
| 03:06:39 | | etnguyen03 (etnguyen03) joins |
| 03:18:10 | | etnguyen03 quits [Remote host closed the connection] |
| 03:19:47 | | etnguyen03 (etnguyen03) joins |
| 03:21:38 | | etnguyen03 quits [Remote host closed the connection] |
| 03:22:47 | | etnguyen03 (etnguyen03) joins |
| 03:38:21 | | etnguyen03 quits [Client Quit] |
| 03:46:06 | | CYBERDEV quits [Quit: Leaving] |
| 03:46:16 | | etnguyen03 (etnguyen03) joins |
| 03:54:55 | | CYBERDEV joins |
| 03:59:21 | | etnguyen03 quits [Remote host closed the connection] |
| 04:27:37 | | chrismeller3 quits [Quit: chrismeller3] |
| 05:02:38 | | Webuser290291 joins |
| 05:02:55 | | Webuser290291 quits [Client Quit] |
| 05:04:14 | | wotd quits [Remote host closed the connection] |
| 05:04:18 | | n9nes quits [Ping timeout: 256 seconds] |
| 05:04:47 | | wotd joins |
| 05:07:05 | | n9nes joins |
| 05:14:25 | | chunkynutz60 quits [Ping timeout: 272 seconds] |
| 05:17:20 | | ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
| 05:18:00 | | ThetaDev joins |
| 05:22:08 | <HP_Archivist> | I just noticed this: https://www.reddit.com/r/Archivists/comments/1q9n5nt/nara_is_shutting_down_history_hub_for_citizen/ |
| 05:22:26 | <HP_Archivist> | https://historyhub.history.gov/citizen_archivists/f/discussions |
| 05:22:29 | <HP_Archivist> | Can we grab it? |
| 05:23:12 | <pokechu22> | "On January 15, 2026 the History Hub site will be “frozen in time.” The site will remain available for reference until February 13, 2026." |
| 05:23:57 | <HP_Archivist> | Does that mean it will be frozen in time and online or? |
| 05:24:16 | | Webuser025436 joins |
| 05:24:18 | <pokechu22> | Presumably that means they made it read-only yesterday and it'll be online for a month before they close it fully |
| 05:24:19 | <nicolas17> | sounds like it will be frozen and online between jan 15 and feb 13, and offline after feb 13 |
| 05:24:29 | <pokechu22> | I'm seeing incapsula on it |
| 05:24:34 | <@arkiver> | i've been working on a job in AB |
| 05:24:34 | <nicolas17> | pls add to deathwatch |
| 05:24:38 | <@arkiver> | but not sure if it went well |
| 05:24:57 | <pokechu22> | It didn't work |
| 05:26:00 | <HP_Archivist> | Hm.Throw it into AB now? |
| 05:26:13 | <pokechu22> | It seems like the incapsula JS challenge would need to be solved, and I don't know how long that lasts |
| 05:26:48 | <Webuser025436> | Hi. Ming Pao Canada, a hk-based newspaper that has a Canada edition and daily newspaper, announced they are shutting down as of today. [1] Notably, they have an archive of all articles Ming Pao Hong Kong (the HK edition). All of Ming Pao HK's articles pre-2021 have been removed a long time ago from the internet because of the media situation in HK. |
| 05:26:49 | <Webuser025436> | Is it possible to archive? https://www.mingpaocanada.com/ |
| 05:26:49 | <Webuser025436> | [1]: https://www.cp24.com/news/canada/2026/01/13/ex-journalists-lament-closure-of-ming-pao-canadas-last-chinese-language-daily-paper/ |
| 05:28:13 | <h2ibot> | Pokechu22 edited Deathwatch (+237, /* 2026-02 */ https://historyhub.history.gov/): https://wiki.archiveteam.org/?diff=60210&oldid=60209 |
| 05:28:30 | <pokechu22> | Webuser025436: I believe we've already started an archivebot job for that; I'm going to double-check the status of it |
| 05:29:59 | <pokechu22> | Webuser025436: It's currently running, together with http://mingshengbao.com/ - see http://archivebot.com/?initialFilter=mingpaocanada#log-container-6cnonr9bi6enxit9na57wckan |
| 05:30:55 | <Webuser025436> | Many thanks 🙏 |
| 05:31:15 | <pokechu22> | Where is the archive of the pre-2021 Hong Kong articles? I can't read Chinese so I'd like to make sure we're saving that too |
| 05:32:54 | <nicolas17> | pokechu22: wonder if we should speed up that job |
| 05:33:28 | <pokechu22> | It closes on the end of January; today was date of the last edition being published |
| 05:33:44 | <pokechu22> | (at least according to https://www.cp24.com/news/canada/2026/01/13/ex-journalists-lament-closure-of-ming-pao-canadas-last-chinese-language-daily-paper/) |
| 05:37:03 | <Webuser025436> | pokechu22 All HK articles for a given day are shown here: https://www.mingpaocanada.com/tor/htm/News/YYYYMMDD/HK-GAindex_r.htm |
| 05:37:03 | <Webuser025436> | So for example: https://www.mingpaocanada.com/tor/htm/News/20140710/HK-GAindex_r.htm |
| 05:37:03 | <Webuser025436> | The earliest AFAICT is 20140710. |
| 05:38:53 | <pokechu22> | Is there a page that lists all of the previous ones? I assume there must be because archivebot has found https://www.mingpaocanada.com/tor/htm/News/20220429/tam1_r.htm but I don't know exactly where it came from |
| 05:40:12 | <pokechu22> | (there's https://www.mingpaocanada.com/tor/htm/Responsive/archiveList.cfm but that seems to only directly show the last week) |
| 05:40:19 | <nicolas17> | pokechu22: also I'm seeing many requests like https://www.mingpaocanada.com/tor/htm/News/20220815/TD/TD/tdc1.txt that redirect to an error page, might be a crawling glitch finding garbage in JS or something? |
| 05:41:58 | <nicolas17> | ah yes, docPath: "HK-GA/gc/gcc1.txt" |
| 05:41:59 | <pokechu22> | Yeah, looks like that comes from https://www.mingpaocanada.com/tor/htm/News/20220815/HK-gaa1_r.htm containing a POST to /Tor/cfc/popular_addone.cfc with HK-GA/ga/gaa1.txt as a parameter |
| 05:42:44 | <nicolas17> | not sure how to avoid this, excluding *.txt feels too broad |
| 05:43:39 | <pokechu22> | It's probably fine to just leave them as-is since there's 1 per article and most articles have several images as well |
| 05:44:25 | <nicolas17> | well it's also following the redirect and saving errorpage.html every single time |
| 05:45:21 | <pokechu22> | Looks like that's not a new issue: https://web.archive.org/web/20260901000000*/https://www.mingpaocanada.com/errorpage.html :) |
| 05:45:36 | <pokechu22> | ... ok, though 10660 snapshots on January 16 is probably still excessive |
| 05:46:26 | <nicolas17> | pain |
| 05:50:09 | <pokechu22> | I guess I can check what dates it's already found using ab2f |
| 05:55:16 | <nicolas17> | I was thinking something like tor/htm/News/[0-9]{8}/[A-Z]+/[A-Z]+/[a-z]+[0-9]\.txt |
| 05:55:27 | <nicolas17> | but that's not exhaustive, will need a few more patterns |
| 05:56:12 | | sec^nd quits [Remote host closed the connection] |
| 05:56:38 | | sec^nd (second) joins |
| 05:59:31 | | evergreen56 joins |
| 06:02:06 | | evergreen5 quits [Ping timeout: 256 seconds] |
| 06:02:06 | | evergreen56 is now known as evergreen5 |
| 06:04:34 | <pokechu22> | The oldest archivebot has found so far is http://www.mingpaocanada.com/TOR/htm/News/20220319/HK-GAindex_r.htm |
| 06:05:09 | <pokechu22> | JAA: can you trace http://www.mingpaocanada.com/TOR/htm/News/20220319/HK-GAindex_r.htm on 6cnonr9bi6enxit9na57wckan please? |
| 06:06:09 | <Webuser025436> | is the link i provided not good enough above? this link contains outlinks to all hk articles for a given day: https://www.mingpaocanada.com/tor/htm/News/20140710/HK-GAindex_r.htm |
| 06:07:00 | <pokechu22> | It is, but now I'm trying to figure out if archivebot will have already found those or if I need to start the job in a way that will discover those |
| 06:11:25 | <pokechu22> | (there isn't any good way to add urls to an existing archivebot job, but I could start a new one with a list of those pages for all days since 2014 or similar, along with https://www.mingpaocanada.com/van/htm/News/20260116/VAindex_r.htm and https://www.mingpaocanada.com/tor/htm/News/20260116/TAindex_r.htm) |
| 06:14:47 | | chunkynutz60 joins |
| 06:17:24 | | nexussfan quits [Quit: Konversation terminated!] |
| 07:32:26 | | chrismeller3 (chrismeller) joins |
| 09:00:02 | | midou quits [Ping timeout: 256 seconds] |
| 09:20:45 | | midou joins |
| 09:27:48 | | midou quits [Ping timeout: 256 seconds] |
| 09:41:56 | | midou joins |
| 09:46:45 | | midou quits [Ping timeout: 272 seconds] |
| 09:52:45 | | midou joins |
| 10:23:29 | | midou quits [Ping timeout: 272 seconds] |
| 10:43:05 | <h2ibot> | KleaBot made 2 bot changes: https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters[]=nsInvert&wpfilters[]=associated |
| 10:43:34 | <klea> | mhmm |
| 10:44:14 | <klea> | oh i love that my terminal thinks the url is shorter. |
| 10:44:37 | <klea> | so my browser opened <https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters> instead |
| 10:45:30 | | Dada joins |
| 11:14:24 | | midou joins |
| 11:20:00 | <alexlehm> | the same happens in my irc client, it does not consider [] as a valid url character |
| 11:22:15 | <klea> | time to urlencode it, or wrap it in <> |
| 11:22:25 | <klea> | alexlehm: how does <https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters[]=nsInvert&wpfilters[]=associated> open? |
| 11:23:28 | <alexlehm> | "https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters" |
| 11:24:14 | <klea> | alexlehm: and KleaBot made 2 bot changes: https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters%5B%5D=nsInvert&wpfilters%5B%5D=associated |
| 11:24:42 | <alexlehm> | it would probably work with https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260117104222&limit=2&namespace=2&wpfilters%5B%5D=nsInvert&wpfilters%5B%5D=associated |
| 11:30:46 | | ArchivalEfforts quits [Ping timeout: 256 seconds] |
| 11:30:58 | | ArchivalEfforts joins |
| 11:31:51 | | midou quits [Read error: Connection reset by peer] |
| 11:34:08 | | HP_Archivist quits [Quit: Leaving] |
| 11:35:05 | <Juest> | hexchat processes the url fine |
| 11:38:51 | | Doomaholic quits [Ping timeout: 272 seconds] |
| 11:39:07 | <alexlehm> | : could also be a stop character |