| 00:27:00 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 00:41:58 | | Lord_Nightmare (Lord_Nightmare) joins |
| 00:44:19 | | blankie (blankie) joins |
| 00:53:40 | | Mineroboter_ joins |
| 00:54:35 | | Mineroboter quits [Ping timeout: 258 seconds] |
| 01:10:16 | | HP_Archivist (HP_Archivist) joins |
| 01:14:00 | | HP_Archivist quits [Client Quit] |
| 01:30:38 | | Lord_Nightmare quits [Client Quit] |
| 01:34:21 | | Lord_Nightmare (Lord_Nightmare) joins |
| 01:49:50 | | blankie quits [Remote host closed the connection] |
| 02:04:49 | | Lord_Nightmare quits [Client Quit] |
| 02:06:40 | | blankie joins |
| 02:06:40 | | blankie is now authenticated as blankie |
| 02:06:40 | | blankie quits [Changing host] |
| 02:06:40 | | blankie (blankie) joins |
| 02:07:09 | | Lord_Nightmare (Lord_Nightmare) joins |
| 02:34:20 | | Stilett0 quits [Ping timeout: 250 seconds] |
| 03:04:00 | | HP_Archivist (HP_Archivist) joins |
| 03:20:11 | | DogsRNice quits [Read error: Connection reset by peer] |
| 03:28:37 | | howardad joins |
| 03:43:26 | | blankie quits [Client Quit] |
| 03:44:08 | | blankie (blankie) joins |
| 03:47:06 | | qw3rty__ joins |
| 03:50:55 | | qw3rty_ quits [Ping timeout: 258 seconds] |
| 03:52:15 | | howardad is now authenticated as howardad |
| 04:02:06 | | etnguyen03 quits [Client Quit] |
| 04:19:05 | | adam (howardad) joins |
| 04:22:21 | | adam quits [Client Quit] |
| 04:25:04 | | howardad quits [Remote host closed the connection] |
| 04:26:06 | | howardad (howardad) joins |
| 04:28:23 | | BlueMaxima_ joins |
| 04:32:12 | | BlueMaxima quits [Ping timeout: 250 seconds] |
| 04:37:43 | | blankie quits [Remote host closed the connection] |
| 04:38:04 | | Ruthalas quits [Ping timeout: 258 seconds] |
| 04:38:14 | | blankie joins |
| 04:38:14 | | blankie is now authenticated as blankie |
| 04:38:14 | | blankie quits [Changing host] |
| 04:38:14 | | blankie (blankie) joins |
| 04:43:34 | | Krownest1 (Krownest) joins |
| 04:47:16 | | Krownest quits [Ping timeout: 258 seconds] |
| 04:51:31 | | Ruthalas (Ruthalas) joins |
| 05:04:20 | | lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)] |
| 05:07:37 | | lennier1 (lennier1) joins |
| 05:29:01 | | Krownest1 is now known as Krownest |
| 05:58:10 | | ragu joins |
| 06:03:22 | | blank_x joins |
| 06:03:24 | | blank_x quits [Client Quit] |
| 06:03:33 | | blankie quits [Client Quit] |
| 06:04:00 | | blankie (blankie) joins |
| 06:40:25 | | blankie quits [Remote host closed the connection] |
| 06:49:41 | | blankie joins |
| 06:49:41 | | blankie is now authenticated as blankie |
| 06:49:41 | | blankie quits [Changing host] |
| 06:49:41 | | blankie (blankie) joins |
| 07:14:28 | | ragu quits [Ping timeout: 258 seconds] |
| 07:17:19 | | LeighR (LeighR) joins |
| 07:50:48 | | Arcorann (Arcorann) joins |
| 08:00:19 | | sliccricc_ quits [Remote host closed the connection] |
| 08:00:39 | | sliccricc_ (sliccricc) joins |
| 08:12:14 | | nertzy_ joins |
| 08:13:53 | | nertzy__ quits [Ping timeout: 258 seconds] |
| 08:20:43 | <avoozl> | euhm.. is there a simple command line tool to decompress these .warc.zst files with embedded dictionaries? I know I can read it in code, but I miss a simple cli tool to go through the content |
| 08:23:58 | | BlueMaxima__ joins |
| 08:28:04 | | BlueMaxima_ quits [Ping timeout: 258 seconds] |
| 08:37:01 | | hooway joins |
| 08:39:30 | | hooway_ joins |
| 08:39:31 | | hooway quits [Read error: Connection reset by peer] |
| 08:44:41 | | ragu joins |
| 09:05:29 | | sonick quits [Ping timeout: 244 seconds] |
| 09:25:05 | | Ruthalas quits [Client Quit] |
| 09:25:27 | | Ruthalas (Ruthalas) joins |
| 09:29:38 | | sonick joins |
| 09:38:19 | | NF885 (NF885) joins |
| 09:46:02 | | derleth joins |
| 09:46:47 | <derleth> | hello. |
| 09:46:53 | <Sanqui> | hi |
| 09:47:06 | <derleth> | How do I propose a site to get saved? |
| 09:47:14 | <Sanqui> | you propose |
| 09:47:17 | <Sanqui> | go ahead and propose |
| 09:47:26 | <derleth> | OK, just a second. |
| 09:47:37 | <derleth> | https://www.hoax-slayer.net/notice-hoax-slayer-is-closing-down/ |
| 09:47:52 | <derleth> | Hoax-Slayer, a debunking website, is going away at the end of the month. |
| 09:47:53 | <Sanqui> | looks like a blog basically |
| 09:48:07 | <Sanqui> | we can handle it with archivebot |
| 09:49:13 | <derleth> | It's also this site, https://www.hoax-slayer.com/ |
| 09:49:25 | <derleth> | I think that's where a lot of the debunkings still are. |
| 09:50:01 | <Sanqui> | ah ok, we can do both domains |
| 09:50:12 | <derleth> | Great! |
| 09:50:26 | <Sanqui> | it's in progress -- if you're interested you can watch it at http://dashboard.at.ninjawedding.org/3 -- if not stand back and relax, it'll make its way into the wayback machine in a few days |
| 09:50:44 | <derleth> | Wonderful. That was a lot easier than i thought. Hooray for scripts. |
| 09:50:53 | <derleth> | Thank you. |
| 09:52:16 | <Sanqui> | thanks for bringing it to our attention! |
| 09:52:42 | | derleth quits [Client Quit] |
| 10:03:04 | | hilda quits [Read error: Connection reset by peer] |
| 10:06:46 | | hilda joins |
| 10:43:34 | | PlsNoJava quits [Ping timeout: 250 seconds] |
| 10:52:51 | | PlsNoJava (ROpdebee) joins |
| 11:02:07 | | BlueMaxima__ quits [Client Quit] |
| 11:06:39 | | yarrow (yarrow) joins |
| 11:07:10 | <yarrow> | what's the difference between bs and ot? |
| 11:07:27 | | tixma (tixma) joins |
| 11:08:46 | <@OrIdow6> | See topic |
| 11:08:52 | <Sanqui> | #archiveteam i announcements, -bs is on-topic discussion, -ot is off-topic discussion |
| 11:08:59 | <Sanqui> | I agree the nomenclature isn't exactly straightforward |
| 11:09:09 | | VukkyWork (VukkyWork) joins |
| 11:26:02 | <lunik1> | ddg sure does a lot of work to get search results that still aren't very good |
| 11:29:56 | <LeighR> | what's the correct channel for discussing grab-site? |
| 11:34:39 | | VukkyWork quits [Remote host closed the connection] |
| 11:39:28 | | hooway joins |
| 11:39:28 | | hooway_ quits [Read error: Connection reset by peer] |
| 11:40:55 | <@EggplantN> | probably here or -dev LeighR |
| 11:41:02 | <LeighR> | ok |
| 11:46:18 | | sonick72 joins |
| 11:47:12 | | sonick quits [Ping timeout: 244 seconds] |
| 11:51:20 | | Zopolis4 quits [Ping timeout: 244 seconds] |
| 11:52:08 | | nertzy__ joins |
| 11:54:41 | | nertzy_ quits [Ping timeout: 258 seconds] |
| 12:22:40 | <Jake> | https://techcrunch.com/2021/05/03/private-equity-firm-apollo-agrees-to-buy-verizon-media-assets-for-5-billion/ (sorry if this has already been mentioned, but I didn't see it!) |
| 12:29:43 | <billy549> | mentioned it in #noanswers but it should also be mentioned here |
| 12:43:49 | | sonick72 is now known as sonick |
| 12:49:26 | | etnguyen03 (etnguyen03) joins |
| 13:15:02 | | LeighR quits [Ping timeout: 244 seconds] |
| 13:21:30 | | sonick is now authenticated as sonick |
| 13:22:42 | | sonick_ (sonick) joins |
| 13:23:17 | | sonick leaves |
| 13:25:43 | | sonick_ is now known as sonick |
| 13:39:40 | | Mateon1 quits [Remote host closed the connection] |
| 13:39:56 | | Mateon1 joins |
| 14:00:30 | | Jonboy3451 quits [Read error: Connection reset by peer] |
| 14:03:42 | <@arkiver> | Jake: what a mess |
| 14:09:23 | <Jake> | yup. quite the mess. |
| 14:18:00 | | Jonboy345 joins |
| 14:31:23 | | Starholme quits [Remote host closed the connection] |
| 14:33:31 | <thuban> | JAA: thanks, that sure explains why turning it up didn't seem to make it go faster :) |
| 14:34:49 | | endrift quits [Quit: +++CARRIER LOST+++] |
| 14:34:57 | | endrift joins |
| 14:35:36 | <thuban> | fortunately the data's not gone yet and i think it'll finish today, or even this morning |
| 14:42:46 | | pcr leaves |
| 14:44:38 | | pcr joins |
| 14:49:49 | | Webuser639 joins |
| 14:56:49 | | LeighR (LeighR) joins |
| 15:00:36 | | sliccricc_ quits [Ping timeout: 258 seconds] |
| 15:06:14 | | hooway quits [Read error: Connection reset by peer] |
| 15:06:30 | | hooway joins |
| 15:15:10 | | mutantmnky quits [Ping timeout: 258 seconds] |
| 15:19:28 | | Qub3d (Qub3d) joins |
| 15:20:33 | | Qub3d quits [Client Quit] |
| 15:21:11 | | Qub3d (Qub3d) joins |
| 15:24:52 | | mutantmnky (mutantmonkey) joins |
| 15:25:08 | | Arcorann quits [Ping timeout: 258 seconds] |
| 15:27:14 | | rsn joins |
| 15:29:25 | | second (second) joins |
| 15:30:55 | | NF885 quits [Ping timeout: 244 seconds] |
| 15:31:13 | | sec^nd quits [Ping timeout: 255 seconds] |
| 15:31:14 | | second is now known as sec^nd |
| 15:43:17 | | aphitex22 quits [Quit: The Lounge - https://thelounge.chat] |
| 15:43:44 | | aphitex22 joins |
| 15:57:52 | | NF885 (NF885) joins |
| 15:59:04 | | blankie quits [Remote host closed the connection] |
| 16:00:32 | | Vukky (Vukky) joins |
| 16:30:41 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 16:39:51 | | Qub3d quits [Client Quit] |
| 16:44:48 | | tixma quits [Ping timeout: 244 seconds] |
| 16:55:08 | | NF885 quits [Ping timeout: 244 seconds] |
| 17:02:05 | | Krownest1 (Krownest) joins |
| 17:02:07 | | HP_Archivist (HP_Archivist) joins |
| 17:05:11 | | Krownest quits [Ping timeout: 258 seconds] |
| 17:06:44 | | wickedplayer494 quits [Remote host closed the connection] |
| 17:13:14 | | wickedplayer494 joins |
| 17:13:31 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 17:36:34 | | cmlow quits [Quit: Connection closed for inactivity] |
| 17:36:35 | | Krownest1 is now known as Krownest |
| 17:37:54 | | serx joins |
| 18:01:14 | | Ella joins |
| 18:01:35 | | Ruthalas5 (Ruthalas) joins |
| 18:03:50 | | Ruthalas quits [Ping timeout: 258 seconds] |
| 18:03:50 | | Ruthalas5 is now known as Ruthalas |
| 18:33:05 | | marked joins |
| 18:50:21 | | Webuser639 quits [Ping timeout: 244 seconds] |
| 18:58:40 | <jodizzle> | serx: (from #archiveteam) I threw http://meristation.as.com/zonaforo/ into AB to get something at least, but I agree that AB will not be able to get everything by the deadline. |
| 19:00:52 | <jodizzle> | This would need a different solution to get it all |
| 19:05:25 | | Webuser639 joins |
| 19:08:08 | | tixma joins |
| 19:08:19 | | tixma is now authenticated as tixma |
| 19:23:16 | | DogsRNice (Webuser299) joins |
| 19:32:46 | | Qub3d (Qub3d) joins |
| 19:33:03 | | Qub3d quits [Client Quit] |
| 19:39:38 | <@JAA> | Oh boy. |
| 19:47:15 | | Inhonion joins |
| 19:48:22 | <jodizzle> | yep |
| 19:50:10 | <@JAA> | Hold my beer. |
| 19:57:35 | <LeighR> | You do have a bunch of people who would love to be going fuller-throttle on Y!A... |
| 19:59:30 | <@HCross> | if Yahoo would let us |
| 20:03:17 | <@HCross> | JAA: please no |
| 20:03:27 | <@HCross> | please no qwarc on y!a |
| 20:04:09 | <LeighR> | I mean, people who are probably willing to redirect that energy to whatever JAA comes up with |
| 20:04:33 | <LeighR> | for meristation |
| 20:05:48 | <@JAA> | HCross: I'm looking into MeriStation, not Y!A. |
| 20:05:53 | <@HCross> | ahh |
| 20:06:41 | <LeighR> | I am willing to look closely at logs and tell what I see |
| 20:06:43 | <@JAA> | Have it working but seems very slow. Like 5+ second response time per page. I'll look a bit closer later. |
| 20:41:47 | | lennier1 quits [Client Quit] |
| 20:42:25 | | lennier1 (lennier1) joins |
| 21:34:27 | <Jake> | https://twitter.com/olesovhcom/status/1389330380411523076 Hopefully this one won't catch on fire! |
| 21:40:54 | | nuroten joins |
| 21:45:07 | | serx quits [Remote host closed the connection] |
| 21:48:12 | <nuroten> | hi, I would like to suggest a podcast to consider for archival please, if this is the right channel for suggestions. it's called Hong Kong Connection, a weekly video documentary series by Radio Television Hong Kong (RTHK), a government news outlet. https://podcast.rthk.hk/podcast/item.php?pid=280&lang=en-US its archival extends back to 2010, but |
| 21:48:12 | <nuroten> | there was an announcement today that the agency is moving to remove videos over 1 year old from their site, to align with their Youtube presence |
| 21:49:47 | <nuroten> | that podcast is one of several programmes on the site, but it is arguably one of the more iconic series with historical value, as it has footage of political protests from 2019, etc. |
| 21:51:26 | | tixma quits [Remote host closed the connection] |
| 21:52:02 | <nuroten> | news article about potential removal: https://hongkongfp.com/2021/05/03/hong-kong-broadcaster-rthk-to-delete-shows-over-a-year-old-from-internet-as-viewers-scramble-to-save-backups/ |
| 21:56:58 | <nuroten> | the show has also been subject to episodes being cut recently due to its coverage of issues perceived as sensitive by new leadership at the broadcaster |
| 21:59:36 | <nuroten> | the Chinese version of the series has more videos: https://podcast.rthk.hk/podcast/item.php?pid=244&lang=zh-CN xml feeds with mp4 links: https://podcast.rthk.hk/podcast/hkconnection_en_i.xml (English) https://podcast.rthk.hk/podcast/hongkongconnection_i.xml (Chinese) |
| 22:02:01 | <nuroten> | thanks for considering |
| 22:10:42 | <thuban> | sounds doable. i'm thinking download all video + upload to ia as one item for each language. opinions? do we want anything from the web presence? (there seems to be very little content other than the videos themselves--they don't, e.g., have individual descriptions.) |
| 22:13:09 | <LeighR> | thuban: the xml feeds seem to provide more details for each video than the main site itself |
| 22:13:09 | <G4te_Keep3r> | https://podcast.rthk.hk/podcast/item.php?pid=280&eid=180380&year=2021&lang=en-US |
| 22:13:18 | <G4te_Keep3r> | click description |
| 22:13:58 | <thuban> | there only seems to be a single description for the entire podcast |
| 22:14:30 | <nuroten> | thanks very much! yeah, the main content are the videos themselves, the site is mainly for format context (i.e. it once lived on this page and it had a dropdown menu to select year) |
| 22:16:33 | <G4te_Keep3r> | it was loading blank on 1 video and that on other, had only looked at 2 till your reply. Yea that makes it easier to just need video and title |
| 22:16:59 | <LeighR> | I'd say that right now, any old news programs in HK are in danger of disappearing |
| 22:17:29 | <thuban> | LeighR: thanks, for some reason ff's rss viewer doesn't seem to want to print the details |
| 22:18:07 | <thuban> | that said, a lot of them seem to be cut off; if there's an authoritative source they're being drawn from i'm not sure where it is |
| 22:18:11 | <LeighR> | in the English version, some of the details are stuck inside [CDATA] custom tags |
| 22:19:53 | <nuroten> | LeighR: indeed, especially given the new leadership and the recent report about the agency needing "reform" |
| 22:20:07 | <G4te_Keep3r> | wait, clicking "this episode" right under video does give diff descriptions like this short one. Are the descripts in the feed? |
| 22:20:09 | <G4te_Keep3r> | 返回 |
| 22:20:09 | <G4te_Keep3r> | After the Lockdown |
| 22:20:09 | <G4te_Keep3r> | 2021-04-18 |
| 22:20:09 | <G4te_Keep3r> | The government imposed lockdown in four streets in the Jordan district in late January and conducted mandatory Covid-19 tests on the residents. This episode records this unprecedented operation. |
| 22:21:20 | <thuban> | ooh |
| 22:21:28 | <LeighR> | yes - under itunes:subtitle and itunes:summary, stuck inside [CDATA ] tags |
| 22:21:32 | <thuban> | those are the cut-off descriptions in the feed |
| 22:22:04 | <LeighR> | thuban: you're right |
| 22:22:43 | <G4te_Keep3r> | so need web for full ones or they are all cut off? |
| 22:22:57 | <thuban> | need web |
| 22:23:24 | <thuban> | they seem to be in the source rather than xhr'd from an api, but that's ok |
| 22:23:29 | <thuban> | i can pull those |
| 22:24:22 | <LeighR> | is there anything I can do to help? stuff like trying to grab all of Y!A is fun, but this is actually important. |
| 22:29:30 | | LeighR quits [Client Quit] |
| 22:49:26 | | Webuser639 quits [Remote host closed the connection] |
| 23:01:18 | | Wayward- quits [Ping timeout: 258 seconds] |
| 23:02:49 | <nuroten> | the back episodes of Letter to Hong Kong might be worth saving: https://podcast.rthk.hk/podcast/item.php?pid=162&lang=en-US it's an audio podcast of short audio clips, public figures (politicians from different parties, current affairs critics, academics, etc.) reading a letter addressed to the people. for example, this one is from a former |
| 23:02:49 | <nuroten> | pan-democratic legislator who is now in prison commenting on the events leading up to his trial: https://podcast.rthk.hk/podcast/item.php?pid=162&eid=171170&year=2020&list=1&lang=en-US |
| 23:03:45 | | hooway quits [Client Quit] |
| 23:08:00 | | BlueMaxima joins |
| 23:16:05 | <nuroten> | (background: he was convicted on charges along the lines of obstruction of justice for attempting to reason with a group that rushed into a subway station and started attacking commuters. he got injured while trying to help people on the train. the incident was all over the local news with questions over the official narrative of the incident, and |
| 23:16:06 | <nuroten> | he was later arrested.)) |
| 23:18:20 | | Webuser639 joins |
| 23:22:22 | | nuroten quits [Remote host closed the connection] |
| 23:28:17 | | HP_Archivist quits [Client Quit] |
| 23:35:00 | | nuroten joins |
| 23:35:08 | <@OrIdow6> | arkiver: So I had an idea for how to save Aimix-BBS ("shut down" a few days ago, still up), based on the point that it blocks seemingly everyone but typically takes a few hundred requests to do it |
| 23:35:15 | <@OrIdow6> | I thought it might work to have items be single URLs, and then have the warriors queue on-targeted extracted URLs to backfeed (there should be a few million at most, not a big site) instead of running them themselves |
| 23:35:17 | <@OrIdow6> | This would basically turn it into a big, distributed ArchiveBot, where you have to make a project update instead of give a command in IRC to add or remove an ignore |
| 23:36:51 | <thuban> | nuroten: ok, i'm currently pulling thumbnails + videos + descriptions and formatting metadata in a way IA will understand |
| 23:37:10 | <thuban> | downloading will take a while, but everything will be set to upload once it's finished! |
| 23:37:24 | <nuroten> | thuban: thanks so much! :) |
| 23:37:41 | <thuban> | no problem! |
| 23:37:47 | <thuban> | someone remind me to update this every six months or something |
| 23:38:09 | <thuban> | i'll take a look at _letter to hong kong_ later |
| 23:39:12 | <nuroten> | if I had to choose only one thing to be saved from the RTHK website, that would be it (HK Connection). it's one of the most recognised local shows. |
| 23:40:09 | <thuban> | i was going to suggest earlier that if they also delete old material from their youtube--https://www.youtube.com/channel/UC6of7UYhctnYmqABjUqzuxw--that might be worth looking into as well, but it seems like a _lot_ of content. did our news project ever extend to video? |
| 23:41:53 | <nuroten> | thanks! anything you can save from either would be great, the announcement seems to apply to anything in their archive older than 1 year, but if I had to guess, current affairs shows would be on the main chopping board |
| 23:47:51 | | hupool joins |
| 23:55:42 | <@arkiver> | OrIdow6: sounds good with me |
| 23:56:44 | <@arkiver> | but |
| 23:57:06 | <@arkiver> | JAA: didnt we have some setup with a ton of IPs that could be used for those purposes? |
| 23:57:12 | <@arkiver> | OrIdow6: how permanent are the band |