00:27:00Lord_Nightmare quits [Quit: ZNC - http://znc.in]
00:41:58Lord_Nightmare (Lord_Nightmare) joins
00:44:19blankie (blankie) joins
00:53:40Mineroboter_ joins
00:54:35Mineroboter quits [Ping timeout: 258 seconds]
01:10:16HP_Archivist (HP_Archivist) joins
01:14:00HP_Archivist quits [Client Quit]
01:30:38Lord_Nightmare quits [Client Quit]
01:34:21Lord_Nightmare (Lord_Nightmare) joins
01:49:50blankie quits [Remote host closed the connection]
02:04:49Lord_Nightmare quits [Client Quit]
02:06:40blankie joins
02:06:40blankie quits [Changing host]
02:06:40blankie (blankie) joins
02:07:09Lord_Nightmare (Lord_Nightmare) joins
02:34:20Stilett0 quits [Ping timeout: 250 seconds]
03:04:00HP_Archivist (HP_Archivist) joins
03:20:11DogsRNice quits [Read error: Connection reset by peer]
03:28:37howardad joins
03:43:26blankie quits [Client Quit]
03:44:08blankie (blankie) joins
03:47:06qw3rty__ joins
03:50:55qw3rty_ quits [Ping timeout: 258 seconds]
04:02:06etnguyen03 quits [Client Quit]
04:19:05adam (howardad) joins
04:22:21adam quits [Client Quit]
04:25:04howardad quits [Remote host closed the connection]
04:26:06howardad (howardad) joins
04:28:23BlueMaxima_ joins
04:32:12BlueMaxima quits [Ping timeout: 250 seconds]
04:37:43blankie quits [Remote host closed the connection]
04:38:04Ruthalas quits [Ping timeout: 258 seconds]
04:38:14blankie joins
04:38:14blankie quits [Changing host]
04:38:14blankie (blankie) joins
04:43:34Krownest1 (Krownest) joins
04:47:16Krownest quits [Ping timeout: 258 seconds]
04:51:31Ruthalas (Ruthalas) joins
05:04:20lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)]
05:07:37lennier1 (lennier1) joins
05:29:01Krownest1 is now known as Krownest
05:58:10ragu joins
06:03:22blank_x joins
06:03:24blank_x quits [Client Quit]
06:03:33blankie quits [Client Quit]
06:04:00blankie (blankie) joins
06:40:25blankie quits [Remote host closed the connection]
06:49:41blankie joins
06:49:41blankie quits [Changing host]
06:49:41blankie (blankie) joins
07:14:28ragu quits [Ping timeout: 258 seconds]
07:17:19LeighR (LeighR) joins
07:50:48Arcorann (Arcorann) joins
08:00:19sliccricc_ quits [Remote host closed the connection]
08:00:39sliccricc_ (sliccricc) joins
08:12:14nertzy_ joins
08:13:53nertzy__ quits [Ping timeout: 258 seconds]
08:20:43<avoozl>euhm.. is there a simple command line tool to decompress these .warc.zst files with embedded dictionaries? I know I can read it in code, but I miss a simple cli tool to go through the content
08:23:58BlueMaxima__ joins
08:28:04BlueMaxima_ quits [Ping timeout: 258 seconds]
08:37:01hooway joins
08:39:30hooway_ joins
08:39:31hooway quits [Read error: Connection reset by peer]
08:44:41ragu joins
09:05:29sonick quits [Ping timeout: 244 seconds]
09:25:05Ruthalas quits [Client Quit]
09:25:27Ruthalas (Ruthalas) joins
09:29:38sonick joins
09:38:19NF885 (NF885) joins
09:46:02derleth joins
09:46:47<derleth>hello.
09:46:53<Sanqui>hi
09:47:06<derleth>How do I propose a site to get saved?
09:47:14<Sanqui>you propose
09:47:17<Sanqui>go ahead and propose
09:47:26<derleth>OK, just a second.
09:47:37<derleth>https://www.hoax-slayer.net/notice-hoax-slayer-is-closing-down/
09:47:52<derleth>Hoax-Slayer, a debunking website, is going away at the end of the month.
09:47:53<Sanqui>looks like a blog basically
09:48:07<Sanqui>we can handle it with archivebot
09:49:13<derleth>It's also this site, https://www.hoax-slayer.com/
09:49:25<derleth>I think that's where a lot of the debunkings still are.
09:50:01<Sanqui>ah ok, we can do both domains
09:50:12<derleth>Great!
09:50:26<Sanqui>it's in progress -- if you're interested you can watch it at http://dashboard.at.ninjawedding.org/3 -- if not stand back and relax, it'll make its way into the wayback machine in a few days
09:50:44<derleth>Wonderful. That was a lot easier than i thought. Hooray for scripts.
09:50:53<derleth>Thank you.
09:52:16<Sanqui>thanks for bringing it to our attention!
09:52:42derleth quits [Client Quit]
10:03:04hilda quits [Read error: Connection reset by peer]
10:06:46hilda joins
10:43:34PlsNoJava quits [Ping timeout: 250 seconds]
10:52:51PlsNoJava (ROpdebee) joins
11:02:07BlueMaxima__ quits [Client Quit]
11:06:39yarrow (yarrow) joins
11:07:10<yarrow>what's the difference between bs and ot?
11:07:27tixma (tixma) joins
11:08:46<@OrIdow6>See topic
11:08:52<Sanqui>#archiveteam i announcements, -bs is on-topic discussion, -ot is off-topic discussion
11:08:59<Sanqui>I agree the nomenclature isn't exactly straightforward
11:09:09VukkyWork (VukkyWork) joins
11:26:02<lunik1>ddg sure does a lot of work to get search results that still aren't very good
11:29:56<LeighR>what's the correct channel for discussing grab-site?
11:34:39VukkyWork quits [Remote host closed the connection]
11:39:28hooway joins
11:39:28hooway_ quits [Read error: Connection reset by peer]
11:40:55<@EggplantN>probably here or -dev LeighR
11:41:02<LeighR>ok
11:46:18sonick72 joins
11:47:12sonick quits [Ping timeout: 244 seconds]
11:51:20Zopolis4 quits [Ping timeout: 244 seconds]
11:52:08nertzy__ joins
11:54:41nertzy_ quits [Ping timeout: 258 seconds]
12:22:40<Jake>https://techcrunch.com/2021/05/03/private-equity-firm-apollo-agrees-to-buy-verizon-media-assets-for-5-billion/ (sorry if this has already been mentioned, but I didn't see it!)
12:29:43<billy549>mentioned it in #noanswers but it should also be mentioned here
12:43:49sonick72 is now known as sonick
12:49:26etnguyen03 (etnguyen03) joins
13:15:02LeighR quits [Ping timeout: 244 seconds]
13:22:42sonick_ (sonick) joins
13:23:17sonick leaves
13:25:43sonick_ is now known as sonick
13:39:40Mateon1 quits [Remote host closed the connection]
13:39:56Mateon1 joins
14:00:30Jonboy3451 quits [Read error: Connection reset by peer]
14:03:42<@arkiver>Jake: what a mess
14:09:23<Jake>yup. quite the mess.
14:18:00Jonboy345 joins
14:31:23Starholme quits [Remote host closed the connection]
14:33:31<thuban>JAA: thanks, that sure explains why turning it up didn't seem to make it go faster :)
14:34:49endrift quits [Quit: +++CARRIER LOST+++]
14:34:57endrift joins
14:35:36<thuban>fortunately the data's not gone yet and i think it'll finish today, or even this morning
14:42:46pcr leaves
14:44:38pcr joins
14:49:49Webuser639 joins
14:56:49LeighR (LeighR) joins
15:00:36sliccricc_ quits [Ping timeout: 258 seconds]
15:06:14hooway quits [Read error: Connection reset by peer]
15:06:30hooway joins
15:15:10mutantmnky quits [Ping timeout: 258 seconds]
15:19:28Qub3d (Qub3d) joins
15:20:33Qub3d quits [Client Quit]
15:21:11Qub3d (Qub3d) joins
15:24:52mutantmnky (mutantmonkey) joins
15:25:08Arcorann quits [Ping timeout: 258 seconds]
15:27:14rsn joins
15:29:25second (second) joins
15:30:55NF885 quits [Ping timeout: 244 seconds]
15:31:13sec^nd quits [Ping timeout: 255 seconds]
15:31:14second is now known as sec^nd
15:43:17aphitex22 quits [Quit: The Lounge - https://thelounge.chat]
15:43:44aphitex22 joins
15:57:52NF885 (NF885) joins
15:59:04blankie quits [Remote host closed the connection]
16:00:32Vukky (Vukky) joins
16:30:41HP_Archivist quits [Ping timeout: 258 seconds]
16:39:51Qub3d quits [Client Quit]
16:44:48tixma quits [Ping timeout: 244 seconds]
16:55:08NF885 quits [Ping timeout: 244 seconds]
17:02:05Krownest1 (Krownest) joins
17:02:07HP_Archivist (HP_Archivist) joins
17:05:11Krownest quits [Ping timeout: 258 seconds]
17:06:44wickedplayer494 quits [Remote host closed the connection]
17:13:14wickedplayer494 joins
17:36:34cmlow quits [Quit: Connection closed for inactivity]
17:36:35Krownest1 is now known as Krownest
17:37:54serx joins
18:01:14Ella joins
18:01:35Ruthalas5 (Ruthalas) joins
18:03:50Ruthalas quits [Ping timeout: 258 seconds]
18:03:50Ruthalas5 is now known as Ruthalas
18:33:05marked joins
18:50:21Webuser639 quits [Ping timeout: 244 seconds]
18:58:40<jodizzle>serx: (from #archiveteam) I threw http://meristation.as.com/zonaforo/ into AB to get something at least, but I agree that AB will not be able to get everything by the deadline.
19:00:52<jodizzle>This would need a different solution to get it all
19:05:25Webuser639 joins
19:08:08tixma joins
19:23:16DogsRNice (Webuser299) joins
19:32:46Qub3d (Qub3d) joins
19:33:03Qub3d quits [Client Quit]
19:39:38<@JAA>Oh boy.
19:47:15Inhonion joins
19:48:22<jodizzle>yep
19:50:10<@JAA>Hold my beer.
19:57:35<LeighR>You do have a bunch of people who would love to be going fuller-throttle on Y!A...
19:59:30<@HCross>if Yahoo would let us
20:03:17<@HCross>JAA: please no
20:03:27<@HCross>please no qwarc on y!a
20:04:09<LeighR>I mean, people who are probably willing to redirect that energy to whatever JAA comes up with
20:04:33<LeighR>for meristation
20:05:48<@JAA>HCross: I'm looking into MeriStation, not Y!A.
20:05:53<@HCross>ahh
20:06:41<LeighR>I am willing to look closely at logs and tell what I see
20:06:43<@JAA>Have it working but seems very slow. Like 5+ second response time per page. I'll look a bit closer later.
20:41:47lennier1 quits [Client Quit]
20:42:25lennier1 (lennier1) joins
21:34:27<Jake>https://twitter.com/olesovhcom/status/1389330380411523076 Hopefully this one won't catch on fire!
21:40:54nuroten joins
21:45:07serx quits [Remote host closed the connection]
21:48:12<nuroten>hi, I would like to suggest a podcast to consider for archival please, if this is the right channel for suggestions. it's called Hong Kong Connection, a weekly video documentary series by Radio Television Hong Kong (RTHK), a government news outlet. https://podcast.rthk.hk/podcast/item.php?pid=280&lang=en-US its archival extends back to 2010, but
21:48:12<nuroten>there was an announcement today that the agency is moving to remove videos over 1 year old from their site, to align with their Youtube presence
21:49:47<nuroten>that podcast is one of several programmes on the site, but it is arguably one of the more iconic series with historical value, as it has footage of political protests from 2019, etc.
21:51:26tixma quits [Remote host closed the connection]
21:52:02<nuroten>news article about potential removal: https://hongkongfp.com/2021/05/03/hong-kong-broadcaster-rthk-to-delete-shows-over-a-year-old-from-internet-as-viewers-scramble-to-save-backups/
21:56:58<nuroten>the show has also been subject to episodes being cut recently due to its coverage of issues perceived as sensitive by new leadership at the broadcaster
21:59:36<nuroten>the Chinese version of the series has more videos: https://podcast.rthk.hk/podcast/item.php?pid=244&lang=zh-CN xml feeds with mp4 links: https://podcast.rthk.hk/podcast/hkconnection_en_i.xml (English) https://podcast.rthk.hk/podcast/hongkongconnection_i.xml (Chinese)
22:02:01<nuroten>thanks for considering
22:10:42<thuban>sounds doable. i'm thinking download all video + upload to ia as one item for each language. opinions? do we want anything from the web presence? (there seems to be very little content other than the videos themselves--they don't, e.g., have individual descriptions.)
22:13:09<LeighR>thuban: the xml feeds seem to provide more details for each video than the main site itself
22:13:09<G4te_Keep3r>https://podcast.rthk.hk/podcast/item.php?pid=280&eid=180380&year=2021&lang=en-US
22:13:18<G4te_Keep3r>click description
22:13:58<thuban>there only seems to be a single description for the entire podcast
22:14:30<nuroten>thanks very much! yeah, the main content are the videos themselves, the site is mainly for format context (i.e. it once lived on this page and it had a dropdown menu to select year)
22:16:33<G4te_Keep3r>it was loading blank on 1 video and that on other, had only looked at 2 till your reply. Yea that makes it easier to just need video and title
22:16:59<LeighR>I'd say that right now, any old news programs in HK are in danger of disappearing
22:17:29<thuban>LeighR: thanks, for some reason ff's rss viewer doesn't seem to want to print the details
22:18:07<thuban>that said, a lot of them seem to be cut off; if there's an authoritative source they're being drawn from i'm not sure where it is
22:18:11<LeighR>in the English version, some of the details are stuck inside [CDATA] custom tags
22:19:53<nuroten>LeighR: indeed, especially given the new leadership and the recent report about the agency needing "reform"
22:20:07<G4te_Keep3r>wait, clicking "this episode" right under video does give diff descriptions like this short one. Are the descripts in the feed?
22:20:09<G4te_Keep3r>返回
22:20:09<G4te_Keep3r>After the Lockdown
22:20:09<G4te_Keep3r>2021-04-18
22:20:09<G4te_Keep3r>The government imposed lockdown in four streets in the Jordan district in late January and conducted mandatory Covid-19 tests on the residents. This episode records this unprecedented operation.
22:21:20<thuban>ooh
22:21:28<LeighR>yes - under itunes:subtitle and itunes:summary, stuck inside [CDATA ] tags
22:21:32<thuban>those are the cut-off descriptions in the feed
22:22:04<LeighR>thuban: you're right
22:22:43<G4te_Keep3r>so need web for full ones or they are all cut off?
22:22:57<thuban>need web
22:23:24<thuban>they seem to be in the source rather than xhr'd from an api, but that's ok
22:23:29<thuban>i can pull those
22:24:22<LeighR>is there anything I can do to help? stuff like trying to grab all of Y!A is fun, but this is actually important.
22:29:30LeighR quits [Client Quit]
22:49:26Webuser639 quits [Remote host closed the connection]
23:01:18Wayward- quits [Ping timeout: 258 seconds]
23:02:49<nuroten>the back episodes of Letter to Hong Kong might be worth saving: https://podcast.rthk.hk/podcast/item.php?pid=162&lang=en-US it's an audio podcast of short audio clips, public figures (politicians from different parties, current affairs critics, academics, etc.) reading a letter addressed to the people. for example, this one is from a former
23:02:49<nuroten>pan-democratic legislator who is now in prison commenting on the events leading up to his trial: https://podcast.rthk.hk/podcast/item.php?pid=162&eid=171170&year=2020&list=1&lang=en-US
23:03:45hooway quits [Client Quit]
23:08:00BlueMaxima joins
23:16:05<nuroten>(background: he was convicted on charges along the lines of obstruction of justice for attempting to reason with a group that rushed into a subway station and started attacking commuters. he got injured while trying to help people on the train. the incident was all over the local news with questions over the official narrative of the incident, and
23:16:06<nuroten>he was later arrested.))
23:18:20Webuser639 joins
23:22:22nuroten quits [Remote host closed the connection]
23:28:17HP_Archivist quits [Client Quit]
23:35:00nuroten joins
23:35:08<@OrIdow6>arkiver: So I had an idea for how to save Aimix-BBS ("shut down" a few days ago, still up), based on the point that it blocks seemingly everyone but typically takes a few hundred requests to do it
23:35:15<@OrIdow6>I thought it might work to have items be single URLs, and then have the warriors queue on-targeted extracted URLs to backfeed (there should be a few million at most, not a big site) instead of running them themselves
23:35:17<@OrIdow6>This would basically turn it into a big, distributed ArchiveBot, where you have to make a project update instead of give a command in IRC to add or remove an ignore
23:36:51<thuban>nuroten: ok, i'm currently pulling thumbnails + videos + descriptions and formatting metadata in a way IA will understand
23:37:10<thuban>downloading will take a while, but everything will be set to upload once it's finished!
23:37:24<nuroten>thuban: thanks so much! :)
23:37:41<thuban>no problem!
23:37:47<thuban>someone remind me to update this every six months or something
23:38:09<thuban>i'll take a look at _letter to hong kong_ later
23:39:12<nuroten>if I had to choose only one thing to be saved from the RTHK website, that would be it (HK Connection). it's one of the most recognised local shows.
23:40:09<thuban>i was going to suggest earlier that if they also delete old material from their youtube--https://www.youtube.com/channel/UC6of7UYhctnYmqABjUqzuxw--that might be worth looking into as well, but it seems like a _lot_ of content. did our news project ever extend to video?
23:41:53<nuroten>thanks! anything you can save from either would be great, the announcement seems to apply to anything in their archive older than 1 year, but if I had to guess, current affairs shows would be on the main chopping board
23:47:51hupool joins
23:55:42<@arkiver>OrIdow6: sounds good with me
23:56:44<@arkiver>but
23:57:06<@arkiver>JAA: didnt we have some setup with a ton of IPs that could be used for those purposes?
23:57:12<@arkiver>OrIdow6: how permanent are the band