00:00:37lorwp quits [Ping timeout: 258 seconds]
00:00:39dm4v quits [Read error: Connection reset by peer]
00:03:39dm4v joins
00:03:41dm4v quits [Changing host]
00:03:41dm4v (dm4v) joins
00:07:31Arcorann__ joins
00:09:55BlueMaxima joins
00:09:57BlueMaxima_ joins
00:26:27BlueMaxima_ quits [Client Quit]
00:26:27BlueMaxima quits [Client Quit]
00:35:22gazorpazorp quits [Ping timeout: 250 seconds]
00:41:07lorwp (lorwp) joins
00:47:51genofire joins
01:02:48dm4v_ joins
01:04:15dm4v quits [Ping timeout: 258 seconds]
01:04:15dm4v_ is now known as dm4v
01:04:15dm4v quits [Changing host]
01:04:15dm4v (dm4v) joins
01:29:29Arcorann__ quits [Read error: Connection reset by peer]
01:30:13Arcorann__ joins
01:30:23monoxane quits [Quit: Ping timeout (120 seconds)]
01:30:27driib quits [Quit: Ping timeout (120 seconds)]
01:30:34monoxane (monoxane) joins
01:30:39driib (driib) joins
01:31:31<Ryz>I noticed that this is a thing for ArchiveTeam on IA - https://archive.org/details/archiveteam_twitter - is there a channel for scraping/getting Twitter content similar to Reddit content?
01:32:32psy quits [Ping timeout: 244 seconds]
01:44:31somerando3 quits [Remote host closed the connection]
01:47:50psy (psy) joins
02:19:22<thuban>thanks, that's really good to know about the rthk podcasts
02:20:29<thuban>the rthk website's episode browser only shows 1000 episodes as well, it's not just the rss.
02:23:00<thuban>(i might or might not end up trying to see if there's a way to access pages for earlier episodes, just to get the metadata from a canonical source, but episode ids seem like they might be shared across all shows, so would require some brute-forcing)
02:30:10<thuban>listennotes has even more episodes for 自由風自由PHONE, 13662
02:31:26<thuban>their free api access is pretty hoop-jumpy but i think just reverse engineering the website will do
02:32:04<thuban>i have stuff to do atm but i'll get to work on it in a few hours.
02:32:05Krownest1 (Krownest) joins
02:33:32<thuban>nuroten: i'm still downloading 31 This Week; do you/others want my script?
02:34:31<thuban>i didn't want to publish it before because it's still a bit janky, but this is becoming... pressing
02:34:56<nuroten>thuban: I'm not sure if I can run it but would like to try, sure. thanks :)
02:35:40<thuban>https://transfer.archivete.am/tTtsa/getall.py
02:35:52Krownest quits [Ping timeout: 258 seconds]
02:36:56<thuban>requires python, youtube-dl. invoke as `./getall.py <rss xml url>`, saves episodes and 'file_md.json' (suitable for ia) to the working directory.
02:39:01<thuban>(note that (some?) audio-only podcasts supply a dummy thumbnail in their metadata; script currently isn't smart about that)
02:40:16<nuroten>yeah, some of the thumbnail links don't work, no problem
02:41:31Krownest1 quits [Read error: Connection reset by peer]
02:41:57<thuban>it does handle 404ing thumbnails, but for some shows _every_ thumbnail is the same generic url, which 404s, and the script will just make a note that it was missing every time... will fix later
02:52:46G4te_Keep3r quits [Quit: The Lounge - https://thelounge.chat]
02:56:19G4te_Keep3r joins
03:01:00G4te_Keep3r quits [Client Quit]
03:06:12G4te_Keep3r joins
03:40:52qw3rty_ joins
03:44:44qw3rty__ quits [Ping timeout: 250 seconds]
03:56:50Krownest (Krownest) joins
04:46:18mutantmonkey quits [Remote host closed the connection]
04:46:39mutantmonkey (mutantmonkey) joins
05:02:15DogsRNice quits [Read error: Connection reset by peer]
05:08:09abcdef joins
05:18:05abcdef quits [Remote host closed the connection]
05:32:12mutantmonkey quits [Ping timeout: 258 seconds]
05:32:35mutantmonkey (mutantmonkey) joins
05:56:42Atom-- quits [Read error: Connection reset by peer]
05:57:15Atom joins
06:37:28abcdefg joins
06:44:05abcdefg quits [Remote host closed the connection]
07:02:17HP_Archivist quits [Ping timeout: 258 seconds]
07:04:08Megame (Megame) joins
07:50:09AK quits [Remote host closed the connection]
08:06:11AK (AK) joins
08:12:26Jonboy3451 quits [Ping timeout: 258 seconds]
09:08:45user1 joins
09:08:51user1 is now known as gazorpazorp
09:11:58HP_Archivist (HP_Archivist) joins
09:27:02HP_Archivist quits [Read error: Connection reset by peer]
10:26:00Megame quits [Ping timeout: 250 seconds]
10:26:07Megame (Megame) joins
11:08:17Megame quits [Client Quit]
11:24:51@EggplantN quits [Quit: Ping timeout (120 seconds)]
11:25:22EggplantN joins
11:41:45EggplantN quits [Changing host]
11:41:45EggplantN (EggplantN) joins
11:41:45@ChanServ sets mode: +o EggplantN
11:58:54nuroten quits [Remote host closed the connection]
12:13:52Jonboy345 joins
14:41:54lunik1 quits [Ping timeout: 258 seconds]
14:58:42nuroten joins
15:25:00Hackerpcs quits [Ping timeout: 250 seconds]
15:32:21lunik1 joins
15:36:00Megame (Megame) joins
15:37:59spirit joins
16:07:02Arcorann__ quits [Ping timeout: 250 seconds]
16:18:25Hackerpcs (Hackerpcs) joins
16:20:59abcd joins
16:49:45qwertyasdfuiopghjkl joins
17:41:56DogsRNice (Webuser299) joins
17:42:39DogsRNice quits [Remote host closed the connection]
17:56:28<mgrandi>New social media site: https://www.politico.com/news/2021/07/01/gettr-trump-social-media-platform-497606
18:04:07HP_Archivist (HP_Archivist) joins
18:40:52Ajay1 quits [Ping timeout: 250 seconds]
19:01:45HP_Archivist quits [Client Quit]
19:43:43nertzy_ joins
19:45:53nertzy__ quits [Ping timeout: 258 seconds]
20:23:46spirit quits [Client Quit]
20:27:06Megame quits [Read error: Connection reset by peer]
20:27:30Megame (Megame) joins
20:48:39Megame quits [Read error: Connection reset by peer]
20:49:03Megame (Megame) joins
21:09:12qwertyasdfuiopghjkl quits [Remote host closed the connection]
21:11:39qwertyasdfuiopghjkl joins
21:16:44Megame quits [Ping timeout: 258 seconds]
21:41:44somerando3 joins
21:42:20<somerando3>nuroten & thuban: I scraped the metadata and URLs for the RTHK podcasts 31, open line open view, the pulse, and backchat: https://www69.zippyshare.com/v/KUUkcy16/file.html.
21:43:00<somerando3>I know you found a bigger lists on listennotes, but I'd already created an API account on podchaser and figured it may be able to fill some gaps.
21:43:38<nuroten>thanks somerando3
21:46:56<Jake>(reupload https://transfer.archivete.am/WLT6n/rthk.podchaser.scrape.2020-07-01.zip )
21:47:19<somerando3>np, let me know if you need anything else scraped from there, since I already have the script. I should be available the next couple days.
22:03:07Viniter6 quits [Quit: Ping timeout (120 seconds)]
22:05:57Viniter6 (Viniter) joins
22:22:29somerando3 quits [Remote host closed the connection]
23:11:28Megame (Megame) joins
23:11:31HugsNotDrugs` joins
23:13:26HugsNotDrugs quits [Ping timeout: 250 seconds]