| 00:00:37 | | lorwp quits [Ping timeout: 258 seconds] |
| 00:00:39 | | dm4v quits [Read error: Connection reset by peer] |
| 00:03:39 | | dm4v joins |
| 00:03:41 | | dm4v is now authenticated as dm4v |
| 00:03:41 | | dm4v quits [Changing host] |
| 00:03:41 | | dm4v (dm4v) joins |
| 00:07:31 | | Arcorann__ joins |
| 00:09:55 | | BlueMaxima joins |
| 00:09:57 | | BlueMaxima_ joins |
| 00:26:27 | | BlueMaxima_ quits [Client Quit] |
| 00:26:27 | | BlueMaxima quits [Client Quit] |
| 00:35:22 | | gazorpazorp quits [Ping timeout: 250 seconds] |
| 00:41:07 | | lorwp (lorwp) joins |
| 00:47:51 | | genofire joins |
| 01:02:48 | | dm4v_ joins |
| 01:04:15 | | dm4v quits [Ping timeout: 258 seconds] |
| 01:04:15 | | dm4v_ is now known as dm4v |
| 01:04:15 | | dm4v is now authenticated as dm4v |
| 01:04:15 | | dm4v quits [Changing host] |
| 01:04:15 | | dm4v (dm4v) joins |
| 01:29:29 | | Arcorann__ quits [Read error: Connection reset by peer] |
| 01:30:13 | | Arcorann__ joins |
| 01:30:23 | | monoxane quits [Quit: Ping timeout (120 seconds)] |
| 01:30:27 | | driib quits [Quit: Ping timeout (120 seconds)] |
| 01:30:34 | | monoxane (monoxane) joins |
| 01:30:39 | | driib (driib) joins |
| 01:31:31 | <Ryz> | I noticed that this is a thing for ArchiveTeam on IA - https://archive.org/details/archiveteam_twitter - is there a channel for scraping/getting Twitter content similar to Reddit content? |
| 01:32:32 | | psy quits [Ping timeout: 244 seconds] |
| 01:44:31 | | somerando3 quits [Remote host closed the connection] |
| 01:47:50 | | psy (psy) joins |
| 02:19:22 | <thuban> | thanks, that's really good to know about the rthk podcasts |
| 02:20:29 | <thuban> | the rthk website's episode browser only shows 1000 episodes as well, it's not just the rss. |
| 02:23:00 | <thuban> | (i might or might not end up trying to see if there's a way to access pages for earlier episodes, just to get the metadata from a canonical source, but episode ids seem like they might be shared across all shows, so would require some brute-forcing) |
| 02:30:10 | <thuban> | listennotes has even more episodes for 自由風自由PHONE, 13662 |
| 02:31:26 | <thuban> | their free api access is pretty hoop-jumpy but i think just reverse engineering the website will do |
| 02:32:04 | <thuban> | i have stuff to do atm but i'll get to work on it in a few hours. |
| 02:32:05 | | Krownest1 (Krownest) joins |
| 02:33:32 | <thuban> | nuroten: i'm still downloading 31 This Week; do you/others want my script? |
| 02:34:31 | <thuban> | i didn't want to publish it before because it's still a bit janky, but this is becoming... pressing |
| 02:34:56 | <nuroten> | thuban: I'm not sure if I can run it but would like to try, sure. thanks :) |
| 02:35:40 | <thuban> | https://transfer.archivete.am/tTtsa/getall.py |
| 02:35:52 | | Krownest quits [Ping timeout: 258 seconds] |
| 02:36:56 | <thuban> | requires python, youtube-dl. invoke as `./getall.py <rss xml url>`, saves episodes and 'file_md.json' (suitable for ia) to the working directory. |
| 02:39:01 | <thuban> | (note that (some?) audio-only podcasts supply a dummy thumbnail in their metadata; script currently isn't smart about that) |
| 02:40:16 | <nuroten> | yeah, some of the thumbnail links don't work, no problem |
| 02:41:31 | | Krownest1 quits [Read error: Connection reset by peer] |
| 02:41:57 | <thuban> | it does handle 404ing thumbnails, but for some shows _every_ thumbnail is the same generic url, which 404s, and the script will just make a note that it was missing every time... will fix later |
| 02:52:46 | | G4te_Keep3r quits [Quit: The Lounge - https://thelounge.chat] |
| 02:56:19 | | G4te_Keep3r joins |
| 03:01:00 | | G4te_Keep3r quits [Client Quit] |
| 03:06:12 | | G4te_Keep3r joins |
| 03:40:52 | | qw3rty_ joins |
| 03:44:44 | | qw3rty__ quits [Ping timeout: 250 seconds] |
| 03:56:50 | | Krownest (Krownest) joins |
| 04:46:18 | | mutantmonkey quits [Remote host closed the connection] |
| 04:46:39 | | mutantmonkey (mutantmonkey) joins |
| 05:02:15 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:08:09 | | abcdef joins |
| 05:18:05 | | abcdef quits [Remote host closed the connection] |
| 05:32:12 | | mutantmonkey quits [Ping timeout: 258 seconds] |
| 05:32:35 | | mutantmonkey (mutantmonkey) joins |
| 05:56:42 | | Atom-- quits [Read error: Connection reset by peer] |
| 05:57:15 | | Atom joins |
| 06:37:28 | | abcdefg joins |
| 06:44:05 | | abcdefg quits [Remote host closed the connection] |
| 07:02:17 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 07:04:08 | | Megame (Megame) joins |
| 07:50:09 | | AK quits [Remote host closed the connection] |
| 08:06:11 | | AK (AK) joins |
| 08:12:26 | | Jonboy3451 quits [Ping timeout: 258 seconds] |
| 09:08:45 | | user1 joins |
| 09:08:51 | | user1 is now known as gazorpazorp |
| 09:11:58 | | HP_Archivist (HP_Archivist) joins |
| 09:27:02 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 10:26:00 | | Megame quits [Ping timeout: 250 seconds] |
| 10:26:07 | | Megame (Megame) joins |
| 11:08:17 | | Megame quits [Client Quit] |
| 11:24:51 | | @EggplantN quits [Quit: Ping timeout (120 seconds)] |
| 11:25:22 | | EggplantN joins |
| 11:41:45 | | EggplantN is now authenticated as EggplantN |
| 11:41:45 | | EggplantN quits [Changing host] |
| 11:41:45 | | EggplantN (EggplantN) joins |
| 11:41:45 | | @ChanServ sets mode: +o EggplantN |
| 11:58:54 | | nuroten quits [Remote host closed the connection] |
| 12:13:52 | | Jonboy345 joins |
| 14:41:54 | | lunik1 quits [Ping timeout: 258 seconds] |
| 14:58:42 | | nuroten joins |
| 15:25:00 | | Hackerpcs quits [Ping timeout: 250 seconds] |
| 15:32:21 | | lunik1 joins |
| 15:36:00 | | Megame (Megame) joins |
| 15:37:59 | | spirit joins |
| 16:07:02 | | Arcorann__ quits [Ping timeout: 250 seconds] |
| 16:18:25 | | Hackerpcs (Hackerpcs) joins |
| 16:20:59 | | abcd joins |
| 16:49:45 | | qwertyasdfuiopghjkl joins |
| 17:41:56 | | DogsRNice (Webuser299) joins |
| 17:42:39 | | DogsRNice quits [Remote host closed the connection] |
| 17:56:28 | <mgrandi> | New social media site: https://www.politico.com/news/2021/07/01/gettr-trump-social-media-platform-497606 |
| 18:04:07 | | HP_Archivist (HP_Archivist) joins |
| 18:40:52 | | Ajay1 quits [Ping timeout: 250 seconds] |
| 19:01:45 | | HP_Archivist quits [Client Quit] |
| 19:43:43 | | nertzy_ joins |
| 19:45:53 | | nertzy__ quits [Ping timeout: 258 seconds] |
| 20:23:46 | | spirit quits [Client Quit] |
| 20:27:06 | | Megame quits [Read error: Connection reset by peer] |
| 20:27:30 | | Megame (Megame) joins |
| 20:48:39 | | Megame quits [Read error: Connection reset by peer] |
| 20:49:03 | | Megame (Megame) joins |
| 21:09:12 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 21:11:39 | | qwertyasdfuiopghjkl joins |
| 21:16:44 | | Megame quits [Ping timeout: 258 seconds] |
| 21:41:44 | | somerando3 joins |
| 21:42:20 | <somerando3> | nuroten & thuban: I scraped the metadata and URLs for the RTHK podcasts 31, open line open view, the pulse, and backchat: https://www69.zippyshare.com/v/KUUkcy16/file.html. |
| 21:43:00 | <somerando3> | I know you found a bigger lists on listennotes, but I'd already created an API account on podchaser and figured it may be able to fill some gaps. |
| 21:43:38 | <nuroten> | thanks somerando3 |
| 21:46:56 | <Jake> | (reupload https://transfer.archivete.am/WLT6n/rthk.podchaser.scrape.2020-07-01.zip ) |
| 21:47:19 | <somerando3> | np, let me know if you need anything else scraped from there, since I already have the script. I should be available the next couple days. |
| 22:03:07 | | Viniter6 quits [Quit: Ping timeout (120 seconds)] |
| 22:05:57 | | Viniter6 (Viniter) joins |
| 22:22:29 | | somerando3 quits [Remote host closed the connection] |
| 23:11:28 | | Megame (Megame) joins |
| 23:11:31 | | HugsNotDrugs` joins |
| 23:13:26 | | HugsNotDrugs quits [Ping timeout: 250 seconds] |