| 00:08:55 | | fishingforpie quits [Remote host closed the connection] |
| 00:09:19 | | fishingforpie joins |
| 00:17:42 | | sec^nd quits [Remote host closed the connection] |
| 00:22:54 | | sec^nd (second) joins |
| 00:24:51 | | pcr (pcr) joins |
| 00:35:46 | | sec^nd quits [Ping timeout: 240 seconds] |
| 00:42:26 | | sec^nd (second) joins |
| 00:45:29 | | Arcorann (Arcorann) joins |
| 00:51:58 | | fishingforpie quits [Remote host closed the connection] |
| 01:01:23 | | fishingforpie joins |
| 01:12:19 | | fishingforpie quits [Remote host closed the connection] |
| 01:43:13 | <systwi_> | https://kiwifarms.ru/ might be a mirror of https://kiwifarms.net/ , but I can't access it because of "DDOS-GUARD." |
| 01:43:35 | <systwi_> | I'm assuming AB will have the same problem. |
| 01:53:10 | <systwi_> | Does anyone have any suggestions on scraping a broken, but loaded, Firefox page for hrefs? |
| 01:54:03 | | katocala quits [Remote host closed the connection] |
| 01:54:12 | <systwi_> | I can neither pull up developer tools, view the page source, use Link Gopher to scrape for links, "print" it to a PDF nor save it to an HTML/_files. |
| 01:54:45 | <systwi_> | The last resort I'm thinking of is OCR, which would _not_ be fun. This page is massive. |
| 01:56:11 | <systwi_> | Or, alternatively, is there a way to get a full list of a Tumblr post's "notes?" |
| 01:56:56 | <@JAA> | Probably with a bit of scripting around the pagination. |
| 01:58:51 | <systwi_> | I tried looking into that and found a few questions on Stack Exchange which didn't seem to help. I'm guessing Tumblr changed how pagination works on /notes/ pages (both w/ and w/o JS) since then. |
| 01:59:22 | <systwi_> | Or _maybe_ it was some strange cache thing in Firefox? I didn't verify that. |
| 01:59:43 | <systwi_> | Either way, the "Show more notes" link would always be the full /notes/ URL with # suffixed. |
| 02:01:07 | <systwi_> | URL I was trying, for reference: https://sierns.tumblr.com/notes/661611184805527552/kX3iH4VUO |
| 02:01:28 | <@JAA> | The pagination URLs are in some JS mess. |
| 02:01:36 | <@JAA> | But should be easy enough. |
| 02:02:17 | <systwi_> | Oh yeah, there was this portion of the URL buried in some JS: ?from_c=1656280336 |
| 02:02:27 | <systwi_> | but the number didn't appear exploitable. |
| 02:04:09 | <@JAA> | It's a timestamp IIRC, but yeah, basically you need to extract it to work properly. |
| 02:05:32 | | katocala joins |
| 02:06:02 | | katocala is now authenticated as katocala |
| 02:07:09 | <@JAA> | Second, writing an ugly Python one-liner. :-) |
| 02:07:32 | <systwi_> | I tried that last time but it still suffixed the #. Now, looking in the same spot as last time, I see the next URL, ending with ?from_c=1654705021. |
| 02:08:30 | <systwi_> | (or doing it the less-cool way, trying to have a bash script do the same :-P ) |
| 02:08:45 | <@JAA> | Yeah, but then you don't get connection reuse, so it's slow. |
| 02:09:01 | <@JAA> | Or well, slower than it needs to be. |
| 02:09:54 | <systwi_> | I feel this is going to get quite involved, considering my current Python skill set. x_x |
| 02:13:24 | <@JAA> | `python3 -c 'import itertools, re, requests, sys, urllib.parse; s = requests.Session(); url = sys.argv[1]; {print(f"Fetching {url}", file = sys.stderr) or print((r := s.get(url)).text) or ((m := re.search(r"\x27/notes/\d+/[A-Za-z0-9]+\?from_c=\d+(?=\x27)", r.text)) and (url := urllib.parse.urljoin(url, m.group(0)[1:]))) or 1/0 for _ in itertools.count()}' |
| 02:13:29 | <@JAA> | https://sierns.tumblr.com/notes/661611184805527552/kX3iH4VUO` |
| 02:13:42 | <@JAA> | Blows up with a ZeroDivisionError because I'm too lazy to terminate that properly now. |
| 02:14:07 | <@JAA> | Output of all notes pages goes to stdout, progress/URLs beint fetched go to stderr. |
| 02:14:16 | | systwi_ closes freshly-started bash script progress... :-P |
| 02:14:22 | <@JAA> | Requires Python 3.8+ |
| 02:15:01 | <@JAA> | Actually hang on |
| 02:15:32 | <systwi_> | Thank you! This will be extremely useful. My current way of getting the next notes page was...you'll probably laugh...keeping my End key held down and running an auto clicker, with the cursor positioned on "Show more notes." |
| 02:16:05 | <@JAA> | `python3 -c 'import itertools, re, requests, sys, urllib.parse; s = requests.Session(); url = sys.argv[1]; {print(f"Fetching {url}", file = sys.stderr) or print((r := s.get(url)).text) or ((m := re.search(r"\x27/notes/\d+/[A-Za-z0-9]+\?from_c=\d+(?=\x27)", r.text)) and (url := urllib.parse.urljoin(url, m.group(0)[1:])) and True) or 1/0 for _ in itertools.count()}' URL` |
| 02:16:10 | <systwi_> | It worked great...until the page froze. |
| 02:16:29 | <@JAA> | Yeah, that's usually how it goes. It works well until it doesn't. :-) |
| 02:16:34 | <systwi_> | :-) |
| 02:16:45 | <@JAA> | This one should be constant memory, unlike the first version. |
| 02:16:54 | <systwi_> | Thanks. will try this. Much appreciated. |
| 02:16:56 | <@JAA> | Can be started from the post page, too. |
| 02:17:03 | <systwi_> | Ooh perfect. |
| 02:18:12 | | systwi_ blindly runs code from a stranger |
| 02:18:15 | <systwi_> | :-P |
| 02:18:49 | <systwi_> | Joking, I trust it. |
| 02:19:39 | | BlueMaxima quits [Client Quit] |
| 02:20:34 | <@JAA> | It's pretty easy to follow once you figure out the structure. Just a very hacky way to get it into a true one-liner. :-) |
| 02:21:10 | <@JAA> | It's basically `while url: r = requests.get(url); print(r.text); url = extract_next_page(r.text)`. |
| 02:21:30 | <systwi_> | Yeah, seemed pretty readable to me, even with my rudimentary Python skills. |
| 02:21:47 | <systwi_> | It's done. :-o |
| 02:21:53 | <@JAA> | Just written as a set comprehension because I figured out recently how to abuse those to get rid of awful '...'$'\n''...' constructs. |
| 02:37:12 | | Chris5010 quits [Remote host closed the connection] |
| 02:48:57 | <jamesp> | Looks like kiwifarms is bach at kiwifarms.ru |
| 02:49:00 | <jamesp> | *back |
| 02:49:12 | <jamesp> | I was also told about a tor link |
| 02:51:46 | <@JAA> | Yeah, discussed earlier here and in #archivebot, the .ru domain uses DDoS-GUARD, and .onion is .onion. |
| 03:01:01 | | Kinille quits [] |
| 03:14:34 | | Arcorann quits [Ping timeout: 265 seconds] |
| 03:26:52 | <Ryz> | Mono = 1, and rail = rail |
| 03:37:27 | | Arcorann (Arcorann) joins |
| 03:38:03 | | Kinille (Kinille) joins |
| 04:12:53 | <h2ibot> | John123521 edited Yuku.com (+1): https://wiki.archiveteam.org/?diff=48911&oldid=48850 |
| 04:12:55 | <@JAA> | ^ RIP |
| 04:13:52 | <h2ibot> | Pokechu22 edited ISP Hosting (+317, +coqui.net): https://wiki.archiveteam.org/?diff=48912&oldid=47729 |
| 04:13:53 | <h2ibot> | JustAnotherArchivist changed the user rights of User:Pokechu22 |
| 04:14:53 | <h2ibot> | JustAnotherArchivist changed the user rights of User:Nemo bis |
| 04:16:26 | | march_happy quits [Ping timeout: 265 seconds] |
| 04:17:00 | | march_happy (march_happy) joins |
| 04:30:20 | | sec^nd quits [Remote host closed the connection] |
| 04:37:25 | | sec^nd (second) joins |
| 04:38:17 | | tbc1887 (tbc1887) joins |
| 04:43:46 | | sec^nd quits [Ping timeout: 240 seconds] |
| 04:50:29 | | sec^nd (second) joins |
| 05:43:44 | | Barto quits [Read error: Connection reset by peer] |
| 05:52:50 | | Barto (Barto) joins |
| 06:18:43 | | march_happy quits [Ping timeout: 265 seconds] |
| 06:18:58 | | march_happy (march_happy) joins |
| 06:36:46 | | sec^nd quits [Ping timeout: 240 seconds] |
| 06:43:24 | | sec^nd (second) joins |
| 07:17:21 | | knecht420 quits [Quit: Ping timeout (120 seconds)] |
| 07:17:27 | | knecht420 (knecht420) joins |
| 07:22:16 | | sec^nd quits [Ping timeout: 240 seconds] |
| 07:33:33 | | sec^nd (second) joins |
| 08:15:30 | | tbc1887 quits [Read error: Connection reset by peer] |
| 09:53:15 | | h joins |
| 09:53:32 | | h quits [Remote host closed the connection] |
| 10:21:32 | | tech_exorcist (tech_exorcist) joins |
| 10:31:16 | | sec^nd quits [Ping timeout: 240 seconds] |
| 10:38:06 | | sec^nd (second) joins |
| 11:15:08 | | march_happy quits [Remote host closed the connection] |
| 11:20:24 | | march_happy (march_happy) joins |
| 11:32:29 | | march_happy quits [Remote host closed the connection] |
| 11:35:28 | | march_happy (march_happy) joins |
| 11:36:21 | | march_happy quits [Remote host closed the connection] |
| 11:38:41 | | march_happy (march_happy) joins |
| 11:45:43 | | march_happy quits [Remote host closed the connection] |
| 11:48:17 | | march_happy (march_happy) joins |
| 11:53:11 | | march_happy quits [Ping timeout: 265 seconds] |
| 11:57:09 | | march_happy (march_happy) joins |
| 12:03:10 | | fishingforpie joins |
| 12:37:22 | | Iki joins |
| 13:18:52 | | tech_exorcist quits [Client Quit] |
| 13:40:26 | | fishingforpie quits [Remote host closed the connection] |
| 13:51:13 | | michaelblob quits [Read error: Connection reset by peer] |
| 13:52:35 | | dm4v_ joins |
| 13:52:37 | | michaelblob (michaelblob) joins |
| 13:54:59 | | dm4v quits [Ping timeout: 265 seconds] |
| 13:54:59 | | dm4v_ is now known as dm4v |
| 14:09:40 | | Arcorann quits [Ping timeout: 240 seconds] |
| 14:44:30 | | flashfire42 quits [Quit: The Lounge - https://thelounge.chat] |
| 14:44:31 | | kiska quits [Quit: The Lounge - https://thelounge.chat] |
| 14:44:31 | | Ryz2 quits [Quit: The Lounge - https://thelounge.chat] |
| 14:44:31 | | s-crypt quits [Quit: The Lounge - https://thelounge.chat] |
| 14:47:43 | | Ryz2 (Ryz) joins |
| 14:47:45 | | s-crypt (s-crypt) joins |
| 14:47:51 | | flashfire42 (flashfire42) joins |
| 14:48:52 | | kiska (kiska) joins |
| 15:58:33 | | fishingforpie joins |
| 16:00:43 | | tech_exorcist (tech_exorcist) joins |
| 16:56:46 | | fishingforpie quits [Remote host closed the connection] |
| 18:24:28 | | le0n quits [Ping timeout: 240 seconds] |
| 18:27:09 | | le0n (le0n) joins |
| 19:59:53 | <h2ibot> | JustAnotherArchivist created TJournal (+16, Redirected page to [[TJ]]): https://wiki.archiveteam.org/?title=TJournal |
| 20:44:16 | | sec^nd quits [Ping timeout: 240 seconds] |
| 20:54:36 | | sec^nd (second) joins |
| 21:20:34 | | datechnoman quits [Quit: The Lounge - https://thelounge.chat] |
| 22:13:42 | | igloo22225 quits [Client Quit] |
| 22:13:56 | | igloo22225 (igloo22225) joins |
| 22:26:16 | | tech_exorcist quits [Client Quit] |
| 22:43:23 | <h2ibot> | Arkiver uploaded File:Tjournal-logo.png: https://wiki.archiveteam.org/?title=File%3ATjournal-logo.png |
| 22:58:21 | | BlueMaxima joins |
| 23:13:14 | | wickedplayer494 quits [Ping timeout: 265 seconds] |
| 23:13:27 | <h2ibot> | Switchnode edited TJ (+189, update project info/status): https://wiki.archiveteam.org/?diff=48915&oldid=48822 |
| 23:15:16 | | wickedplayer494 joins |
| 23:17:27 | <h2ibot> | Switchnode edited Template:Infobox project (+4, reflect hackint as default irc network): https://wiki.archiveteam.org/?diff=48916&oldid=48856 |
| 23:20:20 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 23:26:55 | | Arcorann (Arcorann) joins |