00:08:55fishingforpie quits [Remote host closed the connection]
00:09:19fishingforpie joins
00:17:42sec^nd quits [Remote host closed the connection]
00:22:54sec^nd (second) joins
00:24:51pcr (pcr) joins
00:35:46sec^nd quits [Ping timeout: 240 seconds]
00:42:26sec^nd (second) joins
00:45:29Arcorann (Arcorann) joins
00:51:58fishingforpie quits [Remote host closed the connection]
01:01:23fishingforpie joins
01:12:19fishingforpie quits [Remote host closed the connection]
01:43:13<systwi_>https://kiwifarms.ru/ might be a mirror of https://kiwifarms.net/ , but I can't access it because of "DDOS-GUARD."
01:43:35<systwi_>I'm assuming AB will have the same problem.
01:53:10<systwi_>Does anyone have any suggestions on scraping a broken, but loaded, Firefox page for hrefs?
01:54:03katocala quits [Remote host closed the connection]
01:54:12<systwi_>I can neither pull up developer tools, view the page source, use Link Gopher to scrape for links, "print" it to a PDF nor save it to an HTML/_files.
01:54:45<systwi_>The last resort I'm thinking of is OCR, which would _not_ be fun. This page is massive.
01:56:11<systwi_>Or, alternatively, is there a way to get a full list of a Tumblr post's "notes?"
01:56:56<@JAA>Probably with a bit of scripting around the pagination.
01:58:51<systwi_>I tried looking into that and found a few questions on Stack Exchange which didn't seem to help. I'm guessing Tumblr changed how pagination works on /notes/ pages (both w/ and w/o JS) since then.
01:59:22<systwi_>Or _maybe_ it was some strange cache thing in Firefox? I didn't verify that.
01:59:43<systwi_>Either way, the "Show more notes" link would always be the full /notes/ URL with # suffixed.
02:01:07<systwi_>URL I was trying, for reference: https://sierns.tumblr.com/notes/661611184805527552/kX3iH4VUO
02:01:28<@JAA>The pagination URLs are in some JS mess.
02:01:36<@JAA>But should be easy enough.
02:02:17<systwi_>Oh yeah, there was this portion of the URL buried in some JS: ?from_c=1656280336
02:02:27<systwi_>but the number didn't appear exploitable.
02:04:09<@JAA>It's a timestamp IIRC, but yeah, basically you need to extract it to work properly.
02:05:32katocala joins
02:07:09<@JAA>Second, writing an ugly Python one-liner. :-)
02:07:32<systwi_>I tried that last time but it still suffixed the #. Now, looking in the same spot as last time, I see the next URL, ending with ?from_c=1654705021.
02:08:30<systwi_>(or doing it the less-cool way, trying to have a bash script do the same :-P )
02:08:45<@JAA>Yeah, but then you don't get connection reuse, so it's slow.
02:09:01<@JAA>Or well, slower than it needs to be.
02:09:54<systwi_>I feel this is going to get quite involved, considering my current Python skill set. x_x
02:13:24<@JAA>`python3 -c 'import itertools, re, requests, sys, urllib.parse; s = requests.Session(); url = sys.argv[1]; {print(f"Fetching {url}", file = sys.stderr) or print((r := s.get(url)).text) or ((m := re.search(r"\x27/notes/\d+/[A-Za-z0-9]+\?from_c=\d+(?=\x27)", r.text)) and (url := urllib.parse.urljoin(url, m.group(0)[1:]))) or 1/0 for _ in itertools.count()}'
02:13:29<@JAA>https://sierns.tumblr.com/notes/661611184805527552/kX3iH4VUO`
02:13:42<@JAA>Blows up with a ZeroDivisionError because I'm too lazy to terminate that properly now.
02:14:07<@JAA>Output of all notes pages goes to stdout, progress/URLs beint fetched go to stderr.
02:14:16systwi_ closes freshly-started bash script progress... :-P
02:14:22<@JAA>Requires Python 3.8+
02:15:01<@JAA>Actually hang on
02:15:32<systwi_>Thank you! This will be extremely useful. My current way of getting the next notes page was...you'll probably laugh...keeping my End key held down and running an auto clicker, with the cursor positioned on "Show more notes."
02:16:05<@JAA>`python3 -c 'import itertools, re, requests, sys, urllib.parse; s = requests.Session(); url = sys.argv[1]; {print(f"Fetching {url}", file = sys.stderr) or print((r := s.get(url)).text) or ((m := re.search(r"\x27/notes/\d+/[A-Za-z0-9]+\?from_c=\d+(?=\x27)", r.text)) and (url := urllib.parse.urljoin(url, m.group(0)[1:])) and True) or 1/0 for _ in itertools.count()}' URL`
02:16:10<systwi_>It worked great...until the page froze.
02:16:29<@JAA>Yeah, that's usually how it goes. It works well until it doesn't. :-)
02:16:34<systwi_>:-)
02:16:45<@JAA>This one should be constant memory, unlike the first version.
02:16:54<systwi_>Thanks. will try this. Much appreciated.
02:16:56<@JAA>Can be started from the post page, too.
02:17:03<systwi_>Ooh perfect.
02:18:12systwi_ blindly runs code from a stranger
02:18:15<systwi_>:-P
02:18:49<systwi_>Joking, I trust it.
02:19:39BlueMaxima quits [Client Quit]
02:20:34<@JAA>It's pretty easy to follow once you figure out the structure. Just a very hacky way to get it into a true one-liner. :-)
02:21:10<@JAA>It's basically `while url: r = requests.get(url); print(r.text); url = extract_next_page(r.text)`.
02:21:30<systwi_>Yeah, seemed pretty readable to me, even with my rudimentary Python skills.
02:21:47<systwi_>It's done. :-o
02:21:53<@JAA>Just written as a set comprehension because I figured out recently how to abuse those to get rid of awful '...'$'\n''...' constructs.
02:37:12Chris5010 quits [Remote host closed the connection]
02:48:57<jamesp>Looks like kiwifarms is bach at kiwifarms.ru
02:49:00<jamesp>*back
02:49:12<jamesp>I was also told about a tor link
02:51:46<@JAA>Yeah, discussed earlier here and in #archivebot, the .ru domain uses DDoS-GUARD, and .onion is .onion.
03:01:01Kinille quits []
03:14:34Arcorann quits [Ping timeout: 265 seconds]
03:26:52<Ryz>Mono = 1, and rail = rail
03:37:27Arcorann (Arcorann) joins
03:38:03Kinille (Kinille) joins
04:12:53<h2ibot>John123521 edited Yuku.com (+1): https://wiki.archiveteam.org/?diff=48911&oldid=48850
04:12:55<@JAA>^ RIP
04:13:52<h2ibot>Pokechu22 edited ISP Hosting (+317, +coqui.net): https://wiki.archiveteam.org/?diff=48912&oldid=47729
04:13:53<h2ibot>JustAnotherArchivist changed the user rights of User:Pokechu22
04:14:53<h2ibot>JustAnotherArchivist changed the user rights of User:Nemo bis
04:16:26march_happy quits [Ping timeout: 265 seconds]
04:17:00march_happy (march_happy) joins
04:30:20sec^nd quits [Remote host closed the connection]
04:37:25sec^nd (second) joins
04:38:17tbc1887 (tbc1887) joins
04:43:46sec^nd quits [Ping timeout: 240 seconds]
04:50:29sec^nd (second) joins
05:43:44Barto quits [Read error: Connection reset by peer]
05:52:50Barto (Barto) joins
06:18:43march_happy quits [Ping timeout: 265 seconds]
06:18:58march_happy (march_happy) joins
06:36:46sec^nd quits [Ping timeout: 240 seconds]
06:43:24sec^nd (second) joins
07:17:21knecht420 quits [Quit: Ping timeout (120 seconds)]
07:17:27knecht420 (knecht420) joins
07:22:16sec^nd quits [Ping timeout: 240 seconds]
07:33:33sec^nd (second) joins
08:15:30tbc1887 quits [Read error: Connection reset by peer]
09:53:15h joins
09:53:32h quits [Remote host closed the connection]
10:21:32tech_exorcist (tech_exorcist) joins
10:31:16sec^nd quits [Ping timeout: 240 seconds]
10:38:06sec^nd (second) joins
11:15:08march_happy quits [Remote host closed the connection]
11:20:24march_happy (march_happy) joins
11:32:29march_happy quits [Remote host closed the connection]
11:35:28march_happy (march_happy) joins
11:36:21march_happy quits [Remote host closed the connection]
11:38:41march_happy (march_happy) joins
11:45:43march_happy quits [Remote host closed the connection]
11:48:17march_happy (march_happy) joins
11:53:11march_happy quits [Ping timeout: 265 seconds]
11:57:09march_happy (march_happy) joins
12:03:10fishingforpie joins
12:37:22Iki joins
13:18:52tech_exorcist quits [Client Quit]
13:40:26fishingforpie quits [Remote host closed the connection]
13:51:13michaelblob quits [Read error: Connection reset by peer]
13:52:35dm4v_ joins
13:52:37michaelblob (michaelblob) joins
13:54:59dm4v quits [Ping timeout: 265 seconds]
13:54:59dm4v_ is now known as dm4v
14:09:40Arcorann quits [Ping timeout: 240 seconds]
14:44:30flashfire42 quits [Quit: The Lounge - https://thelounge.chat]
14:44:31kiska quits [Quit: The Lounge - https://thelounge.chat]
14:44:31Ryz2 quits [Quit: The Lounge - https://thelounge.chat]
14:44:31s-crypt quits [Quit: The Lounge - https://thelounge.chat]
14:47:43Ryz2 (Ryz) joins
14:47:45s-crypt (s-crypt) joins
14:47:51flashfire42 (flashfire42) joins
14:48:52kiska (kiska) joins
15:58:33fishingforpie joins
16:00:43tech_exorcist (tech_exorcist) joins
16:56:46fishingforpie quits [Remote host closed the connection]
18:24:28le0n quits [Ping timeout: 240 seconds]
18:27:09le0n (le0n) joins
19:59:53<h2ibot>JustAnotherArchivist created TJournal (+16, Redirected page to [[TJ]]): https://wiki.archiveteam.org/?title=TJournal
20:44:16sec^nd quits [Ping timeout: 240 seconds]
20:54:36sec^nd (second) joins
21:20:34datechnoman quits [Quit: The Lounge - https://thelounge.chat]
22:13:42igloo22225 quits [Client Quit]
22:13:56igloo22225 (igloo22225) joins
22:26:16tech_exorcist quits [Client Quit]
22:43:23<h2ibot>Arkiver uploaded File:Tjournal-logo.png: https://wiki.archiveteam.org/?title=File%3ATjournal-logo.png
22:58:21BlueMaxima joins
23:13:14wickedplayer494 quits [Ping timeout: 265 seconds]
23:13:27<h2ibot>Switchnode edited TJ (+189, update project info/status): https://wiki.archiveteam.org/?diff=48915&oldid=48822
23:15:16wickedplayer494 joins
23:17:27<h2ibot>Switchnode edited Template:Infobox project (+4, reflect hackint as default irc network): https://wiki.archiveteam.org/?diff=48916&oldid=48856
23:26:55Arcorann (Arcorann) joins