00:15:09megapro17 joins
00:16:41<megapro17>hi everyone. i want to ask a question, what is the best way to archive a twitter page? i want a human readable copy with pictures. as i found snscrape can only scrape to json format, without pictures. is there any more friendly solution?
00:19:04<megapro17>shouldn't be nitter great for this
00:24:30<TheTechRobo>megapro17: For what it's worth, snscrape does provide image metadata in the JSON.
00:24:42<TheTechRobo>You can download the pictures by parsing the URL out of the JSON.
00:25:00<megapro17>yes, but meeh you need to download them, store them somehow
00:25:31<@JAA>Data tends to have that issue, yeah. :-)
00:25:50<megapro17>well maybe someone invented a wheel
00:28:24<megapro17>parse with snscrape json as usual, then regex twimg and run wget on them
00:38:42megapro17 quits [Remote host closed the connection]
00:38:50megapro17 joins
00:55:24RisenRubix_ quits [Read error: Connection reset by peer]
00:55:44RisenRubix_ joins
01:04:04omglolbah quits [Ping timeout: 240 seconds]
01:05:07omglolbah joins
01:12:34megapro17 quits [Remote host closed the connection]
02:12:12michaelblob quits [Client Quit]
02:27:13omglolbah quits [Read error: Connection reset by peer]
02:27:44omglolbah joins
02:45:40omglolbah quits [Ping timeout: 240 seconds]
02:50:47omglolbah joins
02:53:16lennier1 quits [Ping timeout: 240 seconds]
02:54:18lennier1 (lennier1) joins
03:00:28tzt quits [Ping timeout: 265 seconds]
03:01:43tzt (tzt) joins
03:26:31katocala joins
03:36:04omglolbah quits [Ping timeout: 240 seconds]
03:37:12omglolbah joins
03:53:19AnotherIki joins
03:56:52Iki1 quits [Ping timeout: 240 seconds]
04:04:16ThreeHM quits [Ping timeout: 265 seconds]
04:04:38ThreeHM (ThreeHeadedMonkey) joins
04:38:49<lennier1>The Verge is reporting that Elon Musk discussed putting Twitter behind a paywall. Obviously Musk discusses a lot of stuff, but for sure that business seems like a mess right now. https://www.theverge.com/2022/11/7/23446262/elon-musk-twitter-paywall-possible
04:44:52geezabiscuit quits [Ping timeout: 240 seconds]
04:45:07geezabiscuit (geezabiscuit) joins
05:00:00treora quits [Quit: blub blub.]
05:01:24treora joins
05:03:55Iki1 joins
05:04:20RisenRubix__ joins
05:04:20RisenRubix_ quits [Remote host closed the connection]
05:04:20AnotherIki quits [Remote host closed the connection]
05:09:46RisenRubix_ joins
05:10:31RisenRubix__ quits [Remote host closed the connection]
05:19:41omglolbah_ joins
05:22:19omglolbah quits [Ping timeout: 265 seconds]
05:22:19qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
05:22:19Barto quits [Ping timeout: 265 seconds]
05:22:19programmerq quits [Ping timeout: 265 seconds]
05:22:19Somebody2 quits [Ping timeout: 265 seconds]
05:22:19kpcyrd quits [Ping timeout: 265 seconds]
05:22:19Jonimus quits [Ping timeout: 265 seconds]
05:22:34kpcyrd (kpcyrd) joins
05:22:34programmerq (programmerq) joins
05:22:39Barto (Barto) joins
05:22:41Jonimus joins
05:22:42Somebody2 joins
05:26:52tzt quits [Ping timeout: 240 seconds]
05:28:06tzt (tzt) joins
05:49:07BlueMaxima quits [Read error: Connection reset by peer]
05:53:25balrog quits [Quit: Bye]
05:58:11h2ibot quits [Remote host closed the connection]
05:58:24h2ibot (h2ibot) joins
05:59:40qwertyasdfuiopghjkl joins
06:02:04balrog (balrog) joins
06:08:24Arcorann (Arcorann) joins
06:26:50JackThompson quits [Ping timeout: 268 seconds]
07:05:12Czechball joins
07:11:12<SketchCow>Someone please mirror: https://www.youtube.com/watch?v=Wn_WPK-xFoQ
07:18:38<SketchCow>Wikiteam, please mirror http://en.techinfodepot.shoutwiki.com/wiki/Main_Page
07:50:58sknebel quits [Remote host closed the connection]
07:52:28@AlsoJAA quits [Ping timeout: 240 seconds]
07:52:42sknebel (sknebel) joins
07:53:19AlsoJAA (JAA) joins
07:53:19@ChanServ sets mode: +o AlsoJAA
08:13:13sonick quits [Client Quit]
08:24:29Adrmcr (Adrmcr) joins
08:25:45Adrmcr quits [Remote host closed the connection]
08:25:59Adrmcr (Adrmcr) joins
08:32:20JackThompson joins
08:32:37<Adrmcr>Posted a description of this in #down-the-tube already, but one of my youtube accounts has strangely gotten access to view comments on every art track channel that isn't a "- Topic" again, like C418's minecraft volume beta and Lena Raine's celeste farewell music.
08:33:16<Adrmcr>Posting comments on those videos doesn't work, though.
08:40:54JackThompson4 joins
08:41:16JackThompson quits [Ping timeout: 268 seconds]
08:41:16JackThompson4 is now known as JackThompson
09:09:53<JTL>Adrmcr: If it's what I think it is. I can see the same thing not being signed in, but in my main browser with all the Google cookies, but if I try an incogntio window "Comments are turned off"
09:09:53RisenRubix_ quits [Read error: Connection reset by peer]
09:09:57<JTL>what the heck google :P
09:10:10<JTL>exact same video
09:10:14RisenRubix_ joins
09:22:00Sluggs joins
10:21:05Adrmcr quits [Remote host closed the connection]
10:38:18RisenRubix__ joins
10:39:26Czechball quits [Client Quit]
10:39:26RisenRubix_ quits [Remote host closed the connection]
10:39:30Czechball joins
10:52:50sec^nd quits [Remote host closed the connection]
10:53:10sec^nd (second) joins
11:35:55Megame (Megame) joins
11:51:12dm4v quits [Ping timeout: 268 seconds]
11:54:09Megame quits [Client Quit]
12:03:57dm4v joins
12:04:48Straif quits [Quit: Connection closed for inactivity]
12:07:50sonick (sonick) joins
12:42:28Arcorann quits [Ping timeout: 240 seconds]
12:44:33Megame (Megame) joins
12:57:54programmerq quits [Remote host closed the connection]
13:09:54Iki joins
13:10:12borislav joins
13:12:28Iki1 quits [Ping timeout: 240 seconds]
13:15:42Megame quits [Client Quit]
13:30:32Czechball8 joins
13:30:34Czechball quits [Client Quit]
13:30:34borislav quits [Remote host closed the connection]
13:30:34qwertyasdfuiopghjkl quits [Client Quit]
13:30:34Czechball8 is now known as Czechball
13:43:11Adrmcr (Adrmcr) joins
13:48:28qwertyasdfuiopghjkl joins
13:57:14programmerq (programmerq) joins
14:07:41tech_exorcist (tech_exorcist) joins
16:10:38HP_Archivist (HP_Archivist) joins
16:34:45lennier1 quits [Client Quit]
16:36:44lennier1 (lennier1) joins
16:45:08Adrmcr quits [Remote host closed the connection]
16:46:00march_happy quits [Ping timeout: 265 seconds]
16:46:14march_happy (march_happy) joins
17:02:02HP_Archivist quits [Client Quit]
18:03:09michaelblob (michaelblob) joins
18:09:23Lord_Nightmare quits [Quit: ZNC - http://znc.in]
18:14:15Lord_Nightmare (Lord_Nightmare) joins
18:20:07<IDK>am I the only one to say that wayback machine is getting really slow right now
18:20:22<IDK>sometimes does not respond at all for a few minutes
18:21:14<@JAA>#internetarchive
18:45:43sknebel quits [Client Quit]
18:45:43qwertyasdfuiopghjkl quits [Client Quit]
18:45:43Doranwen quits [Remote host closed the connection]
18:45:54Doranwen (Doranwen) joins
18:46:52sknebel (sknebel) joins
18:47:51qwertyasdfuiopghjkl joins
18:50:59qwertyasdfuiopghjkl quits [Client Quit]
18:51:14qwertyasdfuiopghjkl joins
18:53:45Chris5010 quits [Quit: ]
19:00:23mut4ntmonkey quits [Remote host closed the connection]
19:01:14mut4ntmonkey (mutantmonkey) joins
19:08:06mut4ntmonkey quits [Remote host closed the connection]
19:08:36mut4ntmonkey (mutantmonkey) joins
19:13:56Chris5010 (Chris5010) joins
19:16:04Chris5010 quits [Client Quit]
19:35:40tzt quits [Remote host closed the connection]
19:36:01tzt (tzt) joins
19:42:16<@JAA>That Twitter scraping is done, 20.9 million tweets.
19:46:06mwfc (mwfc) joins
19:51:57<mwfc>Hej, I suspect something re Twitter for a grab is in the making or running?
20:03:26Chris5010 (Chris5010) joins
20:29:48Czechball quits [Client Quit]
20:30:36Czechball joins
21:06:02Chris5010 quits [Ping timeout: 265 seconds]
21:07:31DopefishJustin quits [Remote host closed the connection]
21:10:49DopefishJustin joins
21:11:27tech_exorcist quits [Client Quit]
21:16:13<betamax>JAA: should I be doing anything with the campaign sites? I can't see any docs for #Y, so I'm inclined to just try and do a grab using my residential connection...
21:16:31<betamax>(but if there's a way using #Y that the end results could end up in wayback, that would be much better)
21:26:11<@JAA>betamax: Yeah, there are no docs, and setting projects up is a manual thing currently that only arkiver can do. You might want to run something yourself. If it's WARC, we can always get it into the WBM later.
21:26:52<@JAA>Please use either an old wget version or wget-at though. Current upstream wget produces weird WARCs.
21:29:08<@JAA>I don't remember which wget version first had the bug, but this is the starting point if you want to dig around: https://github.com/webrecorder/warcio/pull/42
21:31:22borislav joins
21:33:14<betamax>ah, thanks for the heads up (I would have just used the latest version)
21:36:46igloo22225 quits [Quit: Ping timeout (120 seconds)]
21:37:00igloo22225 (igloo22225) joins
22:15:09mut4ntmonkey quits [Remote host closed the connection]
22:16:22mut4ntmonkey (mutantmonkey) joins
22:18:09mut4ntm0nkey (mutantmonkey) joins
22:18:21mut4ntmonkey quits [Remote host closed the connection]
22:20:53mut4ntm0nkey quits [Remote host closed the connection]
22:21:21mut4ntm0nkey (mutantmonkey) joins
22:32:07RisenRubix__ quits [Read error: Connection reset by peer]
22:34:10katocala quits [Remote host closed the connection]
22:39:25BlueMaxima joins
23:07:12Ketchup901 quits [Ping timeout: 255 seconds]
23:11:12Ketchup901 (Ketchup901) joins
23:12:58<betamax>The command I plan to run is the following:
23:13:01<betamax>wget --mirror --timeout=5 --tries=1 --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36" --page-requisites --warc-file=01 --delete-after -o 1.log http://aaronforrep.com/
23:13:13<betamax>it's not doing 100% what I want
23:14:12<betamax>(the images, which are on an external domain, are not saved, but adding --span-hosts made it start crawling other sites which will take too long
23:14:24<betamax>I'll start running it tomorrow (ran out of time now)
23:14:28<@JAA>Yeah, wget doesn't have --span-hosts-allow like wpull does.
23:14:59<@JAA>I don't think it's possible to have it recurse to off-site page requisites but not off-site links, but not entirely certain.
23:15:18<betamax>it's not major, the HTML/content is the main thing
23:17:39<betamax>JAA: could I just switch to wpull?
23:18:04<@JAA>betamax: Depends on whether one of your kinks is masochism.
23:18:41<@JAA>But you could give it a try with wpull 1.2.3. 2.0.x is basically unusable standalone.
23:18:51<betamax>ah, hmm, maybe not
23:19:09<betamax>I need to get this running ASAP :)
23:27:27DLoader quits [Ping timeout: 255 seconds]
23:32:56DLoader joins
23:33:38Ketchup901 quits [Remote host closed the connection]
23:34:27<jodizzle>Any reason to not use grab-site for this task?
23:34:49Ketchup901 (Ketchup901) joins
23:37:04Atom-- quits [Read error: Connection reset by peer]
23:48:54<@JAA>Actually, yeah, probably the best option. You can just start N crawls, then `touch stop` in each crawl's directory after a few minutes and wait for the processes to exit, then start the next batch.
23:54:49Justin[home] joins
23:54:56programmerq quits [Client Quit]
23:55:05BlueMaxima quits [Remote host closed the connection]
23:55:05DopefishJustin quits [Remote host closed the connection]
23:55:05borislav quits [Remote host closed the connection]
23:55:09BlueMaxima joins
23:58:09BlueMaxima quits [Remote host closed the connection]
23:58:12BlueMaxima joins
23:58:48DLoader_ joins