#archiveteam-bs log for 2023-01-05

Home Search Previous day Next day

00:03:07	<Mental>	but the larger blogs are a pain to archive with a web browser, I'm not even getting started on that. For blogs with one or two posts it would be more doable but the bigger blogs tend to have more interesting content that isn't all found on the blog homepage, so I'd rather see the bot process those first
00:03:37	<@JAA>	That's exactly what pokechu22's been doing.
00:04:17	<Mental>	JAA the bigger blogs that are excluded from tweakblogs.net_redo_commands.txt are already done?
00:04:30	<@JAA>	As I understood it, yes.
00:18:01		jacksonchen666 quits [Ping timeout: 245 seconds]
00:19:47		jacksonchen666 (jacksonchen666) joins
00:21:48		sonick (sonick) joins
00:31:33		pokechu22 (pokechu22) joins
00:32:55	<pokechu22>	Mental: Yeah, all of the bigger ones are done already, see http://archivebot.com/finished
00:34:37	<pokechu22>	I also made sure the homepage and pages 2/3 (when applicable) were saved for the smaller blogs yesterday
00:38:43	<pokechu22>	Only the blog pages themselves are being saved in those jobs; things like links to other sites or embedded images (most of which are on imgur it seems) will need to be done later (by extracting a list of them from the log), which shouldn't be too hard
00:41:04	<Mental>	pokechu22, I have all the blog posts locally (https://archive.org/details/tweakblogs), I was planning to extract image links anyway. Many should also be on the Tweakers.net domain as users can upload images to their photo album
00:42:53	<Mental>	I'll just try to hackertyper it right now..
00:43:51		wickedplayer494 is now authenticated as wickedplayer494
00:50:27	<Mental>	pokechu22 I have a list of every url in img src
00:50:31	<Mental>	where do I put it
00:50:44	<pokechu22>	You can upload it to https://transfer.archivete.am/
00:50:50	<Mental>	it's 19450 links
00:51:50	<Mental>	https://transfer.archivete.am/IxfKJ/tweakblogs_imagelinks
00:52:53	<@arkiver>	Mental: i'm putting it in archivebot
00:53:10	<@arkiver>	we should also have gotten a copy through #Y (not in wayback yet though), but good you're getting a second copy
00:53:12	<@arkiver>	it's small anyway
00:53:30	<Mental>	just fyi, this is the result of cat all_blogposts grep "<img" \| sed -e 's/<img/\n<img/' -e 's/.img src="\([^"]\)".*/\1/' \| sort -u
00:54:38	<Mental>	arkiver thx :)
00:54:40	<pokechu22>	Hmm. I thought I saw blogs that directly embedded imgur, but that doesn't seem to be the case
00:54:51	<pokechu22>	(instead it's using those camo URLs)
00:55:45	<Mental>	Tweakers proxies embedded images for security/privacy reasons
00:57:34	<pokechu22>	Yeah, makes sense; I'm just confused because I thought I checked for that and saw otherwise. I must be thinking of a different site, but I'm not sure what I could have confused it with
01:04:20	<Mental>	should have used g switch, but it doesn't change the resulting list
01:07:28		Czechball3 joins
01:07:48		TransonicGravity joins
01:08:20		Czechball quits [Ping timeout: 265 seconds]
01:08:20		Czechball3 is now known as Czechball
01:12:08	<Mental>	pokechu22 the alt and title contain the direct link
01:12:15	<Mental>	maybe you saw those
01:13:10	<pokechu22>	That would explain it; I think I checked with view-source and didn't pay too much attention
01:16:09	<pokechu22>	In any case I can also extract the original URLs from those (just looking for /camo/ and then using the url parameter with URL decoding) to get https://transfer.archivete.am/aA6H2/tweakblogs.net_camo_image_urls_originals.txt which I'm running in archivebot now
01:18:27		Czechball quits [Ping timeout: 250 seconds]
01:18:50	<Mental>	pokechu22 you won't hit all images that way as some are hosted by users in their photo album on tweakers.net which doesn't get proxied
01:18:54		Czechball joins
01:19:15	<Mental>	you'd hit all externally hosted images though, so if that was your goal never mind
01:20:17	<Mental>	but arkiver already but my list in archivebot, does it need to be there twice?
01:20:23	<Mental>	*put
01:20:42	<pokechu22>	Yeah, my goal was to also save the original images; arkiver's job will save the images as used in the blogs
01:21:14	<pokechu22>	Mostly just for completeness
01:21:44	<Mental>	ah I see now
01:41:56		Megame quits [Client Quit]
01:44:26		Czechball quits [Ping timeout: 264 seconds]
02:21:43		wyatt8740 quits [Ping timeout: 250 seconds]
02:27:01		wyatt8740 joins
04:27:39		vpn-user joins
04:28:27	<vpn-user>	Hello, so I was conducting research on clay party, an online harrasment group
04:28:48	<vpn-user>	however, I found a discord server discussing very sensitive topics https://discord.gg/bFQyDnJTUK
04:29:18	<vpn-user>	please immediately archive this with DHT, as to freeze legal evidence on the server
04:29:25		vpn-user quits [Remote host closed the connection]
04:29:43		vpn-user joins
04:30:53	<vpn-user>	The discord server seems to have the bullies behind clay party’s views on pedophillia, which may come in handy in court
04:31:31	<vpn-user>	The clay party site seems to be harassing users, doxing them, and encouraging LGBTQ and “furry” users to commit suicide
04:31:49	<vpn-user>	police action has already started against the site
04:32:16	<vpn-user>	They purge the channels often, so archival is urgent for legal purposes
04:32:27		vpn-user quits [Remote host closed the connection]
04:33:00		vpn-user joins
04:33:48	<vpn-user>	The server does not contain any images, only a text chat, excluding the suspicious emojis
04:34:29	<vpn-user>	it is really important someone archives the chat logs of the server ASAP
04:34:45		vpn-user quits [Remote host closed the connection]
04:35:04		vpn-user joins
04:35:40	<vpn-user>	I will now leave the server, and using a VPN for obvious reasons.
04:35:42		vpn-user leaves
05:03:45		vpnuserisback joins
05:04:07	<vpnuserisback>	I forgot to mention to use discord URLs scraper also, there is images of self harm in the server
05:04:20	<vpnuserisback>	And suspicious emojis too
05:04:50	<vpnuserisback>	Please archive it ASAP, they might purge any minute now
05:04:52		vpnuserisback leaves
05:38:14		asghari joins
05:40:53		asghari quits [Remote host closed the connection]
06:08:16		Atom-- joins
06:11:49		Atom quits [Ping timeout: 250 seconds]
06:21:26		drin joins
06:21:47		geezabiscuit quits [Ping timeout: 250 seconds]
06:22:15		drin is now known as geezabiscuit
06:49:41		HackMii_ quits [Ping timeout: 245 seconds]
06:51:53		HackMii_ (hacktheplanet) joins
07:01:47		qwertyasdfuiopghjkl quits [Client Quit]
07:06:50		Island quits [Read error: Connection reset by peer]
08:00:21	<pokechu22>	OK, all blogs with 6 or more posts have been saved via AB (and I also did the front page of all blogs as a separate job). Per https://tweakblogs.net/?allWeblogs=1&sort=nrpost&order=DESC that still leaves ~2/3rds of them to be done, and it's at 2-3 minutes per blog already and will only speed up, so queuing really needs to be automated before doing more. But, the biggest stuff
08:00:23	<pokechu22>	is done, so that's good
08:27:43		hitgrr8 joins
08:28:51		HackMii_ quits [Ping timeout: 245 seconds]
08:30:54		HackMii_ (hacktheplanet) joins
09:22:24		michaelblob quits [Read error: Connection reset by peer]
09:26:21		jacksonchen666 quits [Ping timeout: 245 seconds]
09:27:43		michaelblob (michaelblob) joins
09:33:15		jacksonchen666 (jacksonchen666) joins
09:53:57		Megame (Megame) joins
11:17:20		AnotherIki joins
11:21:02		Iki1 quits [Ping timeout: 264 seconds]
11:28:57		qwertyasdfuiopghjkl joins
11:37:11		sec^nd quits [Ping timeout: 245 seconds]
11:38:14		sec^nd (second) joins
12:46:22	<@JAA>	arkiver: How come the Tweakblogs data from #Y isn't in the WBM yet? Just too small to trigger a megawarc or something else?
12:56:08		sec^nd quits [Remote host closed the connection]
12:56:50		sec^nd (second) joins
13:11:37		Stiletto joins
13:45:46		alcapotz joins
13:46:02	<alcapotz>	hi
13:46:17	<alcapotz>	@search government contracts
13:46:26		alcapotz quits [Remote host closed the connection]
13:47:22		Megame quits [Client Quit]
13:58:09		LeGoupil joins
14:23:53		chrismeller (chrismeller) joins
14:48:36		pie_ quits []
14:51:56		pie_ joins
14:52:16		pie_ quits [Client Quit]
15:06:14		pie_ joins
16:25:35		VonGuard joins
16:26:57		Island joins
16:40:54		fl0w_ joins
16:44:03		fl0w quits [Ping timeout: 250 seconds]
16:55:32		fl0w joins
16:57:29		fl0w_ quits [Ping timeout: 250 seconds]
16:58:03		fl0w_ joins
17:00:30		fl0w quits [Ping timeout: 265 seconds]
17:06:51		fl0w joins
17:09:41		fl0w_ quits [Ping timeout: 265 seconds]
17:15:15		thuban joins
17:18:10		fl0w_ joins
17:22:14		fl0w quits [Ping timeout: 264 seconds]
17:25:04		fl0w joins
17:27:38		fl0w_ quits [Ping timeout: 264 seconds]
17:30:59		fl0w_ joins
17:34:14		fl0w quits [Ping timeout: 264 seconds]
18:14:56		chrismeller quits [Read error: Connection reset by peer]
18:16:12		LeGoupil quits [Client Quit]
18:32:10		fl0w joins
18:34:45		fl0w_ quits [Ping timeout: 265 seconds]
18:36:00	<h2ibot>	Usernam edited List of websites excluded from the Wayback Machine (+25): https://wiki.archiveteam.org/?diff=49350&oldid=49331
18:57:55		sonick quits [Client Quit]
19:19:34		Megame (Megame) joins
19:59:17		Megame quits [Client Quit]
20:03:41		wyatt8740 quits [Ping timeout: 265 seconds]
20:22:55		Ketchup901 (Ketchup901) joins
20:33:59		wyatt8740 joins
20:34:32	<fishingforsoup>	Fuddles, this isn't archived anywhere.
20:34:33	<fishingforsoup>	https://youtu.be/04urYK0yVZY
20:36:21	<fishingforsoup>	Or this.
20:36:22	<fishingforsoup>	https://youtu.be/besw5nKbidw
20:37:08	<fishingforsoup>	Or this.
20:37:08	<fishingforsoup>	https://youtu.be/4YIEIuKhuAQ
20:37:46	<fishingforsoup>	https://youtu.be/0u6koEzcRJo
20:37:50	<fishingforsoup>	You get the idea.
20:41:26		wyatt8740 quits [Ping timeout: 264 seconds]
20:42:44		wyatt8740 joins
20:47:09		wyatt8740 quits [Ping timeout: 250 seconds]
20:48:25		wyatt8740 joins
22:26:59		Ketchup901 quits [Remote host closed the connection]
22:27:15		Ketchup901 (Ketchup901) joins
22:37:22		BlueMaxima joins
23:06:48		hitgrr8 quits [Client Quit]
23:31:59		Megame (Megame) joins
23:33:00		AlsoTheTechRobo (TheTechRobo) joins
23:36:09		TheTechRobo quits [Ping timeout: 250 seconds]
23:46:12		fl0w_ joins
23:50:22		fl0w quits [Ping timeout: 265 seconds]

Home Search Previous day Next day