00:03:07<Mental>but the larger blogs are a pain to archive with a web browser, I'm not even getting started on that. For blogs with one or two posts it would be more doable but the bigger blogs tend to have more interesting content that isn't all found on the blog homepage, so I'd rather see the bot process those first
00:03:37<@JAA>That's exactly what pokechu22's been doing.
00:04:17<Mental>JAA the bigger blogs that are excluded from tweakblogs.net_redo_commands.txt are already done?
00:04:30<@JAA>As I understood it, yes.
00:18:01jacksonchen666 quits [Ping timeout: 245 seconds]
00:19:47jacksonchen666 (jacksonchen666) joins
00:21:48sonick (sonick) joins
00:31:33pokechu22 (pokechu22) joins
00:32:55<pokechu22>Mental: Yeah, all of the bigger ones are done already, see http://archivebot.com/finished
00:34:37<pokechu22>I also made sure the homepage and pages 2/3 (when applicable) were saved for the smaller blogs yesterday
00:38:43<pokechu22>Only the blog pages themselves are being saved in those jobs; things like links to other sites or embedded images (most of which are on imgur it seems) will need to be done later (by extracting a list of them from the log), which shouldn't be *too* hard
00:41:04<Mental>pokechu22, I have all the blog posts locally (https://archive.org/details/tweakblogs), I was planning to extract image links anyway. Many should also be on the Tweakers.net domain as users can upload images to their photo album
00:42:53<Mental>I'll just try to hackertyper it right now..
00:50:27<Mental>pokechu22 I have a list of every url in img src
00:50:31<Mental>where do I put it
00:50:44<pokechu22>You can upload it to https://transfer.archivete.am/
00:50:50<Mental>it's 19450 links
00:51:50<Mental>https://transfer.archivete.am/IxfKJ/tweakblogs_imagelinks
00:52:53<@arkiver>Mental: i'm putting it in archivebot
00:53:10<@arkiver>we should also have gotten a copy through #Y (not in wayback yet though), but good you're getting a second copy
00:53:12<@arkiver>it's small anyway
00:53:30<Mental>just fyi, this is the result of cat all_blogposts grep "<img" | sed -e 's/<img/\n<img/' -e 's/.*img src="\([^"]*\)".*/\1/' | sort -u
00:54:38<Mental>arkiver thx :)
00:54:40<pokechu22>Hmm. I thought I saw blogs that directly embedded imgur, but that doesn't seem to be the case
00:54:51<pokechu22>(instead it's using those camo URLs)
00:55:45<Mental>Tweakers proxies embedded images for security/privacy reasons
00:57:34<pokechu22>Yeah, makes sense; I'm just confused because I thought I checked for that and saw otherwise. I must be thinking of a different site, but I'm not sure what I could have confused it with
01:04:20<Mental>should have used g switch, but it doesn't change the resulting list
01:07:28Czechball3 joins
01:07:48TransonicGravity joins
01:08:20Czechball quits [Ping timeout: 265 seconds]
01:08:20Czechball3 is now known as Czechball
01:12:08<Mental>pokechu22 the alt and title contain the direct link
01:12:15<Mental>maybe you saw those
01:13:10<pokechu22>That would explain it; I think I checked with view-source and didn't pay too much attention
01:16:09<pokechu22>In any case I can also extract the original URLs from those (just looking for /camo/ and then using the url parameter with URL decoding) to get https://transfer.archivete.am/aA6H2/tweakblogs.net_camo_image_urls_originals.txt which I'm running in archivebot now
01:18:27Czechball quits [Ping timeout: 250 seconds]
01:18:50<Mental>pokechu22 you won't hit all images that way as some are hosted by users in their photo album on tweakers.net which doesn't get proxied
01:18:54Czechball joins
01:19:15<Mental>you'd hit all externally hosted images though, so if that was your goal never mind
01:20:17<Mental>but arkiver already but my list in archivebot, does it need to be there twice?
01:20:23<Mental>*put
01:20:42<pokechu22>Yeah, my goal was to also save the original images; arkiver's job will save the images as used in the blogs
01:21:14<pokechu22>Mostly just for completeness
01:21:44<Mental>ah I see now
01:41:56Megame quits [Client Quit]
01:44:26Czechball quits [Ping timeout: 264 seconds]
02:21:43wyatt8740 quits [Ping timeout: 250 seconds]
02:27:01wyatt8740 joins
04:27:39vpn-user joins
04:28:27<vpn-user>Hello, so I was conducting research on clay party, an online harrasment group
04:28:48<vpn-user>however, I found a discord server discussing very sensitive topics https://discord.gg/bFQyDnJTUK
04:29:18<vpn-user>please immediately archive this with DHT, as to freeze legal evidence on the server
04:29:25vpn-user quits [Remote host closed the connection]
04:29:43vpn-user joins
04:30:53<vpn-user>The discord server seems to have the bullies behind clay party’s views on pedophillia, which may come in handy in court
04:31:31<vpn-user>The clay party site seems to be harassing users, doxing them, and encouraging LGBTQ and “furry” users to commit suicide
04:31:49<vpn-user>police action has already started against the site
04:32:16<vpn-user>They purge the channels often, so archival is urgent for legal purposes
04:32:27vpn-user quits [Remote host closed the connection]
04:33:00vpn-user joins
04:33:48<vpn-user>The server does not contain any images, only a text chat, excluding the suspicious emojis
04:34:29<vpn-user>it is really important someone archives the chat logs of the server ASAP
04:34:45vpn-user quits [Remote host closed the connection]
04:35:04vpn-user joins
04:35:40<vpn-user>I will now leave the server, and using a VPN for obvious reasons.
04:35:42vpn-user leaves
05:03:45vpnuserisback joins
05:04:07<vpnuserisback>I forgot to mention to use discord URLs scraper also, there is images of self harm in the server
05:04:20<vpnuserisback>And suspicious emojis too
05:04:50<vpnuserisback>Please archive it ASAP, they might purge any minute now
05:04:52vpnuserisback leaves
05:38:14asghari joins
05:40:53asghari quits [Remote host closed the connection]
06:08:16Atom-- joins
06:11:49Atom quits [Ping timeout: 250 seconds]
06:21:26drin joins
06:21:47geezabiscuit quits [Ping timeout: 250 seconds]
06:22:15drin is now known as geezabiscuit
06:49:41HackMii_ quits [Ping timeout: 245 seconds]
06:51:53HackMii_ (hacktheplanet) joins
07:01:47qwertyasdfuiopghjkl quits [Client Quit]
07:06:50Island quits [Read error: Connection reset by peer]
08:00:21<pokechu22>OK, all blogs with 6 or more posts have been saved via AB (and I also did the front page of all blogs as a separate job). Per https://tweakblogs.net/?allWeblogs=1&sort=nrpost&order=DESC that still leaves ~2/3rds of them to be done, and it's at 2-3 minutes per blog already and will only speed up, so queuing really needs to be automated before doing more. But, the biggest stuff
08:00:23<pokechu22>is done, so that's good
08:27:43hitgrr8 joins
08:28:51HackMii_ quits [Ping timeout: 245 seconds]
08:30:54HackMii_ (hacktheplanet) joins
09:22:24michaelblob quits [Read error: Connection reset by peer]
09:26:21jacksonchen666 quits [Ping timeout: 245 seconds]
09:27:43michaelblob (michaelblob) joins
09:33:15jacksonchen666 (jacksonchen666) joins
09:53:57Megame (Megame) joins
11:17:20AnotherIki joins
11:21:02Iki1 quits [Ping timeout: 264 seconds]
11:28:57qwertyasdfuiopghjkl joins
11:37:11sec^nd quits [Ping timeout: 245 seconds]
11:38:14sec^nd (second) joins
12:46:22<@JAA>arkiver: How come the Tweakblogs data from #Y isn't in the WBM yet? Just too small to trigger a megawarc or something else?
12:56:08sec^nd quits [Remote host closed the connection]
12:56:50sec^nd (second) joins
13:11:37Stiletto joins
13:45:46alcapotz joins
13:46:02<alcapotz>hi
13:46:17<alcapotz>@search government contracts
13:46:26alcapotz quits [Remote host closed the connection]
13:47:22Megame quits [Client Quit]
13:58:09LeGoupil joins
14:23:53chrismeller (chrismeller) joins
14:48:36pie_ quits []
14:51:56pie_ joins
14:52:16pie_ quits [Client Quit]
15:06:14pie_ joins
16:25:35VonGuard joins
16:26:57Island joins
16:40:54fl0w_ joins
16:44:03fl0w quits [Ping timeout: 250 seconds]
16:55:32fl0w joins
16:57:29fl0w_ quits [Ping timeout: 250 seconds]
16:58:03fl0w_ joins
17:00:30fl0w quits [Ping timeout: 265 seconds]
17:06:51fl0w joins
17:09:41fl0w_ quits [Ping timeout: 265 seconds]
17:15:15thuban joins
17:18:10fl0w_ joins
17:22:14fl0w quits [Ping timeout: 264 seconds]
17:25:04fl0w joins
17:27:38fl0w_ quits [Ping timeout: 264 seconds]
17:30:59fl0w_ joins
17:34:14fl0w quits [Ping timeout: 264 seconds]
18:14:56chrismeller quits [Read error: Connection reset by peer]
18:16:12LeGoupil quits [Client Quit]
18:32:10fl0w joins
18:34:45fl0w_ quits [Ping timeout: 265 seconds]
18:36:00<h2ibot>Usernam edited List of websites excluded from the Wayback Machine (+25): https://wiki.archiveteam.org/?diff=49350&oldid=49331
18:57:55sonick quits [Client Quit]
19:19:34Megame (Megame) joins
19:59:17Megame quits [Client Quit]
20:03:41wyatt8740 quits [Ping timeout: 265 seconds]
20:22:55Ketchup901 (Ketchup901) joins
20:33:59wyatt8740 joins
20:34:32<fishingforsoup>Fuddles, this isn't archived anywhere.
20:34:33<fishingforsoup>https://youtu.be/04urYK0yVZY
20:36:21<fishingforsoup>Or this.
20:36:22<fishingforsoup>https://youtu.be/besw5nKbidw
20:37:08<fishingforsoup>Or this.
20:37:08<fishingforsoup>https://youtu.be/4YIEIuKhuAQ
20:37:46<fishingforsoup>https://youtu.be/0u6koEzcRJo
20:37:50<fishingforsoup>You get the idea.
20:41:26wyatt8740 quits [Ping timeout: 264 seconds]
20:42:44wyatt8740 joins
20:47:09wyatt8740 quits [Ping timeout: 250 seconds]
20:48:25wyatt8740 joins
22:26:59Ketchup901 quits [Remote host closed the connection]
22:27:15Ketchup901 (Ketchup901) joins
22:37:22BlueMaxima joins
23:06:48hitgrr8 quits [Client Quit]
23:31:59Megame (Megame) joins
23:33:00AlsoTheTechRobo (TheTechRobo) joins
23:36:09TheTechRobo quits [Ping timeout: 250 seconds]
23:46:12fl0w_ joins
23:50:22fl0w quits [Ping timeout: 265 seconds]