| 00:03:07 | <Mental> | but the larger blogs are a pain to archive with a web browser, I'm not even getting started on that. For blogs with one or two posts it would be more doable but the bigger blogs tend to have more interesting content that isn't all found on the blog homepage, so I'd rather see the bot process those first |
| 00:03:37 | <@JAA> | That's exactly what pokechu22's been doing. |
| 00:04:17 | <Mental> | JAA the bigger blogs that are excluded from tweakblogs.net_redo_commands.txt are already done? |
| 00:04:30 | <@JAA> | As I understood it, yes. |
| 00:18:01 | | jacksonchen666 quits [Ping timeout: 245 seconds] |
| 00:19:47 | | jacksonchen666 (jacksonchen666) joins |
| 00:21:48 | | sonick (sonick) joins |
| 00:31:33 | | pokechu22 (pokechu22) joins |
| 00:32:55 | <pokechu22> | Mental: Yeah, all of the bigger ones are done already, see http://archivebot.com/finished |
| 00:34:37 | <pokechu22> | I also made sure the homepage and pages 2/3 (when applicable) were saved for the smaller blogs yesterday |
| 00:38:43 | <pokechu22> | Only the blog pages themselves are being saved in those jobs; things like links to other sites or embedded images (most of which are on imgur it seems) will need to be done later (by extracting a list of them from the log), which shouldn't be *too* hard |
| 00:41:04 | <Mental> | pokechu22, I have all the blog posts locally (https://archive.org/details/tweakblogs), I was planning to extract image links anyway. Many should also be on the Tweakers.net domain as users can upload images to their photo album |
| 00:42:53 | <Mental> | I'll just try to hackertyper it right now.. |
| 00:43:51 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 00:50:27 | <Mental> | pokechu22 I have a list of every url in img src |
| 00:50:31 | <Mental> | where do I put it |
| 00:50:44 | <pokechu22> | You can upload it to https://transfer.archivete.am/ |
| 00:50:50 | <Mental> | it's 19450 links |
| 00:51:50 | <Mental> | https://transfer.archivete.am/IxfKJ/tweakblogs_imagelinks |
| 00:52:53 | <@arkiver> | Mental: i'm putting it in archivebot |
| 00:53:10 | <@arkiver> | we should also have gotten a copy through #Y (not in wayback yet though), but good you're getting a second copy |
| 00:53:12 | <@arkiver> | it's small anyway |
| 00:53:30 | <Mental> | just fyi, this is the result of cat all_blogposts grep "<img" | sed -e 's/<img/\n<img/' -e 's/.*img src="\([^"]*\)".*/\1/' | sort -u |
| 00:54:38 | <Mental> | arkiver thx :) |
| 00:54:40 | <pokechu22> | Hmm. I thought I saw blogs that directly embedded imgur, but that doesn't seem to be the case |
| 00:54:51 | <pokechu22> | (instead it's using those camo URLs) |
| 00:55:45 | <Mental> | Tweakers proxies embedded images for security/privacy reasons |
| 00:57:34 | <pokechu22> | Yeah, makes sense; I'm just confused because I thought I checked for that and saw otherwise. I must be thinking of a different site, but I'm not sure what I could have confused it with |
| 01:04:20 | <Mental> | should have used g switch, but it doesn't change the resulting list |
| 01:07:28 | | Czechball3 joins |
| 01:07:48 | | TransonicGravity joins |
| 01:08:20 | | Czechball quits [Ping timeout: 265 seconds] |
| 01:08:20 | | Czechball3 is now known as Czechball |
| 01:12:08 | <Mental> | pokechu22 the alt and title contain the direct link |
| 01:12:15 | <Mental> | maybe you saw those |
| 01:13:10 | <pokechu22> | That would explain it; I think I checked with view-source and didn't pay too much attention |
| 01:16:09 | <pokechu22> | In any case I can also extract the original URLs from those (just looking for /camo/ and then using the url parameter with URL decoding) to get https://transfer.archivete.am/aA6H2/tweakblogs.net_camo_image_urls_originals.txt which I'm running in archivebot now |
| 01:18:27 | | Czechball quits [Ping timeout: 250 seconds] |
| 01:18:50 | <Mental> | pokechu22 you won't hit all images that way as some are hosted by users in their photo album on tweakers.net which doesn't get proxied |
| 01:18:54 | | Czechball joins |
| 01:19:15 | <Mental> | you'd hit all externally hosted images though, so if that was your goal never mind |
| 01:20:17 | <Mental> | but arkiver already but my list in archivebot, does it need to be there twice? |
| 01:20:23 | <Mental> | *put |
| 01:20:42 | <pokechu22> | Yeah, my goal was to also save the original images; arkiver's job will save the images as used in the blogs |
| 01:21:14 | <pokechu22> | Mostly just for completeness |
| 01:21:44 | <Mental> | ah I see now |
| 01:41:56 | | Megame quits [Client Quit] |
| 01:44:26 | | Czechball quits [Ping timeout: 264 seconds] |
| 02:21:43 | | wyatt8740 quits [Ping timeout: 250 seconds] |
| 02:27:01 | | wyatt8740 joins |
| 04:27:39 | | vpn-user joins |
| 04:28:27 | <vpn-user> | Hello, so I was conducting research on clay party, an online harrasment group |
| 04:28:48 | <vpn-user> | however, I found a discord server discussing very sensitive topics https://discord.gg/bFQyDnJTUK |
| 04:29:18 | <vpn-user> | please immediately archive this with DHT, as to freeze legal evidence on the server |
| 04:29:25 | | vpn-user quits [Remote host closed the connection] |
| 04:29:43 | | vpn-user joins |
| 04:30:53 | <vpn-user> | The discord server seems to have the bullies behind clay party’s views on pedophillia, which may come in handy in court |
| 04:31:31 | <vpn-user> | The clay party site seems to be harassing users, doxing them, and encouraging LGBTQ and “furry” users to commit suicide |
| 04:31:49 | <vpn-user> | police action has already started against the site |
| 04:32:16 | <vpn-user> | They purge the channels often, so archival is urgent for legal purposes |
| 04:32:27 | | vpn-user quits [Remote host closed the connection] |
| 04:33:00 | | vpn-user joins |
| 04:33:48 | <vpn-user> | The server does not contain any images, only a text chat, excluding the suspicious emojis |
| 04:34:29 | <vpn-user> | it is really important someone archives the chat logs of the server ASAP |
| 04:34:45 | | vpn-user quits [Remote host closed the connection] |
| 04:35:04 | | vpn-user joins |
| 04:35:40 | <vpn-user> | I will now leave the server, and using a VPN for obvious reasons. |
| 04:35:42 | | vpn-user leaves |
| 05:03:45 | | vpnuserisback joins |
| 05:04:07 | <vpnuserisback> | I forgot to mention to use discord URLs scraper also, there is images of self harm in the server |
| 05:04:20 | <vpnuserisback> | And suspicious emojis too |
| 05:04:50 | <vpnuserisback> | Please archive it ASAP, they might purge any minute now |
| 05:04:52 | | vpnuserisback leaves |
| 05:38:14 | | asghari joins |
| 05:40:53 | | asghari quits [Remote host closed the connection] |
| 06:08:16 | | Atom-- joins |
| 06:11:49 | | Atom quits [Ping timeout: 250 seconds] |
| 06:21:26 | | drin joins |
| 06:21:47 | | geezabiscuit quits [Ping timeout: 250 seconds] |
| 06:22:15 | | drin is now known as geezabiscuit |
| 06:49:41 | | HackMii_ quits [Ping timeout: 245 seconds] |
| 06:51:53 | | HackMii_ (hacktheplanet) joins |
| 07:01:47 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 07:06:50 | | Island quits [Read error: Connection reset by peer] |
| 08:00:21 | <pokechu22> | OK, all blogs with 6 or more posts have been saved via AB (and I also did the front page of all blogs as a separate job). Per https://tweakblogs.net/?allWeblogs=1&sort=nrpost&order=DESC that still leaves ~2/3rds of them to be done, and it's at 2-3 minutes per blog already and will only speed up, so queuing really needs to be automated before doing more. But, the biggest stuff |
| 08:00:23 | <pokechu22> | is done, so that's good |
| 08:27:43 | | hitgrr8 joins |
| 08:28:51 | | HackMii_ quits [Ping timeout: 245 seconds] |
| 08:30:54 | | HackMii_ (hacktheplanet) joins |
| 09:22:24 | | michaelblob quits [Read error: Connection reset by peer] |
| 09:26:21 | | jacksonchen666 quits [Ping timeout: 245 seconds] |
| 09:27:43 | | michaelblob (michaelblob) joins |
| 09:33:15 | | jacksonchen666 (jacksonchen666) joins |
| 09:53:57 | | Megame (Megame) joins |
| 11:17:20 | | AnotherIki joins |
| 11:21:02 | | Iki1 quits [Ping timeout: 264 seconds] |
| 11:28:57 | | qwertyasdfuiopghjkl joins |
| 11:37:11 | | sec^nd quits [Ping timeout: 245 seconds] |
| 11:38:14 | | sec^nd (second) joins |
| 12:46:22 | <@JAA> | arkiver: How come the Tweakblogs data from #Y isn't in the WBM yet? Just too small to trigger a megawarc or something else? |
| 12:56:08 | | sec^nd quits [Remote host closed the connection] |
| 12:56:50 | | sec^nd (second) joins |
| 13:11:37 | | Stiletto joins |
| 13:45:46 | | alcapotz joins |
| 13:46:02 | <alcapotz> | hi |
| 13:46:17 | <alcapotz> | @search government contracts |
| 13:46:26 | | alcapotz quits [Remote host closed the connection] |
| 13:47:22 | | Megame quits [Client Quit] |
| 13:58:09 | | LeGoupil joins |
| 14:23:53 | | chrismeller (chrismeller) joins |
| 14:48:36 | | pie_ quits [] |
| 14:51:56 | | pie_ joins |
| 14:52:16 | | pie_ quits [Client Quit] |
| 15:06:14 | | pie_ joins |
| 16:25:35 | | VonGuard joins |
| 16:26:57 | | Island joins |
| 16:40:54 | | fl0w_ joins |
| 16:44:03 | | fl0w quits [Ping timeout: 250 seconds] |
| 16:55:32 | | fl0w joins |
| 16:57:29 | | fl0w_ quits [Ping timeout: 250 seconds] |
| 16:58:03 | | fl0w_ joins |
| 17:00:30 | | fl0w quits [Ping timeout: 265 seconds] |
| 17:06:51 | | fl0w joins |
| 17:09:41 | | fl0w_ quits [Ping timeout: 265 seconds] |
| 17:15:15 | | thuban joins |
| 17:18:10 | | fl0w_ joins |
| 17:22:14 | | fl0w quits [Ping timeout: 264 seconds] |
| 17:25:04 | | fl0w joins |
| 17:27:38 | | fl0w_ quits [Ping timeout: 264 seconds] |
| 17:30:59 | | fl0w_ joins |
| 17:34:14 | | fl0w quits [Ping timeout: 264 seconds] |
| 18:14:56 | | chrismeller quits [Read error: Connection reset by peer] |
| 18:16:12 | | LeGoupil quits [Client Quit] |
| 18:32:10 | | fl0w joins |
| 18:34:45 | | fl0w_ quits [Ping timeout: 265 seconds] |
| 18:36:00 | <h2ibot> | Usernam edited List of websites excluded from the Wayback Machine (+25): https://wiki.archiveteam.org/?diff=49350&oldid=49331 |
| 18:57:55 | | sonick quits [Client Quit] |
| 19:19:34 | | Megame (Megame) joins |
| 19:59:17 | | Megame quits [Client Quit] |
| 20:03:41 | | wyatt8740 quits [Ping timeout: 265 seconds] |
| 20:22:55 | | Ketchup901 (Ketchup901) joins |
| 20:33:59 | | wyatt8740 joins |
| 20:34:32 | <fishingforsoup> | Fuddles, this isn't archived anywhere. |
| 20:34:33 | <fishingforsoup> | https://youtu.be/04urYK0yVZY |
| 20:36:21 | <fishingforsoup> | Or this. |
| 20:36:22 | <fishingforsoup> | https://youtu.be/besw5nKbidw |
| 20:37:08 | <fishingforsoup> | Or this. |
| 20:37:08 | <fishingforsoup> | https://youtu.be/4YIEIuKhuAQ |
| 20:37:46 | <fishingforsoup> | https://youtu.be/0u6koEzcRJo |
| 20:37:50 | <fishingforsoup> | You get the idea. |
| 20:41:26 | | wyatt8740 quits [Ping timeout: 264 seconds] |
| 20:42:44 | | wyatt8740 joins |
| 20:47:09 | | wyatt8740 quits [Ping timeout: 250 seconds] |
| 20:48:25 | | wyatt8740 joins |
| 22:26:59 | | Ketchup901 quits [Remote host closed the connection] |
| 22:27:15 | | Ketchup901 (Ketchup901) joins |
| 22:37:22 | | BlueMaxima joins |
| 23:06:48 | | hitgrr8 quits [Client Quit] |
| 23:31:59 | | Megame (Megame) joins |
| 23:33:00 | | AlsoTheTechRobo (TheTechRobo) joins |
| 23:36:09 | | TheTechRobo quits [Ping timeout: 250 seconds] |
| 23:46:12 | | fl0w_ joins |
| 23:50:22 | | fl0w quits [Ping timeout: 265 seconds] |