| 00:04:57 | | britmob quits [Remote host closed the connection] |
| 00:05:24 | | britmob joins |
| 00:05:26 | | Stiletto quits [Read error: Connection reset by peer] |
| 00:05:48 | | Stiletto joins |
| 00:06:31 | | Ryz quits [Client Quit] |
| 00:06:49 | | Ryz (Ryz) joins |
| 00:41:58 | | webdownload quits [Ping timeout: 244 seconds] |
| 00:45:38 | | mary quits [Ping timeout: 258 seconds] |
| 01:03:39 | | dm4v quits [Ping timeout: 258 seconds] |
| 01:04:59 | | dm4v joins |
| 01:05:01 | | dm4v is now authenticated as dm4v |
| 01:05:01 | | dm4v quits [Changing host] |
| 01:05:01 | | dm4v (dm4v) joins |
| 01:07:08 | | mary joins |
| 01:15:32 | | Iki1 quits [Ping timeout: 258 seconds] |
| 01:34:02 | | Iki1 joins |
| 01:46:34 | | Lord_Nightmare quits [Ping timeout: 250 seconds] |
| 01:49:50 | | Lord_Nightmare (Lord_Nightmare) joins |
| 01:50:30 | | webdownload joins |
| 02:24:25 | | lennier1 quits [Client Quit] |
| 02:30:09 | | lennier1 (lennier1) joins |
| 03:09:27 | | DogsRNice quits [Read error: Connection reset by peer] |
| 03:30:10 | | qw3rty__ joins |
| 03:33:55 | | qw3rty_ quits [Ping timeout: 258 seconds] |
| 03:38:31 | | Iki1 quits [Ping timeout: 258 seconds] |
| 04:18:42 | | etnguyen03 quits [Client Quit] |
| 05:34:20 | | BlueMaxima__ quits [Read error: Connection reset by peer] |
| 05:34:36 | | BlueMaxima__ joins |
| 05:53:21 | | driib (driib) joins |
| 06:29:41 | | Mateon1 quits [Remote host closed the connection] |
| 06:29:51 | <@JAA> | Wayward: There are two Bethesda forums. One that shut down early last year, which is what you pasted, and one that will be shut down soon. |
| 06:29:56 | | Mateon1 joins |
| 06:31:40 | <Wayward> | ah |
| 06:32:36 | <Wayward> | Why is one of the most valuable gaming companies shuttering? |
| 06:33:05 | <@JAA> | They aren't. |
| 06:33:33 | <Wayward> | Just shutting down community forums? |
| 06:33:38 | <@JAA> | Yeah |
| 06:34:00 | <Wayward> | Can't keep up with European anti-speech laws and regulations? |
| 06:34:01 | <@JAA> | They're reducing costs by moving all community interaction into a third-party walled garden instead of continuing to host their own. |
| 06:35:16 | <@JAA> | You can move that stuff to -ot or, better yet, a non-AT channel. |
| 06:36:23 | | ThreeHeadedMonkey quits [Ping timeout: 258 seconds] |
| 06:39:55 | | ThreeHeadedMonkey (ThreeHeadedMonkey) joins |
| 06:45:12 | | driib quits [Ping timeout: 244 seconds] |
| 06:45:14 | | Aerochrome joins |
| 06:48:54 | | driib (driib) joins |
| 06:51:32 | <@JAA> | Aerochrome: So there's an ArchiveBot job running for it, and I intend to grab the JSON API data once it's read-only. Whether it will work in the Wayback Machine remains to be seen. It's a nightmare of JavaScript, unfortunately. |
| 06:51:54 | <Aerochrome> | Ah |
| 06:52:50 | <Aerochrome> | Well I have my warriors running on other things, spinning away. |
| 06:53:46 | <@JAA> | Yeah, we'll see whether we need any distribution there. The forums are pretty small. My API grab would probably just take a couple hours from a single machine without even trying to go fast. |
| 06:54:12 | <@JAA> | The big issue is getting it to play back in the WBM though. |
| 06:54:36 | <Aerochrome> | I'll admit that I have no idea what complications that entails. |
| 06:55:31 | <@JAA> | Basically, we need to simulate all the JavaScript interactions, i.e. reproduce the HTTP requests that triggers etc. |
| 06:55:47 | <@JAA> | There's also some WebSocket stuff though, which is near-impossible to properly capture and play back. |
| 07:01:03 | <Aerochrome> | That first sentence gives me flashbacks to CS100 |
| 07:18:12 | | duce1337 (duce1337) joins |
| 08:56:43 | | Arcorann (Arcorann) joins |
| 08:57:13 | | Arcorann quits [Remote host closed the connection] |
| 08:57:32 | | Arcorann (Arcorann) joins |
| 09:15:30 | | Ctrl-S quits [Read error: Connection reset by peer] |
| 09:17:00 | | deni quits [Read error: Connection reset by peer] |
| 09:20:41 | | justcool393 quits [Read error: Connection reset by peer] |
| 09:23:35 | | @hook54321 quits [Read error: Connection reset by peer] |
| 09:32:05 | | Aerochrome quits [Read error: Connection reset by peer] |
| 09:32:40 | | Arcorann_ joins |
| 09:33:45 | | deni (deni) joins |
| 09:35:55 | | Aerochrome joins |
| 09:36:33 | | Arcorann quits [Ping timeout: 258 seconds] |
| 09:36:48 | | Ctrl-S joins |
| 09:37:19 | | justcool393 (justcool393) joins |
| 09:37:44 | | RJHacker15382 joins |
| 10:13:06 | | Aerochrome quits [Client Quit] |
| 10:18:15 | | duce1337_ (duce1337) joins |
| 10:18:15 | | duce1337 quits [Read error: Connection reset by peer] |
| 10:41:37 | | Iki joins |
| 11:16:40 | | BlueMaxima__ quits [Read error: Connection reset by peer] |
| 12:17:13 | | mutantmonkey quits [Remote host closed the connection] |
| 12:17:31 | | mutantmonkey (mutantmonkey) joins |
| 12:45:21 | | katocala quits [Remote host closed the connection] |
| 12:49:03 | <thuban> | can i !ao in archivebot without voice or is that just for !ao < ? i want to keep an eye on the freenode implosion |
| 12:58:37 | <thuban> | (alternatively, may i be granted chanserv-voice in there?) |
| 13:02:14 | | Arcorann__ joins |
| 13:05:51 | | Arcorann_ quits [Ping timeout: 258 seconds] |
| 13:07:17 | <betamax> | thuban: I think(?) you can !io directly without voice |
| 13:07:26 | <betamax> | try it and see! |
| 13:08:27 | <betamax> | also, I just asked kline if anything should be archived, and I've done www.kline.sh and the freenodestaff twitter account, I'm not sure if anything else needs saving |
| 13:09:09 | <thuban> | i've got urls for a bunch of resignation letters & published logs of conversations with andrew lee |
| 13:10:00 | <@OrIdow6> | AFAIK !ao is still disabled for those w/o perms |
| 13:10:10 | <thuban> | can ab do twitter (individual tweets) or does that need to go through socialbot? |
| 13:10:30 | <@OrIdow6> | Could do logs also |
| 13:12:04 | <betamax> | thuban: AB can do individual tweets, all socialbot does is call snscrape (https://github.com/JustAnotherArchivist/snscrape/) to get a list of tweets then feed it into AB with '!ao <' |
| 13:12:20 | <thuban> | gotcha, thanks |
| 13:12:34 | <betamax> | OrIdow6: stupid question, but where are the logs (not a freenode user) |
| 13:14:03 | <russss> | for a single URL, why not just use web.archive.org/save? |
| 13:14:20 | | Viniter7 (Viniter) joins |
| 13:14:29 | <russss> | I've been doing that on most of these anyway |
| 13:14:52 | <thuban> | lots of urls (i was thinking of !ao so i could do them as i found new ones), plus spn is super flaky compared to archivebot |
| 13:15:08 | <russss> | fair enough |
| 13:15:28 | <russss> | I've used SPN on most of these anyway. Reflexive behaviour when I see drama. |
| 13:15:47 | <betamax> | russss: Convenience (no need to go to web browser) and as thuban says it's super flakey (and has recently introduced a limit of a few hundred per day I believe) |
| 13:15:48 | <thuban> | haha |
| 13:16:27 | <thuban> | yeah, i saw some of them already archived when i went looking for 404d draft versions. glad to see i'm not alone |
| 13:17:44 | | Viniter quits [Ping timeout: 258 seconds] |
| 13:17:44 | | Viniter7 is now known as Viniter |
| 13:21:44 | | katocala joins |
| 13:24:47 | | Vukky quits [Read error: Connection reset by peer] |
| 13:25:36 | | katocala is now authenticated as katocala |
| 13:35:19 | <thuban> | uhh, apparently unvoiced users currently can't !ao OR !ao <. i've definitely been able to do the latter in the past. |
| 13:35:31 | <@jrwr> | has anyone seen the shit going down over at freenode? |
| 13:35:41 | <@jrwr> | https://blog.bofh.it/debian/id_461 |
| 13:35:54 | <thuban> | yeah, we were just talking about it |
| 13:36:38 | <thuban> | i've prepped https://transfer.archivete.am/ruZPi/freenode.txt (resignation letters) and https://transfer.archivete.am/bBNUs/freenode_outlinks.txt (logs and other links) for archivebot, but someone with voice needs to submit |
| 13:40:45 | <thuban> | betamax, would you mind? ^ |
| 13:43:14 | <betamax> | sure |
| 13:43:45 | <thuban> | thanks! :) |
| 13:46:09 | | benjinsmith joins |
| 13:49:10 | | benjins quits [Ping timeout: 258 seconds] |
| 13:58:01 | | benjinsmith is now known as benjins |
| 13:58:03 | | benjins is now authenticated as benjins |
| 14:15:37 | <betamax> | thuban: if you come across any other links feel free to mention me here and I'll add them to AB |
| 14:19:29 | <@EggplantN> | betamax / thuban |
| 14:19:36 | <@EggplantN> | outlinks are best sent to #// |
| 14:19:53 | <@EggplantN> | please just ping me/ HCross / Kaz (?) for that |
| 14:20:18 | <@Kaz> | nod |
| 14:22:09 | <betamax> | I'm being stupid, but what do you mean by "outlinks are best sent to #//"? |
| 14:22:20 | <@EggplantN> | that is our dedicated outlinks project! |
| 14:22:29 | <@EggplantN> | we use it for Reddit/other projects when they're on going |
| 14:22:37 | <@EggplantN> | its tons faster than AB :P |
| 14:23:37 | <betamax> | Oh, I see, it's an IRC channel. I didn't realise that would be a valid channel name :) |
| 14:24:16 | <thuban> | should archivebot then be used only for recursive crawls? |
| 14:25:23 | <@EggplantN> | Ideally? I think so. #// our urls project is like ultra fast, each URL is tried up to 5? times and our queueing scripts ensure its not NXDOMAIN or an invalid domain anyway to help weed out the bad URLs |
| 14:26:00 | <betamax> | That's what I'd be interested to know, too. I've slowly working my way through 20 million tweets (from the UK 2021 elections), sent into AB in chunks of 1M URLs at a time. |
| 14:26:17 | <@EggplantN> | Ah so we *only* get the URL you send. |
| 14:27:29 | <betamax> | Ah, I see. That makes a difference. Mind if I update the wiki page to note this? |
| 14:27:59 | <@EggplantN> | obviously that list is quite small by thuban but if its much larger. (like Doranwen who is scraping the yahoo groups data for outlinks or rewby who scrapes any IA collection needed) we have taken millions and we can process them at like 30k/min+ |
| 14:28:05 | <@EggplantN> | More if the tracker can handle it |
| 14:28:33 | <betamax> | I assume, like with AB, that facebook is blocked? |
| 14:33:15 | | Arcorann__ quits [Ping timeout: 258 seconds] |
| 14:35:35 | <@EggplantN> | and IG |
| 14:37:27 | <betamax> | I wonder how far you could get with using cookies from a genuine FB / instagram account... |
| 14:37:30 | | betamax has an account they don't care about, and could try using wget with warc output |
| 14:38:02 | <@EggplantN> | i'd have a looooooooong old chat with JAA |
| 14:38:02 | <@EggplantN> | :P |
| 14:41:39 | <betamax> | JAA: ^^ any thoughts? (I don't particularly care about my FB account, provided it doesn't get the underlying gmail banned) |
| 14:43:05 | <@jrwr> | You should see the choas that is libre.chat irc network. poor nerds are getting swarmed |
| 14:46:53 | | Daloader joins |
| 14:49:38 | | etnguyen03 (etnguyen03) joins |
| 14:49:38 | | dm4v quits [Read error: Connection reset by peer] |
| 14:50:23 | | dm4v joins |
| 14:50:25 | | dm4v is now authenticated as dm4v |
| 14:50:25 | | dm4v quits [Changing host] |
| 14:50:25 | | dm4v (dm4v) joins |
| 15:09:40 | | duce1337_ quits [Read error: Connection reset by peer] |
| 15:09:40 | | duce1337 (duce1337) joins |
| 15:38:25 | | Inhonion joins |
| 15:44:35 | | Inhonion quits [Ping timeout: 244 seconds] |
| 15:53:10 | <betamax> | JAA: am I right in thinking that all of the UK election sites have now been scheduled in AB, and it's only the lib dem ones remaining (and those are basically dead due to blocking)? |
| 15:55:02 | <betamax> | Hmm, nvm I'm wrong, there's also job 2zy0vv1vxowss7sqav4my01us for the christian party website (seems to be blocked too), 2sta61ldqwnx2fb7ncyp1w7nh for the socialist party, the conservatives site, etc.... |
| 15:57:47 | | Inhonion joins |
| 16:15:13 | <@JAA> | betamax: I have no experience with the rate limits of Facebook and Instagram when logged in. Archiving while logged in is somewhat problematic though, and generally those wouldn't go into the WBM. Also, snscrape doesn't support authentication. |
| 16:15:52 | <@JAA> | Yes, all UK election sites have been submitted to AB, but there are about 100 in total which had or have problems, including most of the LibDem ones. |
| 16:17:54 | | aleph quits [Quit: WeeChat info:version] |
| 16:18:10 | | aleph joins |
| 16:18:29 | <betamax> | What's the plan for those? I assume thet can't be kept in limbo forever - is there a way to "abort" whilst keeping anything that was saved? |
| 16:22:45 | | spirit joins |
| 16:26:47 | | lennier1 quits [Client Quit] |
| 16:27:25 | | lennier1 (lennier1) joins |
| 16:29:09 | | RJHacker15382 is now known as hook54321 |
| 16:29:38 | | hook54321 is now known as RJHacker47119 |
| 16:33:37 | | Iki quits [Ping timeout: 258 seconds] |
| 16:34:01 | | RJHacker47119 is now known as hook54321 |
| 16:34:01 | | hook54321 is now authenticated as hook54321 |
| 16:34:01 | | hook54321 quits [Changing host] |
| 16:34:02 | | hook54321 (hook54321) joins |
| 16:34:02 | | @ChanServ sets mode: +o hook54321 |
| 16:36:38 | | @hook54321 quits [Client Quit] |
| 16:37:49 | | hook54321 (hook54321) joins |
| 16:37:49 | | @ChanServ sets mode: +o hook54321 |
| 16:57:48 | | LeGoupil joins |
| 17:31:14 | | Wayward quits [Ping timeout: 250 seconds] |
| 17:40:31 | <@JAA> | betamax: I'll probably run them into the ground and then figure out how to rerun them. |
| 17:47:53 | | Iki joins |
| 17:51:55 | <AK> | EggplantN "our queueing scripts ensure its not NXDOMAIN or an invalid domain anyway to help weed out the bad URLs" Is this new then? Or is that only for manual queues. I still see a lot of nxdomains that may be from reddit |
| 18:02:36 | <@EggplantN> | Manual queues only AK |
| 18:02:59 | <AK> | Alright |
| 18:15:11 | | duce1337_ (duce1337) joins |
| 18:15:11 | | duce1337 quits [Read error: Connection reset by peer] |
| 18:23:47 | | DogsRNice (Webuser299) joins |
| 18:48:40 | | HackMii_ quits [Remote host closed the connection] |
| 18:49:03 | | HackMii_ (hacktheplanet) joins |
| 18:59:26 | <Iki> | Ohhhh. I see in chat above that SPN instituted a daily page limit. Explains things recently, maybe--I wonder if that includes outlinks |
| 19:03:55 | <@arkiver> | luckily we have archivebot! |
| 19:04:06 | <@arkiver> | and #// for large lists of URLs (no page requisites) |
| 19:10:23 | <Iki> | I recently got access to IA's google sheets tool as well. I wonder if they include the page limit on that as well--we'll see! |
| 19:10:31 | <thuban> | whoa whoa whoa, i thought #// did assets! EggplantN is this true? ^ |
| 19:10:54 | <@EggplantN> | It does not do assets |
| 19:11:37 | <thuban> | ahhh. yeah, that's another reason to prefer ab for moderately-sized lists... |
| 19:12:37 | <thuban> | just out of curiosity, was that an a priori design decision or more of a technical thing? |
| 19:15:31 | <@JAA> | It was a conscious decision due to size, I believe. Scripts and stylesheets compress well, but images and videos would blow the data volume up by 1-2 orders of magnitude. |
| 19:16:32 | | Daloader quits [Ping timeout: 250 seconds] |
| 19:44:22 | | pew joins |
| 19:45:51 | | Gaelan quits [Quit: ZNC 1.8.2 - https://znc.in] |
| 19:46:23 | | Gaelan (Gaelan) joins |
| 19:50:07 | <AK> | I believe arkiver has done some testing occasionally on adding assets, but that got reverted when size ballooned |
| 20:03:25 | <@arkiver> | yeah we'd be looking at many TBs/day |
| 20:10:34 | | hexa- quits [Quit: WeeChat 2.9] |
| 20:12:22 | | hexa- (hexa-) joins |
| 20:13:18 | | spirit quits [Client Quit] |
| 20:25:09 | | Gereon quits [Ping timeout: 258 seconds] |
| 20:29:41 | | Wayward (wayward) joins |
| 21:00:11 | | LeGoupil quits [Client Quit] |
| 21:01:00 | | Aerochrome joins |
| 21:12:52 | | superkuh joins |
| 21:47:01 | | Mikesky (Mikesky) joins |
| 22:18:58 | | nerdguy1138 quits [Ping timeout: 250 seconds] |
| 22:21:39 | | duce1337_ quits [Read error: Connection reset by peer] |
| 22:21:39 | | duce1337 (duce1337) joins |
| 22:23:33 | | Wayward- (wayward) joins |
| 22:25:31 | | Wayward quits [Ping timeout: 258 seconds] |
| 22:29:51 | | duce1337 quits [Client Quit] |
| 22:34:01 | | nerdguy1138 (nerdguy1138) joins |
| 22:58:48 | | Mikesky quits [Client Quit] |
| 23:12:40 | | pew quits [Ping timeout: 258 seconds] |
| 23:24:37 | | pew joins |
| 23:26:28 | | Wayward- quits [Ping timeout: 258 seconds] |
| 23:28:02 | | driib quits [Ping timeout: 244 seconds] |
| 23:29:59 | <@EggplantN> | so any idea if we're gonna move to freenode /s |
| 23:45:20 | <yano> | :p |
| 23:47:40 | | murmur quits [Quit: leaving] |
| 23:51:16 | | murmur joins |
| 23:53:49 | | murmur quits [Client Quit] |
| 23:55:57 | <SCSi> | looool |
| 23:56:00 | <SCSi> | EggplantN with the lulz |
| 23:57:40 | | murmur joins |