00:04:57britmob quits [Remote host closed the connection]
00:05:24britmob joins
00:05:26Stiletto quits [Read error: Connection reset by peer]
00:05:48Stiletto joins
00:06:31Ryz quits [Client Quit]
00:06:49Ryz (Ryz) joins
00:41:58webdownload quits [Ping timeout: 244 seconds]
00:45:38mary quits [Ping timeout: 258 seconds]
01:03:39dm4v quits [Ping timeout: 258 seconds]
01:04:59dm4v joins
01:05:01dm4v quits [Changing host]
01:05:01dm4v (dm4v) joins
01:07:08mary joins
01:15:32Iki1 quits [Ping timeout: 258 seconds]
01:34:02Iki1 joins
01:46:34Lord_Nightmare quits [Ping timeout: 250 seconds]
01:49:50Lord_Nightmare (Lord_Nightmare) joins
01:50:30webdownload joins
02:24:25lennier1 quits [Client Quit]
02:30:09lennier1 (lennier1) joins
03:09:27DogsRNice quits [Read error: Connection reset by peer]
03:30:10qw3rty__ joins
03:33:55qw3rty_ quits [Ping timeout: 258 seconds]
03:38:31Iki1 quits [Ping timeout: 258 seconds]
04:18:42etnguyen03 quits [Client Quit]
05:34:20BlueMaxima__ quits [Read error: Connection reset by peer]
05:34:36BlueMaxima__ joins
05:53:21driib (driib) joins
06:29:41Mateon1 quits [Remote host closed the connection]
06:29:51<@JAA>Wayward: There are two Bethesda forums. One that shut down early last year, which is what you pasted, and one that will be shut down soon.
06:29:56Mateon1 joins
06:31:40<Wayward>ah
06:32:36<Wayward>Why is one of the most valuable gaming companies shuttering?
06:33:05<@JAA>They aren't.
06:33:33<Wayward>Just shutting down community forums?
06:33:38<@JAA>Yeah
06:34:00<Wayward>Can't keep up with European anti-speech laws and regulations?
06:34:01<@JAA>They're reducing costs by moving all community interaction into a third-party walled garden instead of continuing to host their own.
06:35:16<@JAA>You can move that stuff to -ot or, better yet, a non-AT channel.
06:36:23ThreeHeadedMonkey quits [Ping timeout: 258 seconds]
06:39:55ThreeHeadedMonkey (ThreeHeadedMonkey) joins
06:45:12driib quits [Ping timeout: 244 seconds]
06:45:14Aerochrome joins
06:48:54driib (driib) joins
06:51:32<@JAA>Aerochrome: So there's an ArchiveBot job running for it, and I intend to grab the JSON API data once it's read-only. Whether it will work in the Wayback Machine remains to be seen. It's a nightmare of JavaScript, unfortunately.
06:51:54<Aerochrome>Ah
06:52:50<Aerochrome>Well I have my warriors running on other things, spinning away.
06:53:46<@JAA>Yeah, we'll see whether we need any distribution there. The forums are pretty small. My API grab would probably just take a couple hours from a single machine without even trying to go fast.
06:54:12<@JAA>The big issue is getting it to play back in the WBM though.
06:54:36<Aerochrome>I'll admit that I have no idea what complications that entails.
06:55:31<@JAA>Basically, we need to simulate all the JavaScript interactions, i.e. reproduce the HTTP requests that triggers etc.
06:55:47<@JAA>There's also some WebSocket stuff though, which is near-impossible to properly capture and play back.
07:01:03<Aerochrome>That first sentence gives me flashbacks to CS100
07:18:12duce1337 (duce1337) joins
08:56:43Arcorann (Arcorann) joins
08:57:13Arcorann quits [Remote host closed the connection]
08:57:32Arcorann (Arcorann) joins
09:15:30Ctrl-S quits [Read error: Connection reset by peer]
09:17:00deni quits [Read error: Connection reset by peer]
09:20:41justcool393 quits [Read error: Connection reset by peer]
09:23:35@hook54321 quits [Read error: Connection reset by peer]
09:32:05Aerochrome quits [Read error: Connection reset by peer]
09:32:40Arcorann_ joins
09:33:45deni (deni) joins
09:35:55Aerochrome joins
09:36:33Arcorann quits [Ping timeout: 258 seconds]
09:36:48Ctrl-S joins
09:37:19justcool393 (justcool393) joins
09:37:44RJHacker15382 joins
10:13:06Aerochrome quits [Client Quit]
10:18:15duce1337_ (duce1337) joins
10:18:15duce1337 quits [Read error: Connection reset by peer]
10:41:37Iki joins
11:16:40BlueMaxima__ quits [Read error: Connection reset by peer]
12:17:13mutantmonkey quits [Remote host closed the connection]
12:17:31mutantmonkey (mutantmonkey) joins
12:45:21katocala quits [Remote host closed the connection]
12:49:03<thuban>can i !ao in archivebot without voice or is that just for !ao < ? i want to keep an eye on the freenode implosion
12:58:37<thuban>(alternatively, may i be granted chanserv-voice in there?)
13:02:14Arcorann__ joins
13:05:51Arcorann_ quits [Ping timeout: 258 seconds]
13:07:17<betamax>thuban: I think(?) you can !io directly without voice
13:07:26<betamax>try it and see!
13:08:27<betamax>also, I just asked kline if anything should be archived, and I've done www.kline.sh and the freenodestaff twitter account, I'm not sure if anything else needs saving
13:09:09<thuban>i've got urls for a bunch of resignation letters & published logs of conversations with andrew lee
13:10:00<@OrIdow6>AFAIK !ao is still disabled for those w/o perms
13:10:10<thuban>can ab do twitter (individual tweets) or does that need to go through socialbot?
13:10:30<@OrIdow6>Could do logs also
13:12:04<betamax>thuban: AB can do individual tweets, all socialbot does is call snscrape (https://github.com/JustAnotherArchivist/snscrape/) to get a list of tweets then feed it into AB with '!ao <'
13:12:20<thuban>gotcha, thanks
13:12:34<betamax>OrIdow6: stupid question, but where are the logs (not a freenode user)
13:14:03<russss>for a single URL, why not just use web.archive.org/save?
13:14:20Viniter7 (Viniter) joins
13:14:29<russss>I've been doing that on most of these anyway
13:14:52<thuban>lots of urls (i was thinking of !ao so i could do them as i found new ones), plus spn is super flaky compared to archivebot
13:15:08<russss>fair enough
13:15:28<russss>I've used SPN on most of these anyway. Reflexive behaviour when I see drama.
13:15:47<betamax>russss: Convenience (no need to go to web browser) and as thuban says it's super flakey (and has recently introduced a limit of a few hundred per day I believe)
13:15:48<thuban>haha
13:16:27<thuban>yeah, i saw some of them already archived when i went looking for 404d draft versions. glad to see i'm not alone
13:17:44Viniter quits [Ping timeout: 258 seconds]
13:17:44Viniter7 is now known as Viniter
13:21:44katocala joins
13:24:47Vukky quits [Read error: Connection reset by peer]
13:35:19<thuban>uhh, apparently unvoiced users currently can't !ao OR !ao <. i've definitely been able to do the latter in the past.
13:35:31<@jrwr>has anyone seen the shit going down over at freenode?
13:35:41<@jrwr>https://blog.bofh.it/debian/id_461
13:35:54<thuban>yeah, we were just talking about it
13:36:38<thuban>i've prepped https://transfer.archivete.am/ruZPi/freenode.txt (resignation letters) and https://transfer.archivete.am/bBNUs/freenode_outlinks.txt (logs and other links) for archivebot, but someone with voice needs to submit
13:40:45<thuban>betamax, would you mind? ^
13:43:14<betamax>sure
13:43:45<thuban>thanks! :)
13:46:09benjinsmith joins
13:49:10benjins quits [Ping timeout: 258 seconds]
13:58:01benjinsmith is now known as benjins
14:15:37<betamax>thuban: if you come across any other links feel free to mention me here and I'll add them to AB
14:19:29<@EggplantN>betamax / thuban
14:19:36<@EggplantN>outlinks are best sent to #//
14:19:53<@EggplantN>please just ping me/ HCross / Kaz (?) for that
14:20:18<@Kaz>nod
14:22:09<betamax>I'm being stupid, but what do you mean by "outlinks are best sent to #//"?
14:22:20<@EggplantN>that is our dedicated outlinks project!
14:22:29<@EggplantN>we use it for Reddit/other projects when they're on going
14:22:37<@EggplantN>its tons faster than AB :P
14:23:37<betamax>Oh, I see, it's an IRC channel. I didn't realise that would be a valid channel name :)
14:24:16<thuban>should archivebot then be used only for recursive crawls?
14:25:23<@EggplantN>Ideally? I think so. #// our urls project is like ultra fast, each URL is tried up to 5? times and our queueing scripts ensure its not NXDOMAIN or an invalid domain anyway to help weed out the bad URLs
14:26:00<betamax>That's what I'd be interested to know, too. I've slowly working my way through 20 million tweets (from the UK 2021 elections), sent into AB in chunks of 1M URLs at a time.
14:26:17<@EggplantN>Ah so we *only* get the URL you send.
14:27:29<betamax>Ah, I see. That makes a difference. Mind if I update the wiki page to note this?
14:27:59<@EggplantN>obviously that list is quite small by thuban but if its much larger. (like Doranwen who is scraping the yahoo groups data for outlinks or rewby who scrapes any IA collection needed) we have taken millions and we can process them at like 30k/min+
14:28:05<@EggplantN>More if the tracker can handle it
14:28:33<betamax>I assume, like with AB, that facebook is blocked?
14:33:15Arcorann__ quits [Ping timeout: 258 seconds]
14:35:35<@EggplantN>and IG
14:37:27<betamax>I wonder how far you could get with using cookies from a genuine FB / instagram account...
14:37:30betamax has an account they don't care about, and could try using wget with warc output
14:38:02<@EggplantN>i'd have a looooooooong old chat with JAA
14:38:02<@EggplantN>:P
14:41:39<betamax>JAA: ^^ any thoughts? (I don't particularly care about my FB account, provided it doesn't get the underlying gmail banned)
14:43:05<@jrwr>You should see the choas that is libre.chat irc network. poor nerds are getting swarmed
14:46:53Daloader joins
14:49:38etnguyen03 (etnguyen03) joins
14:49:38dm4v quits [Read error: Connection reset by peer]
14:50:23dm4v joins
14:50:25dm4v quits [Changing host]
14:50:25dm4v (dm4v) joins
15:09:40duce1337_ quits [Read error: Connection reset by peer]
15:09:40duce1337 (duce1337) joins
15:38:25Inhonion joins
15:44:35Inhonion quits [Ping timeout: 244 seconds]
15:53:10<betamax>JAA: am I right in thinking that all of the UK election sites have now been scheduled in AB, and it's only the lib dem ones remaining (and those are basically dead due to blocking)?
15:55:02<betamax>Hmm, nvm I'm wrong, there's also job 2zy0vv1vxowss7sqav4my01us for the christian party website (seems to be blocked too), 2sta61ldqwnx2fb7ncyp1w7nh for the socialist party, the conservatives site, etc....
15:57:47Inhonion joins
16:15:13<@JAA>betamax: I have no experience with the rate limits of Facebook and Instagram when logged in. Archiving while logged in is somewhat problematic though, and generally those wouldn't go into the WBM. Also, snscrape doesn't support authentication.
16:15:52<@JAA>Yes, all UK election sites have been submitted to AB, but there are about 100 in total which had or have problems, including most of the LibDem ones.
16:17:54aleph quits [Quit: WeeChat info:version]
16:18:10aleph joins
16:18:29<betamax>What's the plan for those? I assume thet can't be kept in limbo forever - is there a way to "abort" whilst keeping anything that was saved?
16:22:45spirit joins
16:26:47lennier1 quits [Client Quit]
16:27:25lennier1 (lennier1) joins
16:29:09RJHacker15382 is now known as hook54321
16:29:38hook54321 is now known as RJHacker47119
16:33:37Iki quits [Ping timeout: 258 seconds]
16:34:01RJHacker47119 is now known as hook54321
16:34:01hook54321 quits [Changing host]
16:34:02hook54321 (hook54321) joins
16:34:02@ChanServ sets mode: +o hook54321
16:36:38@hook54321 quits [Client Quit]
16:37:49hook54321 (hook54321) joins
16:37:49@ChanServ sets mode: +o hook54321
16:57:48LeGoupil joins
17:31:14Wayward quits [Ping timeout: 250 seconds]
17:40:31<@JAA>betamax: I'll probably run them into the ground and then figure out how to rerun them.
17:47:53Iki joins
17:51:55<AK>EggplantN "our queueing scripts ensure its not NXDOMAIN or an invalid domain anyway to help weed out the bad URLs" Is this new then? Or is that only for manual queues. I still see a lot of nxdomains that may be from reddit
18:02:36<@EggplantN>Manual queues only AK
18:02:59<AK>Alright
18:15:11duce1337_ (duce1337) joins
18:15:11duce1337 quits [Read error: Connection reset by peer]
18:23:47DogsRNice (Webuser299) joins
18:48:40HackMii_ quits [Remote host closed the connection]
18:49:03HackMii_ (hacktheplanet) joins
18:59:26<Iki>Ohhhh. I see in chat above that SPN instituted a daily page limit. Explains things recently, maybe--I wonder if that includes outlinks
19:03:55<@arkiver>luckily we have archivebot!
19:04:06<@arkiver>and #// for large lists of URLs (no page requisites)
19:10:23<Iki>I recently got access to IA's google sheets tool as well. I wonder if they include the page limit on that as well--we'll see!
19:10:31<thuban>whoa whoa whoa, i thought #// did assets! EggplantN is this true? ^
19:10:54<@EggplantN>It does not do assets
19:11:37<thuban>ahhh. yeah, that's another reason to prefer ab for moderately-sized lists...
19:12:37<thuban>just out of curiosity, was that an a priori design decision or more of a technical thing?
19:15:31<@JAA>It was a conscious decision due to size, I believe. Scripts and stylesheets compress well, but images and videos would blow the data volume up by 1-2 orders of magnitude.
19:16:32Daloader quits [Ping timeout: 250 seconds]
19:44:22pew joins
19:45:51Gaelan quits [Quit: ZNC 1.8.2 - https://znc.in]
19:46:23Gaelan (Gaelan) joins
19:50:07<AK>I believe arkiver has done some testing occasionally on adding assets, but that got reverted when size ballooned
20:03:25<@arkiver>yeah we'd be looking at many TBs/day
20:10:34hexa- quits [Quit: WeeChat 2.9]
20:12:22hexa- (hexa-) joins
20:13:18spirit quits [Client Quit]
20:25:09Gereon quits [Ping timeout: 258 seconds]
20:29:41Wayward (wayward) joins
21:00:11LeGoupil quits [Client Quit]
21:01:00Aerochrome joins
21:12:52superkuh joins
21:47:01Mikesky (Mikesky) joins
22:18:58nerdguy1138 quits [Ping timeout: 250 seconds]
22:21:39duce1337_ quits [Read error: Connection reset by peer]
22:21:39duce1337 (duce1337) joins
22:23:33Wayward- (wayward) joins
22:25:31Wayward quits [Ping timeout: 258 seconds]
22:29:51duce1337 quits [Client Quit]
22:34:01nerdguy1138 (nerdguy1138) joins
22:58:48Mikesky quits [Client Quit]
23:12:40pew quits [Ping timeout: 258 seconds]
23:24:37pew joins
23:26:28Wayward- quits [Ping timeout: 258 seconds]
23:28:02driib quits [Ping timeout: 244 seconds]
23:29:59<@EggplantN>so any idea if we're gonna move to freenode /s
23:45:20<yano>:p
23:47:40murmur quits [Quit: leaving]
23:51:16murmur joins
23:53:49murmur quits [Client Quit]
23:55:57<SCSi>looool
23:56:00<SCSi>EggplantN with the lulz
23:57:40murmur joins