| 00:59:47 | | BlueMaxima joins |
| 01:00:56 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 01:01:05 | | BlueMaxima joins |
| 01:05:08 | | etnguyen03 quits [Ping timeout: 252 seconds] |
| 01:08:45 | | us3rrr quits [Ping timeout: 258 seconds] |
| 01:12:20 | | etnguyen03 (etnguyen03) joins |
| 01:16:53 | | asdad joins |
| 01:32:19 | | Lord_Nightmare quits [Client Quit] |
| 01:35:55 | | Lord_Nightmare (Lord_Nightmare) joins |
| 01:41:34 | | asdad quits [Remote host closed the connection] |
| 01:55:17 | | rohvani joins |
| 02:01:39 | | yasomi quits [Ping timeout: 258 seconds] |
| 02:05:38 | | etnguyen03 quits [Ping timeout: 252 seconds] |
| 02:19:32 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 02:19:48 | | AmAnd0A joins |
| 02:22:09 | | etnguyen03 (etnguyen03) joins |
| 02:30:44 | | yasomi (yasomi) joins |
| 02:33:17 | <h2ibot> | Manu edited Discourse (+195, add various active discourse forums): https://wiki.archiveteam.org/?diff=50143&oldid=49768 |
| 02:34:13 | <fireonlive> | i love how it's very tech biased :D |
| 02:37:18 | <h2ibot> | FireonLive edited Discourse (+54, add NTP Pool Community): https://wiki.archiveteam.org/?diff=50144&oldid=50143 |
| 02:48:33 | <fireonlive> | I assume Discourse is in #msgbored as well yeah? |
| 02:50:11 | | dumbgoy quits [Ping timeout: 252 seconds] |
| 02:51:19 | <@JAA> | Yep |
| 02:51:33 | <fireonlive> | :) |
| 02:51:43 | <Video> | in retrospect it was probably not a good idea to name myself Video |
| 02:52:42 | <fireonlive> | lemme kinda just janky wanky that into there |
| 02:52:48 | <fireonlive> | Video: xD |
| 02:52:59 | <fireonlive> | i wonder if myself thinks similar |
| 02:56:04 | <fireonlive> | wiki getting rid of /index.php/ when? :D |
| 03:01:23 | <h2ibot> | FireonLive edited Discourse (+218, slap in a 'place to gather at' for now): https://wiki.archiveteam.org/?diff=50145&oldid=50144 |
| 03:15:27 | <h2ibot> | FireonLive edited Current Projects (+146, let's keep recently finished around a bit longer): https://wiki.archiveteam.org/?diff=50146&oldid=50119 |
| 03:18:27 | <h2ibot> | FireonLive uploaded File:Discourse logo.png (source:…): https://wiki.archiveteam.org/?title=File%3ADiscourse%20logo.png |
| 03:19:28 | <h2ibot> | FireonLive edited Discourse (+28, add logo :)): https://wiki.archiveteam.org/?diff=50148&oldid=50145 |
| 03:19:32 | <fireonlive> | ok no more wiki spam from me for now :P |
| 03:19:57 | <nulldata> | https://news.ycombinator.com/item?id=36660481 |
| 03:20:39 | <fireonlive> | ..... |
| 03:21:04 | <nulldata> | What an awful response to deleting customer data without ensuring they are aware outside of an Email |
| 03:21:39 | <fireonlive> | yes indded |
| 03:22:01 | <fireonlive> | reminds me of Heroku, just being like well we emailed you, time to shred all the free tier stuff |
| 03:22:05 | <fireonlive> | iirc tha'ts how it went down |
| 03:22:07 | <@JAA> | TIL 'scream test' |
| 03:22:11 | <fireonlive> | https://news.ycombinator.com/item?id=36660869 makes a good 'what could have been done' |
| 03:22:17 | <fireonlive> | ye it's a fun term :D |
| 03:23:58 | <fireonlive> | https://news.ycombinator.com/item?id=36660626 "After that we make best efforts but if people can’t respond to vendors they pay money to, we’re really at a loss." |
| 03:25:20 | <fireonlive> | seems to be the only two comments he's made there |
| 03:50:13 | <nicolas17> | yikes |
| 03:55:13 | | nicolas17 quits [Ping timeout: 265 seconds] |
| 03:55:49 | <fireonlive> | oh that nicolas17 and his pings |
| 04:06:22 | <BPCZ> | Damn they really should have scream tested that instead of just pulling the plug. 1 month of no access would have saved anyone worth saving |
| 04:07:54 | | etnguyen03 quits [Client Quit] |
| 04:08:17 | <pabs> | tech234a: pass that domains thing to #// (the URLs project)? |
| 04:13:07 | | Meroje quits [Client Quit] |
| 04:13:19 | | Meroje joins |
| 04:13:19 | | Meroje is now authenticated as Meroje |
| 04:13:19 | | Meroje quits [Changing host] |
| 04:13:19 | | Meroje (Meroje) joins |
| 04:13:24 | | DLoader quits [Quit: DLoader] |
| 04:15:55 | | systwi quits [Read error: Connection reset by peer] |
| 04:15:55 | | Icyelut quits [Read error: Connection reset by peer] |
| 04:16:06 | | Icyelut (Icyelut) joins |
| 04:16:38 | | systwi (systwi) joins |
| 04:18:06 | | dxrt_ is now known as dxrt |
| 04:18:06 | | dxrt is now authenticated as dxrt |
| 04:18:06 | | dxrt quits [Changing host] |
| 04:18:06 | | dxrt (dxrt) joins |
| 04:18:06 | | @ChanServ sets mode: +o dxrt |
| 04:22:53 | | decky quits [Read error: Connection reset by peer] |
| 04:23:21 | | decky joins |
| 04:32:48 | <tech234a> | could be nice to get the homepages of a bunch of domains |
| 04:33:59 | | BigBrain_ quits [Remote host closed the connection] |
| 04:34:22 | | BigBrain_ (bigbrain) joins |
| 05:03:44 | | emberquill quits [Quit: The Lounge - https://thelounge.chat] |
| 05:04:06 | | emberquill (emberquill) joins |
| 05:05:25 | | emberquill quits [Client Quit] |
| 05:05:46 | | emberquill (emberquill) joins |
| 05:10:47 | <@JAA> | So regarding the EPA Archive, they've removed the shutdown date from the homepage at some point in March. |
| 05:11:45 | | Island quits [Read error: Connection reset by peer] |
| 05:17:43 | <fireonlive> | o_O |
| 05:27:40 | | hitgrr8 joins |
| 06:07:51 | | IDK (IDK) joins |
| 06:12:59 | | railen64 joins |
| 06:15:02 | | rubberduckie quits [Ping timeout: 258 seconds] |
| 06:41:56 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 06:47:47 | | bf_ joins |
| 07:00:08 | | nfriedly quits [Remote host closed the connection] |
| 07:19:28 | | DLoader joins |
| 07:31:04 | | rubberduckie joins |
| 07:54:17 | | Arcorann (Arcorann) joins |
| 07:57:23 | <h2ibot> | PaulWise edited Mailman2 (+24, lists.sucs.org done): https://wiki.archiveteam.org/?diff=50149&oldid=50133 |
| 08:08:03 | | railen64 quits [Remote host closed the connection] |
| 08:18:03 | | Dango360 quits [Read error: Connection reset by peer] |
| 08:37:46 | | yts98 leaves |
| 09:02:23 | | JensRex quits [Client Quit] |
| 09:02:56 | | JensRex (JensRex) joins |
| 09:30:31 | <flashfire42> | https://apnews.com/article/rutte-netherlands-migration-government-90b8716de33aabfad205874f69073b9c time to do some dutch stuff? |
| 09:34:29 | | nfriedly joins |
| 09:48:57 | | balrog quits [Max SendQ exceeded] |
| 09:49:42 | | balrog (balrog) joins |
| 09:49:42 | | imer quits [Client Quit] |
| 09:50:17 | | imer (imer) joins |
| 09:58:08 | | HotSwap quits [Ping timeout: 258 seconds] |
| 09:58:46 | | BigBrain_ quits [Ping timeout: 245 seconds] |
| 10:00:39 | | BigBrain_ (bigbrain) joins |
| 10:08:05 | | lflare quits [Client Quit] |
| 10:12:06 | | BigBrain_ quits [Ping timeout: 245 seconds] |
| 10:15:15 | | BigBrain_ (bigbrain) joins |
| 10:40:33 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 10:44:26 | | Icyelut|2 (Icyelut) joins |
| 10:45:46 | | Suika quits [Client Quit] |
| 10:45:46 | | iCaotix quits [Client Quit] |
| 10:45:48 | | Meroje quits [Client Quit] |
| 10:45:48 | | Icyelut quits [Write error: Connection reset by peer] |
| 10:45:54 | | iCaotix joins |
| 10:45:58 | | Meroje joins |
| 10:45:58 | | Meroje is now authenticated as Meroje |
| 10:45:58 | | Meroje quits [Changing host] |
| 10:45:58 | | Meroje (Meroje) joins |
| 10:46:22 | | Suika_ joins |
| 10:54:38 | | lflare (lflare) joins |
| 11:11:14 | | Letur quits [Ping timeout: 252 seconds] |
| 11:20:49 | | benjins2 joins |
| 11:38:46 | <nstrom|m> | btw if anyone has any spare IPs to throw at #wuciyuan, project has 2 days left before site shutdown and still lots to go |
| 11:39:10 | <nstrom|m> | with caveats that a) worker IPs will show up in saved webpages and b) site seems to block on anything higher than 1 concurrency |
| 11:39:36 | | DarkCoder15 leaves |
| 11:39:50 | <nstrom|m> | that said we're not even archiving images yet so it's barely any bandwidth usage right now |
| 11:46:32 | <Exorcism|m> | <nstrom|m> "with caveats that a) worker..." <- even with 1 concurrency it's difficult lol 😭 |
| 11:51:56 | | IDK quits [Client Quit] |
| 11:52:23 | | Letur joins |
| 12:14:50 | | rageear joins |
| 12:17:37 | <systwi> | JAA: My script that collects IRC URLs omits #archivebot , #// and #Y . I had considered omitting channels such as #imgone but I thought there was a possibility of the bot generated lists not being explicitly saved through AB. |
| 12:17:53 | <systwi> | Since I've seen you save things manually before. |
| 12:18:15 | <systwi> | I thought, in case you were to happen to miss something, my script would have that covered. |
| 12:19:05 | <@JAA> | systwi: Yeah, could happen, but duplicating most things is not a good approach there. I do try to cover everything. I sometimes process things in batches if there are too many to handle semi-manually. |
| 12:19:33 | <systwi> | If queueh2ibot does this automatically, at least WRT #imgone bot-generated lists, I can omit #imgone as well, or any other channels at your request. |
| 12:19:53 | <@JAA> | Well, 'everything'. I generally omit things for which we have specific projects (no point in trying to !ao a mediafire.com URL when it gets archived by #mediaonfire) or for which AB doesn't work (e.g. YouTube). |
| 12:19:59 | | rageear quits [Ping timeout: 252 seconds] |
| 12:20:23 | <@JAA> | I do this for all AT channels. |
| 12:21:25 | <systwi> | Okay, I think I'm following. So channels such as #mediaonfire and #imgone are likely safe to ignore. |
| 12:23:18 | <@JAA> | Perhaps I should automate it completely, but sometimes there are URLs that need manual treatment. I.e. where simple direct archival causes actual harm. |
| 12:25:13 | <@OrIdow6> | Believe I have found (what should have been a very obvious) way to enumerate Wysp |
| 12:26:24 | <systwi> | JAA: Direct archival as in !ao https://example.com/theActualThing or including the actual thing in the IRC URLs list, too? |
| 12:26:43 | <@JAA> | systwi: Either. The URL needs to be modified first. |
| 12:27:11 | <@JAA> | E.g. i.postimg.cc URLs, when !ao'd directly or as part of a list, don't archive the images because of their hotlink prevention. |
| 12:27:34 | <@JAA> | So you get broken snapshots in the WBM etc. |
| 12:30:35 | <systwi> | I see, okay. So, hmm... My original plan was just to grab everything from that day, en masse, and awesome if it happens to work. If not, yeah that's a shame, but my thought process was the "grab things out of a burning building" approach, considering I've seen URLs die in as little as a couple hours (thankfully with one WBM capture, courtesy of AB). |
| 12:31:51 | <systwi> | I can change the behaviour of what my script does to certain URLs beforehand, if that is what you were thinking. |
| 12:32:18 | <systwi> | I already have it grep Imgur images into their own list for #imgone . |
| 12:32:33 | <systwi> | s/images/URLs/ |
| 12:43:10 | <systwi> | JAA: If you have any specific suggestions, feel free to send them my way. I'm touching up my script a bit so I can share it later. I have a blacklist with key words that would be better kept in a separate file. For the time being/until further instruction, this is the only change I'm making for now. |
| 12:45:33 | <@JAA> | systwi: Well, anything that avoids duplicates. But it's hard to coordinate if those URLs aren't in the #archivebot logs. |
| 12:45:42 | <@JAA> | Part of why I stopped doing list jobs. |
| 12:48:05 | <systwi> | Hmm, yeah, that would be a bit tricky to pull off (also doesn't help that I haven't slept and am thinking about this). |
| 12:49:04 | <systwi> | When I am able to send over a copy of my script maybe you will have some ideas on how to implement that. |
| 12:50:02 | <systwi> | It's nothing too fancy. It's a bash script that started as a oneliner, and could probably do fine as a Python script, too, if I was as proficient in that language. |
| 12:58:29 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 12:58:49 | | AmAnd0A joins |
| 13:27:51 | | HotSwap joins |
| 13:27:52 | | HotSwap is now authenticated as HotSwap |
| 13:27:58 | <manu|m> | Not sure if y'all caught this, but Nitter is working again |
| 13:33:36 | | yts98 joins |
| 13:35:26 | <Exorcism|m> | oh really ? |
| 13:43:11 | <manu|m> | I just checked, at least for nitter.net it does. The fix even made it on the front-page of orange site. |
| 13:52:46 | | chrismeller quits [Quit: The Cow is in my pants!] |
| 13:53:32 | | chrismeller3 (chrismeller) joins |
| 13:56:47 | | Arcorann quits [Ping timeout: 252 seconds] |
| 13:57:03 | | chrismeller3 quits [Client Quit] |
| 13:57:26 | | chrismeller3 (chrismeller) joins |
| 13:57:40 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 13:58:50 | | chrismeller3 quits [Client Quit] |
| 14:00:03 | | chrismeller3 (chrismeller) joins |
| 14:20:13 | | nostalgebraist joins |
| 14:37:38 | | Island joins |
| 14:44:24 | | nostalgebraist quits [Client Quit] |
| 15:09:01 | | chrismeller3 quits [Ping timeout: 258 seconds] |
| 15:15:17 | <yetanotherarchiver|m> | [working instances](https://github.com/zedeus/nitter/wiki/Instances) |
| 15:15:51 | <yetanotherarchiver|m> | (with Updated tag) |
| 15:47:11 | | dumbgoy joins |
| 16:06:00 | | Jonimus joins |
| 16:58:11 | | benjins2 quits [Remote host closed the connection] |
| 16:59:28 | | benjins2 joins |
| 17:10:44 | | Ryz26 (Ryz) joins |
| 17:10:58 | | imer6 (imer) joins |
| 17:11:09 | | endrift|ZNC joins |
| 17:11:31 | | HugsNotDrugs` joins |
| 17:11:32 | | neggles_ (neggles) joins |
| 17:11:40 | | imer quits [Client Quit] |
| 17:11:40 | | neggles quits [Quit: bye friends - ZNC - https://znc.in] |
| 17:11:40 | | Larsenv quits [Remote host closed the connection] |
| 17:11:40 | | HugsNotDrugs quits [Remote host closed the connection] |
| 17:11:40 | | tartarus quits [Remote host closed the connection] |
| 17:11:40 | | @arkiver quits [Remote host closed the connection] |
| 17:11:40 | | Ryz2 quits [Client Quit] |
| 17:11:40 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 17:11:40 | | hogchips quits [Write error: Broken pipe] |
| 17:11:40 | | BPCZ quits [Remote host closed the connection] |
| 17:11:40 | | endrift quits [Remote host closed the connection] |
| 17:11:40 | | IDK_ quits [Client Quit] |
| 17:11:40 | | iCaotix quits [Client Quit] |
| 17:11:40 | | imer6 is now known as imer |
| 17:11:40 | | Ryz26 is now known as Ryz2 |
| 17:11:44 | | iCaotix_ joins |
| 17:11:45 | | IDK_ joins |
| 17:11:47 | | arkiver2 (arkiver) joins |
| 17:11:47 | | @ChanServ sets mode: +o arkiver2 |
| 17:11:48 | | tartarus joins |
| 17:11:48 | | neggles_ is now known as neggles |
| 17:11:55 | | hogchips (shoghicp) joins |
| 17:12:06 | | BPCZ (BPCZ) joins |
| 17:14:11 | | @arkiver2 is now known as @arkiver |
| 17:17:00 | | Larsenv (Larsenv) joins |
| 17:25:18 | | Larsenv quits [Max SendQ exceeded] |
| 17:25:41 | | Larsenv (Larsenv) joins |
| 17:28:46 | | nicolas17 joins |
| 17:44:20 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 18:00:26 | | nicolas17 quits [Ping timeout: 252 seconds] |
| 19:45:49 | | nicolas17 joins |
| 19:47:31 | | icedice (icedice) joins |
| 20:08:35 | | killsushi joins |
| 20:18:45 | | BPCZ quits [Ping timeout: 258 seconds] |
| 20:19:08 | | sidpatchy quits [Ping timeout: 258 seconds] |
| 20:19:19 | | sidpatchy joins |
| 20:19:26 | | BPCZ (BPCZ) joins |
| 20:19:31 | | ave quits [Ping timeout: 258 seconds] |
| 20:19:45 | | ave (ave) joins |
| 20:21:16 | | Ketchup902 quits [Ping timeout: 245 seconds] |
| 20:22:28 | | Ketchup901 (Ketchup901) joins |
| 20:22:28 | | Craigle9 (Craigle) joins |
| 20:22:34 | | AK9 (AK) joins |
| 20:22:35 | | le0n quits [Ping timeout: 258 seconds] |
| 20:22:53 | | marto_ (marto_) joins |
| 20:24:07 | | marto_9 quits [Ping timeout: 258 seconds] |
| 20:24:07 | | AK quits [Ping timeout: 258 seconds] |
| 20:24:07 | | Craigle quits [Ping timeout: 258 seconds] |
| 20:24:07 | | AK9 is now known as AK |
| 20:24:07 | | Craigle9 is now known as Craigle |
| 20:25:38 | | le0n (le0n) joins |
| 20:30:11 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 20:30:29 | | AmAnd0A joins |
| 20:42:31 | | nicolas17 quits [Ping timeout: 258 seconds] |
| 20:43:12 | <vokunal|m> | On banciyuan, is Bad SSR data kind of like a rate limit or is it expected? |
| 20:43:25 | <@JAA> | → #wuciyuan |
| 20:43:44 | <vokunal|m> | ah ty |
| 20:44:44 | <vokunal|m> | Idk how, but i've had #wuciyuanr this whole time :P |
| 21:01:56 | | bf_ quits [Ping timeout: 252 seconds] |
| 21:08:06 | <cm> | not sure if this is the right channel but would anyone be able to help me get brozzler running? |
| 21:08:30 | <cm> | so far I am stuck running pip3 install brozzler[easy] |
| 21:09:05 | <cm> | it keeps hanging at the markupsafe dependency, when run with and without a venv |
| 21:11:37 | <cm> | this is on debian bullseye |
| 21:25:59 | | hackbug quits [Remote host closed the connection] |
| 21:29:17 | | hackbug (hackbug) joins |
| 21:39:46 | | iCaotix_ quits [Read error: Connection reset by peer] |
| 21:41:08 | | iCaotix joins |
| 21:49:18 | | AnotherIki joins |
| 21:52:40 | | Iki1 quits [Ping timeout: 258 seconds] |
| 21:57:14 | | hitgrr8 quits [Client Quit] |
| 22:06:39 | <pokechu22> | I assume that beyond the contents of futurequest.net, we'd also need to identify what sites are hosted by them and save those |
| 22:07:01 | <tzt> | there's this https://bgp.he.net/net/69.5.0.0/19#_dns but it is truncated |
| 22:07:04 | <Exorcism|m> | anyways, I'm adding this to deathwatch |
| 22:07:15 | <pokechu22> | ... and, that forum is mostly restricted, it seems |
| 22:07:32 | <Exorcism|m> | darn :c |
| 22:10:50 | <Exorcism|m> | Exorcism|m: added |
| 22:13:14 | <pokechu22> | based on https://subdomainfinder.c99.nl/scans/2023-07-04/futurequest.net there's a bunch of internal services that aren't accessible (giving 401s). I've thrown the ones that do something else into AB |
| 22:21:05 | <tzt> | more sites https://urlscan.io/search/#page.asn:%22AS22915%22 |
| 22:23:02 | <flashfire42> | I will get started on these I guess |
| 22:40:34 | <fireonlive> | context for above: https://www.futurequest.net/forums/showthread.php?t=27786 |
| 22:40:47 | <fireonlive> | (cont'd from #archivebot) |
| 22:41:49 | <fireonlive> | https://bgp.tools/prefix/69.5.0.0/19#dns has more entries ('show forward DNS') but it's truncated per host unless you happen to own an AS or otherwise have a login |
| 22:42:36 | <fireonlive> | s/host/IP/ |
| 22:47:21 | <nulldata> | https://news.ycombinator.com/item?id=36672595 |
| 22:48:35 | <fireonlive> | lmao |
| 22:49:23 | <nulldata> | Man if I was a InfluxDB Cloud customer I'd be getting my data out ASAP - even if the data deletion didn't affect my region |
| 22:49:35 | <fireonlive> | yeah, they really fucked the dog with that one |
| 22:49:47 | <nulldata> | InfluxDB: "Sorry we purposely deleted your data with only an Email notice that may or may not have been actually sent. Feel free to sign-up again in a different region!" |
| 22:51:04 | <fireonlive> | they're in the tech space too, so would have probably heard of the heroku shredding free DBs as well |
| 22:51:29 | <nulldata> | That kind of shit is why I steer clear of SaaS for business production stuff if I can |
| 22:52:51 | <@JAA> | -bs and -ot are swapped now. :-| |
| 22:54:19 | <nulldata> | Sorry! |
| 22:54:43 | <fireonlive> | it's opposite day! |
| 22:54:51 | <Doranwen> | LOL |
| 22:55:18 | <@rewby> | arkiver: Anyway, as I was going to say: Looks like they don't entirely mind people doing archivism? Quote: "However, we don't have a policy against responsible data collection — such as those done by academic researchers, fans backing up works to Wayback Machine or Google's search indexing." |
| 22:55:30 | <@rewby> | Notably that part about the WBM |
| 22:55:36 | <@rewby> | Given that our crap usually ends up there |
| 22:55:41 | <Doranwen> | Anyway, the original purpose for the login wall with AO3 was just privacy in general - reduced the chances of someone seeing your explicit fic if they found your nick, for instance, unless they also had a account. |
| 22:56:02 | <Doranwen> | But now more people are locking theirs because of OpenAI, yeah. |
| 22:56:05 | <@rewby> | I think if we figure out a way to dump the whole thing into some darkened items |
| 22:56:09 | <@rewby> | Probably would be fine? |
| 22:56:16 | <@rewby> | Although we might consider emailing them |
| 22:56:39 | <Doranwen> | Though my experience with LOTR books was something like 11k unlocked, 3k locked? Granted, I had a few filters on so it wasn't the raw fandom without exclusions, but… |
| 22:57:14 | <Doranwen> | And the ao3downloader script is really an excellent way to grab stuff as it uses their api and respects requests for breaks. |
| 22:57:15 | <@rewby> | The way I see it: It's small enough that I doubt this will hit the 1T mark after zstd |
| 22:57:27 | <@rewby> | But could very well be worth preserving |
| 22:57:39 | <@rewby> | If nothing else it's a timecapsule of how zeitgeist and popularity changes over time |
| 22:57:43 | <Doranwen> | It'll say "ao3 has requested a 206 second break" and give the timestamps for pausing and resuming. |
| 22:58:01 | <@rewby> | I'm sure there's some fun stats to be made about popularity of fandoms over time |
| 22:58:08 | <Doranwen> | Oh definitely. |
| 22:58:40 | <Doranwen> | The breaks do vary, though, so it's not a static thing, must respond to the general load on the site at the time. |
| 22:59:02 | <@rewby> | I mean, if we just ask them for collaboration, maybe they'd be open to it? |
| 22:59:08 | <@rewby> | And we can work out the best way to get as much as possible |
| 22:59:35 | <@arkiver> | rewby: sounds like it's time to rebrand Archive Team as fanclub :P |
| 22:59:46 | <Doranwen> | LOL |
| 22:59:53 | <Doranwen> | Well, fandom is what got me saving Yahoo Groups, after all… |
| 23:00:01 | <@rewby> | arkiver: I don't get it |
| 23:00:27 | <@arkiver> | "fans backing up works to the Wayback Machine" is allowed |
| 23:00:42 | <Doranwen> | There's definitely an interesection of archivists and fandom. |
| 23:01:02 | <@rewby> | I mean, just sending an email going "Hi, we do a lot of archival stuff and we'd like to archive the works on your site. Is this okay with you all and can we collaborate?" |
| 23:01:07 | <@rewby> | Would probably do a lot |
| 23:01:18 | <pokechu22> | After looking at futurequest.net some more it seems like the rest of the forums do still work - https://www.futurequest.net/forums/forumdisplay.php?f=1 is empty, but https://www.futurequest.net/forums/forumdisplay.php?s=&f=1&page=1&pp=25&sort=lastpost&order=desc&daysprune=-1 exists just fine. |
| 23:01:19 | <@rewby> | Especially since this will involve bypassing a login wall |
| 23:01:36 | <@rewby> | But if the admins are okay with it, we could probably just get an account just for this purpose |
| 23:01:38 | <@arkiver> | how do you bypass the login wall? |
| 23:01:41 | <@arkiver> | hmm |
| 23:01:46 | <@arkiver> | not sure about account |
| 23:01:51 | <@JAA> | Or perhaps they could whitelist a UA or similar. |
| 23:01:53 | <@arkiver> | data may not go into the Wayback Machine |
| 23:01:53 | <@rewby> | Yeah |
| 23:01:58 | <@arkiver> | yes they could whitelist |
| 23:02:02 | <@arkiver> | i'm not a fan of an account |
| 23:02:32 | <@rewby> | The reason for the account is as posted in -ot: To prevent OpenAI from grabbing stuff they forced a bunch of stuff behind logins |
| 23:02:50 | <@rewby> | But they're completely open to archiving things as far as I can tell |
| 23:02:56 | <@rewby> | So we could probably do something like an UA whitelist |
| 23:03:00 | <@rewby> | Or something |
| 23:03:27 | <@rewby> | I don't really see the problem with just having an account we use for the archive and then disable after we get stuff |
| 23:03:38 | <@rewby> | Like, yeah good luck |
| 23:03:45 | <@rewby> | * doing anything with that |
| 23:04:39 | <fireonlive> | the official ArchiveTeam AO3 account |
| 23:04:41 | <fireonlive> | x3 |
| 23:07:02 | <Doranwen> | Rather than backing it all up to the WBM - which it already is, to some extent - I'd suggest just grabbing the files they already provide. |
| 23:07:24 | <@rewby> | Could do |
| 23:07:46 | <Doranwen> | There's no user-identifiable data in them, even the locked ones. |
| 23:08:08 | <fireonlive> | user-identifiable being the author? |
| 23:08:10 | <Doranwen> | And then you can use already-made tools like ao3downloader (it's not the only one, but I think it's probably the best out there). |
| 23:08:17 | <Doranwen> | Well, that's identifiable, lol. |
| 23:08:23 | <Doranwen> | The user browsing the files, I meant. |
| 23:08:29 | <fireonlive> | ah ok |
| 23:08:36 | <Doranwen> | There's no way to tell, from looking at a file you downloaded, that *you* were the one who downloaded it. |
| 23:08:42 | <fireonlive> | i thought it strange that would include author/tags/etc :p |
| 23:08:47 | <fireonlive> | s/include/exclude/ |
| 23:08:55 | <Doranwen> | The fics always have author, tags, etc. on them. |
| 23:09:06 | <Doranwen> | As well as the link of the fic they came from. |
| 23:09:22 | <fireonlive> | ah |
| 23:09:49 | <Doranwen> | So they could always be used afterward to grab links for the WBM if so desired. |
| 23:11:39 | <tzt> | anyone know any better way to get reverse IP data to find futurequest sites |
| 23:11:43 | <Doranwen> | For a general file estimate, for one of the fandoms I grabbed fics for, I have 3,899 epubs which total 172.4 MB. |
| 23:12:24 | <fireonlive> | tzt: like IP Y hosts domains example.com, example.net? |
| 23:12:35 | <tzt> | yes |
| 23:12:55 | <fireonlive> | ah, there was some more on bgp.tools, but need a login sadly |
| 23:13:02 | <fireonlive> | (each host is elided) |
| 23:13:20 | <fireonlive> | after about 2-3 |
| 23:13:22 | <@rewby> | bgp.tools mostly uses certificate transparency for the domain -> ip mapping IIRc |
| 23:13:24 | <Doranwen> | Two of the fics were each just under 11, another was 9.6. Only a couple dozen were over 1. Over 1,800 of them were under 10 kB. |
| 23:13:39 | <flashfire42> | I am going to go through them over my day today but I will have a lot of stuff to queue and also a few chores to do so will see how I go |
| 23:13:44 | <fireonlive> | ahh ok |
| 23:13:58 | <fireonlive> | there's companies that offer 'passive dns' lookups... but usually charge, https://securitytrails.com/dns-trails |
| 23:14:28 | <@rewby> | I could look up how bgp.tools does it, but not tonight |
| 23:14:36 | <fireonlive> | though that found nothing for 69.5.3.189 lol |
| 23:15:19 | <fireonlive> | ah it found one |
| 23:16:10 | <@rewby> | fireonlive: It does? |
| 23:16:19 | <@rewby> | I see like 9 sites? |
| 23:16:40 | <fireonlive> | it showed nothing, then now it just shows 'pop.agaveguides.com' |
| 23:16:43 | <fireonlive> | weird |
| 23:16:59 | <@rewby> | I have a full bgp.tools account, shows up fine |
| 23:17:11 | <fireonlive> | oh was talking about securitytrails |
| 23:17:15 | <fireonlive> | bgp.tools is better for that |
| 23:17:15 | <@rewby> | Ah ST |
| 23:17:31 | <@rewby> | I mean, if you care much I can go ask for a quick db extract |
| 23:17:56 | <fireonlive> | tzt/pokechu22 could probably find that useful |
| 23:18:13 | <fireonlive> | i need to sacrifice a goat to benjojo for a login one of these days i guess :p |
| 23:18:33 | <@rewby> | I'll just at ben and get him to dump the db for me |
| 23:18:36 | <pokechu22> | I don't think I have time to do anything more elaborate with futurequest beyond what I've just done with the forums |
| 23:18:54 | <fireonlive> | oh hey i remembered who owned it |
| 23:18:55 | <fireonlive> | :3 |
| 23:19:14 | <tzt> | i'm trying to get a list of all the sites hosted with them so it could be run in #// or #Y since it's shutting down in 4 days |
| 23:19:31 | <fireonlive> | tzt 🤝 rewby |
| 23:19:34 | <fireonlive> | :) |
| 23:19:46 | <@rewby> | tzt: I've asked ben to dump me the data from bgp.tools for their ip range |
| 23:20:40 | <tzt> | rewby: thank you |
| 23:21:38 | <@rewby> | He's asleep as far as I can tell, so probably tomorrow |
| 23:21:55 | <@rewby> | I could go figure out how bgptools gets its data, but I wanna go sleep too |
| 23:22:05 | <fireonlive> | night rewby :) |
| 23:22:23 | <fireonlive> | don't let the targets bite |
| 23:22:26 | <fireonlive> | :p |
| 23:22:39 | <@rewby> | Ben's a decent coder but dear god bgptools' code can be a pain due to just the sheer stupidity of half the things it needs to talk to |
| 23:23:02 | <@rewby> | So I'm not doing that tonight |
| 23:23:17 | <fireonlive> | xD |
| 23:23:27 | <TheTechRobo> | Oh, JAA, here's what happens every so often with `docker logs` on the project containers: |
| 23:23:27 | <TheTechRobo> | found item https://p3-bcy-sign.bcyimg.com/banciyuan/52641711bb894c2ca012e3f |
| 23:23:27 | <TheTechRobo> | b9310074d~tplv-banciyuan-sq90.image?x-expires=1705101090&x-signature=gOL%2Bbdr9v |
| 23:23:27 | <TheTechRobo> | mHDm%2Fxu%2FDcl9%2FX44uA%3D |
| 23:23:27 | <TheTechRobo> | error from daemon in stream: Error grabbing logs: unexpected EOF |
| 23:44:24 | | BlueMaxima joins |
| 23:53:26 | | nicolas17 joins |
| 23:56:52 | | AmAnd0A quits [Ping timeout: 258 seconds] |
| 23:57:39 | | AmAnd0A joins |