00:04:18Dada quits [Remote host closed the connection]
00:27:11wyatt8740 quits [Ping timeout: 272 seconds]
00:28:36wyatt8740 joins
00:35:41<klea>i wonder, how bad of an idea would it be to make a bot to kick banned people, so that AT could run it, have it join every single public channel, and then if someone gets banned on #archiveteam it'd automatically kick them from all other channels?
00:56:32<Vokun>Warframe Android closed beta subforum closing jan 16th https://forums.warframe.com/topic/1487493-psa-android-closed-beta-is-ending-january-9-2026/
00:56:33<Vokun>Could it be put into AB?
00:57:03<Vokun>Guessing it's this https://forums.warframe.com/forum/2158-android-closed-beta/
01:22:16beardicus4 (beardicus) joins
01:24:40beardicus quits [Ping timeout: 256 seconds]
01:24:40beardicus4 is now known as beardicus
01:53:48<@JAA>klea: It's a rare enough occasion that it's not worth the effort, basically.
01:54:50<@JAA>I think I only had to kick someone from more than a couple channels once since we moved to hackint in 2020.
01:55:27<@JAA>So https://xkcd.com/1205/ applies.
01:56:50<@JAA>Vokun: Not directly, but generating a list for !ao < should be easy. Will do that shortly.
02:09:46<Doranwen>I was recommended to take the picture embed urls we're collecting as we process the downloaded LJs and send in the direction of projects here. So far the script I've coded will pull out photobucket, livejournal (two different link types), imgur, and flickr pictures. What's the best way to handle these? They're direct links to the images, not regular webpages.
02:10:12<Doranwen>I guess my real question is - do you want them all separated by project (and ignore the ones that don't have a project) or combine them all and dump in the #// channel?
02:10:31<Doranwen>Or some other configuration?
02:41:24<Vokun>JAA: Thanks!
03:13:14<Doranwen>Once I know the answer, I can start running this script on all the downloaded comms. But I'm going to change how it operates depending on what is desired, so I'm waiting to hear the answer first, lol.
03:22:32TheEnbyperor quits [Ping timeout: 256 seconds]
03:22:37TheEnbyperor_ quits [Ping timeout: 272 seconds]
03:28:55<Doranwen>JAA: Any thoughts on this?
03:40:38PredatorIWD258 joins
03:42:56PredatorIWD25 quits [Ping timeout: 256 seconds]
03:42:56PredatorIWD258 is now known as PredatorIWD25
03:47:52TheEnbyperor joins
03:50:22TheEnbyperor_ (TheEnbyperor) joins
03:56:48<pabs>nicolas17: btw, M​anouchehri on #conservancy on Libera has been sending thousands of source requests to Samsung, so samsung-grab might get a bigger backlog :)
03:57:16<pabs>(to the point that Samsung are blocking various email domains to stop them)
03:57:39<pabs>justauser: the docx/etc URLs just gave 404 in AB :/
03:58:46<pabs>cruller: #googlecrash seems only for Google Drive I thought?
04:00:50<pabs>malcomind: maybe put a writeup on the wiki?
04:06:15<@JAA>Doranwen: I have a script for extracting candidates for the existing long-term projects. Those do get fed from #//, but shorter-term projects, including LiveJournal, don't. And we don't have active projects for Photobucket and Flickr currently, so nothing special happens with those either.
04:07:58<Doranwen>So, no interest in any of the picture links?
04:08:40<@JAA>Not no interest, but there's nothing to make use of them currently.
04:08:45<Doranwen>I can process them either way, just have to know whether I'm keeping the temporary files I'm creating in the process or not.
04:08:53<Doranwen>The links are still there, just not the extracted versions.
04:08:56<@JAA>Imgur can go into #imgone of course.
04:09:08<Doranwen>That was what I was recommended to extract. So I can pull those out.
04:09:15<Doranwen>And will leave the rest alone.
04:09:58<Doranwen>We're downloading the pics for ourselves, but figure any that can go into something else should get there while we're at the extraction.
04:10:08<Doranwen>Ty, got a direction to take this then.
04:10:57<@JAA>Feeding the rest to #// sounds fine to me. Although LiveJournal-hosted images would likely get covered by the project in #recordedjournal as well.
04:11:38<@JAA>As usual with #//, it needs to be decently spread across hosts so we don't overwhelm anything.
04:25:12<h2ibot>PaulWise created Samsung Open Source (+818, initial page /cc nicolas17): https://wiki.archiveteam.org/?title=Samsung%20Open%20Source
04:27:25<nicolas17>pabs: they seem to have added TLS fingerprinting and it's now a pain to scrape the list so I'll fall behind
04:27:42<pabs>oh :(
04:28:29<nicolas17>I may have to automate a browser
04:29:12<h2ibot>PaulWise edited Samsung Open Source (+77, TLS fingerprinting): https://wiki.archiveteam.org/?diff=60041&oldid=60040
04:29:39<Doranwen>JAA: It'd be a strong mix of LJ and Photobucket, if what I'm seeing is any indication. Imgur and Flickr are likely to be very small compared to the rest. But Digital was testing the image grabbing and said the CDN wasn't rate-limiting or banning even with 0s/request or something like that. So that may not be as much of an issue.
04:30:12<h2ibot>PaulWise edited Samsung Open Source (+33, linkify): https://wiki.archiveteam.org/?diff=60042&oldid=60041
04:35:15<@JAA>Doranwen: I guess I'm more talking about duplication, but if it isn't much data, that wouldn't really matter.
04:39:25<Doranwen>It really isn't much data at all.
04:40:25<Doranwen>JAA: In bulk, maybe, so if you want it all de-duped, we can definitely hold onto it and dedupe ourselves. Or we can hand it off periodically to get it deduped against whatever list someone else maintains. Whatever you think best.
04:41:07<@JAA>Doranwen: How many URLs are we talking?
04:42:51<Doranwen>Oof, hard to say without running this. Thousands upon thousands if you stack all the LJs I'm extracting from together. But some links would be duplicated for sure (all the userpics people used over and over everywhere they commented), so… *throws hands in air* I really have no real estimate.
04:43:20<Doranwen>Up to this point in my testing I was literally creating temporary files with the extracted links from different sites, and promptly removing them when their purpose was served.
04:44:06<Doranwen>It'll also vary wildly with the type of LJ I'm pulling them from. Ones focused on graphics creation and sharing are obviously going to have lots more than ones that were focused on text interaction.
04:45:22<Doranwen>I would have to code this to leave all the temp files where I could do something with them later and see what I get when I run it on one of them. And then again, the sizes of LJs varied wildly too. A dozen posts compared to thousands, for instance.
04:52:24<Doranwen>If you just want all the links mixed together (except for imgur), that's easier than anything else, I think.
04:52:38<Doranwen>But I'm going to have to rewrite parts of the script for that, lol.
05:01:25Island quits [Read error: Connection reset by peer]
05:14:01panopticon quits [Quit: Ping timeout (120 seconds)]
05:14:15panopticon (panopticon) joins
05:30:34DogsRNice quits [Read error: Connection reset by peer]
05:34:21pabs quits [Ping timeout: 272 seconds]
05:43:31nexussfan quits [Quit: Konversation terminated!]
05:46:54<malcomind>pabs: I don't know if it's worth writing in the wiki for an archive method I need help experimenting with.
05:53:48_wotd_ joins
05:57:48wotd quits [Ping timeout: 256 seconds]
06:07:42pabs (pabs) joins
06:25:31<pabs>nicolas17: curl-impersonate and friends were mentioned above btw
06:29:20__wotd__ joins
06:32:56_wotd_ quits [Ping timeout: 256 seconds]
06:50:51joeyo joins
06:53:13<klea>JAA: ack.
06:53:28<joeyo>Hi I'd like to make an archive request
06:56:15gosc joins
06:56:31<h2ibot>Klea edited Archiveteam:IRC/Relay (+100, Add jseater-relay): https://wiki.archiveteam.org/?diff=60043&oldid=58502
06:56:32<h2ibot>Klea edited Archiveteam:IRC/Relay (+0, Fix username, sorry): https://wiki.archiveteam.org/?diff=60044&oldid=60043
06:58:06<pabs>joeyo: which site/reason?
07:00:46<joeyo>The Message board on https://www.slapmagazine.com/
07:01:07<joeyo>There was an outage a while back and a lot of people voice interest in backing it up.
07:02:32<joeyo>In case of future problems.
07:03:51<pokechu22>I'm getting a cloudflare challenge on that, which probably will prevent archivebot from saving it unless we can get whitelisted by site admins
07:05:40<joeyo>How do we go about getting it whitelisted?
07:06:33<h2ibot>Klea edited Deathwatch (+226, /* 2026-01 */ Add Warframe Android closed beta…): https://wiki.archiveteam.org/?diff=60045&oldid=60030
07:08:10<pokechu22>I'm not entirely sure what it looks like, but it would be something in cloudflare settings. Archivebot uses this user-agent by default: https://github.com/ArchiveTeam/ArchiveBot/blob/050c783b01e31af904f3731b32a331a64df836b8/pipeline/pipeline.py#L124-L127
07:08:47<gosc>I got more urls I need help with, I've got a list of versions from a defunct game, all still up, though there's a lot of versions and files
07:09:07<gosc>https://transfer.archivete.am/47E2V/scholastic%20homebase%20info.txt
07:09:07<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/47E2V/scholastic%20homebase%20info.txt
07:10:11<pokechu22>gosc: I'll take a look at that probably tomorrow (maybe later today). The EA stuff was saved successfully though
07:10:30<gosc>thank you for both
07:10:54<gosc>I gotta stop looking into things lmao I wasn't even sure if the sims was finished before I hopped onto this one
07:11:40<gosc>the list I sent here does not include some 1gb files I found, because those files rely on hashes and are thus impossible to guess
07:11:52<gosc>I'll comb through wayback later to get all I can from that
07:56:16<cruller>pabs: I just joined the channel yesterday myself, so I don't know the details.
07:56:42<cruller>However, GDrive and GDocs are very closely related, so I think it makes sense to discuss them in the same place.
07:58:24SootBector quits [Remote host closed the connection]
07:59:33SootBector (SootBector) joins
08:01:44<cruller>Perhaps Docs was simply out of scope in the crisis that led to the creation of the channel.
08:20:56<cruller>It's funny that [[Archive Team]] and [[ArchiveTeam]] have different redirect destinations.
08:25:28<@JAA>Google Docs discussion has happened in #googlecrash before, and I agree it makes sense to keep them in one place.