00:04:18Dada quits [Remote host closed the connection]
00:27:11wyatt8740 quits [Ping timeout: 272 seconds]
00:28:36wyatt8740 joins
00:35:41<klea>i wonder, how bad of an idea would it be to make a bot to kick banned people, so that AT could run it, have it join every single public channel, and then if someone gets banned on #archiveteam it'd automatically kick them from all other channels?
00:56:32<Vokun>Warframe Android closed beta subforum closing jan 16th https://forums.warframe.com/topic/1487493-psa-android-closed-beta-is-ending-january-9-2026/
00:56:33<Vokun>Could it be put into AB?
00:57:03<Vokun>Guessing it's this https://forums.warframe.com/forum/2158-android-closed-beta/
01:22:16beardicus4 (beardicus) joins
01:24:40beardicus quits [Ping timeout: 256 seconds]
01:24:40beardicus4 is now known as beardicus
01:53:48<@JAA>klea: It's a rare enough occasion that it's not worth the effort, basically.
01:54:50<@JAA>I think I only had to kick someone from more than a couple channels once since we moved to hackint in 2020.
01:55:27<@JAA>So https://xkcd.com/1205/ applies.
01:56:50<@JAA>Vokun: Not directly, but generating a list for !ao < should be easy. Will do that shortly.
02:09:46<Doranwen>I was recommended to take the picture embed urls we're collecting as we process the downloaded LJs and send in the direction of projects here. So far the script I've coded will pull out photobucket, livejournal (two different link types), imgur, and flickr pictures. What's the best way to handle these? They're direct links to the images, not regular webpages.
02:10:12<Doranwen>I guess my real question is - do you want them all separated by project (and ignore the ones that don't have a project) or combine them all and dump in the #// channel?
02:10:31<Doranwen>Or some other configuration?
02:41:24<Vokun>JAA: Thanks!
03:13:14<Doranwen>Once I know the answer, I can start running this script on all the downloaded comms. But I'm going to change how it operates depending on what is desired, so I'm waiting to hear the answer first, lol.
03:22:32TheEnbyperor quits [Ping timeout: 256 seconds]
03:22:37TheEnbyperor_ quits [Ping timeout: 272 seconds]
03:28:55<Doranwen>JAA: Any thoughts on this?
03:40:38PredatorIWD258 joins
03:42:56PredatorIWD25 quits [Ping timeout: 256 seconds]
03:42:56PredatorIWD258 is now known as PredatorIWD25
03:47:52TheEnbyperor joins
03:50:22TheEnbyperor_ (TheEnbyperor) joins
03:56:48<pabs>nicolas17: btw, M​anouchehri on #conservancy on Libera has been sending thousands of source requests to Samsung, so samsung-grab might get a bigger backlog :)
03:57:16<pabs>(to the point that Samsung are blocking various email domains to stop them)
03:57:39<pabs>justauser: the docx/etc URLs just gave 404 in AB :/
03:58:46<pabs>cruller: #googlecrash seems only for Google Drive I thought?
04:00:50<pabs>malcomind: maybe put a writeup on the wiki?
04:06:15<@JAA>Doranwen: I have a script for extracting candidates for the existing long-term projects. Those do get fed from #//, but shorter-term projects, including LiveJournal, don't. And we don't have active projects for Photobucket and Flickr currently, so nothing special happens with those either.
04:07:58<Doranwen>So, no interest in any of the picture links?
04:08:40<@JAA>Not no interest, but there's nothing to make use of them currently.
04:08:45<Doranwen>I can process them either way, just have to know whether I'm keeping the temporary files I'm creating in the process or not.
04:08:53<Doranwen>The links are still there, just not the extracted versions.
04:08:56<@JAA>Imgur can go into #imgone of course.
04:09:08<Doranwen>That was what I was recommended to extract. So I can pull those out.
04:09:15<Doranwen>And will leave the rest alone.
04:09:58<Doranwen>We're downloading the pics for ourselves, but figure any that can go into something else should get there while we're at the extraction.
04:10:08<Doranwen>Ty, got a direction to take this then.
04:10:57<@JAA>Feeding the rest to #// sounds fine to me. Although LiveJournal-hosted images would likely get covered by the project in #recordedjournal as well.
04:11:38<@JAA>As usual with #//, it needs to be decently spread across hosts so we don't overwhelm anything.
04:25:12<h2ibot>PaulWise created Samsung Open Source (+818, initial page /cc nicolas17): https://wiki.archiveteam.org/?title=Samsung%20Open%20Source
04:27:25<nicolas17>pabs: they seem to have added TLS fingerprinting and it's now a pain to scrape the list so I'll fall behind
04:27:42<pabs>oh :(
04:28:29<nicolas17>I may have to automate a browser
04:29:12<h2ibot>PaulWise edited Samsung Open Source (+77, TLS fingerprinting): https://wiki.archiveteam.org/?diff=60041&oldid=60040
04:29:39<Doranwen>JAA: It'd be a strong mix of LJ and Photobucket, if what I'm seeing is any indication. Imgur and Flickr are likely to be very small compared to the rest. But Digital was testing the image grabbing and said the CDN wasn't rate-limiting or banning even with 0s/request or something like that. So that may not be as much of an issue.
04:30:12<h2ibot>PaulWise edited Samsung Open Source (+33, linkify): https://wiki.archiveteam.org/?diff=60042&oldid=60041
04:35:15<@JAA>Doranwen: I guess I'm more talking about duplication, but if it isn't much data, that wouldn't really matter.
04:39:25<Doranwen>It really isn't much data at all.
04:40:25<Doranwen>JAA: In bulk, maybe, so if you want it all de-duped, we can definitely hold onto it and dedupe ourselves. Or we can hand it off periodically to get it deduped against whatever list someone else maintains. Whatever you think best.
04:41:07<@JAA>Doranwen: How many URLs are we talking?
04:42:51<Doranwen>Oof, hard to say without running this. Thousands upon thousands if you stack all the LJs I'm extracting from together. But some links would be duplicated for sure (all the userpics people used over and over everywhere they commented), so… *throws hands in air* I really have no real estimate.
04:43:20<Doranwen>Up to this point in my testing I was literally creating temporary files with the extracted links from different sites, and promptly removing them when their purpose was served.
04:44:06<Doranwen>It'll also vary wildly with the type of LJ I'm pulling them from. Ones focused on graphics creation and sharing are obviously going to have lots more than ones that were focused on text interaction.
04:45:22<Doranwen>I would have to code this to leave all the temp files where I could do something with them later and see what I get when I run it on one of them. And then again, the sizes of LJs varied wildly too. A dozen posts compared to thousands, for instance.
04:52:24<Doranwen>If you just want all the links mixed together (except for imgur), that's easier than anything else, I think.
04:52:38<Doranwen>But I'm going to have to rewrite parts of the script for that, lol.
05:01:25Island quits [Read error: Connection reset by peer]
05:14:01panopticon quits [Quit: Ping timeout (120 seconds)]
05:14:15panopticon (panopticon) joins
05:30:34DogsRNice quits [Read error: Connection reset by peer]
05:34:21pabs quits [Ping timeout: 272 seconds]
05:43:31nexussfan quits [Quit: Konversation terminated!]
05:46:54<malcomind>pabs: I don't know if it's worth writing in the wiki for an archive method I need help experimenting with.
05:53:48_wotd_ joins
05:57:48wotd quits [Ping timeout: 256 seconds]
06:07:42pabs (pabs) joins
06:25:31<pabs>nicolas17: curl-impersonate and friends were mentioned above btw
06:29:20__wotd__ joins
06:32:56_wotd_ quits [Ping timeout: 256 seconds]
06:50:51joeyo joins
06:53:13<klea>JAA: ack.
06:53:28<joeyo>Hi I'd like to make an archive request
06:56:15gosc joins
06:56:31<h2ibot>Klea edited Archiveteam:IRC/Relay (+100, Add jseater-relay): https://wiki.archiveteam.org/?diff=60043&oldid=58502
06:56:32<h2ibot>Klea edited Archiveteam:IRC/Relay (+0, Fix username, sorry): https://wiki.archiveteam.org/?diff=60044&oldid=60043
06:58:06<pabs>joeyo: which site/reason?
07:00:46<joeyo>The Message board on https://www.slapmagazine.com/
07:01:07<joeyo>There was an outage a while back and a lot of people voice interest in backing it up.
07:02:32<joeyo>In case of future problems.
07:03:51<pokechu22>I'm getting a cloudflare challenge on that, which probably will prevent archivebot from saving it unless we can get whitelisted by site admins
07:05:40<joeyo>How do we go about getting it whitelisted?
07:06:33<h2ibot>Klea edited Deathwatch (+226, /* 2026-01 */ Add Warframe Android closed beta…): https://wiki.archiveteam.org/?diff=60045&oldid=60030
07:08:10<pokechu22>I'm not entirely sure what it looks like, but it would be something in cloudflare settings. Archivebot uses this user-agent by default: https://github.com/ArchiveTeam/ArchiveBot/blob/050c783b01e31af904f3731b32a331a64df836b8/pipeline/pipeline.py#L124-L127
07:08:47<gosc>I got more urls I need help with, I've got a list of versions from a defunct game, all still up, though there's a lot of versions and files
07:09:07<gosc>https://transfer.archivete.am/47E2V/scholastic%20homebase%20info.txt
07:09:07<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/47E2V/scholastic%20homebase%20info.txt
07:10:11<pokechu22>gosc: I'll take a look at that probably tomorrow (maybe later today). The EA stuff was saved successfully though
07:10:30<gosc>thank you for both
07:10:54<gosc>I gotta stop looking into things lmao I wasn't even sure if the sims was finished before I hopped onto this one
07:11:40<gosc>the list I sent here does not include some 1gb files I found, because those files rely on hashes and are thus impossible to guess
07:11:52<gosc>I'll comb through wayback later to get all I can from that
07:56:16<cruller>pabs: I just joined the channel yesterday myself, so I don't know the details.
07:56:42<cruller>However, GDrive and GDocs are very closely related, so I think it makes sense to discuss them in the same place.
07:58:24SootBector quits [Remote host closed the connection]
07:59:33SootBector (SootBector) joins
08:01:44<cruller>Perhaps Docs was simply out of scope in the crisis that led to the creation of the channel.
08:20:56<cruller>It's funny that [[Archive Team]] and [[ArchiveTeam]] have different redirect destinations.
08:25:28<@JAA>Google Docs discussion has happened in #googlecrash before, and I agree it makes sense to keep them in one place.
08:30:01SootBector quits [Remote host closed the connection]
08:30:22SootBector (SootBector) joins
08:33:13Wohlstand (Wohlstand) joins
08:54:11Wohlstand quits [Client Quit]
09:07:54<h2ibot>Hans5958 edited Main Page/Current Projects (+114, Bruh no one told me about bibuke (or I just…): https://wiki.archiveteam.org/?diff=60046&oldid=59329
09:07:55<h2ibot>Hans5958 edited Main Page/Current Projects (+0): https://wiki.archiveteam.org/?diff=60047&oldid=60046
09:08:55<h2ibot>Hans5958 edited Main Page/Current Projects (+0, Wrong date): https://wiki.archiveteam.org/?diff=60048&oldid=60047
09:21:56<h2ibot>Hans5958 edited Bitbucket (+423, Put it on separate project since dunno whether…): https://wiki.archiveteam.org/?diff=60049&oldid=59874
09:23:22Juest quits [Read error: Connection reset by peer]
09:24:43Juest (Juest) joins
09:25:34Wohlstand (Wohlstand) joins
09:40:32Webuser792680 joins
09:40:33Webuser792680 quits [Client Quit]
10:07:40nine quits [Quit: See ya!]
10:07:53nine joins
10:07:53nine quits [Changing host]
10:07:53nine (nine) joins
10:09:02<h2ibot>Manu edited Distributed recursive crawls (+122, Candidates: Add namu.wiki and related pages): https://wiki.archiveteam.org/?diff=60050&oldid=60013
10:20:47<BornOn420>Are we ready yet for an UncleAlisArchive? Iran's government is either gonna crackdown very hard, or topple
10:23:06<BornOn420>I'm trying to archive the most important Telegram channels, but that's quite hard since Telegrab has limited capacity ATM due to eyewas being the preferred project
10:46:16<BornOn420>Also, Iran is #2 worldwide in Telegram usage so I won't be able to just add everything (as could be done with Venezuela) - that would take a year to archive, even as preferred project
10:48:49<Flashfire42>BornOn420 are you suggesting i start doing Iranian youtube channels too?
11:05:09Dada joins
11:18:50<BornOn420>I think its time to collect the government channels - if they haven't been banned in the west yet
11:19:27<BornOn420>But do ask ar.kiver before submitting anything large - he calls the shots, I'm just a news junkie
11:29:25<@arkiver>definitely time to start collecting Iran government youtub channels
11:34:54<cruller>It's a shame that https://www.govdirectory.org/ doesn't have a page for Iran
11:41:31<cruller>I suspect that many Iranian government websites block access from outside the country...
11:52:14Dada quits [Remote host closed the connection]
11:54:53Wohlstand quits [Client Quit]
12:00:02Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:01:30<cruller>https://globalping.io/?measurement=2VZvOGKaKkeXhycV50001zdS1 weird results
12:02:46Bleo182600722719623455222 joins
12:14:44<cruller>Anyway, it's frustrating that I can't even do a whois lookup.
12:27:33<cruller>Btw, isn't there a wiki page or project for whois, rdap, or other basic site metadata collecting/archiving?
12:42:53<triplecamera|m>Yesterday I replied to the wget RfC about WARC 1.1 <https://lists.gnu.org/archive/html/bug-wget/2024-11/msg00010.html>, the maintainer just replied, saying that he will apply the patch this weekend.
12:43:53<triplecamera|m>However, his mail didn't appear in the mailing list. It seems that the "reply" button merely replies to the sender, not the list.
12:53:29<justauser>"Reply" send a mail to "From:". "Reply all" sends a mail to "From:", "To:", "Cc:" sans yourself.
12:53:29Wohlstand (Wohlstand) joins
12:53:59<justauser>On some MLs, "From:" is the mailing list. On others, "From:" is the real sender.
14:12:47<@Sanqui>Sorry if I missed it, but are we doing anything with LiveJournal? It appears to be moribund, at least for western ("non-cyrilic") users. https://bsky.app/profile/rahaeli.bsky.social/post/3mbebi2xfxc25
14:16:22<Hans5958>Yes, on #recordedjournal
14:23:29<BornOn420>arkiver Flashfire42 All major airlines have cancelled their upcoming flights to Tehran… that's serious
14:26:50<@Sanqui>Cheers Hans5958
14:43:22<raccoon>is there a specific channel for coordinating iran-related archiving efforts?
14:46:10ArcadianMaggie quits [Read error: Connection reset by peer]
14:48:45<cruller>https://radar.cloudflare.com/ir A very obvious blackout
14:55:25<justauser>raccoon: We can always create one.
14:55:32<justauser>#iranaway?
14:59:00<raccoon>excellent choice, let's
15:45:48nulldata-alt1 (nulldata) joins
15:52:25lennier2 joins
15:53:25<justauser>https://old.reddit.com/r/Archiveteam/comments/1q8bui4/httpsdesmotivacioneses/ look actionable, even without a source.
15:53:45takamori300 joins
15:54:30<justauser>Might be too large for AB - no sitemap in sight.
15:55:27<justauser>https://desmotivaciones.es/ - "I think is in risk of be completely lost the website by 2027"
15:55:39lennier2_ quits [Ping timeout: 272 seconds]
15:59:24takamori300 quits [Client Quit]
16:03:53<h2ibot>Nintendofan885 edited Distributed recursive crawls (+15, link to [[Namuwiki]] page): https://wiki.archiveteam.org/?diff=60051&oldid=60050
16:05:12<cruller>For some reason that post has now been deleted, but I saved it as mhtml from the cache.
16:06:11<cruller>(just in case)
16:07:15<cruller>ah, they reposted https://old.reddit.com/r/Archiveteam/comments/1q8c84c/desmotivaciones_a_spanish_demotivational_poster/
16:08:48aiueo700 joins
16:09:28aiueo700 is now known as aiueo700|takamori300
16:10:17<cruller>Btw I thought no one was monitoring r/Archiveteam.
16:15:32<TheTechRobo>#archiveteam-reddit forwards it here
16:26:59aiueo700|takamori300 quits [Client Quit]
16:38:57<cruller>I see. I rarely see AT members posting on r/AT, so that's why I thought that.
16:41:27ArcadianMaggie joins
16:53:17bilboed0 quits [Ping timeout: 272 seconds]
16:55:48bilboed0 joins
17:02:04Webuser719998 joins
17:02:09bilboed0 quits [Ping timeout: 272 seconds]
17:02:30Webuser719998 quits [Client Quit]
17:06:10beastbg8 (beastbg8) joins
17:08:29beastbg8__ quits [Ping timeout: 272 seconds]
17:09:49bilboed0 joins
17:12:02<h2ibot>Manu edited Abkhazia (+192, /* Government websites */): https://wiki.archiveteam.org/?diff=60052&oldid=58137
17:22:03<h2ibot>Manu edited Abkhazia (+192, http://ssra.apsny.land/ vanished): https://wiki.archiveteam.org/?diff=60053&oldid=60052
17:31:04<h2ibot>Manu edited Abkhazia (+294, Some apsny.land subdomains disabled/vanished): https://wiki.archiveteam.org/?diff=60054&oldid=60053
17:31:41lucifer_sam joins
17:32:04<h2ibot>Manu edited Abkhazia (-65, Remove dates since they’re part of the listed…): https://wiki.archiveteam.org/?diff=60055&oldid=60054
17:39:13DogsRNice joins
17:40:02gosc quits [Quit: Leaving]
17:46:05<h2ibot>Manu edited Abkhazia (+536, /* Officially recognized de-jure government */): https://wiki.archiveteam.org/?diff=60056&oldid=60055
17:50:06<h2ibot>Manu edited Abkhazia (+178, Queued https://www.portal.abkhazia.gov.ge/): https://wiki.archiveteam.org/?diff=60057&oldid=60056
17:50:06ducky quits [Ping timeout: 256 seconds]
17:50:11ducky (ducky) joins
17:55:21ducky quits [Ping timeout: 272 seconds]
17:56:06<h2ibot>Manu edited Abkhazia (+83, Queued abkhazia-info.com): https://wiki.archiveteam.org/?diff=60058&oldid=60057
17:57:51Dada joins
18:01:20Wohlstand quits [Quit: Wohlstand]
18:07:17ducky (ducky) joins
18:07:23ice quits [Ping timeout: 272 seconds]
18:08:08<h2ibot>Manu edited Abkhazia (+122, /* Officially recognized de-jure government */): https://wiki.archiveteam.org/?diff=60059&oldid=60058
18:08:54ice joins
18:20:41ice quits [Ping timeout: 272 seconds]
18:32:56nine quits [Quit: See ya!]
18:33:09nine joins
18:33:09nine quits [Changing host]
18:33:09nine (nine) joins
18:43:10Wohlstand (Wohlstand) joins
18:47:14<h2ibot>Manu edited Mailing Lists (+29, ml.chaoschemnitz.de uses Mailman 3): https://wiki.archiveteam.org/?diff=60060&oldid=59477
18:51:08Chris5010 quits [Quit: ]
18:51:24Chris5010 (Chris5010) joins
18:58:15<h2ibot>Manu edited Anubis/uncategorized (+23, gitea.c3d2.de deploys anubis): https://wiki.archiveteam.org/?diff=60061&oldid=59076
19:11:31ice joins
19:17:22ice quits [Ping timeout: 256 seconds]
19:17:34ice joins
19:17:41lucifer_sam quits [Ping timeout: 272 seconds]
19:37:12ice quits [Ping timeout: 256 seconds]
19:50:16ATinySpaceMarine quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
19:50:37ice joins
19:50:50ATinySpaceMarine joins
19:58:10ice quits [Ping timeout: 256 seconds]
20:02:39Wohlstand quits [Client Quit]
20:14:03superkuh quits [Ping timeout: 272 seconds]
20:15:27HP_Archivist quits [Read error: Connection reset by peer]
20:19:27superkuh joins
20:24:25HP_Archivist (HP_Archivist) joins
20:40:06NF885 (NF885) joins
20:53:28superkuh_ joins
20:55:51superkuh quits [Ping timeout: 272 seconds]
21:25:34lucifer_sam joins
21:25:57ice joins
21:26:40<h2ibot>Nintendofan885 edited Archiveteam:Copyrights (+72, mention wiki dumps): https://wiki.archiveteam.org/?diff=60064&oldid=57192
21:32:17lucifer_sam quits [Client Quit]
21:32:29lucifer_sam joins
22:00:19nexussfan (nexussfan) joins
22:09:34sec^nd quits [Remote host closed the connection]
22:10:01sec^nd (second) joins
22:15:27lennier2_ joins
22:16:25ArcadianMaggie quits [Remote host closed the connection]
22:18:08lennier2 quits [Ping timeout: 256 seconds]
22:31:29NF885 quits [Client Quit]
23:20:15ice quits [Ping timeout: 272 seconds]
23:48:15joeyo_ joins
23:52:33joeyo quits [Ping timeout: 272 seconds]
23:57:52yasomi quits [Ping timeout: 256 seconds]
23:59:21ice joins