00:05:13Velocifyer (Velocifyer) joins
00:08:46Velocifyer quits [Client Quit]
00:08:56Velocifyer (Velocifyer) joins
00:19:31HP_Archivist quits [Client Quit]
00:20:34etnguyen03 quits [Client Quit]
00:26:55<myself>there's no bugmenot login for the forums?
01:02:45StarletCharlotte joins
01:05:59etnguyen03 (etnguyen03) joins
01:21:40eightthree quits [Ping timeout: 255 seconds]
01:23:02eightthree joins
01:49:45pixel leaves [Disconnected: Replaced by new connection]
01:49:46pixel (pixel) joins
02:00:55<nulldata>https://www.merklemap.com/
02:07:51<nicolas17>how does that work? certificate transparency?
02:23:15<nulldata>No idea
02:24:20<nulldata>Saw it posted on HN
02:38:45etnguyen03 quits [Client Quit]
02:40:21<@OrIdow6>https://users.rust-lang.org/t/merklemap-ct-subdomain-search-engine/117223 looks like it
02:49:49etnguyen03 (etnguyen03) joins
02:55:23Velocifyer quits [Ping timeout: 256 seconds]
03:00:02<nulldata>So yes basically
03:00:20<nulldata>Someone should suggest they add AB log monitoring lol
03:03:04<pabs>from ##programming <dostoyevsky2> It seems like spotify is killing particle detector :(
03:04:44Nicker8 joins
03:05:32<Nicker8>Imgsrc.ru
03:06:31Nicker8 quits [Client Quit]
03:06:43HP_Archivist (HP_Archivist) joins
03:07:27nicker joins
03:08:07nicker8 joins
03:08:23DogsRNice quits [Read error: Connection reset by peer]
03:09:21nicker8 quits [Client Quit]
03:09:21nicker quits [Client Quit]
03:16:49BlueMaxima quits [Read error: Connection reset by peer]
03:20:42<pabs>heres a ghetto-style client for the merkelmap API: curl -s 'https://api.merklemap.com/search?query=*.google.com&stream=true' | sed 's/^data: //;/^$/d' | jq -r '.domain , .subject_common_name' | sed 's/^\*\.//' | sort -u
03:21:12<nicolas17>what does stream=true do?
03:22:03<pabs>prevents the need for pagination, found it in https://github.com/Barre/merklemap-cli/blob/master/src/lib.rs
03:23:55pabs not sure how this is different to crt.sh too, since it seems to be based on CT logs indeed
03:25:49<pabs>re Particle Detector, https://docs.google.com/spreadsheets/d/1gyAs28Z-5FDHv3VF18o6SMQzpi0_DnKku8Vop5N6UyM/pubhtml#
03:27:12pabs quits [Remote host closed the connection]
03:28:03pabs (pabs) joins
03:28:41etnguyen03 quits [Remote host closed the connection]
03:28:41Nicker8 joins
03:29:02Nicker joins
03:29:27Nicker10 joins
03:29:44<Nicker8>Imgsrc
03:30:32<@JAA>Hi Nicker8, how can we help you?
03:30:49<Nicker8>How do I get to imgsrc.ru?
03:31:26<@JAA>Probably by opening it in a browser, not an IRC client.
03:31:51<Nicker8>It showed in a wiki that it was suppose to be here
03:32:52<@JAA>This is where we would (at least initially) discuss about potentially archiving it.
03:33:05<Nicker8>You haven't archived it?
03:33:52<@JAA>Not to my knowledge.
03:35:14Nicker quits [Client Quit]
03:35:14Nicker10 quits [Client Quit]
03:35:14Nicker8 quits [Client Quit]
03:35:33<@JAA>(Which the wiki says, too.)
03:40:39Nicker8 joins
03:41:36<Nicker8>Why is there a wiki of imgsrc.ru if it don't work on here
03:41:57<nicolas17>"this seems to be at risk of disappearing and maybe we should archive it" goes to the wiki too
03:42:22<nicolas17>the wiki is for internal collaboration, not a catalog of archived data for people to use
03:43:09<nicolas17>Nicker8: was there an announcement of it shutting down soon or something like that?
03:44:14<Nicker8>https://wiki.archiveteam.org/index.php/Imgsrc.ru and I don't get why it makes me think there was archived data when it was at top search
03:44:35<nicolas17>it says "not saved yet"
03:44:46<Nicker8>Darnit.
03:45:39<nicolas17>JAA: huh looks like imgsrc.ru is excluded from the WBM
03:46:20<Nicker8>Because there is pornography on it and wayback machine refuses certain sites that contain pornography and its also a russian site
03:47:09<nicolas17>I gues if archiveteam had archived it, the data would be on archive.org, but inaccessible because the site is in WBM exclusions :P
03:47:41<Nicker8>Wouldn't there be another way to archive the data?
03:48:04<@JAA>The WBM doesn't have an issue with including pornography. Those exclusions are normally manually triggered by a request from the site owners.
03:48:25<Nicker8>It's a russian site so maybe that's why
03:53:31<Nicker8>I dont know if there is another way to archive it
03:53:57<@JAA>Archiving it isn't necessarily the problem. But making the archive accessible is.
03:54:36<Nicker8>You can't if wbm doesn't let you and there should be another way
03:56:27<Nicker8>Wbm shouldn't be the only archive site
03:56:43<nulldata>Being Russian isn't a reason IA excludes for either lol
03:57:16<Nicker8>The site is russian not a person
03:58:38<@JAA>It's expensive to run something like the IA/WBM. Yes, it'd be nice if all the eggs weren't in one basket, but there simply is no other institution currently. And I don't see that changing either.
03:59:45<Nicker8>There's other types of archives sites I think when you search but I don't know if it would work for imgsrc.ru
04:00:01<@JAA>Nothing is comparable in scale to IA.
04:01:16<Nicker8>What about archive.su?
04:01:37<Nicker8>Archive.fo*
04:03:34<@JAA>Yeah, it's cute. It's over 100 times smaller than IA, IIRC.
04:04:11<Nicker8>are
04:04:26<Nicker8>archive.today could work right?
04:06:01<@JAA>It's slow, not automatable, has no data exports (and never supported WARC), etc. I don't consider it a proper archive.
04:06:16<@JAA>It's also entirely unclear who runs it etc.
04:06:24<Nicker8>What else is nothing?
04:10:34<nicolas17>Nicker8: is imgsrc.ru at risk of shutdown?
04:11:13<Nicker8>And I'm pretty sure it would depend
04:12:03<Nicker8>It's ran since 2006 and surprisingly has not been archived like wow its been around since 2006 like youtube been around since 2005 already was archived and imgsrc.ru hasn't? Wow
04:12:32<nicolas17>lol surely you don't think we have the entire youtube archived
04:13:17<Nicker8>Atleast you have youtube on wbm still
04:14:18<nulldata>Only an extremely small portion compared to what is available lol
04:14:47<Nicker8>I mean so what you expect everything to have everything on it?
04:15:41<nicolas17>we don't know why imgsrc is blocked on WBM, we're not the Internet Archive
04:16:01<Nicker8>Pretty sure because it doesn't accept .ru url
04:17:14<Flashfire42>Thats not how any of this works anymore. Since Archive.org started ignoring robots.txt it is only really excluded if its CSAM or if the website owner requests its removal
04:17:36<Nicker8>Oh maybe both
04:18:00<Nicker8>Well actually there's still CSAM on archive.org possibly depends on what your type is
04:18:11<nicolas17>https://web.archive.org/web/20240907062326/https://cctld.ru/
04:18:18<steering>If it were me, I would not be hosting anything that came from imgsrc.ru
04:18:20<nicolas17>dunno where you get the idea that all of .ru is blocked
04:18:50<Nicker8>I don't know where he gets the idea of how CSAM content is all blocked
04:19:59<Flashfire42>I meant on the wayback machine its only manual exclusions these days. If the website is requested to be removed by the site owner or if someone emails the archive.org info email adress and goes hey t.me/DEFSNOTILLEGAL/8887 has CSAM maybe you should block that from being viewed
04:20:11<Flashfire42>Thats what I meant
04:20:30<Flashfire42>Things dont generally get excluded from wayback
04:20:36<Nicker8>Yeah I get it it's not fully accurate of what it does I know
04:21:48<Flashfire42>There arent blanket blocks on tld tho
04:22:58<Nicker8>What do you mean blanket blocks lol
04:24:03<Flashfire42>I meant they wont just block a whole TLD. the individual site owner has to say hey exclude my website from archive.org or an admin at IA has to go Well fuck keeping this available to the public is not worth it right now lets dark it
04:24:45<Nicker8>Pretty sure the owner probably didn't want it on there
04:25:05<nicolas17>what do you want us to do?
04:25:06<Flashfire42>Then yeah that is what has happened then they would have manually requested exclusion
04:26:04<Nicker8>Find another site like wbm if it's possible or find an archiving accessible data site that either gets created or already is there
04:31:46<nulldata>You're welcome to create your own
04:32:08<Nicker8>I have to do all the work?
04:32:42<nulldata>Build it and they will come - maybe
04:33:00<Nicker8>I just signed up to this and this is what I'm getting right now
04:33:02<nicolas17>we have to do all the work for you?
04:33:15<Nicker8>I thought that's what your job is
04:33:25<TheTechRobo>our job is to archive the data
04:33:27<TheTechRobo>not to store it
04:33:27<Nicker8>The "archiveteam"
04:33:34<TheTechRobo>Storing data is expensive
04:33:54<TheTechRobo>The Mildom project is 240TiB so far. That's roughly ten 24TB hard drives
04:34:00<nulldata>There's plenty of existinf open source AT projects to bootstrap from
04:34:15<nicolas17>why should we work on archiving imgsrc.ru with so many other sites to archive? (some of which already announced they will shut down soon so they are obviously a priority)
04:34:39<Nicker8>Isn't imgsrc.ru one of them?
04:34:52<nicolas17>did they announce they are shutting down soon?
04:35:18<Nicker8>You don't know when it can shut down sometimes they don't tell you and it gets shut down
04:35:52<nulldata>Funny, I feel like that's been asked a few times in the past hour with no answer lol
04:36:21<nicolas17>sure... meanwhile there's another site that announced it will shut down in September 1st, and it did, but some videos are still in the CDN so we're saving as much as we can
04:36:41<Nicker8>It's September 7th what are you talking about
04:36:43<nicolas17>and we're having trouble with that because our intermediate servers are absolutely saturated with network and disk
04:37:32<nulldata>Do you remember? The 21 first night of September.
04:37:32<nicolas17>exactly, the website shut down 7 days ago and we're saving the data that is still in their file server, if we got the links from the website before the shutdown
04:39:05<Nicker8>This has to be a circus right now this is looking like a stand up comedy show that didn't get many claps right now and apparently this isn't what I expected to sign up
04:39:29<nicolas17>yeah you thought you could come here and give a request and everyone would drop what they are doing and work on your request for free
04:39:54<Nicker8>It wasn't a request it was to see if imgsrc.ru was already archived
04:40:17<nicolas17>maybe it is, maybe it isn't, we can't know because archive.org blocked it, maybe go ask archive.org why it's blocked
04:40:43<Nicker8>We already know apparently you didn't see that
04:40:54<Flashfire42>Nicolas17 is being a bit of a dick but he is saying exactly what we are all thinking. We have to prioritise what we can. It would be great to grab every video every uploaded to youtube but we dont have the space or time to do that. We only archive we cant control if something is blocked from the archive. It fucking sucks but they are well within
04:40:54<Flashfire42>their rights to block it
04:41:08<Flashfire42>And I commend him for saying what we are all thinking
04:41:35<nicolas17>Flashfire42: I guess you were still typing when Nicker8 successfully out-dicked me :P
04:41:50<Flashfire42>yeah I was XD
04:42:07<Nicker8>Yeah do you think you can control everything that happens on imgsrc.ru like removed accounts or removed posts hm?
04:42:31<Nicker8>Exactly that's what could've been archived
04:43:58<Nicker8>I don't think nulldata knows his days very well lol
04:44:02<nicolas17>maybe go ask archive.org why it's blocked
04:44:19<Nicker8>Bro we already know did you not see?
04:47:27<Flashfire42>Nicker8 my advice. If you want to help? Run a warrior. Donate to the archive.org fundraisers. Stick around for a bit. recommend a few sites to run in archivebot. Get to know how we work and then you can start to mkae polite requests
04:47:39<nulldata>Can't you see? Woha can't you see.
04:48:01<Nicker8>I have to pay money now?
04:48:56<Flashfire42>I mean you dont have to donate money directly to the archive but if you come in here making demands of a site that hasnt announced a shutdown then you should be contributing something
04:48:59<Nicker8>This isn't what I thought it was about requests
04:49:36<@OrIdow6>This ain't joquinit, this seems overly antagonistic
04:49:52<nulldata>ArchiveTeam OnlyFans
04:49:56<Flashfire42>I mean assuming archivebot isnt still on fire it might be able to be run in archivebot but we cant do anything about making it Publically accessible if the site is manually excluded from the wayback machine
04:50:13<@OrIdow6>Nicker8: So from my understanding, imgsrc.ru is important to the Russian Internet, but not curreently at immediate heightened risk?
04:50:14<nicolas17>Flashfire42:
04:50:36<nicolas17>Flashfire42: nsfw content seems to have a click-through that sets a cookie, so at least *that* won't work with archivebot
04:50:38<Nicker8>Nope it is
04:50:48<@OrIdow6>Nicker8: How so?
04:51:04steering passed around popcorn
04:51:10<Nicker8>It's very rare?
04:52:05<@OrIdow6>Nicker8: When you say "nope it is", are you replying to "imgsrc.ru is important to the Russian Internet", or to "but not curreently at immediate heightened risk"?
04:52:15<@OrIdow6>*currently
04:52:38Nicker8 quits [Client Quit]
04:52:42<@OrIdow6>:|
04:53:29DigitalDragons comes back from grabbing snacks
04:53:34<DigitalDragons>aw they left
04:53:38<Flashfire42>DigitalDragons well they left but I still want snacks
04:53:50<nicolas17>I remembered this recent conversation https://cdn.discordapp.com/attachments/779844089196576809/1281668489425322184/image0.jpg?ex=66dddfcc&is=66dc8e4c&hm=4e3d1745791353bfa6a487f310c78284161931ed4ac5eb53ee37a9d8a41a29b8&
04:54:12<@OrIdow6>And here I was actually trying to communicate...
04:54:28<steering>nicolas17: lmao
04:54:36steering thinks back to 13 on irc
04:54:38<steering>you're not wrong
04:55:15<steering>OrIdow6: I don't think it's at any more risk than any other semi-legal free image host that allows porn
04:55:30<nicolas17>"it could shut down at any moment you never know" is the default state of every website
04:55:34<steering>^
04:55:40<@OrIdow6>steering: When you say "semi-legal" do you mean it hosts a lot of child porn?
04:55:52<steering>OrIdow6: it certainly used to have a reputation for such.
04:56:26<steering>it has "passworded" galleries that are, or were, basically unmoderated
04:56:49<nicolas17>I browsed around the nudity category and didn't even find actual nudity, only very risque teasing
04:57:25<@OrIdow6>If it's indeed a Russian site I guess that means the Russian political/media environment would be what determines if it's at heightened risk for that?
04:57:43<Flashfire42>CSAM not child porn
04:57:56<@OrIdow6>I'm not familiar with Russian politics
04:58:23<steering>did a search for "young" and yup I didn't want to see any of that, even if it is AI
04:58:37<steering>glad I used tor *washes hands*
04:58:51<nicolas17>oh yeah I didn't try that kind of thing
04:59:30<@OrIdow6>Could still be a nice proactive project to do I guess? Image hosts to tend to die rather dramatically
04:59:42<@OrIdow6>But that makes it priority number... ca 70
05:00:06<nicolas17>what about WBM's block though
05:01:10<@OrIdow6>I can't give you a super well-thought-out answer to that, but in short even if the WBM blocks it now it still may be useful in 50 years
05:01:36<@OrIdow6>Certainly does make it less attractive though
05:05:49<DigitalDragons>WBM excluded sites have been done before (zippyshare)
05:06:52<steering>if it were in the US I'd say it's certain to at least go the way of imgur, but yeah I don't know about RU
05:08:35<steering>I'm guessing it's excluded due to the quantity of "depictions of minors" which are illegal in the US (but maybe not RU?)
05:10:44<@OrIdow6>("And here I was actually trying to communicate" was not meant to be casting shade BTW, implicit "in my laborious manner" at the end)
05:11:25<DigitalDragons>If they're not imminently shutting down, I don't see much of a reason for an organized grab
05:12:25<DigitalDragons>Especially if they're well known for hosting "depictions of minors"
06:06:58flotwig quits [Ping timeout: 255 seconds]
06:08:28flotwig joins
06:24:00Wohlstand quits [Client Quit]
07:04:13sepro8 (sepro) joins
07:04:34sepro quits [Ping timeout: 255 seconds]
07:04:34sepro8 is now known as sepro
07:05:51Unholy236192464537713 (Unholy2361) joins
07:17:02<h2ibot>VoynichCr created Talk:INTERNETARCHIVE.BAK/torrents implementation (+1257, Created page with "== We can simply the…): https://wiki.archiveteam.org/?title=Talk%3AINTERNETARCHIVE.BAK/torrents%20implementation
07:55:57sepro7 (sepro) joins
07:57:13sepro quits [Ping timeout: 255 seconds]
07:57:13sepro7 is now known as sepro
08:01:37sepro3 (sepro) joins
08:03:04sepro quits [Ping timeout: 255 seconds]
08:03:04sepro3 is now known as sepro
09:57:33nicolas17 quits [Ping timeout: 256 seconds]
10:22:32nicolas17 joins
10:24:58loug joins
10:32:06nicolas17 quits [Read error: Connection reset by peer]
10:32:43sepro1 (sepro) joins
10:32:52nicolas17 joins
10:33:49sepro quits [Ping timeout: 255 seconds]
10:33:49sepro1 is now known as sepro
10:53:10sepro quits [Ping timeout: 255 seconds]
11:00:04Bleo1826007227196 quits [Client Quit]
11:00:31sepro (sepro) joins
11:01:17Bleo1826007227196 joins
11:08:42beastbg8 (beastbg8) joins
11:37:33SkilledAlpaca4 quits [Quit: SkilledAlpaca4]
11:39:10SkilledAlpaca4 joins
11:59:57jumpedwolf joins
12:00:25jumpedwolf quits [Client Quit]
12:02:53etnguyen03 (etnguyen03) joins
12:10:14Velocifyer (Velocifyer) joins
12:12:28ixitUIIRX joins
12:21:03<ixitUIIRX>Greetings, does anyone is operator (@) or voice (+), can help me archive a link in #archivebot, thank you!
12:41:19Velocifyer quits [Ping timeout: 256 seconds]
12:44:41sepro7 (sepro) joins
12:45:40sepro quits [Ping timeout: 255 seconds]
12:45:40sepro7 is now known as sepro
12:52:51Velocifyer (Velocifyer) joins
13:05:37Barto quits [Quit: WeeChat 4.4.1]
13:12:19Barto (Barto) joins
13:17:44etnguyen03 quits [Client Quit]
13:35:15etnguyen03 (etnguyen03) joins
13:56:41Velocifyer quits [Ping timeout: 256 seconds]
13:57:14etnguyen03 quits [Client Quit]
14:01:55yano quits [Quit: WeeChat, the better IRC client, https://weechat.org/]
14:01:58etnguyen03 (etnguyen03) joins
14:02:15yano (yano) joins
14:03:15decky quits [Read error: Connection reset by peer]
14:03:40decky joins
14:11:41etnguyen03 quits [Client Quit]
14:16:31etnguyen03 (etnguyen03) joins
14:44:55ixitUIIRX quits [Ping timeout: 255 seconds]
15:10:29Velocifyer (Velocifyer) joins
15:32:13briankrebs quits [Remote host closed the connection]
15:35:10briankrebs joins
16:05:26emily quits [Client Quit]
16:09:54pseudorizer (pseudorizer) joins
16:11:33<@JAA>TIL my job description here.
16:12:07Velocifyer quits [Ping timeout: 256 seconds]
16:31:35dadseeddad joins
16:32:08dadseeddad78 joins
16:33:15dadseeddad78 quits [Client Quit]
16:33:15dadseeddad quits [Client Quit]
17:02:10etnguyen03 quits [Client Quit]
17:23:19aninternettroll quits [Ping timeout: 255 seconds]
17:27:24aninternettroll (aninternettroll) joins
17:40:58shgaqnyrjp_ (shgaqnyrjp) joins
17:43:35shgaqnyrjp quits [Ping timeout: 260 seconds]
17:46:28etnguyen03 (etnguyen03) joins
18:04:36shgaqnyrjp_ is now known as shgaqnyrjp
18:17:53loug quits [Client Quit]
18:18:10loug joins
18:24:58balrog quits [Ping timeout: 255 seconds]
18:32:03balrog (balrog) joins
19:02:09Velocifyer (Velocifyer) joins
19:13:07StarletCharlotte quits [Ping timeout: 255 seconds]
19:13:27Chris5010 quits [Ping timeout: 256 seconds]
19:15:46StarletCharlotte joins
19:22:13etnguyen03 quits [Client Quit]
19:23:57etnguyen03 (etnguyen03) joins
19:36:07Unholy236192464537713 quits [Ping timeout: 256 seconds]
20:10:41Velocifyer quits [Ping timeout: 256 seconds]
20:18:49etnguyen03 quits [Client Quit]
20:20:19lizardexile quits [Ping timeout: 256 seconds]
20:22:41Unholy236192464537713 (Unholy2361) joins
20:57:05BlueMaxima joins
20:59:04parfait quits [Read error: Connection reset by peer]
21:04:46Velocifyer (Velocifyer) joins
21:08:29Velocifyer quits [Client Quit]
21:08:43Velocifyer (Velocifyer) joins
21:13:01Velocifyer quits [Ping timeout: 256 seconds]
21:18:58Velocifyer (Velocifyer) joins
21:23:47Velocifyer quits [Ping timeout: 256 seconds]
21:50:15<nicolas17>I have been privately informed that "toucharcade.com is likely heading for a shutdown"
21:50:49cow_2001 quits [Quit: ✡]
21:52:56cow_2001 joins
21:56:01<thuban>seems like it should be ok to ab, i'll start a job
21:57:18<nicolas17>looks like it's wordpress
22:30:05etnguyen03 (etnguyen03) joins
23:01:32CrimsonCream quits [Quit: Client closed]
23:09:23<nicolas17>thuban: toucharcade AB queue keeps growing a lot (210k URLs), any idea what that could be?
23:10:20<nicolas17>I guess we won't know until it finishes with the current recursion depth level and goes to the next ones
23:13:51etnguyen03 quits [Client Quit]
23:25:18Island joins
23:32:59xarph_ quits [Ping timeout: 256 seconds]
23:33:08<thuban>nicolas17: article pages + related resources (images) + forum threads
23:34:20xarph joins
23:50:13Velocifyer (Velocifyer) joins
23:50:51<thuban>oh, and external links.
23:53:04<thuban>which is not to say that there's nothing that should be ignored, but i did a bunch of ignores during the first few levels which should cover most of it
23:53:15etnguyen03 (etnguyen03) joins
23:58:24loug quits [Client Quit]