00:01:11VerifiedJ quits [Quit: The Lounge - https://thelounge.chat]
00:01:21gazorpazorp quits [Client Quit]
00:01:37gazorpazorp (gazorpazorp) joins
00:06:54VerifiedJ (VerifiedJ) joins
00:20:15<@OrIdow6>Does anyone have examples of projects from before ~late 2019 that have data in the WBM but aren't viewable there? If I make a viewer for Google Drive, would like parts to be reusable
00:20:27<@OrIdow6>Date because that's about when I joined
00:35:36<h2ibot>JustAnotherArchivist edited Deathwatch (+167, /* 2022 */ Add Meta, fix order): https://wiki.archiveteam.org/?diff=47997&oldid=47949
00:40:29FourFire quits [Remote host closed the connection]
00:44:21LegitSi quits [Remote host closed the connection]
01:01:37dm4v quits [Read error: Connection reset by peer]
01:02:06dm4v joins
01:02:08dm4v quits [Changing host]
01:02:08dm4v (dm4v) joins
01:19:42<katocala>!a https://heimatkunde.boell.de/ --explain "think tank, Heinrich Boll Foundation (kakapo)" --igset blogs,badvideos --useragent firefox --pipeline jap-kakapo
01:20:06<katocala>blah
01:26:42qwertyasdfuiopghjkl quits [Client Quit]
01:39:33etnguyen03 (etnguyen03) joins
02:02:43dm4v_ joins
02:03:51dm4v quits [Ping timeout: 265 seconds]
02:03:51dm4v_ is now known as dm4v
02:03:52dm4v quits [Changing host]
02:03:52dm4v (dm4v) joins
02:06:12etnguyen03 quits [Ping timeout: 244 seconds]
02:35:05etnguyen03 (etnguyen03) joins
03:07:46AlsoHP_Archivist joins
03:11:35HP_Archivist quits [Ping timeout: 258 seconds]
03:12:43Unverified joins
03:15:26epe07 joins
03:32:29Stiletto quits [Ping timeout: 244 seconds]
03:35:05HP_Archivist (HP_Archivist) joins
03:35:27AlsoHP_Archivist quits [Client Quit]
03:35:53Stiletto joins
03:40:42battlekeeper joins
03:41:10pawbs|2 quits [Quit: My ZNC server died. Probably updating my kernel…]
03:44:40AlsoHP_Archivist joins
03:48:10<h2ibot>Tech234a edited YouTube/Technical details (+388, Add additional domains): https://wiki.archiveteam.org/?diff=47998&oldid=47996
03:51:45battlekeeper leaves
03:57:11<h2ibot>JustAnotherArchivist edited YouTube/Technical details (+780, /* Videos */ Add details on previously working…): https://wiki.archiveteam.org/?diff=47999&oldid=47998
04:00:24BlueMaxima joins
04:03:13<h2ibot>JustAnotherArchivist edited YouTube/Technical details (+1635, /* Playlists */): https://wiki.archiveteam.org/?diff=48000&oldid=47999
04:13:14<h2ibot>Tech234a edited YouTube/Technical details (+1719, Add some information about channel URLs): https://wiki.archiveteam.org/?diff=48001&oldid=48000
04:14:21etnguyen03 quits [Client Quit]
04:14:31etnguyen03 (etnguyen03) joins
04:24:26nostalgebraist joins
04:29:41amazo joins
04:29:50<amazo>are the trackers down?
04:30:17<h2ibot>Tech234a edited YouTube/Technical details (+56, /* Playlists */ Add a little more detail about…): https://wiki.archiveteam.org/?diff=48002&oldid=48001
04:34:55<tech234a>amazo: yes except for URLTeam
04:38:06amazo quits [Remote host closed the connection]
04:47:49qw3rty__ joins
04:51:32qw3rty_ quits [Ping timeout: 244 seconds]
05:00:00treora quits [Quit: blub blub.]
05:01:17treora joins
05:12:21AlsoHP_Archivist quits [Ping timeout: 265 seconds]
05:18:05etnguyen03 quits [Ping timeout: 258 seconds]
05:21:55HP_Archivist quits [Ping timeout: 258 seconds]
05:32:34etnguyen03 (etnguyen03) joins
05:35:45jamesp quits [Client Quit]
05:43:46Nay quits [Ping timeout: 265 seconds]
06:01:06etnguyen03 quits [Client Quit]
06:30:55Arcorann_ quits [Ping timeout: 258 seconds]
07:45:35nepeat quits [Read error: Connection reset by peer]
07:46:39nepeat (nepeat) joins
08:03:31BlueMaxima quits [Read error: Connection reset by peer]
09:00:17britmob25636477 quits [Quit: britmob25636477]
09:36:06Arcorann_ joins
09:56:10Hosseinifard (Hosseinifard) joins
09:57:13Hosseinifard quits [Client Quit]
11:45:15TheTechRobo quits [Ping timeout: 258 seconds]
11:46:51enowaldo joins
11:46:57<enowaldo>Legal/forensics Q for 3rd party re: problematic d/l archive.
11:48:14TheTechRobo joins
12:21:33enowaldo quits [Ping timeout: 265 seconds]
12:38:37enowaldo joins
13:00:49enowaldo quits [Ping timeout: 244 seconds]
13:02:17britmob25636477 joins
13:02:38enowaldo joins
13:10:59qwertyasdfuiopghjkl joins
13:15:17enowaldo quits [Ping timeout: 244 seconds]
13:21:17enowaldo joins
13:26:08enowaldo quits [Ping timeout: 244 seconds]
13:29:54Myself quits [Ping timeout: 258 seconds]
13:31:22enowaldo joins
13:36:28enowaldo quits [Ping timeout: 265 seconds]
13:37:38Myself (myself) joins
13:40:38Arcorann_ quits [Ping timeout: 258 seconds]
13:41:30enowaldo joins
13:44:08vukky (Vukky) joins
13:46:23enowaldo quits [Ping timeout: 258 seconds]
13:51:37enowaldo joins
13:56:21enowaldo quits [Ping timeout: 258 seconds]
14:01:44enowaldo joins
14:06:26enowaldo quits [Ping timeout: 244 seconds]
14:11:52enowaldo joins
14:14:49AlsoHP_Archivist joins
14:16:46enowaldo quits [Ping timeout: 244 seconds]
14:21:07Daloader joins
14:21:58enowaldo joins
14:26:44enowaldo quits [Ping timeout: 265 seconds]
14:32:10enowaldo joins
14:36:59enowaldo quits [Ping timeout: 258 seconds]
14:42:13enowaldo joins
14:47:02enowaldo quits [Ping timeout: 265 seconds]
14:47:16HP_Archivist (HP_Archivist) joins
14:52:21enowaldo joins
14:57:04enowaldo quits [Ping timeout: 244 seconds]
15:02:28enowaldo joins
15:07:16enowaldo quits [Ping timeout: 258 seconds]
15:12:37enowaldo joins
15:17:13enowaldo quits [Ping timeout: 244 seconds]
15:18:28enowaldo joins
15:29:00<Nulo>What is a good tool to download *many* small files in a small amount of time?
15:29:38qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
15:35:06<Zeklyn>wget
15:37:06<@JAA>How many in how little time?
15:38:02<rewby>My usual go-to is curl + gnu parallel
15:39:22<Nulo>JAA, 548 thousand, I guess it's not that much
15:39:42<Nulo>From my testing just now parallel gets CPU bound
15:40:06<Nulo>I was trying aria2 but for some reason it's pretty slow (it tries every few seconds instead of constantly downloading)
15:40:14<rewby>Maybe shard the list into like 16 files and run 16 wget -i instances?
15:40:56<rewby>Although I'm curious as to what you're downloading that's that many files
15:41:18<Nulo>Scraping a subtitle website (subdivx.com)
15:41:25<Nulo>I'll try that, thanks
15:41:39<Nulo>The other problem with curl/wget + parallel is that it creates connections for no reason
15:41:55<rewby>Yeah. I think the wget-i approach should keep the connection open
15:42:14<rewby>And ideally there'll always be one downloading while the others do whatever administration they do
15:42:32AlsoHP_Archivist quits [Client Quit]
15:44:03AlsoHP_Archivist joins
15:44:29HP_Archivist quits [Client Quit]
15:44:32AlsoHP_Archivist quits [Remote host closed the connection]
15:44:50HP_Archivist (HP_Archivist) joins
15:49:24<enowaldo>Legal/forensics quesiton for 3rd party re: problematic d/l archive. Any docs or resources available?
15:57:55<@OrIdow6>enowaldo: If this is a question not directed at a specific person, I have no idea what you're talking about
16:03:32<@OrIdow6>Nulo: I don't know if you got told, but ArchiveTeam did start its own download of the site when you talked about it
16:03:57<@OrIdow6>Which looks like it is still running in ArchiveBot
16:04:04<Nulo>OrIdow6, I haven't got told, I couldn't find any info on the wiki so I took the matter into my own hands :P
16:04:47<Nulo>Is there a way to see the progress somewhere?
16:06:56<@OrIdow6>http://dashboard.at.ninjawedding.org/3
16:09:28<Nulo>Okay, thanks. I will continue with my archive just to be sure.
16:09:44<Nulo>The site announced later on that they would stay and not close, but it's not clear so I would archive just to be sure
16:17:56Megame (Megame) joins
16:27:23enowaldo quits [Ping timeout: 258 seconds]
16:32:52enowaldo joins
16:37:43enowaldo quits [Ping timeout: 265 seconds]
16:42:59enowaldo joins
16:47:38enowaldo quits [Ping timeout: 244 seconds]
16:53:05enowaldo joins
16:56:57Nay (JeDa) joins
16:58:01enowaldo quits [Ping timeout: 265 seconds]
17:03:12enowaldo joins
17:07:47enowaldo quits [Ping timeout: 244 seconds]
17:20:33godane2 quits [Client Quit]
17:20:45godane (godane) joins
17:38:06ragu_ joins
17:40:24ragu__ joins
17:41:45ragu quits [Ping timeout: 258 seconds]
17:41:48ragu__ quits [Read error: Connection reset by peer]
17:43:40ragu_ quits [Ping timeout: 258 seconds]
17:44:43ragu__ joins
17:58:56ragu__ quits [Ping timeout: 244 seconds]
18:00:57lunik1 quits [Quit: :x]
18:11:28enowaldo joins
18:12:28<enowaldo>A friend is looking for help for someone in hot water over a CP-tainted archive, political leak. Any organisations or references to suggest?
18:12:57<enowaldo>s/help/legal help/
18:15:57<enowaldo>My understanding is that the content was planted or incidental.
18:16:25<Unverified>iirc apple is using a CSAM hash list, not sure if those would be public but they're hashes to known CP archives that you could scan against to remove such content
18:17:07<enowaldo>Unverified: Right. They're sort of beyond the preventive stage, though that's a good suggestion.
18:17:56<Unverified>that's the best thing I can think of that'll help a little at least lol
18:18:35<enowaldo>Unverified: Right. I was hoping there might be some resource or discussion of how to avoid, or respond if the situation arises.
18:19:22<enowaldo>I've not found anything on AT website/wiki or a few other discussions (e.g., /r/datahoarder).
18:19:41<enowaldo>And I've passed on the usual organisations --- EFF/ACLU.
18:19:50Nay quits [Client Quit]
18:20:07<@JAA>I haven't heard of there having been such issues within AT before. I'm sure IA has dealt with it before though, so maybe shooting them an email might be an idea.
18:21:11<enowaldo>@JAA Also on my list, haven't reached out yet. Thanks.
18:22:31Nay (JeDa) joins
18:25:09<@JAA>By the way, I believe those CSAM hashes aren't public. Or at least I couldn't find them when I looked into it a while ago. The details of the fuzzy hashing methods are also not very public.
18:30:45lunik1 joins
18:31:39lunik1 quits [Client Quit]
18:36:43lunik1 joins
18:42:47<enowaldo>JAA: "Free to qualified organisations" apparently: https://www.microsoft.com/en-us/photodna "1Must be a qualified organization subject to approval by third-party vetting service." https://www.microsoft.com/en-us/PhotoDNA/CloudService
18:43:39<@JAA>Sounds about right.
18:47:27Iki quits [Read error: Connection reset by peer]
18:52:44<Unverified>so its quite unlikely that any small archive projects would be accepted then
18:52:45<Unverified>rip
18:53:41<enowaldo>Unverified: Though AT / IA might stand an in. Stand up an org specifically aimed at validating archives, maybe. Asking is free :)
18:53:47<russss>enowaldo: they don't send you the hashes though. You hash the content and send the hash to them, and then there's some fuzzy-matching going on server-side.
18:54:03<enowaldo>russss: Right. That would be sufficient IMO.
18:54:23<Unverified>probably some image upload API that gives a response
18:55:10<russss>you don't have to upload the image, because that in itself could give rise to liability (company I work for uses PhotoDNA)
18:56:24<enowaldo>russss: I'm guessing some processor that ingests images locally, computes a set of hashes based on transforms, and sends those. Apple and MSFT have some whitepapers / docs.
18:57:07<russss>yeah there is some white paper on PhotoDNA somewhere. I think it's a relatively unsophisticated perceptual hashing system by today's standards
18:59:18<russss>if you did have access to the list of hashes it would probably be quite trivial to generate images which happen to match them.
19:00:23<@JAA>And to manipulate matching images so they no longer match them.
19:16:28<Unverified>alternatively, use blind luck and hope some AI can detect something like that but I doubt there's anything like that online
19:17:08<Unverified>making your own would probably get you in trouble as well so I wouldn't think of it as a smart move lol
20:14:06Iki joins
20:52:13vukky quits [Client Quit]
21:07:12enowaldo quits [Client Quit]
21:25:21Minkafighter quits [Quit: The Lounge - https://thelounge.chat]
21:27:03Minkafighter joins
21:32:13<Ryz>Is there a way to archive https://ufile.io/84mirinu ? It looks like it may be behind a reCAPTCHA - came from https://boards.4channel.org/v/thread/580002471
21:47:35monoxane4 quits [Quit: Ping timeout (120 seconds)]
21:47:49monoxane4 (monoxane) joins
21:47:53BlueMaxima joins
22:27:37Arcorann_ joins
23:14:29qwertyasdfuiopghjkl joins
23:56:10Earendil (Cobalt17) joins
23:58:05Earendil quits [Client Quit]