00:05:28Sluggs quits [Ping timeout: 265 seconds]
00:05:54Sluggs joins
00:12:04Sluggs quits [Ping timeout: 252 seconds]
00:12:54Sluggs joins
00:23:26Sluggs quits [Ping timeout: 252 seconds]
00:23:47Sluggs joins
00:37:25@dxrt quits [Quit: ZNC - http://znc.sourceforge.net]
00:44:55dxrt joins
00:44:57dxrt quits [Changing host]
00:44:57dxrt (dxrt) joins
00:44:57@ChanServ sets mode: +o dxrt
01:22:06fl0w_ joins
01:24:14<datechnoman>What are your thoughts arkiver to the above comments? We never discussed load being an issue causing urls/page requisites to be dropped. We could look at running a little slower or just pushing all found URL's to a separate tracker/queue to be moved into the backfeed slowly until the issue is resolved?
01:24:43<datechnoman>Note that I dont have any understanding of the inner-outer working of the tracker backend so that solution may not be workable at all
01:25:42fl0w quits [Ping timeout: 265 seconds]
01:26:38<@JAA>datechnoman: It's not related to load as I understood it.
01:28:17<datechnoman>So it could be just a case of running through items that are in the tracker queue and stashing discovered urls to be processed by the bloom filter elsewhere? (If tenable)
01:28:59<datechnoman>Remove the automated function temporally
01:39:21<@JAA>I guess that should be possible in theory. But the bloom filters are massive. As in 'hundreds of GB of RAM' massive, I believe. So it's not exactly easy to do.
01:40:02<datechnoman>sorry i mean store the urls elsewhere until the bloom filter issue is properly solved
01:40:10<datechnoman>Dont want to move the bloom filters nooooooooooo
01:40:28<@JAA>Yeah, but then we can only run through the existing backlog, nothing more.
01:40:56<@JAA>And there isn't that much of that, is there?
01:41:20<datechnoman>Correct. Better than nothing. The #// backlog is nearly 300 million items
01:41:24<datechnoman>So pretty large
01:41:27<TheTechRobo>Maybe we could queue URLs both to the backfeed AND a separate location? Then after everything is fixed, we run the separate location through the backfeed again, which will catch duplicates. Or would that be too resource-intensive?
01:41:50<datechnoman>That aint a bad idea unless like you said compute is a killer later down the track
01:42:35<datechnoman>9.84M out + 291.80M to do
01:43:37<datechnoman>Its also new month when we re-archive a metric tonne of websites sitemaps
01:48:32<@JAA>Not a clue what order of magnitude of data we're talking there.
01:48:46<@JAA>Re keeping a copy of the backfeed stream
01:50:57<datechnoman>If its .ztsd zipped it will be much smaller but it will require some storage. I guess real ballpark figure is the URLTeam project exports 8 million url's into a compressed file of approximately 400MB. We would need a few TB's of local storage at a bare minimum
01:51:49<datechnoman>Not sure how tenable that would be over time though as it will keep growing, unless we can slowly process them and verify they are going through the bloom filter and being queued
01:52:20Mateon2 joins
01:52:21<@JAA>That's not the figure I mean. I have no idea what the rate of URLs thrown at #// for example is or what the dupe rate is.
01:52:51<@JAA>I suppose this temporary dump could be deduped with another separate bloom filter, but that's just asking for trouble. :-)
01:53:47<datechnoman>Haha yeah. Well it goes hetic when we run the sitemaps at the start of the month. I see typically 700,000 urls per minute being discovered which I assume are sent to the bloom filter for processing
01:53:52<@JAA>And that'd basically be what the backfeed server does, anyway, which is broken, so at that point we're reinventing the wheel rather than fixing the bug.
01:54:00Mateon1 quits [Ping timeout: 252 seconds]
01:54:00Mateon2 is now known as Mateon1
01:54:09<datechnoman>Can we squash the bug? :P easy fix
01:54:29<TheTechRobo>"Why are programmers paid so much? They just have to fix bugs and add features, that's easy!" :-)
01:54:46<datechnoman>https://media.tenor.com/ptLJfHc0PV4AAAAC/bug-bash-dt-bug-bash.gif
01:54:58<@JAA>Alternatively: 'Nothing works, why am I even paying you?'
01:55:10<@JAA>(And after it's fixed: 'Everything works, why am I even paying you?')
01:55:16<datechnoman>Lets get ChatGPT to fix the tracker
01:56:14<datechnoman>well the bloom filter issue
02:01:32leo60228 quits [Quit: ZNC 1.8.2 - https://znc.in]
02:01:55leo60228 (leo60228) joins
02:55:22Mateon2 joins
02:57:04Mateon1 quits [Ping timeout: 252 seconds]
02:57:04Mateon2 is now known as Mateon1
02:58:59@rewby quits [Ping timeout: 265 seconds]
03:01:13rewby (rewby) joins
03:01:13@ChanServ sets mode: +o rewby
03:34:28Stiletto joins
03:44:48katocala quits [Remote host closed the connection]
03:54:51katocala joins
04:42:07thuban quits [Read error: Connection reset by peer]
04:42:25thuban joins
04:43:58sonick quits [Client Quit]
05:25:12<h2ibot>Bear edited List of websites excluded from the Wayback Machine (+761, The first known .plus domain to be excluded is…): https://wiki.archiveteam.org/?diff=49506&oldid=49504
05:25:13<h2ibot>Bear created The Chive (+446, Yes, you heard it right. That's "chive", not…): https://wiki.archiveteam.org/?title=The%20Chive
05:25:14<h2ibot>Bear created Wayback Machine exclusions (+64, A shorter and more memorable variant for the…): https://wiki.archiveteam.org/?title=Wayback%20Machine%20exclusions
05:25:15<h2ibot>Bear edited V Live (+49, category): https://wiki.archiveteam.org/?diff=49509&oldid=49422
05:25:16<h2ibot>Bear edited 4chan (+48, As far as I can remember, 4chan is excluded…): https://wiki.archiveteam.org/?diff=49510&oldid=49392
05:25:17<h2ibot>Bear created Mortis (+731, Created page with "{{Infobox project | title =…): https://wiki.archiveteam.org/?title=Mortis
05:25:18<h2ibot>CreaZyp154 edited List of websites excluded from the Wayback Machine/Partial exclusions (-88, Removed a link because it is available on the…): https://wiki.archiveteam.org/?diff=49512&oldid=49315
05:42:50lun4 quits [Client Quit]
05:42:51lun42 (lun4) joins
05:42:56fl0w_ quits [Remote host closed the connection]
05:42:59fl0w joins
06:00:18<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=49513&oldid=49506
06:07:48eroc1990 quits [Remote host closed the connection]
06:08:03lennier1 quits [Client Quit]
06:08:11eroc1990 (eroc1990) joins
06:08:23lennier1 (lennier1) joins
06:19:32Arcorann (Arcorann) joins
06:24:44BlueMaxima quits [Read error: Connection reset by peer]
07:02:06LegitSi quits [Ping timeout: 265 seconds]
08:14:53hitgrr8 joins
08:25:58Sluggs quits [Ping timeout: 252 seconds]
08:28:55Sluggs joins
08:33:18Sluggs quits [Ping timeout: 252 seconds]
08:34:12Sluggs joins
08:35:26<@JAA>Oh yeah, we didn't do anything significant about keybase.pub, did we?
08:38:26Sluggs quits [Ping timeout: 252 seconds]
08:42:41Sluggs joins
08:47:58Sluggs quits [Ping timeout: 252 seconds]
08:48:26Sluggs joins
08:58:06Sluggs quits [Ping timeout: 265 seconds]
08:58:28Sluggs joins
09:11:23jspiros_ quits [Client Quit]
09:11:24qwertyasdfuiopghjkl quits [Remote host closed the connection]
09:11:24fuzzy8021 quits [Read error: Connection reset by peer]
09:11:35fuzzy8021 (fuzzy8021) joins
09:11:39jspiros (jspiros) joins
09:15:05leo60228- (leo60228) joins
09:15:24leo60228 quits [Client Quit]
09:15:24fangfufu quits [Client Quit]
09:15:29fangfufu joins
09:17:26Arcorann quits [Ping timeout: 253 seconds]
09:17:30Arcorann (Arcorann) joins
09:45:05michaelblob quits [Read error: Connection reset by peer]
10:04:21qwertyasdfuiopghjkl joins
10:05:01umgr036 quits [Remote host closed the connection]
10:05:15umgr036 joins
11:05:02LeGoupil joins
11:31:52eroc1990 quits [Ping timeout: 252 seconds]
11:31:52LeGoupil quits [Ping timeout: 252 seconds]
11:52:06umgr036 quits [Remote host closed the connection]
11:52:55umgr036 joins
11:57:10Sluggs quits [Ping timeout: 252 seconds]
11:57:36Sluggs joins
12:00:38eroc1990 (eroc1990) joins
12:21:41hogchips (shoghicp) joins
12:28:32LeGoupil joins
12:35:13michaelblob (michaelblob) joins
12:46:11drin joins
12:46:18geezabiscuit quits [Ping timeout: 252 seconds]
12:46:53drin is now known as geezabiscuit
13:00:14Arcorann quits [Ping timeout: 252 seconds]
14:52:48knecht420 quits [Ping timeout: 252 seconds]
15:15:35knecht420 (knecht420) joins
15:53:13hitgrr8 quits [Client Quit]
16:38:40LeGoupil quits [Client Quit]
16:49:46hitgrr8 joins
17:03:45gazorpazorp quits [Quit: Leaving]
17:26:55<Jake>:( Don't believe so
17:29:48Ketchup901 quits [Client Quit]
17:31:10Ketchup901 (Ketchup901) joins
18:19:29HP_Archivist (HP_Archivist) joins
18:28:44<@JAA>cm: The idea comes up every now and then, and it's a good one, but it sadly can't work. Non-repudiation just wasn't a design goal. TLS works by first doing a key exchange with asymmetric algorithms, and then the agreed-on key is used for symmetric encryption of the payload. So if you keep the client-side internal state (pre-master secret etc.), you can prove that a specific key was used in the TLS
18:28:50<@JAA>connection to a specific server. But that's where it stops. You can freely manipulate the payload since it's encrypted symmetrically.
18:29:27<@JAA>There were attempts to make this work. TLS Sign is one of those. They didn't get anywhere.
18:31:13<@JAA>(And before someone brings up AES-GCM et al.: no, the payload still isn't authenticated there, only the sequence number, protocol version, packet type, and packet length. See RFC 5246 section 6.2.3.3 for example.)
18:34:08<cm>great thanks for the thorough explanation
18:35:14<@JAA>(Which probably means you can't manipulate the total length of the payload, I guess. Better than nothing, but not that useful overall.)
18:55:04<cm>hm yeah that's something
19:02:13<@JAA>Actually, no, still useless, because AES-GCM is still a symmetric algorithm. The authentication tag is not a signature.
19:02:25nothere quits [Quit: Leaving]
19:02:56systwi_ joins
19:05:13systwi_ quits [Client Quit]
19:05:47superkuh_ quits [Remote host closed the connection]
19:06:04superkuh_ joins
19:16:20nothere joins
19:45:32LegitSi joins
20:07:37lennier1 quits [Client Quit]
20:08:20lennier1 (lennier1) joins
20:10:01CreaZyp154 joins
20:27:52<CreaZyp154>Yesterday someone sent an open directory containing .IPAs and APKs, turns out they have a lot of interesting stuff and the link submitted was for http://s1.bitdl.ir/ but I checked out and there is http://s2.bitdl.ir/, http://s3.bitdl.ir/, etc up to at least 28 i'll check for more (s4, s7, s17 s21-s26 timeout; s6, s8, s16, s19, s27, s30 error; no
20:27:52<CreaZyp154>s0, s20, s29 (DNS_PROBE_FINISHED_NXDOMAIN) ; other works), they all have open directories so idk if it is worth putting in #archivebot
20:32:22<@JAA>Yeah, that site's been circulating in /r/opendirectories for years.
20:33:18<CreaZyp154>oh... didn't know that
20:34:00<@JAA>And until a couple years ago, it was known as bitdownload.ir.
20:34:19<anarcat>holy crap, wth
20:34:24<@JAA>It's full of *totally* legitimate stuff.
20:35:00<@JAA>Servers went up to at least s33 at one point, although quite a few are dead, yeah.
20:36:02<CreaZyp154>yeah after s32 and there's no more server (there's also video.bitdl.ir but I think that's it (thanks subdomain finder hehe))
20:36:37<CreaZyp154>s33 is timing out and above that it's all DNS_PROBE_FINISHED_NXDOMAIN
20:58:00HP_Archivist quits [Client Quit]
21:01:22CreaZyp154 quits [Remote host closed the connection]
21:04:59sec^nd quits [Ping timeout: 276 seconds]
21:06:52HP_Archivist (HP_Archivist) joins
21:08:06Ketchup901 quits [Client Quit]
21:09:40Ketchup901 (Ketchup901) joins
21:10:42sec^nd (second) joins
21:14:23HP_Archivist quits [Client Quit]
21:15:09michaelblob_ (michaelblob) joins
21:15:17umgr036 quits [Remote host closed the connection]
21:15:17michaelblob quits [Remote host closed the connection]
21:15:25umgr036 joins
21:25:45<kpcyrd>sooo, I'm trying to archive InRelease files (what apt-get works with) from various high-profile repositories. Since this data is signed, I developed a p2p network to collect and exchange these files. Channel is ##apt-swarm-p2p on hackint.
21:27:26user_ joins
21:27:51umgr036 quits [Remote host closed the connection]
21:27:57<kpcyrd>the code I have so far is on github: https://github.com/kpcyrd/apt-swarm
21:42:42user_ quits [Remote host closed the connection]
21:42:55user_ joins
22:00:19<h2ibot>JAABot edited CurrentWarriorProject (-2): https://wiki.archiveteam.org/?diff=49514&oldid=49450
22:09:21<h2ibot>Ravishshah edited ArchiveBot/Educational institutions/list (+74, /* Unsorted */): https://wiki.archiveteam.org/?diff=49515&oldid=48889
22:20:52treora quits [Ping timeout: 252 seconds]
22:21:49treora joins
22:25:23<h2ibot>JustAnotherArchivist edited Deathwatch (+249, /* 2023 */ Add WirelessAdvisor.com): https://wiki.archiveteam.org/?diff=49516&oldid=49500
22:33:57BlueMaxima joins
22:41:21hitgrr8 quits [Client Quit]
23:39:34Atom-- joins
23:43:55Atom quits [Ping timeout: 252 seconds]