00:00:38nertzy joins
00:02:01Earendil7 quits [Ping timeout: 255 seconds]
00:06:39Arcorann (Arcorann) joins
00:15:07Earendil7 (Earendil7) joins
00:20:23davo joins
00:20:59davo quits [Client Quit]
01:43:51qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
01:50:54BlueMaxima quits [Read error: Connection reset by peer]
02:07:43tmob joins
02:09:35Macsteel (Macsteel) joins
02:11:48<Macsteel>Hello, I'm here for a suggestion.
02:18:04<thuban>fire away
02:18:24<Macsteel>BD25.eu was a rich Usenet index with the largest Bluray ISO catalog.
02:18:44<Macsteel>The site shut down, but someone made a 42gig archive of just NZBs.
02:18:58<Macsteel>These NZBs are for all Bluray ISOs.
02:19:24<Macsteel>While you can consider the site "archived," everything on Usenet is prone to retention/DMCA, like the Blurays...
02:19:53<Macsteel>Would archiving these ISOs be a project of interest?
02:22:11<thuban>theoretically yes; in practice, archiveteam data goes to the internet archive, which is itself subject to dmca and as such frowns on gross piracy.
02:23:48<thuban>(but got a link to the archive?)
02:23:55<Macsteel>Yeah
02:24:11<fireonlive>offhand do you know roughly how big the dataset would be?
02:24:29<Macsteel>42 gigs beware. All Bluray NZBs inside.
02:24:36<Macsteel>https://nzbstars.com/?page=getnzb&action=display&messageid=bEs4UlM1Z1dmawemuG1DxAmwi0v%40spot.net
02:24:58<fireonlive>ah, could also calculate that from the NZBs themselves
02:25:55<Larsenv>Macsteel isn't that on cabal trackers?
02:25:57<Larsenv>also nzbstars sucks
02:27:05<Macsteel>well releases were "bd25", bd50", "bd100etc". Numbers implying the disc size(?) in ISO format. So anywhere from 25 to 100gigs each.
02:27:31<Larsenv>yeah, I'm aware of the scrape, I bet the nzbs are out there
02:27:55<Larsenv>afaik if you download a collection of nzbs on sabnzbd it will download every nzb in the file
02:28:00<Macsteel>All NZBs are within that NZB. lol
02:28:09<Larsenv>yep, so sabnzbd will download everything it sees
02:28:26<Larsenv>I'm sure there are people which have downloaded everything, I use eweka personally
02:28:31<Larsenv>they have 14+ year retention
02:29:12<Larsenv>but remember that most of them probably have the par2 to repair em if articles go down
02:29:30<fireonlive>thuban: do you have a usenet setup?
02:29:50<Macsteel>Full hoard is petabytes for sure.
02:29:54<thuban>not at present
02:29:55<@JAA>Needs a total size estimate, but yeah, very unlikely that IA would take this.
02:30:20<fireonlive>kk i'm (attempting) to pull the nzb's contents
02:30:32<fireonlive>can dump those 43GB on IA i suppose
02:30:49<@JAA>Yeah, that sounds fine.
02:30:58<fireonlive>:)
03:10:08<Larsenv>fireonlive I do
03:10:21<Larsenv>I'm not archiving that though, the 43gb is comprised of nzbs
03:22:52<Macsteel>Do you get missing articles on eweka often? I know giganews is practically in bed with california.
03:56:37atphoenix__ (atphoenix) joins
03:57:49atphoenix_ quits [Ping timeout: 255 seconds]
04:07:37katocala joins
04:08:14nertzy quits [Client Quit]
04:53:02Macsteel quits [Changing host]
04:53:02Macsteel (Macsteel) joins
04:57:27<pabs> -rss/#hackernews- Loss of nearly a full decade of information from early days of Chinese internet: https://chinamediaproject.org/2024/05/27/goldfish-memories/ https://news.ycombinator.com/item?id=40546920
05:03:09Macsteel quits [Client Quit]
05:05:24<yzqzss>that's ture
05:05:30<yzqzss>true
05:16:16pixel leaves [Error from remote client]
05:23:15Macsteel (Macsteel) joins
05:28:18<pokechu22>That partially feels like an issue of the metadata used for date filtering not existing back then and things not being smart enough to infer based on page text (probably in addition to actual deletion)
05:36:21<yzqzss>Although the original author is not good at using search engines, the conclusion is still correct
05:47:06<yzqzss>Can be attributed to three reasons (I think): 1. extremely high bandwidth costs 2. Restrictions, censorship, fines and shutdown commands from 🫢 3. Competition from mobile apps
05:52:23<yzqzss>For example, Baidu Tieba (or Baidu Post) mentioned in the article chose to delete all posts before 2017 due to increasingly strict censorship requirements (it is costly to re-review all old posts).
05:54:09<steering>I can't speak to how much worse it is in China, but it's not like that's uncommon in the rest of the world.
05:54:46<steering>It's also costly to maintain those old posts etc.
05:58:50<yzqzss>For reason 1: The general price of most CDNs is currently 200 RMB/TB (30 USD/TB).
06:10:36<yzqzss>Peking University launched the www.infomall.cn web archive project in 2002, but the project was stopped around 2010. (Peking University still keeps these data, about 300TB.)
06:20:33<yzqzss>steering: bad world 😶‍🌫️
06:27:58Macsteel quits [Client Quit]
06:29:07Macsteel (Macsteel) joins
06:40:13shgaqnyrjp quits [Remote host closed the connection]
06:40:57shgaqnyrjp (shgaqnyrjp) joins
06:43:14<h2ibot>Exorcism edited 抽屉新热榜 (-6): https://wiki.archiveteam.org/?diff=52313&oldid=52294
06:49:02tmob_ joins
06:51:31tmob quits [Ping timeout: 255 seconds]
06:52:45tmob_ quits [Read error: Connection reset by peer]
07:05:02Unholy23619246453771 quits [Remote host closed the connection]
07:05:19Macsteel quits [Remote host closed the connection]
07:06:09Unholy23619246453771 (Unholy2361) joins
07:14:45<fireonlive>hopefully the world ends soon
07:20:44Island quits [Read error: Connection reset by peer]
07:34:19pixel (pixel) joins
07:46:43shgaqnyrjp quits [Remote host closed the connection]
07:47:18shgaqnyrjp (shgaqnyrjp) joins
08:00:39pixel leaves
08:00:39pixel (pixel) joins
08:20:37lizardexile quits [Ping timeout: 255 seconds]
08:20:38lizardexile joins
08:30:01shgaqnyrjp quits [Remote host closed the connection]
08:30:48shgaqnyrjp (shgaqnyrjp) joins
08:35:41Pedrosso quits [Remote host closed the connection]
08:35:41TheTechRobo quits [Remote host closed the connection]
08:35:41ScenarioPlanet quits [Remote host closed the connection]
08:36:06Pedrosso joins
08:36:11ScenarioPlanet (ScenarioPlanet) joins
08:36:26TheTechRobo (TheTechRobo) joins
09:00:01Bleo1826007227196 quits [Client Quit]
09:01:23Bleo1826007227196 joins
09:03:08tzt quits [Remote host closed the connection]
09:03:31tzt (tzt) joins
09:23:05Wohlstand quits [Client Quit]
10:23:03loug4 joins
11:07:06immibis quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
11:09:19immibis joins
11:10:21pie_ quits []
11:10:25pie_ joins
11:53:58Megame (Megame) joins
13:38:10sludge quits [Remote host closed the connection]
13:38:27sludge joins
13:38:46magmaus3 quits [Ping timeout: 255 seconds]
13:50:28Arcorann quits [Ping timeout: 255 seconds]
14:20:41GrooveKeeper joins
14:22:01Macsteel (Macsteel) joins
14:25:07<GrooveKeeper>Hi there, Is there any plans to archive mixes db? The website is shutting down at the end of this month? https://www.mixesdb.com/w/MixesDB:Shutdown there are dumps at the bottom. The important part is the info about each mix and possibly the audio as well?
14:30:25Unholy23619246453771 quits [Ping timeout: 265 seconds]
14:37:45<that_lurker>Grabbing the wiki. Needed a bit of investigation so doing it locally. Could maybe be good to run that on AB as well so it will end up in WB
14:43:30<GrooveKeeper>i have got a warrior vm. i am not sure if there are audio files in the dumps, having a look now to see
14:46:01<that_lurker>Good on the site maintainer for porviding dumbs
14:49:08<that_lurker>GrooveKeeper: The audio is most likely in Soundcloud.
14:49:22<GrooveKeeper>im trying to see if i can get jdownloader to grab the 9 part files
14:50:22<that_lurker>oh they have multiple outside sources on the audios. Some are on soundcloud, mixtube and youtube. Most likely others too
14:53:55<GrooveKeeper>that_lurker there are some pages that audio directly on them such as https://www.mixesdb.com/w/2006-08-15_-_Above_%26_Beyond,_Paul_Oakenfold_-_Trance_Around_The_World_126 but looking closely, they apperer to be hosted on archive.org
15:25:02Macsteel quits [Ping timeout: 265 seconds]
15:25:18<GrooveKeeper>this site might be quite easy to archive
15:28:39pixel quits [Client Quit]
15:28:39RealPerson quits [Quit: Gateway shutdown]
15:28:39aninternettroll-xmpp quits [Quit: Gateway shutdown]
15:28:54<that_lurker>Yeah Mediawikis tend to be. I or someone else will run it in archivebot onces the pending queue clears up so the site will be in the wayback machine as well
15:29:22<that_lurker>of course huge thanks also go to the maintainer for releaseing that page with all the links and such
15:30:06aninternettroll-xmpp joins
15:38:23<GrooveKeeper>well i have got a warrior running, i do notice it seams to often do telegram. i am not sure that because that's the highest priory or due to so much to archive?
15:38:33<GrooveKeeper>thank you for the pointers.
15:38:52<GrooveKeeper>a sociality with history, is a society without a future
15:39:07<GrooveKeeper>a sociality without history, is a society without a future
15:41:44Macsteel (Macsteel) joins
15:44:43qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
15:46:06<imer>GrooveKeeper: telegram needs tons of workers due to their rate limiting and there's lot of work, thats why it's usually the auto-choice :)
15:48:14Macsteel quits [Ping timeout: 265 seconds]
15:54:21<GrooveKeeper>ah thank you
15:56:18<that_lurker>Hmm. Mixesdb site is down it seems
16:02:40inedia quits [Quit: WeeChat 4.1.2]
16:04:37inedia (inedia) joins
16:56:31atphoenix_ (atphoenix) joins
16:58:58superkuh_ joins
16:58:58sludge_ joins
16:59:46aninternettroll_ (aninternettroll) joins
17:00:22atphoenix_ quits [Read error: Connection reset by peer]
17:00:23sludge quits [Read error: Connection reset by peer]
17:00:23GrooveKeeper quits [Client Quit]
17:00:23qwertyasdfuiopghjkl quits [Client Quit]
17:00:23rktk quits [Read error: Connection reset by peer]
17:00:58atphoenix_ (atphoenix) joins
17:01:14tertu2 (tertu) joins
17:02:40katocala quits [Ping timeout: 260 seconds]
17:02:40atphoenix__ quits [Ping timeout: 260 seconds]
17:02:40aninternettroll quits [Ping timeout: 260 seconds]
17:02:40aninternettroll_ is now known as aninternettroll
17:03:15rktk (rktk) joins
17:03:25katocala joins
17:05:05superkuh quits [Ping timeout: 265 seconds]
17:05:05tertu quits [Ping timeout: 265 seconds]
17:06:22atphoenix__ (atphoenix) joins
17:10:24atphoenix_ quits [Ping timeout: 265 seconds]
17:13:08qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
17:48:28GrooveKeeper joins
17:57:48Island joins
18:23:40shgaqnyrjp_ (shgaqnyrjp) joins
18:25:37shgaqnyrjp quits [Ping timeout: 250 seconds]
18:30:56shgaqnyrjp_ quits [Remote host closed the connection]
18:44:15<GrooveKeeper>i think mixes db is being hoarded to death which is it seems to be showing 403 forbidden message
18:45:53<that_lurker>I'll check up on it every now and then and start the grab once it becomes stable again.
18:47:13<GrooveKeeper>no worries are you grabbing the dump or are you archiving the pages into wayback?
18:48:22<that_lurker>the page to wb and also if possible the entire wiki with https://github.com/saveweb/wikiteam3 to Internet Archive as well
18:48:24<GrooveKeeper>i think a lot of people are grabbing the dump files, it's funny how a page that had become too complex to maintain or closes due to lack of use, and is then leached to death as soon as they announce closure
18:49:13<that_lurker>Allowing the download of large files witout rate limiting tends to do that
18:49:36<GrooveKeeper>that_lurker so thats saving the web pages onto archive.org?
18:50:37<GrooveKeeper>if that could be added to a warrior with rate limiting, then its something that can be run and if new edits come in, they can get backed up onto archive.org
18:51:17<that_lurker>GrooveKeeper: wikiteam3 is the one that save the wiki to https://archive.org/details/wikiteam But #archivebot is the one that grabs sites to the Wayback Machine
18:52:17<that_lurker>GrooveKeeper: Archivebot can easily handle that site. Warrior would be too many connections and most likely ddos the site.
18:54:10<GrooveKeeper>ah fair play. so instead of using warrior, which i thought was the way websites are crawled and uploaded onto archive.org. You use something else ie #archivebot to save mixes db which is something 1 person can run?
18:56:12<that_lurker>Warrior project are site targeted projects. Archivebot is best explained (at least better than I can) in here https://wiki.archiveteam.org/index.php?title=ArchiveBot
18:58:01<GrooveKeeper>wow, thank you having a read up.
18:58:32<that_lurker>There is a lot of information in that wiki
18:59:06<GrooveKeeper>chears
19:00:08<yzqzss>chouti_comments done !
19:08:20<GrooveKeeper>now i know why people start homelabs. start with a single file server, then build a lowered desktop just for running warrior and archivebot
19:14:11<masterx244|m>Forgot to deploy temp warriors at the GPN in karlsruhe. Perfect internet there (fiber to the table) and each device there gets a public ip (yes, you need to firewall your device yourself, the LAN there is a full part of the internet)
19:17:35that_lurker drools
19:22:29Macsteel (Macsteel) joins
19:31:53tmob joins
19:35:40<masterx244|m>Its a sister event of the well known ccc congress. Ccc in germany = expect better internet than elsewhere in the country
19:37:21<that_lurker>I really need to attend ccc some year
19:37:42<masterx244|m>They get datacenter-grade network setup for a few days pretty quick (and a few years ago when twitcb had a false-positive nipple detection on the revision demoparty they had the streams set up as replacement in 10 minutes (they were quicker than twitch support for a featured event without prior announcement))
19:37:56<that_lurker>Travel to Murican events cost too much, but it's cheap to go to Germany from Finland
19:38:52<Macsteel>https://vc.gg/blog/36c3-staff-assaulted-me-for-political-reasons.html
19:39:01<thuban>"nipple detection" would be a good band name
19:40:39<masterx244|m>that_lurker: And the congress is not as commercial as defcon & friends since the main orgsnizer is a nonprofit (and thats not as easy in germany as in the us)
19:54:43superkuh_ quits [Remote host closed the connection]
20:35:36<yzqzss>arkiver: https://transfer.archivete.am/ErtBA/chouti_links.id.originalUrl.csv.zst
20:37:26<yzqzss>(standard csv format, commas and quotes escaped)
20:38:23<yzqzss>13623632 urls, have a good day :)
20:43:43<fireonlive>yzqzss++
20:43:44<eggdrop>[karma] 'yzqzss' now has 5 karma!
20:52:36shgaqnyrjp_ (shgaqnyrjp) joins
20:52:39leo60228 quits [Quit: ZNC 1.8.2 - https://znc.in]
20:53:19shgaqnyrjp_ is now known as shgaqnyrjp
20:53:26leo60228 (leo60228) joins
20:55:17loug4 quits [Client Quit]
21:10:49Notrealname1234 (Notrealname1234) joins
21:16:01Notrealname1234 quits [Client Quit]
21:16:43GrooveKeeper quits [Client Quit]
21:25:20Notrealname1234 (Notrealname1234) joins
21:46:47Notrealname1234 quits [Client Quit]
22:13:09Megame quits [Client Quit]
22:13:19abirkill- (abirkill) joins
22:14:55abirkill quits [Ping timeout: 255 seconds]
22:14:55abirkill- is now known as abirkill
22:18:13etnguyen03 (etnguyen03) joins
22:18:16<@JAA>'Standard' CSV, good one! :-)
22:22:25<Macsteel>followup on bd25? no bouncer
22:23:40<fireonlive>i have the nzb of the nzbs; attempting to upload it to IA but ran into an issue
22:23:56<fireonlive>well, the content of the nzb of the nzbs
22:24:01<Macsteel>cool
22:25:26<Macsteel>if you mean iso's then it was said IA may reject
22:27:12<Macsteel>thank you for the interest
22:27:35<Macsteel>and bandwidth!
22:30:40midou quits [Ping timeout: 255 seconds]
22:32:28elliewebz quits [Ping timeout: 255 seconds]
22:34:34<fireonlive>:)
22:34:47<fireonlive>just the 7z.### and par2s so far
22:39:47midou joins
22:44:17<Macsteel>rar pw's were 0-999 easy crack if there's no list but don't waste time doing it
22:44:55<Macsteel>*001-999
22:50:46<fireonlive>ah i haven't ventured deeper
22:51:29<fireonlive>do you mean the BD25.part01.rar BD25.part02.rar BD25.part03.rar?
22:51:43<fireonlive>there was g8ted for the initial .7z
22:55:06Notrealname1234 (Notrealname1234) joins
22:55:53Notrealname1234 quits [Client Quit]
22:58:20shgaqnyrjp quits [Remote host closed the connection]
22:59:05shgaqnyrjp (shgaqnyrjp) joins
23:00:41shgaqnyrjp quits [Remote host closed the connection]
23:00:42<Macsteel>no that's after each nzb is fetched individually.
23:00:58<Macsteel>the film itself
23:00:59shgaqnyrjp (shgaqnyrjp) joins
23:02:32elliewebz joins
23:04:13<fireonlive>ahh, they're in passworded rars?
23:04:35<fireonlive>is the password for them in the individual NZBs? or documented somewhere?
23:08:07<fireonlive>going back over my items and updating some metadata i left out...
23:08:08<fireonlive>zzz
23:08:15<fireonlive>🧹
23:09:58<Macsteel>it was in the index's search results
23:10:20BlueMaxima joins
23:10:30<fireonlive>ahh, so potentially different every time?
23:10:39<fireonlive>is there a backup of the passwords anywhere?
23:10:49<Macsteel>001 to 999 consistently
23:16:40<Macsteel>I don't know about a backup if there aint a list in there
23:24:29<fireonlive>oh,
23:24:43<fireonlive>i see what you mean - they were always 3 digits from 001 to 999 for the passwords
23:24:47<fireonlive>gotcha
23:24:48<Macsteel>Correct
23:24:52<fireonlive>:)
23:27:12<Macsteel>1. the big fat NZB with a gorillion NZBs (you are here)
23:27:20<Macsteel>2. each nzb downloads *.rar
23:27:25<Macsteel>3. each rar's pw is 001 to 999
23:30:47<fireonlive>gotcha
23:42:11tmob_ joins
23:45:22tmob quits [Ping timeout: 255 seconds]
23:50:20etnguyen03 quits [Client Quit]
23:51:19shgaqnyrjp_ (shgaqnyrjp) joins
23:51:20shgaqnyrjp quits [Remote host closed the connection]