00:00:38 | | nertzy joins |
00:02:01 | | Earendil7 quits [Ping timeout: 255 seconds] |
00:06:39 | | Arcorann (Arcorann) joins |
00:15:07 | | Earendil7 (Earendil7) joins |
00:20:23 | | davo joins |
00:20:59 | | davo quits [Client Quit] |
01:43:51 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
01:50:54 | | BlueMaxima quits [Read error: Connection reset by peer] |
02:07:43 | | tmob joins |
02:09:35 | | Macsteel (Macsteel) joins |
02:11:48 | <Macsteel> | Hello, I'm here for a suggestion. |
02:18:04 | <thuban> | fire away |
02:18:24 | <Macsteel> | BD25.eu was a rich Usenet index with the largest Bluray ISO catalog. |
02:18:44 | <Macsteel> | The site shut down, but someone made a 42gig archive of just NZBs. |
02:18:58 | <Macsteel> | These NZBs are for all Bluray ISOs. |
02:19:24 | <Macsteel> | While you can consider the site "archived," everything on Usenet is prone to retention/DMCA, like the Blurays... |
02:19:53 | <Macsteel> | Would archiving these ISOs be a project of interest? |
02:22:11 | <thuban> | theoretically yes; in practice, archiveteam data goes to the internet archive, which is itself subject to dmca and as such frowns on gross piracy. |
02:23:48 | <thuban> | (but got a link to the archive?) |
02:23:55 | <Macsteel> | Yeah |
02:24:11 | <fireonlive> | offhand do you know roughly how big the dataset would be? |
02:24:29 | <Macsteel> | 42 gigs beware. All Bluray NZBs inside. |
02:24:36 | <Macsteel> | https://nzbstars.com/?page=getnzb&action=display&messageid=bEs4UlM1Z1dmawemuG1DxAmwi0v%40spot.net |
02:24:58 | <fireonlive> | ah, could also calculate that from the NZBs themselves |
02:25:55 | <Larsenv> | Macsteel isn't that on cabal trackers? |
02:25:57 | <Larsenv> | also nzbstars sucks |
02:27:05 | <Macsteel> | well releases were "bd25", bd50", "bd100etc". Numbers implying the disc size(?) in ISO format. So anywhere from 25 to 100gigs each. |
02:27:31 | <Larsenv> | yeah, I'm aware of the scrape, I bet the nzbs are out there |
02:27:55 | <Larsenv> | afaik if you download a collection of nzbs on sabnzbd it will download every nzb in the file |
02:28:00 | <Macsteel> | All NZBs are within that NZB. lol |
02:28:09 | <Larsenv> | yep, so sabnzbd will download everything it sees |
02:28:26 | <Larsenv> | I'm sure there are people which have downloaded everything, I use eweka personally |
02:28:31 | <Larsenv> | they have 14+ year retention |
02:29:12 | <Larsenv> | but remember that most of them probably have the par2 to repair em if articles go down |
02:29:30 | <fireonlive> | thuban: do you have a usenet setup? |
02:29:50 | <Macsteel> | Full hoard is petabytes for sure. |
02:29:54 | <thuban> | not at present |
02:29:55 | <@JAA> | Needs a total size estimate, but yeah, very unlikely that IA would take this. |
02:30:20 | <fireonlive> | kk i'm (attempting) to pull the nzb's contents |
02:30:32 | <fireonlive> | can dump those 43GB on IA i suppose |
02:30:49 | <@JAA> | Yeah, that sounds fine. |
02:30:58 | <fireonlive> | :) |
03:10:08 | <Larsenv> | fireonlive I do |
03:10:21 | <Larsenv> | I'm not archiving that though, the 43gb is comprised of nzbs |
03:22:52 | <Macsteel> | Do you get missing articles on eweka often? I know giganews is practically in bed with california. |
03:56:37 | | atphoenix__ (atphoenix) joins |
03:57:49 | | atphoenix_ quits [Ping timeout: 255 seconds] |
04:07:37 | | katocala joins |
04:07:37 | | katocala is now authenticated as katocala |
04:08:14 | | nertzy quits [Client Quit] |
04:53:02 | | Macsteel quits [Changing host] |
04:53:02 | | Macsteel (Macsteel) joins |
04:57:27 | <pabs> | -rss/#hackernews- Loss of nearly a full decade of information from early days of Chinese internet: https://chinamediaproject.org/2024/05/27/goldfish-memories/ https://news.ycombinator.com/item?id=40546920 |
05:03:09 | | Macsteel quits [Client Quit] |
05:05:24 | <yzqzss> | that's ture |
05:05:30 | <yzqzss> | true |
05:16:16 | | pixel leaves [Error from remote client] |
05:23:15 | | Macsteel (Macsteel) joins |
05:28:18 | <pokechu22> | That partially feels like an issue of the metadata used for date filtering not existing back then and things not being smart enough to infer based on page text (probably in addition to actual deletion) |
05:36:21 | <yzqzss> | Although the original author is not good at using search engines, the conclusion is still correct |
05:47:06 | <yzqzss> | Can be attributed to three reasons (I think): 1. extremely high bandwidth costs 2. Restrictions, censorship, fines and shutdown commands from 🫢 3. Competition from mobile apps |
05:52:23 | <yzqzss> | For example, Baidu Tieba (or Baidu Post) mentioned in the article chose to delete all posts before 2017 due to increasingly strict censorship requirements (it is costly to re-review all old posts). |
05:54:09 | <steering> | I can't speak to how much worse it is in China, but it's not like that's uncommon in the rest of the world. |
05:54:46 | <steering> | It's also costly to maintain those old posts etc. |
05:58:50 | <yzqzss> | For reason 1: The general price of most CDNs is currently 200 RMB/TB (30 USD/TB). |
06:10:36 | <yzqzss> | Peking University launched the www.infomall.cn web archive project in 2002, but the project was stopped around 2010. (Peking University still keeps these data, about 300TB.) |
06:20:33 | <yzqzss> | steering: bad world 😶🌫️ |
06:27:58 | | Macsteel quits [Client Quit] |
06:29:07 | | Macsteel (Macsteel) joins |
06:40:13 | | shgaqnyrjp quits [Remote host closed the connection] |
06:40:57 | | shgaqnyrjp (shgaqnyrjp) joins |
06:43:14 | <h2ibot> | Exorcism edited 抽屉新热榜 (-6): https://wiki.archiveteam.org/?diff=52313&oldid=52294 |
06:49:02 | | tmob_ joins |
06:51:31 | | tmob quits [Ping timeout: 255 seconds] |
06:52:45 | | tmob_ quits [Read error: Connection reset by peer] |
07:05:02 | | Unholy23619246453771 quits [Remote host closed the connection] |
07:05:19 | | Macsteel quits [Remote host closed the connection] |
07:06:09 | | Unholy23619246453771 (Unholy2361) joins |
07:14:45 | <fireonlive> | hopefully the world ends soon |
07:20:44 | | Island quits [Read error: Connection reset by peer] |
07:34:19 | | pixel (pixel) joins |
07:46:43 | | shgaqnyrjp quits [Remote host closed the connection] |
07:47:18 | | shgaqnyrjp (shgaqnyrjp) joins |
08:00:39 | | pixel leaves |
08:00:39 | | pixel (pixel) joins |
08:20:37 | | lizardexile quits [Ping timeout: 255 seconds] |
08:20:38 | | lizardexile joins |
08:30:01 | | shgaqnyrjp quits [Remote host closed the connection] |
08:30:48 | | shgaqnyrjp (shgaqnyrjp) joins |
08:35:41 | | Pedrosso quits [Remote host closed the connection] |
08:35:41 | | TheTechRobo quits [Remote host closed the connection] |
08:35:41 | | ScenarioPlanet quits [Remote host closed the connection] |
08:36:06 | | Pedrosso joins |
08:36:11 | | ScenarioPlanet (ScenarioPlanet) joins |
08:36:26 | | TheTechRobo (TheTechRobo) joins |
09:00:01 | | Bleo1826007227196 quits [Client Quit] |
09:01:23 | | Bleo1826007227196 joins |
09:03:08 | | tzt quits [Remote host closed the connection] |
09:03:31 | | tzt (tzt) joins |
09:23:05 | | Wohlstand quits [Client Quit] |
10:23:03 | | loug4 joins |
11:07:06 | | immibis quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
11:09:19 | | immibis joins |
11:09:19 | | immibis is now authenticated as immibis |
11:10:21 | | pie_ quits [] |
11:10:25 | | pie_ joins |
11:53:58 | | Megame (Megame) joins |
13:38:10 | | sludge quits [Remote host closed the connection] |
13:38:27 | | sludge joins |
13:38:46 | | magmaus3 quits [Ping timeout: 255 seconds] |
13:50:28 | | Arcorann quits [Ping timeout: 255 seconds] |
14:20:41 | | GrooveKeeper joins |
14:22:01 | | Macsteel (Macsteel) joins |
14:25:07 | <GrooveKeeper> | Hi there, Is there any plans to archive mixes db? The website is shutting down at the end of this month? https://www.mixesdb.com/w/MixesDB:Shutdown there are dumps at the bottom. The important part is the info about each mix and possibly the audio as well? |
14:30:25 | | Unholy23619246453771 quits [Ping timeout: 265 seconds] |
14:37:45 | <that_lurker> | Grabbing the wiki. Needed a bit of investigation so doing it locally. Could maybe be good to run that on AB as well so it will end up in WB |
14:43:30 | <GrooveKeeper> | i have got a warrior vm. i am not sure if there are audio files in the dumps, having a look now to see |
14:46:01 | <that_lurker> | Good on the site maintainer for porviding dumbs |
14:49:08 | <that_lurker> | GrooveKeeper: The audio is most likely in Soundcloud. |
14:49:22 | <GrooveKeeper> | im trying to see if i can get jdownloader to grab the 9 part files |
14:50:22 | <that_lurker> | oh they have multiple outside sources on the audios. Some are on soundcloud, mixtube and youtube. Most likely others too |
14:53:55 | <GrooveKeeper> | that_lurker there are some pages that audio directly on them such as https://www.mixesdb.com/w/2006-08-15_-_Above_%26_Beyond,_Paul_Oakenfold_-_Trance_Around_The_World_126 but looking closely, they apperer to be hosted on archive.org |
15:25:02 | | Macsteel quits [Ping timeout: 265 seconds] |
15:25:18 | <GrooveKeeper> | this site might be quite easy to archive |
15:28:39 | | pixel quits [Client Quit] |
15:28:39 | | RealPerson quits [Quit: Gateway shutdown] |
15:28:39 | | aninternettroll-xmpp quits [Quit: Gateway shutdown] |
15:28:54 | <that_lurker> | Yeah Mediawikis tend to be. I or someone else will run it in archivebot onces the pending queue clears up so the site will be in the wayback machine as well |
15:29:22 | <that_lurker> | of course huge thanks also go to the maintainer for releaseing that page with all the links and such |
15:30:06 | | aninternettroll-xmpp joins |
15:38:23 | <GrooveKeeper> | well i have got a warrior running, i do notice it seams to often do telegram. i am not sure that because that's the highest priory or due to so much to archive? |
15:38:33 | <GrooveKeeper> | thank you for the pointers. |
15:38:52 | <GrooveKeeper> | a sociality with history, is a society without a future |
15:39:07 | <GrooveKeeper> | a sociality without history, is a society without a future |
15:41:44 | | Macsteel (Macsteel) joins |
15:44:43 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
15:46:06 | <imer> | GrooveKeeper: telegram needs tons of workers due to their rate limiting and there's lot of work, thats why it's usually the auto-choice :) |
15:48:14 | | Macsteel quits [Ping timeout: 265 seconds] |
15:54:21 | <GrooveKeeper> | ah thank you |
15:56:18 | <that_lurker> | Hmm. Mixesdb site is down it seems |
16:02:40 | | inedia quits [Quit: WeeChat 4.1.2] |
16:04:37 | | inedia (inedia) joins |
16:56:31 | | atphoenix_ (atphoenix) joins |
16:58:58 | | superkuh_ joins |
16:58:58 | | sludge_ joins |
16:59:46 | | aninternettroll_ (aninternettroll) joins |
17:00:22 | | atphoenix_ quits [Read error: Connection reset by peer] |
17:00:23 | | sludge quits [Read error: Connection reset by peer] |
17:00:23 | | GrooveKeeper quits [Client Quit] |
17:00:23 | | qwertyasdfuiopghjkl quits [Client Quit] |
17:00:23 | | rktk quits [Read error: Connection reset by peer] |
17:00:58 | | atphoenix_ (atphoenix) joins |
17:01:14 | | tertu2 (tertu) joins |
17:02:40 | | katocala quits [Ping timeout: 260 seconds] |
17:02:40 | | atphoenix__ quits [Ping timeout: 260 seconds] |
17:02:40 | | aninternettroll quits [Ping timeout: 260 seconds] |
17:02:40 | | aninternettroll_ is now known as aninternettroll |
17:03:15 | | rktk (rktk) joins |
17:03:25 | | katocala joins |
17:03:26 | | katocala is now authenticated as katocala |
17:05:05 | | superkuh quits [Ping timeout: 265 seconds] |
17:05:05 | | tertu quits [Ping timeout: 265 seconds] |
17:06:22 | | atphoenix__ (atphoenix) joins |
17:10:24 | | atphoenix_ quits [Ping timeout: 265 seconds] |
17:13:08 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
17:48:28 | | GrooveKeeper joins |
17:57:48 | | Island joins |
18:23:40 | | shgaqnyrjp_ (shgaqnyrjp) joins |
18:25:37 | | shgaqnyrjp quits [Ping timeout: 250 seconds] |
18:30:56 | | shgaqnyrjp_ quits [Remote host closed the connection] |
18:44:15 | <GrooveKeeper> | i think mixes db is being hoarded to death which is it seems to be showing 403 forbidden message |
18:45:53 | <that_lurker> | I'll check up on it every now and then and start the grab once it becomes stable again. |
18:47:13 | <GrooveKeeper> | no worries are you grabbing the dump or are you archiving the pages into wayback? |
18:48:22 | <that_lurker> | the page to wb and also if possible the entire wiki with https://github.com/saveweb/wikiteam3 to Internet Archive as well |
18:48:24 | <GrooveKeeper> | i think a lot of people are grabbing the dump files, it's funny how a page that had become too complex to maintain or closes due to lack of use, and is then leached to death as soon as they announce closure |
18:49:13 | <that_lurker> | Allowing the download of large files witout rate limiting tends to do that |
18:49:36 | <GrooveKeeper> | that_lurker so thats saving the web pages onto archive.org? |
18:50:37 | <GrooveKeeper> | if that could be added to a warrior with rate limiting, then its something that can be run and if new edits come in, they can get backed up onto archive.org |
18:51:17 | <that_lurker> | GrooveKeeper: wikiteam3 is the one that save the wiki to https://archive.org/details/wikiteam But #archivebot is the one that grabs sites to the Wayback Machine |
18:52:17 | <that_lurker> | GrooveKeeper: Archivebot can easily handle that site. Warrior would be too many connections and most likely ddos the site. |
18:54:10 | <GrooveKeeper> | ah fair play. so instead of using warrior, which i thought was the way websites are crawled and uploaded onto archive.org. You use something else ie #archivebot to save mixes db which is something 1 person can run? |
18:56:12 | <that_lurker> | Warrior project are site targeted projects. Archivebot is best explained (at least better than I can) in here https://wiki.archiveteam.org/index.php?title=ArchiveBot |
18:58:01 | <GrooveKeeper> | wow, thank you having a read up. |
18:58:32 | <that_lurker> | There is a lot of information in that wiki |
18:59:06 | <GrooveKeeper> | chears |
19:00:08 | <yzqzss> | chouti_comments done ! |
19:08:20 | <GrooveKeeper> | now i know why people start homelabs. start with a single file server, then build a lowered desktop just for running warrior and archivebot |
19:14:11 | <masterx244|m> | Forgot to deploy temp warriors at the GPN in karlsruhe. Perfect internet there (fiber to the table) and each device there gets a public ip (yes, you need to firewall your device yourself, the LAN there is a full part of the internet) |
19:17:35 | | that_lurker drools |
19:22:29 | | Macsteel (Macsteel) joins |
19:31:53 | | tmob joins |
19:35:40 | <masterx244|m> | Its a sister event of the well known ccc congress. Ccc in germany = expect better internet than elsewhere in the country |
19:37:21 | <that_lurker> | I really need to attend ccc some year |
19:37:42 | <masterx244|m> | They get datacenter-grade network setup for a few days pretty quick (and a few years ago when twitcb had a false-positive nipple detection on the revision demoparty they had the streams set up as replacement in 10 minutes (they were quicker than twitch support for a featured event without prior announcement)) |
19:37:56 | <that_lurker> | Travel to Murican events cost too much, but it's cheap to go to Germany from Finland |
19:38:52 | <Macsteel> | https://vc.gg/blog/36c3-staff-assaulted-me-for-political-reasons.html |
19:39:01 | <thuban> | "nipple detection" would be a good band name |
19:40:39 | <masterx244|m> | that_lurker: And the congress is not as commercial as defcon & friends since the main orgsnizer is a nonprofit (and thats not as easy in germany as in the us) |
19:54:43 | | superkuh_ quits [Remote host closed the connection] |
20:35:36 | <yzqzss> | arkiver: https://transfer.archivete.am/ErtBA/chouti_links.id.originalUrl.csv.zst |
20:37:26 | <yzqzss> | (standard csv format, commas and quotes escaped) |
20:38:23 | <yzqzss> | 13623632 urls, have a good day :) |
20:43:43 | <fireonlive> | yzqzss++ |
20:43:44 | <eggdrop> | [karma] 'yzqzss' now has 5 karma! |
20:52:36 | | shgaqnyrjp_ (shgaqnyrjp) joins |
20:52:39 | | leo60228 quits [Quit: ZNC 1.8.2 - https://znc.in] |
20:53:19 | | shgaqnyrjp_ is now known as shgaqnyrjp |
20:53:26 | | leo60228 (leo60228) joins |
20:55:17 | | loug4 quits [Client Quit] |
21:10:49 | | Notrealname1234 (Notrealname1234) joins |
21:16:01 | | Notrealname1234 quits [Client Quit] |
21:16:43 | | GrooveKeeper quits [Client Quit] |
21:25:20 | | Notrealname1234 (Notrealname1234) joins |
21:46:47 | | Notrealname1234 quits [Client Quit] |
22:13:09 | | Megame quits [Client Quit] |
22:13:19 | | abirkill- (abirkill) joins |
22:14:55 | | abirkill quits [Ping timeout: 255 seconds] |
22:14:55 | | abirkill- is now known as abirkill |
22:18:13 | | etnguyen03 (etnguyen03) joins |
22:18:16 | <@JAA> | 'Standard' CSV, good one! :-) |
22:22:25 | <Macsteel> | followup on bd25? no bouncer |
22:23:40 | <fireonlive> | i have the nzb of the nzbs; attempting to upload it to IA but ran into an issue |
22:23:56 | <fireonlive> | well, the content of the nzb of the nzbs |
22:24:01 | <Macsteel> | cool |
22:25:26 | <Macsteel> | if you mean iso's then it was said IA may reject |
22:27:12 | <Macsteel> | thank you for the interest |
22:27:35 | <Macsteel> | and bandwidth! |
22:30:40 | | midou quits [Ping timeout: 255 seconds] |
22:32:28 | | elliewebz quits [Ping timeout: 255 seconds] |
22:34:34 | <fireonlive> | :) |
22:34:47 | <fireonlive> | just the 7z.### and par2s so far |
22:39:47 | | midou joins |
22:44:17 | <Macsteel> | rar pw's were 0-999 easy crack if there's no list but don't waste time doing it |
22:44:55 | <Macsteel> | *001-999 |
22:50:46 | <fireonlive> | ah i haven't ventured deeper |
22:51:29 | <fireonlive> | do you mean the BD25.part01.rar BD25.part02.rar BD25.part03.rar? |
22:51:43 | <fireonlive> | there was g8ted for the initial .7z |
22:55:06 | | Notrealname1234 (Notrealname1234) joins |
22:55:53 | | Notrealname1234 quits [Client Quit] |
22:58:20 | | shgaqnyrjp quits [Remote host closed the connection] |
22:59:05 | | shgaqnyrjp (shgaqnyrjp) joins |
23:00:41 | | shgaqnyrjp quits [Remote host closed the connection] |
23:00:42 | <Macsteel> | no that's after each nzb is fetched individually. |
23:00:58 | <Macsteel> | the film itself |
23:00:59 | | shgaqnyrjp (shgaqnyrjp) joins |
23:02:32 | | elliewebz joins |
23:04:13 | <fireonlive> | ahh, they're in passworded rars? |
23:04:35 | <fireonlive> | is the password for them in the individual NZBs? or documented somewhere? |
23:08:07 | <fireonlive> | going back over my items and updating some metadata i left out... |
23:08:08 | <fireonlive> | zzz |
23:08:15 | <fireonlive> | 🧹 |
23:09:58 | <Macsteel> | it was in the index's search results |
23:10:20 | | BlueMaxima joins |
23:10:30 | <fireonlive> | ahh, so potentially different every time? |
23:10:39 | <fireonlive> | is there a backup of the passwords anywhere? |
23:10:49 | <Macsteel> | 001 to 999 consistently |
23:16:40 | <Macsteel> | I don't know about a backup if there aint a list in there |
23:24:29 | <fireonlive> | oh, |
23:24:43 | <fireonlive> | i see what you mean - they were always 3 digits from 001 to 999 for the passwords |
23:24:47 | <fireonlive> | gotcha |
23:24:48 | <Macsteel> | Correct |
23:24:52 | <fireonlive> | :) |
23:27:12 | <Macsteel> | 1. the big fat NZB with a gorillion NZBs (you are here) |
23:27:20 | <Macsteel> | 2. each nzb downloads *.rar |
23:27:25 | <Macsteel> | 3. each rar's pw is 001 to 999 |
23:30:47 | <fireonlive> | gotcha |
23:42:11 | | tmob_ joins |
23:45:22 | | tmob quits [Ping timeout: 255 seconds] |
23:50:20 | | etnguyen03 quits [Client Quit] |
23:51:19 | | shgaqnyrjp_ (shgaqnyrjp) joins |
23:51:20 | | shgaqnyrjp quits [Remote host closed the connection] |