00:00:04user_ joins
00:42:09BlueMaxima joins
00:46:35sonick quits [Client Quit]
00:47:30AlsoHP_Archivist joins
00:47:50AlsoHP_Archivist quits [Remote host closed the connection]
01:00:30<h2ibot>JAABot edited CurrentWarriorProject (-6): https://wiki.archiveteam.org/?diff=49708&oldid=49666
01:11:13Arcorann (Arcorann) joins
01:14:32<h2ibot>Pokechu22 edited Enjin (+151, in progress): https://wiki.archiveteam.org/?diff=49709&oldid=49602
01:16:45Guest50 joins
01:31:55michaelblob quits [Read error: Connection reset by peer]
01:35:55michaelblob (michaelblob) joins
01:37:53lukash799 is now known as lukash
01:48:06lukash quits [Quit: The Lounge - https://thelounge.chat]
01:48:28lukash joins
01:54:10HP_Archivist quits [Client Quit]
01:58:28Guest50 quits [Client Quit]
02:11:41user_ quits [Read error: Connection reset by peer]
02:12:02user_ joins
02:14:29za3k quits [Ping timeout: 265 seconds]
02:34:42tbc1887 (tbc1887) joins
03:08:25tbc1887_ joins
03:11:22tbc1887 quits [Ping timeout: 252 seconds]
03:50:40DiscantX quits [Ping timeout: 265 seconds]
03:54:22DiscantX joins
04:10:43eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
04:11:21eroc1990 (eroc1990) joins
04:12:25eroc1990 quits [Client Quit]
04:13:00eroc1990 (eroc1990) joins
04:29:01fishingforsoup_ quits [Read error: Connection reset by peer]
04:29:34tbc1887 (tbc1887) joins
04:31:40tbc1887_ quits [Ping timeout: 252 seconds]
04:32:11Island quits [Read error: Connection reset by peer]
04:33:57pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
04:36:15pabs (pabs) joins
04:46:45Ivan226 joins
04:47:23Ivan226 leaves
04:47:33Ivan226 joins
05:36:18<pabs>company acquired: https://www.sri.com/press/press-release/the-palo-alto-research-center-parc-will-join-sri-international/ https://news.ycombinator.com/item?id=35693306
06:27:24bilboed quits [Quit: The Lounge - https://thelounge.chat]
06:27:47bilboed joins
06:30:10DiscantX quits [Ping timeout: 265 seconds]
06:47:23spirit joins
07:05:57Wolf480pl joins
07:22:55<Wolf480pl>Hello, I know a site - sunsite.icm.edu.pl - that has a lot of software mirrored from other places, mostly source tarballs, including some old versions that are no longer available upstream. They claim they have 140TB of stuff in total. I have no reason to believe it'll go away any time soon, but according to wikipedia there used to a lot of sunsites and now there's only 3 left and the other 2 appear to not have any content anymore. How
07:22:55<Wolf480pl>would one go about backing it up?
07:33:23lexikiq quits [Client Quit]
07:51:21BlueMaxima quits [Read error: Connection reset by peer]
08:32:34TastyWiener95 quits [Ping timeout: 252 seconds]
08:39:25user_ quits [Read error: Connection reset by peer]
08:39:51user_ joins
08:41:00user_ quits [Remote host closed the connection]
08:41:19user_ joins
08:56:42user__ joins
09:00:00user_ quits [Ping timeout: 265 seconds]
09:23:13<@JAA>Wolf480pl: Interesting, thanks. I've been meaning to launch a project for this kind of thing, but it depends on software that isn't completed yet. There's a fair amount of duplication (e.g. http://sunsite.icm.edu.pl/debian/ == http://sunsite.icm.edu.pl/Linux/dist/debian/) and the whole thing is also available under http://ftp.icm.edu.pl/ so deduping is required. I'll put it on my list to run when the
09:23:19<@JAA>software is ready, although I have no idea at this time when that will be.
09:24:37<Wolf480pl>I also saw it on #effteepee 's list of ftp sites to archive, but I have no idea if they got it...
09:29:23<@JAA>Yeah, that project has been dead for years, so I have no idea either.
09:30:34<Wolf480pl>ok, thanks for the answers, and glad to hear the site is on your radar!
09:32:05dan_a quits [Client Quit]
09:35:25dan_a (dan_a) joins
09:42:29Wolf480pl leaves
09:49:52LeGoupil joins
10:12:59JackThompson3 quits [Ping timeout: 265 seconds]
10:20:16JackThompson3 joins
10:35:13Ruthalas5 quits [Ping timeout: 265 seconds]
10:37:12tbc1887 quits [Read error: Connection reset by peer]
10:37:22Ruthalas5 (Ruthalas) joins
12:13:20JackThompson3 quits [Ping timeout: 265 seconds]
12:16:18JackThompson3 joins
12:17:23blackdoomer joins
12:17:38<blackdoomer>Hi everyone. I want to draw your attention to one file hosting, apparently dying. Could someone look after Host-A.net? If it has not died already, of course, because now it does not open for me (I'm from Russia). At one time it was very popular, for example, among the gmc.yoyogames.com community.
12:19:37<blackdoomer>Last wayback snapshot wad December 10: http://web.archive.org/web/20221210104314/https://host-a.net/
12:19:45<blackdoomer>was*
12:44:18user__ quits [Read error: Connection reset by peer]
12:44:40user__ joins
12:50:52user__ quits [Read error: Connection reset by peer]
12:51:13user__ joins
12:51:37user__ quits [Read error: Connection reset by peer]
12:51:59user__ joins
12:52:01user__ quits [Read error: Connection reset by peer]
12:52:19user__ joins
12:57:04<pabs>zero ArchiveBot coverage: https://archive.fart.website/archivebot/viewer/?q=host-a.net
12:57:20<pabs>and the site is currently down: No route to host
13:26:56<blackdoomer>whois tells that domain is still occupied, so le't hope it will reappear soon
13:29:58<pabs>hmm, mtr/ping to the host work, but not wget
13:36:40<masterX244>Noticed another annyoing imagehoster: bashify.io links get zapped there when they had no view for a month. I wonder how many of those we got so far in the outlinks of other project (spotted that hoster in a reddit comment)
13:36:43Arcorann quits [Ping timeout: 252 seconds]
13:45:05<@Sanqui>project idea (for somebody with too much free time): archive every kickstarter project as it comes in
14:17:30JackThompson37 joins
14:20:54JackThompson3 quits [Ping timeout: 252 seconds]
14:20:54JackThompson37 is now known as JackThompson3
14:34:28Eder joins
14:37:53<Eder>Hello everyone, where can I suggest an website to be crawled by the ArchiveTeam?
14:41:14Ivan226 quits [Ping timeout: 265 seconds]
14:41:43programmerq quits [Ping timeout: 265 seconds]
14:41:51qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
14:52:28<Maakuth|m>Hello, for example here 🙃
14:54:08<@arkiver>Eder: ^
14:55:43hitgrr8 joins
14:58:49<Eder>Thanks! What about archive mersenne.org (or mersenne.ca)? It's probably the biggest distributed computing project. Its search for mersenne primes and its factors. There are reports and a big amount of information. It's not in danger of ending, but if someday that happens, its a very bad loss
15:20:00icedice quits [Changing host]
15:20:00icedice (icedice) joins
15:23:17icedice quits [Client Quit]
15:23:33icedice joins
15:26:50icedice quits [Client Quit]
15:27:06icedice joins
15:28:42icedice quits [Client Quit]
15:32:32LeGoupil quits [Client Quit]
15:37:43Island joins
15:44:37VerifiedJ quits [Client Quit]
15:45:05VerifiedJ (VerifiedJ) joins
15:51:56icedice joins
15:52:54icedice quits [Changing host]
15:52:54icedice (icedice) joins
15:55:20Eder quits [Remote host closed the connection]
15:56:35lunik173 quits [Remote host closed the connection]
16:06:07lunik173 joins
16:13:44za3k joins
16:30:14<icedice>What's the link to the ArchiveBot v3 dashboard again?
16:30:25<AK>http://archivebot.com/3?showNicks=1
16:30:27<icedice>It's been years since I last used it, so I forgot
16:30:29<icedice>Thanks!
16:30:43<AK>(Or without the 3 for the old school one)
16:31:20<icedice>The new one is objectively better
16:31:42<icedice>It properly shows only my archivation jobs when I search my username
16:31:57<icedice>(So I prefer that one)
16:32:46<AK>I use a mix of both, old has the right click to generate ignores which is cool though
16:33:20<icedice>Ah, that's nice
16:33:43<icedice>I had forgot about that
16:34:57<@JAA>Old one also has keyboard shortcuts for switching through jobs, and the filter is superior.
16:38:07<@JAA>Those three features are why I use the stable dashboard exclusively.
16:38:45<icedice>I couldn't get the old one to only show my archivation jobs when searching my name though
16:39:11<icedice>But I'm guessing it has better filtering for other things then
16:39:12<@JAA>Yeah, it only filters by URL, but it supports regex while the beta one doesn't IIRC.
16:39:18<icedice>Ok
16:39:34<@JAA>My little-things repo has an archivebot-jobs script which can generate the regex for watching your jobs as a workaround.
16:40:26<@JAA>`archivebot-jobs --filter user^JAA --mode dashboard-regex`
16:49:00<icedice>Nice
16:49:03<icedice>Thanks!
16:49:23spirit quits [Client Quit]
16:49:31<icedice>Can you link to that repo as well as the ignore sets repo so that I can bookmark them?
16:58:43TastyWiener95 (TastyWiener95) joins
16:59:43<pokechu22>https://github.com/JustAnotherArchivist/little-things and https://github.com/ArchiveTeam/ArchiveBot/tree/master/db/ignore_patterns
17:01:45<AK>https://gitea.arpa.li/JustAnotherArchivist/little-things
17:02:03<AK>I would use that for JA_As stuff, it's much more likely to be updated
17:02:41<@JAA>Yep, the GitHub repo is outdated and obsolete (cf. description).
17:04:18<icedice>All right, thanks!
17:13:22myself0 (myself) joins
17:13:41myself quits [Read error: Connection reset by peer]
17:13:41myself0 is now known as myself
17:15:42user__ quits [Read error: Connection reset by peer]
17:15:56user__ joins
17:16:17user__ quits [Remote host closed the connection]
17:16:39user__ joins
17:17:09user__ quits [Read error: Connection reset by peer]
17:21:02umgr036 joins
17:21:43umgr036 quits [Read error: Connection reset by peer]
17:22:02umgr036 joins
17:37:47michaelblob quits [Read error: Connection reset by peer]
17:43:44michaelblob (michaelblob) joins
18:00:51ThreeHM quits [Ping timeout: 265 seconds]
18:10:36ThreeHM (ThreeHeadedMonkey) joins
18:44:21blackdoomer quits [Ping timeout: 265 seconds]
18:54:48umgr036 quits [Read error: Connection reset by peer]
18:55:06umgr036 joins
19:28:18umgr036 quits [Read error: Connection reset by peer]
19:50:25jtagcat quits [Quit: Bye!]
19:51:11jtagcat (jtagcat) joins
20:39:18TastyWiener95 quits [Ping timeout: 252 seconds]
20:58:28umgr036 joins
20:58:36umgr036 quits [Client Quit]
20:58:48umgr036 joins
21:06:57user_ joins
21:07:20user_ quits [Client Quit]
21:07:55user_ joins
21:11:42dumbgoy joins
21:30:49hitgrr8 quits [Client Quit]
21:36:34Apollo joins
21:36:53<Apollo>Hey all
21:37:27<Apollo>Just made a reddit post detailing my struggles but here's a repaste of it:
21:37:54<Apollo>I'm trying to find a flac download of this one soundcloud song, time (if and when) by Todd Siesel that was on TAU, but I'm new to here and I have no god damn clue on how any of these files and whatnot work.
21:48:24<Apollo>Lol I'm praying it's here somewhere
21:48:36<Apollo>I've been trying to find a high quality file of it for years
21:49:21<pokechu22>What's TAU?
21:49:26<Apollo>The Artist Union
21:50:02<pokechu22>ah, https://wiki.archiveteam.org/index.php/The_Artist_Union
21:50:33<Apollo>pokechu, you haven't begun to feel your disappointment for me yet lol
21:51:03<Apollo>clicked on the link mentioning data and found this: https://archive.org/details/archiveteam_theartistunion?tab=collection
21:51:18<Apollo>but Im definitely confused in how to interact with this here
21:51:38<pokechu22>https://web.archive.org/ is the easiest way to interact with it - that collection is the backend WARC data that web.archive.org shows
21:52:16<Apollo>trying to get to here: https://theartistunion.com/tracks/9f3dea
21:52:41<Apollo>just heading to that page on wayback failed me lol
21:53:13<Apollo>wait no nvm maybe I'm dumb here
21:53:23<pokechu22>hmm, yeah, I see a capture from archiveteam_theartistunion at https://web.archive.org/web/20190501000000*/https://theartistunion.com/tracks/9f3dea
21:54:50<pokechu22>It seems like you can at least play it via that, though the download button wants you to log into soundcloud which doesn't seem likely to work
21:54:55<Apollo>Oh god please tell me that I can still download the track and it's not locked behind a login
21:54:57<Apollo>oh god it is
21:55:17<pokechu22>It may have still been downloaded, hmm
21:55:52<pokechu22>Yeah, based on https://github.com/ArchiveTeam/theartistunion-grab/commit/b58019be9a52b62046502ca1a04137a9bafbac18 it looks like the project did try to download stuff...
21:57:37<pokechu22>... but it's post-based, so it's not going to play back on web.archive.org
21:58:05<Apollo>It's still available on Soundcloud itself but SoundCloud itself won't stream lossless, hence my search for this file from au
21:58:06<Apollo>hm
22:00:15<Apollo>Is it still possible that the file could have been downloaded via the crawler to archive.org or is my search not fruitful at all lol
22:00:22<pokechu22>Hmm, the archive.org capture is on July 13 2019 and it looks like the logic to download was added on September 2 2019. So it may or may not be saved. They could have re-queued stuff to download it afterwards but I don't know if they actually did
22:00:32<pokechu22>It's still possible, but it's probably not easy to access it
22:01:13<@JAA>Not only is it POST-based, that project was before we had the necessary POST support in wget-at, so the association between track pages and files is lost.
22:02:17<Apollo>ah shit
22:02:31<pokechu22>Ah, so the POST request isn't captured/saved in the WARC at all, so if the file itself was downloaded it's just buried somewhere with no indication as to what it is :|
22:02:47<Apollo>jesus
22:02:49<@JAA>Checking
22:03:20<Apollo>any kind of flac/wav/alac would be a likely candidate
22:03:51<Apollo>But considering it's buried somewhere, how buried would it be lol
22:04:16<Apollo>Obviously tried the standard tactic of reaching out to the creator to see if they could provide the flac but this track isn't on their bandcamp
22:06:42<@JAA>Here it is: https://web.archive.org/web/20190713030611/https://d2tml28x3t0b85.cloudfront.net/tracks/original_files/000/509/784/original/time%20(if%20and%20when).wav
22:06:54<Apollo>NO FUCKING WAY
22:07:05<Apollo>lemme check lol
22:07:25<Apollo>def the track but making sure it's actual lossless and shit lol
22:07:47<Apollo>bitrate is 1411 kbps
22:07:48<Apollo>yep
22:07:52<pokechu22>Huh, that's from July - I guess the downloads were being processed in a different, non-POST based way before then?
22:09:02<Apollo>However you were able to find that, it would definitely be great to share this detail on wiki or something but nevertheless, thank you so much
22:09:08<Apollo>:]
22:09:28<@JAA>Bitrate alone doesn't tell you much, but yeah, it's the only lossless file format there is, anyway.
22:09:58<Apollo>Eh
22:09:59<@JAA>I looked at the WARC contents (via the CDX file) around where the track page was fetched.
22:10:09<Apollo>Wav is uncompressed, FLAC or ALAC would be lossless
22:10:15<Apollo>since those both compress
22:10:28<@JAA>WAV is also a lossless audio format, just uncompressed.
22:10:36<Apollo>yeah
22:11:07<@JAA>But if someone converts an MP3 to WAV/FLAC/ALAC/whatever, you can't directly tell. You'd need to do a frequency analysis etc.
22:11:07<Apollo>but if there is something to be lost/not lost, it implies some compression is done
22:11:11<Apollo>yeah
22:11:13<Apollo>running Spek rn
22:11:46<@JAA>The only other audio file we have for that song is this MP3: https://web.archive.org/web/20190713030607/https://d2tml28x3t0b85.cloudfront.net/tracks/stream_files/000/509/784/original/time%20(if%20and%20when).mp3?1492833816
22:12:09<TheTechRobo>MP3 is fairly easy to find, I'll have to check my code
22:12:10<Apollo>oh perfect
22:12:19<Apollo>I can run a DeltaWave on it
22:12:27dumbgoy quits [Read error: Connection reset by peer]
22:13:26dumbgoy joins
22:13:40<TheTechRobo>Looks like my old shit tool finds the mp3 by just getting the data from
22:13:42<TheTechRobo>http://web.archive.org/web/20190810013707if_/https://theartistunion.com/api/v3/tracks/<ID>.json
22:13:54<TheTechRobo>where <ID> is the identifier, ie in this case 9f3dea
22:14:15<Apollo>tis based
22:14:26<TheTechRobo>That doesn't give you the lossless one, though
22:14:27<Apollo>also, spek has been run, wav goes up to 22khz as it should
22:14:38<Apollo>oh right lol
22:15:09<Apollo>hopefully there's a simple way to just get the lossless variant so both could be put up on the wiki for future ref or smth like that
22:15:25<TheTechRobo>Peak web design: https://tau.thetechrobo.ca/ :-)
22:15:31<TheTechRobo>Also invalid HTML, but oh well.
22:15:46<Apollo>gasp
22:15:56<@JAA>There isn't a simple way, you have to look at the CDX or WARC I believe.
22:16:21<TheTechRobo>Probably you could index the CDX and search for stuff in the mp3 filename
22:16:22<@JAA>You might be able to transform the MP3 URL into a lookup for the original_files, but I'm not sure that's reliable.
22:16:43<Apollo>doesn't seem to work for me TheTechRobo
22:16:53<Apollo>the identifier
22:16:54<Apollo>oh nvm
22:16:59<Apollo>ya fixed it
22:17:06<@JAA>> Internal Server Error
22:17:07<@JAA>Nice
22:17:17<TheTechRobo>JAA: lol
22:17:26<TheTechRobo>what did you throw at it?
22:17:37<@JAA>Just the ID from above, 9f3dea
22:17:47<@JAA>Worked the second time.
22:17:53<@JAA>¯\_(ツ)_/¯
22:17:54<Apollo>I appreciate y'all, archive.org is one of my favorite projects and I definitely like the work you all do in terms of archiving stuff and sending it to there (yes I know you all are not affiliated)
22:17:57<@JAA>Computers, how do they even work?
22:18:06<TheTechRobo>> requests.exceptions.ConnectionError: HTTPConnectionPool(host='web.archive.org', port=80): Max retries exceeded with url: /web/20190810013707if_/https://theartistunion.com/api/v3/tracks/9f3dea.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3492831a80>: Failed to establish a new connection: [Errno 111] Connection refused'))
22:18:09<TheTechRobo>yeah archive.org hates me
22:18:18<Apollo>the age old USB insertion pun
22:18:19<Apollo>3 times
22:18:20<TheTechRobo>i sometimes get this on my youtube video finder
22:18:21<@JAA>Ah, the random errors, yeah.
22:18:33<pokechu22>Yeah, I guess from https://d2tml28x3t0b85.cloudfront.net/tracks/stream_files/000/509/784/original/time%20%28if%20and%20when%29.mp3?1492833816 you can get https://web.archive.org/web/*/https://d2tml28x3t0b85.cloudfront.net/tracks/original_files/000/509/784/* - it works in this case, who knows if it works in all of them though
22:19:28<Apollo>hmm, seems like the mp3 is slightly longer than the flac lol
22:20:19<Apollo>but I'm talking like a few ms
22:20:34<TheTechRobo>odd
22:20:39<@JAA>luckcolors would be the most knowledgeable person here regarding the The Artist Union project.
22:21:09<Apollo>BREAKING NEWS:
22:21:23<Apollo>the lossy file isn't bitperfect to the lossless??????????
22:22:03<Apollo>Good to see though, gives me good hope that the shit wasn't run through an mp3 to wav converter because some musicians hate uploading lossless variants
22:22:12<@JAA>Anyway, my process was this: load the track page, check HTTP headers to get the WARC filename containing the track, download the corresponding CDX, sort it by the file offset, look for nearby audio files that seem to match. The last step could be automated through the JSON file, I think.
22:23:14<Apollo>I'm curious if MusicBrainz Picard will recognize the file lol
22:23:19<Apollo>I have to check
22:23:38<Apollo>see if I can get some organization before it gets thrown into my Roon
22:23:44<Apollo>god I love roon
22:27:50<Apollo>Nope, he doesn't exist
22:27:54<Apollo>on picard
22:27:56<Apollo>gasp how could this be
22:31:30<Apollo>lol the image of the track in soundcloud is a black square
22:31:32<Apollo>lovely
22:42:36<Apollo>Welp, that's one tangent done for now
22:42:39<Apollo>Thanks, all
22:42:46Apollo quits [Remote host closed the connection]
23:21:28<@arkiver>"bitperfect to the lossless"
23:21:29<@arkiver>love it :)
23:51:02BlueMaxima joins