#internetarchive log for 2024-05-02

Home Search Previous day Next day

00:27:56		SootBector quits [Remote host closed the connection]
00:28:21		SootBector (SootBector) joins
00:53:23		DogsRNice joins
01:58:18		DogsRNice_ joins
02:01:43		DogsRNice quits [Ping timeout: 255 seconds]
02:14:09		BenFranske joins
02:16:01	<BenFranske>	Does anyone have any information on what "RIghts issue" the Internet Archive has in making old copies of the twit.tv podcasts available? All I can get from them is "we don't comment on rights issues" even though these are all released under a CC license... I was working on archiving everything TWiT has published and got to about 7500 episodes (of
02:16:02	<BenFranske>	24,500) but my account got locked and most (but not yet all) of them have been pulled down.
02:16:55	<nicolas17>	you got content taken down that was actually under Creative Commons? :\|
02:17:37	<BenFranske>	Yep, I was shocked
02:17:52		tapos joins
02:19:25	<BenFranske>	I sent an email to info@archive.org asking to get some collections created and got back "Please do not upload these. We already have an archive of them and at this time they cannot be made public." I asked for clarification given the CC status and got back "we do not comment on rights issues." Then they locked my account and started pulling them
02:20:39	<BenFranske>	See my account which had 7500+ items earlier today https://archive.org/details/@benfranske and currently has 870 (though I expect that to drop, they seem to be pulling stuff in batches)
02:23:24	<BenFranske>	I'm especially annoyed because I was backing up a bunch of metadata (including but not limited to the human generated transcripts and show notes for Security Now which are also CC licensed) for each episode as well which I doubt is in whatever they "already have".
02:44:09		DogsRNice_ quits [Read error: Connection reset by peer]
03:36:48		BenFranske quits [Client Quit]
05:07:09		qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
05:12:18		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
05:34:13		qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
05:41:50		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
06:02:44		qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
06:05:31		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
07:14:20		Arcorann_ joins
07:58:59		qwertyasdfuiopghjkl quits [Client Quit]
08:08:17		SootBector quits [Ping timeout: 250 seconds]
08:11:45		SootBector (SootBector) joins
09:18:13		Doran quits [Ping timeout: 255 seconds]
09:19:46		lea (lea_) joins
09:20:29	<lea>	I want to archive a site behind a login wall
09:20:38	<lea>	site in question: https://usdb.animux.de/ (hosts synced song texts for karaoke apps)
09:20:51	<lea>	is there a documentation on the preferred format for uploads?
09:21:03	<lea>	since these are individual files, I guess I could just upload tens of thousands of individual files to the archive?
09:22:45		Doran (Doranwen) joins
09:26:45		Doran quits [Remote host closed the connection]
09:44:06		pabs quits [Ping timeout: 265 seconds]
09:47:03		pabs (pabs) joins
09:57:57		Doran (Doranwen) joins
10:00:05		f_ (funderscore) joins
10:14:55		Doran quits [Ping timeout: 255 seconds]
10:57:00		Doran (Doranwen) joins
11:07:43		f_ quits [Client Quit]
11:07:49		SootBector quits [Remote host closed the connection]
11:08:18		SootBector (SootBector) joins
11:44:44		thalia quits [Quit: Connection closed for inactivity]
13:27:04		Arcorann_ quits [Ping timeout: 255 seconds]
13:40:34		sonick quits [Client Quit]
13:53:17		tapos quits [Client Quit]
14:17:31		s-crypt quits [Quit: Ping timeout (120 seconds)]
14:17:47		s-crypt (s-crypt) joins
15:16:28		f_ (funderscore) joins
15:26:30		RealPerson joins
15:28:33		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
17:02:14		f_ quits [Remote host closed the connection]
17:03:00		f_ (funderscore) joins
17:09:01		f_ quits [Remote host closed the connection]
17:10:13		RealPerson leaves
17:10:29		RealPerson joins
17:11:03		f_ (funderscore) joins
17:19:55		f_ quits [Ping timeout: 250 seconds]
17:21:02		f_ (funderscore) joins
19:22:07		f_ quits [Ping timeout: 250 seconds]
21:09:01		Webuser536 joins
21:10:38	<Webuser536>	Moving my question here, is there a way of using wget to download files from the Wayback Machine, my connection gets cut off after around ~9000000 bytes.
21:13:09	<@JAA>	Hmm, do you have an example?
21:17:09	<Webuser536>	Say you go onto a website in the Wayback Machine, and you click on a direct download link on that page, on a normal browser on windows (Chrome, Firefox, etc) it downloads the file normally.
21:17:09	<Webuser536>	But if I use wget, the connection gets cut off and it's unable to fully download.
21:17:27	<Webuser536>	If you mean an example link I can go get one, if needed.
21:17:39	<@JAA>	Yeah, an example URL where you see that behaviour.
21:20:31	<@JAA>	I just tested with a random 166 MB file, and that worked just fine.
21:22:06	<Webuser536>	Ah okay, only just realized it's because I'm downloading a much bigger file, not too huge but I can't find it anywhere else on the internet.
21:28:34	<nicolas17>	Webuser536: are you sure that same file works on the browser and not on wget?
21:28:53	<nicolas17>	it's possible that the wayback machine doesn't have the full file archived
21:33:22	<@JAA>	Yeah, Save Page Now truncates large files, for example.
21:33:32	<@JAA>	Although that limit is much higher than 9 MB.
21:34:37	<Webuser536>	Just checked, it gets to 9.70k in wget and it fails at 9.20MB on browser.
21:35:04	<Webuser536>	Sad, assuming you are right about everything this probably means the file is lost.
21:36:08	<@JAA>	Maybe, maybe not, no way for us to know if you don't share the URL. :-)
21:37:10	<Webuser536>	Yeah, sorry about holding off with not sharing it.
21:37:11	<Webuser536>	https://web.archive.org/web/20220818205148/http://github-enterprise.s3.amazonaws.com/kvm/releases/github-enterprise-3.6.0.qcow2
21:37:11	<Webuser536>	I wanted to see how the contents changed compared to the currently released build.
21:39:02	<katia>	9720206 bytes via wget
21:39:10	<@JAA>	Yeah, same
21:40:02	<@JAA>	< warning: 299 wayback content truncated by "time"
21:41:32	<@JAA>	Definitely incomplete, even if there might be more than the 9.7 MB.
21:41:52	<Webuser536>	Unfortunate, but semi-expected.
21:41:53	<Webuser536>	Shame that those versions are probably fully lost.
21:42:09	<@JAA>	The WARC is only 955.4 MiB, so the full file couldn't possibly be in there anyway.
21:44:11	<Webuser536>	Yeah, probably should've expected this outcome.
21:53:47	<TheTechRobo>	lea: WARC is preferred as an archival format, but if you want to avoid that whole mess (I wouldn't blame you), probably just upload it in whatever format makes the most sense
21:54:20	<TheTechRobo>	Note that many small files tend to make archive.org's servers sad, so you might want to chunk them up in tarballs or zips
21:54:50	<TheTechRobo>	You can also contact them directly (info@archive.org) and discuss with them
21:58:38	<that_lurker>	Webuser536: You could also maybe ask githubs customer service if they could give you the file. (generally most likely no)
22:09:00	<Webuser536>	Yeah, I'm assuming seeing as I'm not McDonalds or something they probably won't give it to me.
22:09:16	<Webuser536>	I will try that though.
22:12:08	<Webuser536>	Nevermind, it seems like a hassle to get in contact with specifically who would tell me the info I need.
22:18:21	<Webuser536>	There is some good news though, I just found out the University Of Oklahoma archived some old builds in 2016, meaning I could still reference back what I need to those.
22:22:54	<Webuser536>	Maybe I should've looked more, it seems like a lot of github enterprise builds WERE actually saved with Save Now.
22:23:10	<Webuser536>	I'll attempt downloading one of those to completion.
22:24:11	<Webuser536>	*Might be saved, not too sure yet
22:25:34	<Webuser536>	The Oklahoma ones are absolutely saved though.
22:25:35	<Webuser536>	Sorry for the question that I should've looked into more, I should probably try and upload these to the Internet Archive at some point.
22:26:29	<@JAA>	SPN has long been truncating by size though. So those might be incomplete, too.
22:27:12	<nicolas17>	<Webuser536> Just checked, it gets to 9.70k in wget and it fails at 9.20MB on browser.
22:27:16	<nicolas17>	I think that's you misreading units
22:27:38	<nicolas17>	9720206 bytes is 9.2MiB, not "9.70k"
22:29:27	<@JAA>	Where can that University Of Oklahoma data be found?
22:30:21	<Webuser536>	No? wget does seem to be reporting ~9k whenever I try that file.
22:31:19	<Webuser536>	JAA https://archive.org/details/ArchiveIt-Partner-164
22:31:56	<@JAA>	Oh, I thought you meant outside of IA.
22:32:07	<Webuser536>	Oh, sorry for the miscommunication.
22:32:24	<Webuser536>	Nevermind, you were right. It is 9.27M.
22:32:36	<@JAA>	I'm not sure duplicating that data would be worthwhile.
22:32:48	<Webuser536>	Yeah, fair point.
22:39:37		geezabiscuit quits [Remote host closed the connection]
22:43:56		geezabiscuit (geezabiscuit) joins
23:57:06	<lea>	TheTechRobo: how happy is info@archive.org to be contacted over a small unimportant page?

Home Search Previous day Next day