#archiveteam<efnet> log for 2012-11-27

Home Search Previous day Next day

00:22:00	<SketchCow>	I have even more than you
00:33:00	<Nemo_bis>	6 hours? Peanatus. My book took one month. :p
00:42:00	<godane>	i'm uploading amigahistory.co.uk
00:43:00	<godane>	not many crawls in wayback machine
00:53:00	<godane>	i'm grabing arstechica.com index
00:55:00	<godane>	uploaded: http://archive.org/details/amigahistory.co.uk-20121126-mirror
02:15:00	<dashcloud>	so is there an easy way to ask an FTP site how big it is?
02:15:00	<SketchCow>	no
02:22:00	<godane>	does any one know how to cat a file and just echo what ends with / at the end of the line?
02:23:00	<godane>	my arstechnica.com index.txt file has a lot of bad urls
02:24:00	<godane>	these urls are going be redirect other urls in the list anyway
02:42:00	<chronomex>	godane: try grep '/$' whatever.txt
02:43:00	<chronomex>	$ means "here must be end-of-line"
02:43:00	<chronomex>	^ is the same but for beginning-of-line
02:43:00	<dashcloud>	hi, wget isn't able to connect to this ftp site: ftp://ftp.gamers.org/ - any ideas why? it tries logging in as anonymous, says Error in server greeting, and then repeats the process
02:57:00	<SketchCow>	Might need to hide who you are
03:03:00	<godane>	chronomex: that only grabs the last line
03:09:00	<dashcloud>	ah- I got it- apparently I wasn't timed out from my last login using a non-wget client
03:14:00	<dashcloud>	making some progress on the list here: http://pastebin.com/NA610GXe (lot of dead sites though)
03:15:00	<chronomex>	godane: ??? ummmmm not sure what kind of unix you're using
03:18:00	<godane>	i'm doing grep '/$' index.txt
03:18:00	<godane>	the only line that comes up is the last one in list
08:15:00	<chronomex>	hm, are you sure that it's not correct?
14:51:00	<SmileyG>	http://www.savewalterwhite.com/
17:03:00	<soultcer>	chronomex: Is the tracker still OOM-ing?
19:39:00	<chronomex>	augh, it is
19:44:00	<chronomex>	hm, seems to have fallen over hard this time
19:49:00	<chronomex>	ok it's back
19:49:00	<ersi>	awesome, thanks man
19:49:00	<chronomex>	alard and I will have to discuss how to make this not happen
20:00:00	<ersi>	chronomex: Well, we're back at HTTP 599
20:00:00	<ersi>	:<
20:00:00	<chronomex>	fuqqq
20:01:00	<ersi>	Cocks. Huge cocks. In a bowl. A Bowl of Cocks.
20:01:00	<ersi>	In other words, cockbowl.
20:02:00	<chronomex>	you sure? the website works
20:02:00	<ersi>	Maybe it's just my seesaw pipeline that has fucked up, let me restart that
20:02:00	<ersi>	but I'm basically getting a lot of connection refuses
20:02:00	<chronomex>	hm
20:03:00	<ersi>	res = http_client.fetch("http://tracker.archiveteam.org:8123/request-discover", method="POST", body="n=25&version=2")
20:03:00	<ersi>	tornado.httpclient.HTTPError: HTTP 599: [Errno 111] Connection refused
20:03:00	<chronomex>	I just kicked redis and nginx, maybe they started in the wrong order or something
20:04:00	<chronomex>	ah, I guess I need to start another daemon?
20:04:00	<ersi>	seems to be fucked up for me still unfortunally, oh well
20:04:00	<ersi>	mayhapples
20:04:00	<ersi>	seems to be the user discovery stuff
20:04:00	<ersi>	which very well might be seperate
20:06:00	<chronomex>	ok, try now
20:08:00	<ersi>	lots better
20:08:00	<ersi>	hugs and kisses etc
20:09:00	<chronomex>	\o/
20:09:00	<chronomex>	it seems that the normal failure mode is for redis to die and then something in either the website or the tracker to go tits-up and occupy 100% cpu
20:10:00	<chronomex>	what happened the most recent time is not exactly known; something died even more horribly than usual so all 4 cpus were at 100% and the box was entirely unresponsive
20:13:00	<SketchCow>	Weird.
20:13:00	<SketchCow>	I got a slight reprieve on the DEFCON documentary
20:14:00	<SketchCow>	So I can spend a little more time on archiveteam projects and things and stuff.
20:15:00	<ersi>	chronomex: not super strange since my pipeline was having a fun time using as much CPU as possible to throw as many connection attempts as possible to your box, I assume everyone elses would do the same. That's a lot of connections.
20:15:00	<chronomex>	no, I think some daemon on my side goes into spinloop
20:16:00	<ersi>	coolers, maybe both
20:17:00	<chronomex>	oh, most recent time it appears that redis didn't get OOMed, so the box was completely stuffed
20:17:00	<chronomex>	I should probably enlarge the swapspace
20:21:00	<ersi>	swap sucks, but it's better than none I guess
20:21:00	<ersi>	Or maybe not, maybe it's better for it to go get OOM'd
20:24:00	<chronomex>	I don't know
20:24:00	<chronomex>	next time the box falls over completely I'll take the occasion to rejigger the disk allocation
20:46:00	<SketchCow>	--------------------------------------------------
20:47:00	<SketchCow>	BETA OF THE NEW WAYBACK MACHINE AVAILABLE
20:47:00	<SketchCow>	http://web-beta.archive.org/
20:47:00	<SketchCow>	Please pound on it, per Brewster's invite.
20:47:00	<SketchCow>	Let me know if you run into anything.
20:47:00	<SketchCow>	--------------------------------------------------
20:51:00	<chronomex>	whatall's different?
20:51:00	<SketchCow>	50% more data
20:51:00	<SketchCow>	Right up to the moment.
20:51:00	<Deewiant>	http://faq.web.archive.org/whats-the-difference-between-the-classic-wayback-machine-and-the-new-beta-version/
20:51:00	<DFJustin>	sweet, some of the mess wiki content is there
20:51:00	<chronomex>	spiffy
20:56:00	<SketchCow>	http://web-beta.archive.org/web/20121103192508/http://torrentfreak.com/ hooray
20:56:00	<swebb>	SketchCow: some links don't map properly on the web-beta.archive.org to other pages. Relative links don't include the base URL from the referred.
20:56:00	<swebb>	http://web-beta.archive.org/web/20120518135633/http://badcheese.com/all.html - Click on any of the blue links.
20:59:00	<ersi>	SketchCow: Is this a new Wayback Machine or a new Liveweb?
21:00:00	<SketchCow>	http://web-beta.archive.org/web/20121023010539/http://tvtropes.org/pmwiki/pmwiki.php/Main/HomePage ha HA yes
21:00:00	<ersi>	What? Cool! I didn't know all of Wayback Machines data was available to download via archive.org/details/blahblah.arc
21:00:00	<ersi>	available under the crawldata keyword
21:01:00	<DFJustin>	http://wayback-beta.archive.org/web//http://goatse.cx/ throws up an error
21:02:00	<DFJustin>	also the display of urls is a little screwy
21:02:00	<SketchCow>	http://wayback-beta.archive.org/web/*/http://www.fortunecity.com and here is a bit of cuteness
21:03:00	<SketchCow>	You can see the insanity of us on May 1-5
21:03:00	<SketchCow>	Followed by sad little crawls of a dead site
21:03:00	<chronomex>	whoa insanity indeed
21:04:00	<chronomex>	and march
21:04:00	<SketchCow>	ha ha, yes
21:06:00	<SketchCow>	Sounds like the MESS wiki info can be transferred back
21:07:00	<SketchCow>	http://wayback-beta.archive.org/web/*/http://www.nytimes.com/
21:10:00	<DFJustin>	parts of it anyway
21:11:00	<DFJustin>	there was a lot of deeply nested stuff unfortunately
21:16:00	<DFJustin>	this is the one I was most wanting to get back :D http://web-beta.archive.org/web/20111027173407/http://mess.redump.net/freely_available_systems
21:18:00	<DFJustin>	took me a lot of work to hunt those down to have something more concrete than "oh a guy said once it's cool"
21:27:00	<chronomex>	nice!
21:46:00	<ersi>	SketchCow: Got any changelist? New features? Specific bug fixes? Or is it ""just"" new data available?
21:51:00	<ersi>	http://wayback-beta.archive.org/web//http://www.fortunecity.com/ hung my Firefox Instance >_>
21:51:00	<ersi>	and then I got an error; "DataTables warning: Unexpected number of TD elements. Expected 99156 and got 99152. DataTables does not support rowspan / colspan in the table body, and there must be one cell for each row/column combination."
22:09:00		SketchCow is on the phone with an archive about donating his stuff to an archive
22:09:00	<SketchCow>	(some of it)
22:17:00	<balrog_>	it would be nice if the new wayback frontend allowed at least URL grep
22:18:00	<balrog_>	since I know fulltext grep would be really, really difficult
22:19:00	<balrog_>	wait, that's there :P
22:19:00	<balrog_>	didn't think I saw it before
22:22:00	<chronomex>	url grep?!?
22:22:00	<chronomex>	neato
22:51:00	<SketchCow>	Uploading downloaded FTP sites
23:46:00	<dashcloud>	so I've updated the list from Internet Games Directory (1996's most popular FTP sites) with dead sites, inaccessible, and things that I've done/working on: http://pastebin.com/M9VzgiYc

Home Search Previous day Next day