00:22:00<SketchCow>I have even more than you
00:33:00<Nemo_bis>6 hours? Peanatus. My book took one month. :p
00:42:00<godane>i'm uploading amigahistory.co.uk
00:43:00<godane>not many crawls in wayback machine
00:53:00<godane>i'm grabing arstechica.com index
00:55:00<godane>uploaded: http://archive.org/details/amigahistory.co.uk-20121126-mirror
02:15:00<dashcloud>so is there an easy way to ask an FTP site how big it is?
02:15:00<SketchCow>no
02:22:00<godane>does any one know how to cat a file and just echo what ends with / at the end of the line?
02:23:00<godane>my arstechnica.com index.txt file has a lot of bad urls
02:24:00<godane>these urls are going be redirect other urls in the list anyway
02:42:00<chronomex>godane: try grep '/$' whatever.txt
02:43:00<chronomex>$ means "here must be end-of-line"
02:43:00<chronomex>^ is the same but for beginning-of-line
02:43:00<dashcloud>hi, wget isn't able to connect to this ftp site: ftp://ftp.gamers.org/ - any ideas why? it tries logging in as anonymous, says Error in server greeting, and then repeats the process
02:57:00<SketchCow>Might need to hide who you are
03:03:00<godane>chronomex: that only grabs the last line
03:09:00<dashcloud>ah- I got it- apparently I wasn't timed out from my last login using a non-wget client
03:14:00<dashcloud>making some progress on the list here: http://pastebin.com/NA610GXe (lot of dead sites though)
03:15:00<chronomex>godane: ??? ummmmm not sure what kind of unix you're using
03:18:00<godane>i'm doing grep '/$' index.txt
03:18:00<godane>the only line that comes up is the last one in list
08:15:00<chronomex>hm, are you sure that it's not correct?
14:51:00<SmileyG>http://www.savewalterwhite.com/
17:03:00<soultcer>chronomex: Is the tracker still OOM-ing?
19:39:00<chronomex>augh, it is
19:44:00<chronomex>hm, seems to have fallen over hard this time
19:49:00<chronomex>ok it's back
19:49:00<ersi>awesome, thanks man
19:49:00<chronomex>alard and I will have to discuss how to make this not happen
20:00:00<ersi>chronomex: Well, we're back at HTTP 599
20:00:00<ersi>:<
20:00:00<chronomex>fuqqq
20:01:00<ersi>Cocks. Huge cocks. In a bowl. A Bowl of Cocks.
20:01:00<ersi>In other words, cockbowl.
20:02:00<chronomex>you sure? the website works
20:02:00<ersi>Maybe it's just my seesaw pipeline that has fucked up, let me restart that
20:02:00<ersi>but I'm basically getting a lot of connection refuses
20:02:00<chronomex>hm
20:03:00<ersi>res = http_client.fetch("http://tracker.archiveteam.org:8123/request-discover", method="POST", body="n=25&version=2")
20:03:00<ersi>tornado.httpclient.HTTPError: HTTP 599: [Errno 111] Connection refused
20:03:00<chronomex>I just kicked redis and nginx, maybe they started in the wrong order or something
20:04:00<chronomex>ah, I guess I need to start another daemon?
20:04:00<ersi>seems to be fucked up for me still unfortunally, oh well
20:04:00<ersi>mayhapples
20:04:00<ersi>seems to be the user discovery stuff
20:04:00<ersi>which very well might be seperate
20:06:00<chronomex>ok, try now
20:08:00<ersi>lots better
20:08:00<ersi>hugs and kisses etc
20:09:00<chronomex>\o/
20:09:00<chronomex>it seems that the normal failure mode is for redis to die and then something in either the website or the tracker to go tits-up and occupy 100% cpu
20:10:00<chronomex>what happened the most recent time is not exactly known; something died even more horribly than usual so all 4 cpus were at 100% and the box was entirely unresponsive
20:13:00<SketchCow>Weird.
20:13:00<SketchCow>I got a slight reprieve on the DEFCON documentary
20:14:00<SketchCow>So I can spend a little more time on archiveteam projects and things and stuff.
20:15:00<ersi>chronomex: not super strange since my pipeline was having a fun time using as much CPU as possible to throw as many connection attempts as possible to your box, I assume everyone elses would do the same. That's a lot of connections.
20:15:00<chronomex>no, I think some daemon on my side goes into spinloop
20:16:00<ersi>coolers, maybe both
20:17:00<chronomex>oh, most recent time it appears that redis didn't get OOMed, so the box was completely stuffed
20:17:00<chronomex>I should probably enlarge the swapspace
20:21:00<ersi>swap sucks, but it's better than none I guess
20:21:00<ersi>Or maybe not, maybe it's better for it to go get OOM'd
20:24:00<chronomex>I don't know
20:24:00<chronomex>next time the box falls over completely I'll take the occasion to rejigger the disk allocation
20:46:00<SketchCow>--------------------------------------------------
20:47:00<SketchCow>BETA OF THE NEW WAYBACK MACHINE AVAILABLE
20:47:00<SketchCow>http://web-beta.archive.org/
20:47:00<SketchCow>Please pound on it, per Brewster's invite.
20:47:00<SketchCow>Let me know if you run into anything.
20:47:00<SketchCow>--------------------------------------------------
20:51:00<chronomex>whatall's different?
20:51:00<SketchCow>50% more data
20:51:00<SketchCow>Right up to the moment.
20:51:00<Deewiant>http://faq.web.archive.org/whats-the-difference-between-the-classic-wayback-machine-and-the-new-beta-version/
20:51:00<DFJustin>sweet, some of the mess wiki content is there
20:51:00<chronomex>spiffy
20:56:00<SketchCow>http://web-beta.archive.org/web/20121103192508/http://torrentfreak.com/ hooray
20:56:00<swebb>SketchCow: some links don't map properly on the web-beta.archive.org to other pages. Relative links don't include the base URL from the referred.
20:56:00<swebb>http://web-beta.archive.org/web/20120518135633/http://badcheese.com/all.html - Click on any of the blue links.
20:59:00<ersi>SketchCow: Is this a new Wayback Machine or a new Liveweb?
21:00:00<SketchCow>http://web-beta.archive.org/web/20121023010539/http://tvtropes.org/pmwiki/pmwiki.php/Main/HomePage ha HA yes
21:00:00<ersi>What? Cool! I didn't know all of Wayback Machines data was available to download via archive.org/details/blahblah.arc
21:00:00<ersi>available under the crawldata keyword
21:01:00<DFJustin>http://wayback-beta.archive.org/web/*/http://goatse.cx/* throws up an error
21:02:00<DFJustin>also the display of urls is a little screwy
21:02:00<SketchCow>http://wayback-beta.archive.org/web/*/http://www.fortunecity.com and here is a bit of cuteness
21:03:00<SketchCow>You can see the insanity of us on May 1-5
21:03:00<SketchCow>Followed by sad little crawls of a dead site
21:03:00<chronomex>whoa insanity indeed
21:04:00<chronomex>and march
21:04:00<SketchCow>ha ha, yes
21:06:00<SketchCow>Sounds like the MESS wiki info can be transferred back
21:07:00<SketchCow>http://wayback-beta.archive.org/web/*/http://www.nytimes.com/
21:10:00<DFJustin>parts of it anyway
21:11:00<DFJustin>there was a lot of deeply nested stuff unfortunately
21:16:00<DFJustin>this is the one I was most wanting to get back :D http://web-beta.archive.org/web/20111027173407/http://mess.redump.net/freely_available_systems
21:18:00<DFJustin>took me a lot of work to hunt those down to have something more concrete than "oh a guy said once it's cool"
21:27:00<chronomex>nice!
21:46:00<ersi>SketchCow: Got any changelist? New features? Specific bug fixes? Or is it ""just"" new data available?
21:51:00<ersi>http://wayback-beta.archive.org/web/*/http://www.fortunecity.com/* hung my Firefox Instance >_>
21:51:00<ersi>and then I got an error; "DataTables warning: Unexpected number of TD elements. Expected 99156 and got 99152. DataTables does not support rowspan / colspan in the table body, and there must be one cell for each row/column combination."
22:09:00SketchCow is on the phone with an archive about donating his stuff to an archive
22:09:00<SketchCow>(some of it)
22:17:00<balrog_>it would be nice if the new wayback frontend allowed at least URL grep
22:18:00<balrog_>since I know fulltext grep would be really, really difficult
22:19:00<balrog_>wait, that's there :P
22:19:00<balrog_>didn't think I saw it before
22:22:00<chronomex>url grep?!?
22:22:00<chronomex>neato
22:51:00<SketchCow>Uploading downloaded FTP sites
23:46:00<dashcloud>so I've updated the list from Internet Games Directory (1996's most popular FTP sites) with dead sites, inaccessible, and things that I've done/working on: http://pastebin.com/M9VzgiYc