00:22:00 | <SketchCow> | I have even more than you |
00:33:00 | <Nemo_bis> | 6 hours? Peanatus. My book took one month. :p |
00:42:00 | <godane> | i'm uploading amigahistory.co.uk |
00:43:00 | <godane> | not many crawls in wayback machine |
00:53:00 | <godane> | i'm grabing arstechica.com index |
00:55:00 | <godane> | uploaded: http://archive.org/details/amigahistory.co.uk-20121126-mirror |
02:15:00 | <dashcloud> | so is there an easy way to ask an FTP site how big it is? |
02:15:00 | <SketchCow> | no |
02:22:00 | <godane> | does any one know how to cat a file and just echo what ends with / at the end of the line? |
02:23:00 | <godane> | my arstechnica.com index.txt file has a lot of bad urls |
02:24:00 | <godane> | these urls are going be redirect other urls in the list anyway |
02:42:00 | <chronomex> | godane: try grep '/$' whatever.txt |
02:43:00 | <chronomex> | $ means "here must be end-of-line" |
02:43:00 | <chronomex> | ^ is the same but for beginning-of-line |
02:43:00 | <dashcloud> | hi, wget isn't able to connect to this ftp site: ftp://ftp.gamers.org/ - any ideas why? it tries logging in as anonymous, says Error in server greeting, and then repeats the process |
02:57:00 | <SketchCow> | Might need to hide who you are |
03:03:00 | <godane> | chronomex: that only grabs the last line |
03:09:00 | <dashcloud> | ah- I got it- apparently I wasn't timed out from my last login using a non-wget client |
03:14:00 | <dashcloud> | making some progress on the list here: http://pastebin.com/NA610GXe (lot of dead sites though) |
03:15:00 | <chronomex> | godane: ??? ummmmm not sure what kind of unix you're using |
03:18:00 | <godane> | i'm doing grep '/$' index.txt |
03:18:00 | <godane> | the only line that comes up is the last one in list |
08:15:00 | <chronomex> | hm, are you sure that it's not correct? |
14:51:00 | <SmileyG> | http://www.savewalterwhite.com/ |
17:03:00 | <soultcer> | chronomex: Is the tracker still OOM-ing? |
19:39:00 | <chronomex> | augh, it is |
19:44:00 | <chronomex> | hm, seems to have fallen over hard this time |
19:49:00 | <chronomex> | ok it's back |
19:49:00 | <ersi> | awesome, thanks man |
19:49:00 | <chronomex> | alard and I will have to discuss how to make this not happen |
20:00:00 | <ersi> | chronomex: Well, we're back at HTTP 599 |
20:00:00 | <ersi> | :< |
20:00:00 | <chronomex> | fuqqq |
20:01:00 | <ersi> | Cocks. Huge cocks. In a bowl. A Bowl of Cocks. |
20:01:00 | <ersi> | In other words, cockbowl. |
20:02:00 | <chronomex> | you sure? the website works |
20:02:00 | <ersi> | Maybe it's just my seesaw pipeline that has fucked up, let me restart that |
20:02:00 | <ersi> | but I'm basically getting a lot of connection refuses |
20:02:00 | <chronomex> | hm |
20:03:00 | <ersi> | res = http_client.fetch("http://tracker.archiveteam.org:8123/request-discover", method="POST", body="n=25&version=2") |
20:03:00 | <ersi> | tornado.httpclient.HTTPError: HTTP 599: [Errno 111] Connection refused |
20:03:00 | <chronomex> | I just kicked redis and nginx, maybe they started in the wrong order or something |
20:04:00 | <chronomex> | ah, I guess I need to start another daemon? |
20:04:00 | <ersi> | seems to be fucked up for me still unfortunally, oh well |
20:04:00 | <ersi> | mayhapples |
20:04:00 | <ersi> | seems to be the user discovery stuff |
20:04:00 | <ersi> | which very well might be seperate |
20:06:00 | <chronomex> | ok, try now |
20:08:00 | <ersi> | lots better |
20:08:00 | <ersi> | hugs and kisses etc |
20:09:00 | <chronomex> | \o/ |
20:09:00 | <chronomex> | it seems that the normal failure mode is for redis to die and then something in either the website or the tracker to go tits-up and occupy 100% cpu |
20:10:00 | <chronomex> | what happened the most recent time is not exactly known; something died even more horribly than usual so all 4 cpus were at 100% and the box was entirely unresponsive |
20:13:00 | <SketchCow> | Weird. |
20:13:00 | <SketchCow> | I got a slight reprieve on the DEFCON documentary |
20:14:00 | <SketchCow> | So I can spend a little more time on archiveteam projects and things and stuff. |
20:15:00 | <ersi> | chronomex: not super strange since my pipeline was having a fun time using as much CPU as possible to throw as many connection attempts as possible to your box, I assume everyone elses would do the same. That's a lot of connections. |
20:15:00 | <chronomex> | no, I think some daemon on my side goes into spinloop |
20:16:00 | <ersi> | coolers, maybe both |
20:17:00 | <chronomex> | oh, most recent time it appears that redis didn't get OOMed, so the box was completely stuffed |
20:17:00 | <chronomex> | I should probably enlarge the swapspace |
20:21:00 | <ersi> | swap sucks, but it's better than none I guess |
20:21:00 | <ersi> | Or maybe not, maybe it's better for it to go get OOM'd |
20:24:00 | <chronomex> | I don't know |
20:24:00 | <chronomex> | next time the box falls over completely I'll take the occasion to rejigger the disk allocation |
20:46:00 | <SketchCow> | -------------------------------------------------- |
20:47:00 | <SketchCow> | BETA OF THE NEW WAYBACK MACHINE AVAILABLE |
20:47:00 | <SketchCow> | http://web-beta.archive.org/ |
20:47:00 | <SketchCow> | Please pound on it, per Brewster's invite. |
20:47:00 | <SketchCow> | Let me know if you run into anything. |
20:47:00 | <SketchCow> | -------------------------------------------------- |
20:51:00 | <chronomex> | whatall's different? |
20:51:00 | <SketchCow> | 50% more data |
20:51:00 | <SketchCow> | Right up to the moment. |
20:51:00 | <Deewiant> | http://faq.web.archive.org/whats-the-difference-between-the-classic-wayback-machine-and-the-new-beta-version/ |
20:51:00 | <DFJustin> | sweet, some of the mess wiki content is there |
20:51:00 | <chronomex> | spiffy |
20:56:00 | <SketchCow> | http://web-beta.archive.org/web/20121103192508/http://torrentfreak.com/ hooray |
20:56:00 | <swebb> | SketchCow: some links don't map properly on the web-beta.archive.org to other pages. Relative links don't include the base URL from the referred. |
20:56:00 | <swebb> | http://web-beta.archive.org/web/20120518135633/http://badcheese.com/all.html - Click on any of the blue links. |
20:59:00 | <ersi> | SketchCow: Is this a new Wayback Machine or a new Liveweb? |
21:00:00 | <SketchCow> | http://web-beta.archive.org/web/20121023010539/http://tvtropes.org/pmwiki/pmwiki.php/Main/HomePage ha HA yes |
21:00:00 | <ersi> | What? Cool! I didn't know all of Wayback Machines data was available to download via archive.org/details/blahblah.arc |
21:00:00 | <ersi> | available under the crawldata keyword |
21:01:00 | <DFJustin> | http://wayback-beta.archive.org/web/*/http://goatse.cx/* throws up an error |
21:02:00 | <DFJustin> | also the display of urls is a little screwy |
21:02:00 | <SketchCow> | http://wayback-beta.archive.org/web/*/http://www.fortunecity.com and here is a bit of cuteness |
21:03:00 | <SketchCow> | You can see the insanity of us on May 1-5 |
21:03:00 | <SketchCow> | Followed by sad little crawls of a dead site |
21:03:00 | <chronomex> | whoa insanity indeed |
21:04:00 | <chronomex> | and march |
21:04:00 | <SketchCow> | ha ha, yes |
21:06:00 | <SketchCow> | Sounds like the MESS wiki info can be transferred back |
21:07:00 | <SketchCow> | http://wayback-beta.archive.org/web/*/http://www.nytimes.com/ |
21:10:00 | <DFJustin> | parts of it anyway |
21:11:00 | <DFJustin> | there was a lot of deeply nested stuff unfortunately |
21:16:00 | <DFJustin> | this is the one I was most wanting to get back :D http://web-beta.archive.org/web/20111027173407/http://mess.redump.net/freely_available_systems |
21:18:00 | <DFJustin> | took me a lot of work to hunt those down to have something more concrete than "oh a guy said once it's cool" |
21:27:00 | <chronomex> | nice! |
21:46:00 | <ersi> | SketchCow: Got any changelist? New features? Specific bug fixes? Or is it ""just"" new data available? |
21:51:00 | <ersi> | http://wayback-beta.archive.org/web/*/http://www.fortunecity.com/* hung my Firefox Instance >_> |
21:51:00 | <ersi> | and then I got an error; "DataTables warning: Unexpected number of TD elements. Expected 99156 and got 99152. DataTables does not support rowspan / colspan in the table body, and there must be one cell for each row/column combination." |
22:09:00 | | SketchCow is on the phone with an archive about donating his stuff to an archive |
22:09:00 | <SketchCow> | (some of it) |
22:17:00 | <balrog_> | it would be nice if the new wayback frontend allowed at least URL grep |
22:18:00 | <balrog_> | since I know fulltext grep would be really, really difficult |
22:19:00 | <balrog_> | wait, that's there :P |
22:19:00 | <balrog_> | didn't think I saw it before |
22:22:00 | <chronomex> | url grep?!? |
22:22:00 | <chronomex> | neato |
22:51:00 | <SketchCow> | Uploading downloaded FTP sites |
23:46:00 | <dashcloud> | so I've updated the list from Internet Games Directory (1996's most popular FTP sites) with dead sites, inaccessible, and things that I've done/working on: http://pastebin.com/M9VzgiYc |