00:04:00<abartov>The Warrior is a really inspired feat of packaging! I love it.
00:15:00<godane1>someone uploaded a 2.1gb file of the last x-play episode
00:15:00<godane1>getting that one now
00:28:00<JudgeDead>Ever heard about formspring.me?
00:33:00<JudgeDead>"This Account Has Expired. As of January 2013, accounts that have not been active in over 18 months may be automatically deleted. If this is your account, you may login within the next 24 hours to stop this account from being permanently deleted."
00:33:00<JudgeDead>How nice of them. A whole 24 hours to react.
00:51:00<xk_id_>how bad is it to disrespect the robots.txt rate guidelines for bots?
00:51:00<xk_id_>ethically/legally
00:54:00<balrog_>rate guidelines? are they ridiculously strict here?
00:55:00<balrog_>also, you may violate the "spirit of the law" but who says you can't use multiple bots from multiple hosts
00:55:00<xk_id_>Crawl-delay: 5
00:55:00<xk_id_>for the Speedy spider
00:56:00<xk_id_>and I suppose our advice wouldn't really work for an EC2 cluster then, would it...
00:58:00<xk_id_>in fact... here's what they put in the robots.txt: 5 specific user-agents with crawl-delay restriction. If my crawler is diy, then I can assume it's not restricted?
00:58:00xk_id_ chuckles
00:58:00<balrog_>you can set your own UA
00:58:00<balrog_>are they blocking all other UAs?
00:58:00<xk_id_>nope
00:58:00<balrog_>lol
00:59:00<xk_id_>ikr
00:59:00<balrog_>I'd still add a delay just to be polite
00:59:00<balrog_>but I'd probably make is lower
00:59:00<xk_id_>under "User-agent: *" they only list a bunch of "Disallow:"
00:59:00<xk_id_>Yes. I shall. say, what would be a polite, yet satisfying delay?
01:00:00<xk_id_>in fact, let's keep it to polite. (satisfying depends on my needs)
01:00:00<balrog_>1 or 2 seconds probably is what I'd do
01:00:00<xk_id_>ic
01:00:00<xk_id_>would that be cumulative across the cluster? or for just one machine?
02:08:00<chronomex>xk_id_: robots is somewhere between a request and a suggestion
02:08:00<chronomex>imo
02:08:00<xk_id_>ic
02:08:00<chronomex>polite delay is 1 second, sneaky is 10 seconds
04:33:00<godane1>uploaded finally: http://archive.org/details/TechTV_Music_Wars
09:57:00<omf__>http://www.ussc.gov/ got hacked by Anonymous
09:57:00<omf__>I made a copy for the future to see
10:07:00<illunatic>omf__: archiving that?
10:12:00<omf__>I am downloading it now
10:12:00<omf__>many of the content mirrors are down so
10:13:00<omf__>I am really interested in what is in these files, no doubt it is going to piss off the gov
10:13:00<omf__>they are 150mb each
10:15:00<illunatic>:)
10:15:00<illunatic>the gov't is going all out tho
10:16:00<illunatic>trying to recruit netizens to fight in the cyber war
10:16:00<illunatic>http://www.whitehouse.gov/blog/2013/01/22/roll-your-sleeves-get-involved-and-get-civic-hacking
10:17:00<illunatic>http://activepolitic.com:82/external/1785.html?utm_source=dlvr.it&utm_medium=twitter
10:17:00<illunatic>John Kerry: Foreign Hackers Are '21st Century Nuclear Weapons'
10:18:00<ersi>#archiveteam-bs man
10:18:00<ersi>If it ain't about archiving, put it in -bs
10:18:00<omf__>the download is going super slow, I assume because everyone is slamming the shit out of those servers
11:23:00<omf__>ussc.gov dns has been taken out
11:23:00<omf__>am I the only one who backed it up?
14:41:00<db48x>aww, ussc.gov is down
14:41:00<omf__>just the dns is down
14:41:00<db48x>I always sleep at the worst times
14:41:00<db48x>ah, have you an ip address?
14:41:00<omf__>the direct ip works
14:42:00<omf__>it is in the hackernews story
14:42:00<omf__>I am still getting 2 of the files from mirrors
14:42:00<Schbirid>ncdu has new features since september: Added option to dump scanned directory information to a file (-o) & Added option to load scanned directory information from a file (-f)
14:43:00<Schbirid>this makes it the perfect tool if you want to get a nice overview of some directories, their sizes, etc
14:45:00<db48x>I guess the story left the first few pages
15:08:00<omf__>https://news.ycombinator.com/item?id=5119600
15:08:00<omf__>it is still the 2nd story on there
15:08:00<omf__>it looks like for now I am only going to get 2 of the files
15:08:00<omf__>I am keeping an eye out for more mirrors
15:24:00<db48x>omf__: that one doesn't seem to have the ip address in it
15:35:00<omf__>Yes it does but it does not matter anymore since they partially restored the site to normal
16:34:00<ats_>mmm, flashy lights: http://offog.org/stuff/arc-breakout.jpg
16:34:00<ats_>(as it turned out, I didn't need the breakout box to do anything in the end, but at least it gives me something to watch while the hard disk image is copying...)
18:27:00<db48x>omf__: which of those files have you downloaded?
18:27:00<balrog_>http://www.youtube.com/watch?v=myYzfsEOaDw
18:28:00<balrog_>http://www.youtube.com/watch?v=x3Fz1V3LZtw
18:28:00<balrog_>alternate footage of the NYC memorial; official footage of the IA memorial.
18:33:00<omf__>almost done with kennedy and scalia
18:34:00<omf__>I also got the site as screenshots
18:35:00<omf__>I hope others are getting the other files
18:36:00<balrog_>link me
18:36:00<balrog_>oh, the hn one? ok
18:37:00<omf__>http://pastebin.com/d2nvt263
18:37:00<omf__>the new anonymous thing
18:37:00<balrog_>yeah ok
18:38:00<omf__>7 parts left
18:38:00<balrog_>speeds cusk
18:38:00<balrog_>suck*
18:38:00<omf__>from what I have gathered you need all the files to decrypt everything
18:39:00<omf__>I am glad they at least have 4 mirrors since one was already taken down
18:39:00<balrog_>there is a torrent
18:40:00<balrog_>some are giving 503
18:40:00<balrog_>and most are 404
18:42:00<balrog_>eta 14h 44m
18:42:00<balrog_>for one
18:43:00<omf__>I want to add this to interesting things I collect over the years
18:43:00<omf__>like the doom 3 alpha and the half life 2 source code
18:43:00<omf__>amongst other things
18:44:00<balrog_>did you get the half life source code?
18:44:00<balrog_>err, half life alpha
18:44:00<omf__>I might have a copy of the game not the code, just HL2
18:44:00<omf__>I am not sure they ever caught who broke in
18:45:00<balrog_>wow
18:45:00<balrog_>the torrent picked up
18:45:00<balrog_>should be done in a minute
18:45:00<Schbirid>omf__: half life (1) alpha leaked some week ago
18:45:00<Schbirid>alpha as in press release disc
18:45:00<Schbirid>pretty nice early stuff
18:46:00<omf__>hmm maybe I should look around for it
18:46:00<omf__>my big thing is gaming history, way too much of that is long gone.
18:49:00<balrog_>the torrent has a "press release" flv in it
18:49:00<balrog_>and .txt
18:50:00<balrog_>omf__: I have all the files
18:50:00<omf__>you got a fast internet connection
18:50:00<balrog_>yeah, 5MB/s
18:56:00<balrog_>omf__: someone posted the whole thing on mega : https://mega.co.nz/#!V9sH3TIC!P9U_C2udtPdJyt8772o_aEiceHsV7BDxdOmwO9224Qg
18:56:00<omf__>hahah go MEGA
18:56:00<balrog_>ha, they force accepting tos
18:56:00<balrog_>most download sites don't do that for downloaders
18:58:00<omf__>I need fucking flash to use mega
18:58:00<balrog_>:[
18:58:00<balrog_>meh
18:58:00<omf__>what a huge stack of shit
18:58:00<balrog_>hold on
18:59:00<omf__>why cannot flash just die
19:00:00<balrog_>someone grab google cache of ussc.gov
19:02:00<omf__>I already got the main page saved from before. First thing I did
19:02:00<balrog_>ok
19:02:00<omf__>that and the video in case youtube pulled it
19:03:00<balrog_>see pm
19:17:00<omf__>http://no.reddit.com/r/technology/comments/17awqe/ussc_has_been_taken_down_with_an_important_message/
19:17:00<omf__>lists what was in clarity1-3
19:19:00<balrog_>hah
19:19:00<balrog_>weird
19:20:00<omf__>Does anyone else backup twitter feeds? I am only doing a few hundred so far
19:20:00<omf__>I was thinking of setting up an archive warrior so people could help
19:20:00<SketchCow>We have people backing up both twitter feeds of most followed accounts, and a sample (called the drizzle) of the main feed.
19:21:00<SketchCow>No, twitter would not be worth your effort or the effort of the warrior.
19:21:00<omf__>not for everything
19:21:00<omf__>just a few hundred
19:21:00<SketchCow>Yes, but this is literally being done by many others.
19:22:00<SketchCow>Sexy high-profile site, gets all the downloading and the backing up.
19:22:00<omf__>I would like to coordinate with them as to not duplicate effort
19:22:00<SketchCow>Much more at risk are small communities running vbulletin or sites of people recently dead.
19:22:00<SketchCow>How can I say this?
19:22:00<omf__>I am currently doing the small half life site
19:22:00<SketchCow>Oh never mind, I did 3 times. Have fun.
19:23:00<ersi>Self-hosted content is dying an silent death, it's in my opinion a lot more important. We know US LoC gets data from Twitter
19:24:00<omf__>SketchCow, you missed my point. I want to make sure they are backing up the things I would, so I do not have to do it. Also a few of these twitter archivers that I know of do not share data because of the TOS
19:24:00<SketchCow>I totally got the point.
19:24:00<SketchCow>I have the point.
19:24:00<SketchCow>I see no reason to fight you. You want to do it, goooooooo nuts.
19:24:00<SketchCow>Some people like vanilla.
19:25:00<omf__>Let me repeat: I want to make sure they are backing up the things I would, so I do not have to do it.
19:25:00<omf__>I do not want to do it
19:26:00<ersi>Then why are you talking about it? If you want to ensure something, start an effort - maybe people tag along
19:28:00<Smiley>lol
19:28:00<omf__>To head back to my original point. I was asking if anyone else is doing it so I can stop doing it
19:29:00<Smiley>don't trust others to do what you believe should be done.
19:29:00<omf__>I would like to contact the others first and see if they would upload it to IA
19:30:00<omf__>I have got people to put things up before with a simple email, the normal response is it never dawned on them to back it up
19:30:00<Smiley>if it hadn't, why would they be here, of all places?
19:32:00<omf__>People here know people who are not here who do big data
19:32:00<omf__>finding data is like job hunting, you get more through word of mouth than anything else
19:34:00<omf__>I got a local non-profit to convert all their tapes to dvd and this year they are going to upload them to IA
19:34:00<omf__>they just wanted a place to back them up online and I proposed that solution
19:36:00<omf__>plus they have the dvd backups for their library
19:36:00<omf__>all of it is news shows from 70s-80s
19:37:00<omf__>sorry I went OT
19:44:00<SketchCow>Someone is contributing roughly 500 CD-ROM images and scans to me. That's happening in another window. More than enough good for the world today.
19:44:00<Nemo_bis>aww
19:47:00<omf__>more shareware? I love that stuff
19:47:00<SketchCow>Primarily cover discs.
19:48:00<omf__>aah from mags
19:48:00<SketchCow>And the mags.
19:50:00<omf__>Do the mags go dark until copyright expires?
19:53:00<SketchCow>Ask the question again
19:57:00<omf__>Do the magazines themselves have to go dark until the copyright on them expires?
19:58:00<db48x>given the number of other magazines that aren't dark, I'd guess that they don't
19:58:00<ersi>Unless there's a complaint, etc.
20:01:00<omf__>I am just glad we get this stuff
20:48:00<godane1>so i have about 30gb of videos from g4tv.com
21:27:00chronomex omw to portland to pick up some of the zillions of cds that turnkit wanted
21:33:00<DFJustin>cd-roms fuck yeah
21:33:00<chronomex>some of, dunno how much I can fit in this vehicle
21:33:00<chronomex>:P
21:33:00<DFJustin>O_O
21:34:00<chronomex>well, penny each
21:34:00<SketchCow>Truck rental
21:35:00<chronomex>heh
21:36:00<DFJustin>I wonder how favourably a carload of CD-ROMs compares with fibre internet
21:36:00<SketchCow>Seriously. Truck rental.
21:53:00<chronomex>lady at goodwill doesn't understand indiscriminate purchasing
21:55:00<chronomex>"give me a shelfload of cdroms, I don't care which"
21:55:00<db48x>:)
21:55:00<chronomex>"no, you have to go pick them out on amazon"
22:00:00<DrainLbry>I've decided to start a no-kill shelter for elderly PCs. My first rescue is a rusty, cosmetically deficient, missing accessories Tandy 1000. He sits nicely on command, is housetrained, needs a little TLC. This guy deserves to live out the rest of his days in a warm and loving home. Does great with cats. Won't you be his forever home?
22:09:00<Schbirid>:)
22:12:00<SketchCow>Already running a no-kill shelter
22:26:00<Famicoman>DrainLbry you basically described my basement
22:38:00<omf__>DrainLbry, I do that too
22:38:00<omf__>I have a whole 10x10 storage unit full of old computers. It is the only thing I collect
22:38:00<omf__>I should get some pics up online
22:38:00<omf__>Does it work?
23:46:00<bsmith094>hello?
23:46:00<no2pencil>hi
23:57:00<SketchCow>whazzzzzuupppp
23:58:00<no2pencil>happy weekend, SketchCow