01:13:00<godane>ivan`: good news
01:14:00<godane>we can archive it just by grabbing the html and looking for mp4
01:16:00<godane>we have to search with google i think
01:16:00<godane>site:color.com/profie
01:17:00<godane>i think jason scott will have to look at this
01:17:00<godane>but this maybe some help it getting the scripts wrote
01:20:00<godane>http://www.color.com/profile/fb-10714086302
01:21:00<godane>snl had a channel
07:23:00<DFJustin>http://www.reddit.com/r/Games/comments/13glp8/sega_starts_quietly_removing_youtube_users_videos/
07:45:00<SketchCow>We need to push this into archive.org then
14:21:00<SketchCow>Grabbing a copy of ftp.lotus.com
14:21:00<SketchCow>Since the reduction of IBM's belief in the brand means that thing could die anytime.
14:21:00<SketchCow>I think we need to think about some FTP downloading.
14:21:00<SketchCow>I mean, you know.
14:22:00<balrog>like, doing it? I could do more FTP downloading myself. Just wondering whether I should continue using lftp mirror, wget, or what :)
14:22:00<balrog>ftp is a little different from http/html
14:22:00<SketchCow>I use wget straight on, through UNIX.
14:22:00<SketchCow>That does a lot of work.
14:22:00<balrog>with WARC?
14:23:00<balrog>oh one positive for lftp: it supports foreign charsets, which nearly no other tools do
14:23:00<SketchCow>No, not with WARC.
14:23:00<balrog>ok.
14:23:00<SketchCow>I mean, I don't for this.
14:23:00<SketchCow>Unless we think it's totally dying.
14:23:00<balrog>WARC doesn't seem strictly needed for ftp, unless you know something I don't :)
14:24:00<balrog>I generally use warc for http sites no matter what
14:25:00<balrog>anyway I have multiple web site archives here I'd like to upload. should I just use that bulk uploader that was linked?
14:28:00<SketchCow>How many?
14:28:00<balrog>around 100gb that I have in front of me
14:28:00<balrog>more that's on other disks
14:29:00<balrog>28 large tar.gzs in this folder
14:29:00<balrog>this was before I started using warc, so it's tar.gz files with index lists
14:30:00<SketchCow>Two choices. Either I can do it, or you can use scripts I wrote for S3, or you can use FTP.
14:31:00<balrog>question though, do .tar.gzs like this need any postprocessing?
14:31:00<SketchCow>Not for this purpose.
14:31:00<SketchCow>Remixes of this stuff could be done later.
14:31:00<balrog>ok
14:32:00<balrog>"Either I can do it" — how? not sure what you mean there
14:32:00<SketchCow>Wow, ftp.lotus.com is full of K-razy Stuff
14:32:00<SketchCow>I mean I give you a place to rsync and I do it.
14:32:00<balrog>yeah that would work
14:32:00<balrog>some of these were preemptive captures; others were sites that have disappeared
14:32:00<SketchCow>Everyone loves the Jason Does Everything option!
14:33:00<SketchCow>It always wins the vote
14:33:00<SketchCow>RESULTS OF VOTE: FOR: 34 AGAINST: 1
14:33:00<balrog>well, I don't mind making a text file explaining what is what
14:33:00<balrog>I just want these in more than one place, mainly.
14:33:00<balrog>some were because of the timeframe — for example, I captured familyradio.com a few days before May 21, 2011, when they predicted the world would end. lol
14:34:00<balrog>I think stuff like that belongs in the archives :)
14:34:00<norbert79>Where I live we are around 30 years behind so I am not worrying
14:39:00<godane>uploaded: https://archive.org/details/G4.E3.10.Live.Day.1.DSR.XviD-SYS
14:40:00<godane>the rest of day 1 live coverage is with the live ea and ubisoft spotlight
14:41:00<godane>over 4 hours of video with that one
14:43:00<godane>SketchCow: i uploaded the slax.org forums today
14:43:00<godane>that was 400+ warc.gz
15:28:00<underscor>(just curious)
15:28:00<underscor>SketchCow: Any reason you use wget instead of lftp mirror?
15:28:00<balrog>both ways work. afaik wget does not work for foreign-charset ftp servers though
15:28:00<balrog>that's something to note
15:29:00<underscor>Yeah
15:29:00<underscor>Also, I like the parallelization of lftp's miror
15:29:00<underscor>mirror*
15:30:00<balrog>yes, though some servers don't like that
15:31:00<balrog>there are two types of parallelization it can do
15:31:00<balrog>multipart or multifile or both
15:54:00<SketchCow>1. Servers hate parallelization, especially old ones we're pulling massive copies from
15:54:00<SketchCow>2. I've used WGET for half your life.
15:54:00<SmileyG>if a server doesn't like multifile, then it deserves punching
15:55:00<SketchCow>Old ones.
15:55:00<SketchCow>Creaky end-of-life places that might crash or give up when you suck the entire contents out
15:56:00<SketchCow>I'm grabbing some amazing amount of crap out of this ftp site.
16:02:00<underscor>balrog: I always just do multifile
16:02:00<underscor>Too many misbehaving FTP servers for multipart
16:15:00<balrog>underscor: yup
16:22:00<SketchCow>52222222222222222222222222222222
16:23:00<SketchCow>says cat
16:23:00<SmileyG>hey socks
16:37:00<SketchCow>This was Jetta
16:37:00<SketchCow>The Angriest Cat in the world
16:38:00<SketchCow>When there's food available, she literally mugs me
16:38:00<SketchCow>Like, tries to trip me to make me fall
16:38:00<SketchCow>She's the meanest cat in the planet
16:38:00<SketchCow>She's an outdoor cat except when we hit frost point
16:38:00<SketchCow>So normally she just murders every living thing for a mile around the house
16:40:00<balrog>wow...
16:41:00<SmileyG>lol
16:41:00<balrog>oh yeah, the archives I'm uploading include a copy of ftp.funet.fi from 2008
16:41:00<SmileyG>i have a cute cat who loves his food
16:41:00<SmileyG>he'll just grab on to your leg
16:41:00<SmileyG>however hes old and loosing the ability to retract his claws
16:41:00<SmileyG>and as he doesn't go out much...... - ow.
16:42:00<DFJustin>balrog: sweet
17:22:00<chazchaz>After all this stuff is getting uploded, how does it get accessed? Is it all on the Wayback machine?
17:22:00<chazchaz>*is uploaded
17:35:00<DFJustin>https://archive.org/details/archiveteam
17:42:00<balrog>anyone here with IA edit privs?
17:42:00<balrog>https://archive.org/details/archiveteam-mobileme-hero should say .Mac not .me
19:50:00<balrog>http://www.petapixel.com/2012/11/20/photo-sharing-app-color-shutting-down-sold-for-7m-after-raising-41m/
19:50:00<balrog>the amount of buzz when this thing launched was insane
19:53:00<godane>balrog: this can backed up thanks to videos being hosted directly there
19:54:00<godane>look for a page with video player then search mp4
19:56:00<SketchCow>------------------------------------------
19:57:00<SketchCow>Here's your Delightful Statistic of the Day
19:57:00<SketchCow>size: 319,301,465,472 KB
19:57:00<SketchCow>Archive Team: 320 Terabytes of Data
19:57:00<SketchCow>You're Welcome, Internet
19:57:00<SketchCow>------------------------------------------
19:57:00<godane>i hope save 1/320 of that
19:57:00<godane>*helped
20:02:00<nitro2k01>all_logs_from_all_webservers_2012.tar.xz
20:42:00<SketchCow>https://twitter.com/archiveteam/status/270990234945200128 zing
20:49:00<ersi>niceness
21:05:00<ersi>"Alert: We hope you've enjoyed sharing your stories via real-time video. Regretfully, the app will no longer be available after 12/31/2012." - Color.com