01:13:00 | <godane> | ivan`: good news |
01:14:00 | <godane> | we can archive it just by grabbing the html and looking for mp4 |
01:16:00 | <godane> | we have to search with google i think |
01:16:00 | <godane> | site:color.com/profie |
01:17:00 | <godane> | i think jason scott will have to look at this |
01:17:00 | <godane> | but this maybe some help it getting the scripts wrote |
01:20:00 | <godane> | http://www.color.com/profile/fb-10714086302 |
01:21:00 | <godane> | snl had a channel |
07:23:00 | <DFJustin> | http://www.reddit.com/r/Games/comments/13glp8/sega_starts_quietly_removing_youtube_users_videos/ |
07:45:00 | <SketchCow> | We need to push this into archive.org then |
14:21:00 | <SketchCow> | Grabbing a copy of ftp.lotus.com |
14:21:00 | <SketchCow> | Since the reduction of IBM's belief in the brand means that thing could die anytime. |
14:21:00 | <SketchCow> | I think we need to think about some FTP downloading. |
14:21:00 | <SketchCow> | I mean, you know. |
14:22:00 | <balrog> | like, doing it? I could do more FTP downloading myself. Just wondering whether I should continue using lftp mirror, wget, or what :) |
14:22:00 | <balrog> | ftp is a little different from http/html |
14:22:00 | <SketchCow> | I use wget straight on, through UNIX. |
14:22:00 | <SketchCow> | That does a lot of work. |
14:22:00 | <balrog> | with WARC? |
14:23:00 | <balrog> | oh one positive for lftp: it supports foreign charsets, which nearly no other tools do |
14:23:00 | <SketchCow> | No, not with WARC. |
14:23:00 | <balrog> | ok. |
14:23:00 | <SketchCow> | I mean, I don't for this. |
14:23:00 | <SketchCow> | Unless we think it's totally dying. |
14:23:00 | <balrog> | WARC doesn't seem strictly needed for ftp, unless you know something I don't :) |
14:24:00 | <balrog> | I generally use warc for http sites no matter what |
14:25:00 | <balrog> | anyway I have multiple web site archives here I'd like to upload. should I just use that bulk uploader that was linked? |
14:28:00 | <SketchCow> | How many? |
14:28:00 | <balrog> | around 100gb that I have in front of me |
14:28:00 | <balrog> | more that's on other disks |
14:29:00 | <balrog> | 28 large tar.gzs in this folder |
14:29:00 | <balrog> | this was before I started using warc, so it's tar.gz files with index lists |
14:30:00 | <SketchCow> | Two choices. Either I can do it, or you can use scripts I wrote for S3, or you can use FTP. |
14:31:00 | <balrog> | question though, do .tar.gzs like this need any postprocessing? |
14:31:00 | <SketchCow> | Not for this purpose. |
14:31:00 | <SketchCow> | Remixes of this stuff could be done later. |
14:31:00 | <balrog> | ok |
14:32:00 | <balrog> | "Either I can do it" â how? not sure what you mean there |
14:32:00 | <SketchCow> | Wow, ftp.lotus.com is full of K-razy Stuff |
14:32:00 | <SketchCow> | I mean I give you a place to rsync and I do it. |
14:32:00 | <balrog> | yeah that would work |
14:32:00 | <balrog> | some of these were preemptive captures; others were sites that have disappeared |
14:32:00 | <SketchCow> | Everyone loves the Jason Does Everything option! |
14:33:00 | <SketchCow> | It always wins the vote |
14:33:00 | <SketchCow> | RESULTS OF VOTE: FOR: 34 AGAINST: 1 |
14:33:00 | <balrog> | well, I don't mind making a text file explaining what is what |
14:33:00 | <balrog> | I just want these in more than one place, mainly. |
14:33:00 | <balrog> | some were because of the timeframe â for example, I captured familyradio.com a few days before May 21, 2011, when they predicted the world would end. lol |
14:34:00 | <balrog> | I think stuff like that belongs in the archives :) |
14:34:00 | <norbert79> | Where I live we are around 30 years behind so I am not worrying |
14:39:00 | <godane> | uploaded: https://archive.org/details/G4.E3.10.Live.Day.1.DSR.XviD-SYS |
14:40:00 | <godane> | the rest of day 1 live coverage is with the live ea and ubisoft spotlight |
14:41:00 | <godane> | over 4 hours of video with that one |
14:43:00 | <godane> | SketchCow: i uploaded the slax.org forums today |
14:43:00 | <godane> | that was 400+ warc.gz |
15:28:00 | <underscor> | (just curious) |
15:28:00 | <underscor> | SketchCow: Any reason you use wget instead of lftp mirror? |
15:28:00 | <balrog> | both ways work. afaik wget does not work for foreign-charset ftp servers though |
15:28:00 | <balrog> | that's something to note |
15:29:00 | <underscor> | Yeah |
15:29:00 | <underscor> | Also, I like the parallelization of lftp's miror |
15:29:00 | <underscor> | mirror* |
15:30:00 | <balrog> | yes, though some servers don't like that |
15:31:00 | <balrog> | there are two types of parallelization it can do |
15:31:00 | <balrog> | multipart or multifile or both |
15:54:00 | <SketchCow> | 1. Servers hate parallelization, especially old ones we're pulling massive copies from |
15:54:00 | <SketchCow> | 2. I've used WGET for half your life. |
15:54:00 | <SmileyG> | if a server doesn't like multifile, then it deserves punching |
15:55:00 | <SketchCow> | Old ones. |
15:55:00 | <SketchCow> | Creaky end-of-life places that might crash or give up when you suck the entire contents out |
15:56:00 | <SketchCow> | I'm grabbing some amazing amount of crap out of this ftp site. |
16:02:00 | <underscor> | balrog: I always just do multifile |
16:02:00 | <underscor> | Too many misbehaving FTP servers for multipart |
16:15:00 | <balrog> | underscor: yup |
16:22:00 | <SketchCow> | 52222222222222222222222222222222 |
16:23:00 | <SketchCow> | says cat |
16:23:00 | <SmileyG> | hey socks |
16:37:00 | <SketchCow> | This was Jetta |
16:37:00 | <SketchCow> | The Angriest Cat in the world |
16:38:00 | <SketchCow> | When there's food available, she literally mugs me |
16:38:00 | <SketchCow> | Like, tries to trip me to make me fall |
16:38:00 | <SketchCow> | She's the meanest cat in the planet |
16:38:00 | <SketchCow> | She's an outdoor cat except when we hit frost point |
16:38:00 | <SketchCow> | So normally she just murders every living thing for a mile around the house |
16:40:00 | <balrog> | wow... |
16:41:00 | <SmileyG> | lol |
16:41:00 | <balrog> | oh yeah, the archives I'm uploading include a copy of ftp.funet.fi from 2008 |
16:41:00 | <SmileyG> | i have a cute cat who loves his food |
16:41:00 | <SmileyG> | he'll just grab on to your leg |
16:41:00 | <SmileyG> | however hes old and loosing the ability to retract his claws |
16:41:00 | <SmileyG> | and as he doesn't go out much...... - ow. |
16:42:00 | <DFJustin> | balrog: sweet |
17:22:00 | <chazchaz> | After all this stuff is getting uploded, how does it get accessed? Is it all on the Wayback machine? |
17:22:00 | <chazchaz> | *is uploaded |
17:35:00 | <DFJustin> | https://archive.org/details/archiveteam |
17:42:00 | <balrog> | anyone here with IA edit privs? |
17:42:00 | <balrog> | https://archive.org/details/archiveteam-mobileme-hero should say .Mac not .me |
19:50:00 | <balrog> | http://www.petapixel.com/2012/11/20/photo-sharing-app-color-shutting-down-sold-for-7m-after-raising-41m/ |
19:50:00 | <balrog> | the amount of buzz when this thing launched was insane |
19:53:00 | <godane> | balrog: this can backed up thanks to videos being hosted directly there |
19:54:00 | <godane> | look for a page with video player then search mp4 |
19:56:00 | <SketchCow> | ------------------------------------------ |
19:57:00 | <SketchCow> | Here's your Delightful Statistic of the Day |
19:57:00 | <SketchCow> | size: 319,301,465,472 KB |
19:57:00 | <SketchCow> | Archive Team: 320 Terabytes of Data |
19:57:00 | <SketchCow> | You're Welcome, Internet |
19:57:00 | <SketchCow> | ------------------------------------------ |
19:57:00 | <godane> | i hope save 1/320 of that |
19:57:00 | <godane> | *helped |
20:02:00 | <nitro2k01> | all_logs_from_all_webservers_2012.tar.xz |
20:42:00 | <SketchCow> | https://twitter.com/archiveteam/status/270990234945200128 zing |
20:49:00 | <ersi> | niceness |
21:05:00 | <ersi> | "Alert: We hope you've enjoyed sharing your stories via real-time video. Regretfully, the app will no longer be available after 12/31/2012." - Color.com |