09:20:00 | <kennethre> | is there any way to upload something to upload something to archive.org without creative commons? |
09:21:00 | <DFJustin> | sure |
09:21:00 | <kennethre> | i see, it's optional |
09:27:00 | <kennethre> | there's no generic 'data' category? |
09:27:00 | <kennethre> | has to be audio, movie, or text? |
09:29:00 | <Coderjoe> | you're using the form, aren't you? |
09:29:00 | <kennethre> | yes |
09:29:00 | <kennethre> | is there an api? |
09:29:00 | <kennethre> | sorry, i've never really investigated this before :) |
09:29:00 | <Coderjoe> | there are other categories, just not available through the web form |
09:30:00 | <kennethre> | ah excellent |
09:30:00 | <Coderjoe> | http://archive.org/help/abouts3.txt |
09:30:00 | <kennethre> | oh god, perfect |
09:30:00 | <kennethre> | thank you |
09:32:00 | <kennethre> | i'm building a 'blackbox' system for everything i ever create |
09:32:00 | <kennethre> | and the goal is for it to be as permanent as possible |
09:33:00 | <Coderjoe> | however, unless you are an admin, you can only upload to one of a few collections |
09:33:00 | <kennethre> | Coderjoe: wonder if i can get a collection added for myself |
09:33:00 | <Coderjoe> | (which the web form picked via the category you chose) |
09:34:00 | <kennethre> | that'd be ideal |
09:49:00 | <kennethre> | ideally i'll have a warc for everything too |
09:49:00 | <kennethre> | but we'll see |
10:10:00 | <chronomex> | Coderjoe: you can be added to the approve list for a collection, of course |
10:27:00 | <Nemo_bis> | mediatype can be set to anything by anyone |
10:27:00 | <godane> | i'm starting to hate the speed of ftp |
10:27:00 | <Nemo_bis> | godane: only now? |
10:28:00 | <godane> | it normally works fine |
10:28:00 | <Nemo_bis> | No. It doesn't. |
10:28:00 | <godane> | for me it does |
10:28:00 | <godane> | but ever so often the speed becomes very slow |
10:28:00 | <Nemo_bis> | Maybe you're the only user left. https://archive.org/~tracey/mrtg/ftp.html |
10:29:00 | <Nemo_bis> | Every time a single other person tries to use it, you're both ruined. ;) |
10:29:00 | <Famicoman> | I'm using it |
10:30:00 | <godane> | i'm not that good with the scripting uploads to s3 |
10:30:00 | <Famicoman> | I kept getting errors that the drive was full earler |
10:30:00 | <kennethre> | is there anyone here i should bother for a 'kennethreitz' collection, or should i go through the normal process? |
10:30:00 | <kennethre> | /cc @chronomex |
10:30:00 | <chronomex> | hi |
10:31:00 | <chronomex> | I think underscor or SketchCow are the people to ask |
10:31:00 | <kennethre> | /cc underscor :) |
11:40:00 | <godane> | i think s3 is very slow too |
11:41:00 | <godane> | not just ftp |
11:41:00 | <SketchCow> | What does this collection have? |
11:42:00 | <GLaDOS> | WARCs of everything he's done. |
12:14:00 | <Coderjoe> | what the |
12:14:00 | <Coderjoe> | the ia donate page no longer has the 3-to-1 match blurb |
12:15:00 | <ersi> | That's unfortunate, because maybe there's a few holding out to the absolute last day for some reason |
12:15:00 | <Coderjoe> | the amounts reflect it, and the blog post about it says it goes to the 31st |
12:16:00 | <Coderjoe> | but the progress meter is gone |
12:17:00 | <Famicoman> | maybe the goal was reached? |
12:18:00 | <ersi> | It was lacking 17k yesterday |
12:18:00 | <Famicoman> | ah, doubtful then |
12:55:00 | | SmileyG looks in |
13:13:00 | <kennethre> | SketchCow: i'm working on a continual archive of everything i create, including articles, tweets, photos, music, etc |
13:13:00 | <kennethre> | SketchCow: the plan is to have it back itself up to archive.org in case I have an untimely demise :) |
13:18:00 | <kennethre> | it's coming along quite nicely so far |
13:18:00 | <kennethre> | http://blackbox.kennethreitz.org/records/1e7f3c62-96e4-4be4-a26a-f62c61ce939d |
13:18:00 | <kennethre> | http://blackbox.kennethreitz.org/records/1e7f3c62-96e4-4be4-a26a-f62c61ce939d/download |
14:03:00 | | Nemo_bis has 1200 tasks waiting for admin. :/ |
15:45:00 | <push> | i think web archive should open up old 90s versions of sites, it sucks now that some domains seem to be totally gone due to a NEW robots.txt put on the active site? |
15:45:00 | <ersi> | bla bla bla whine old bla bla |
15:45:00 | <ersi> | It's been iterated over a billion times already. |
15:47:00 | <push> | ah sorry, didnt think about that |
15:48:00 | <ersi> | But I agree that it's unfortunate that some new owner of a domain can make the previous owners data hidden in the Wayback Machine. |
15:49:00 | <ersi> | There's a lot of data public for what I know, look in the crawldata collection @ IA. It's not everything though, I think. And besides, the data will continue to exist - it's just hidden/darkened (until it's public again, if IA undarks or robots.txt goes away) |
15:51:00 | <push> | yeah, theres still a chance to see some of it some time later i guess |
15:51:00 | <push> | it hasnt been a huge thing or anything, only a few sites |
15:51:00 | <ersi> | Yeah, but it comes up so often it makes me almost angry everytime it comes up |
15:52:00 | <push> | i have had a similar reaction :P |
15:52:00 | <ersi> | ^_^ |
15:53:00 | <push> | it's hard to solve though i would think, sometimes a legitimate owner wants to block the whole history and i reckon he should be able to |
15:53:00 | <push> | i think other times they dont even know about IA maybe |
15:54:00 | <push> | some have forbidden everything by default and it seems senseless |
15:54:00 | <ersi> | I know that the Wayback Machine does a HTTP GET on the robots.txt when it's going to serve something from a crawled domain - everytime |
15:54:00 | <push> | ah |
15:55:00 | <ersi> | Maybe I'm wrong, but I have a faint memory of that from fiddling with the code and trying to set Wayback Machine up (http://github.com/internetarchive/wayback/) |
15:57:00 | <push> | guess it can also be tested, i have a couple old domains indexed i could set them up again and do before/after robots.txt |
15:57:00 | <push> | but it does feel that way |
15:57:00 | <push> | it was restrictive just earlier, a site is blocked and i was totally excited to see it |
15:57:00 | <push> | some very old site |
15:57:00 | <push> | brb |
15:57:00 | <push> | ehe |
15:59:00 | <ersi> | Yeah, sucks when you run into the problem |
16:43:00 | <SketchCow> | That's an interesting tactic, kennethre |
16:43:00 | <kennethre> | SketchCow: thanks, i like it more the longer i think about it |
17:47:00 | <tef> | push: archive should have old copies of robots.txt ? |
19:25:00 | <balrog_> | anyone here familiar with archiving yahoo groups? |
19:25:00 | <balrog_> | I found this tool: http://grabyahoogroup.sourceforge.net |
20:12:00 | <balrog_> | it's giving me error 500s though |