00:19:00 | <chronomex> | erp |
04:06:00 | <tef> | any attempts to archive aaron's stuff yet |
04:09:00 | <kennethre> | tef: i'll assist in any way, if needed |
04:10:00 | <tef> | i'm firing up my work's crawler atm on news.yc +1 |
04:17:00 | <Famicoman> | jasons been working on some stuff |
04:17:00 | <Famicoman> | godane grabbed a copy of his site |
04:17:00 | <Famicoman> | and I'm sure there are other things |
04:19:00 | <tef> | ah cool, I was wondering if someone would do that |
04:19:00 | <tef> | I did the newsyc for sopa |
04:28:00 | <godane> | Famicoman: I got SpikeTV Xbox 360 2011 coverage |
04:29:00 | <Famicoman> | noice |
04:36:00 | <godane> | its also 720p version |
04:37:00 | <godane> | whats funny is the release group calls themself "Aggressive Archive Force" |
04:38:00 | <Famicoman> | haha wow |
04:38:00 | <GLaDOS> | Heh |
04:47:00 | <SketchCow> | OK, now on a proper laptop. |
04:47:00 | <SketchCow> | We have some rough stuff. |
04:48:00 | <tef> | SketchCow: i'm running a crawl of hn frontpage + all links appearing on it. should have snapshots, and ajax shit too. |
04:48:00 | <tef> | in the warcs. |
04:48:00 | <chronomex> | great |
04:49:00 | <tef> | not sure what to do about twitter |
04:49:00 | <tef> | especially #pdftribute |
05:02:00 | <SketchCow> | My co-workers and I made duplicate pages. |
05:02:00 | <SketchCow> | http://archive.org/details/ark-aaronsw |
05:02:00 | <SketchCow> | and |
05:03:00 | <SketchCow> | http://archive.org/details/aaronsw |
05:06:00 | <tef> | oops |
05:07:00 | <tef> | about halfway with the hackernews +1 link |
05:13:00 | <godane> | https://www.youtube.com/watch?v=AqZNebWoqnc |
05:13:00 | <godane> | that is another video for len sassaman afk |
05:30:00 | <balrog_> | SketchCow: I notice that several interesting sections of Aaron's website were removed and blocked with robots.txt in the past but I'm sure you're all aware |
05:33:00 | <tef> | ppfft who looks at robots.txts. wimpy crawlers. |
05:34:00 | <Cameron_D> | I do, to find bonus things to crawl :3 |
05:35:00 | <GLaDOS> | "Disallow? Must be some saucy stuff in here.." |
05:41:00 | <balrog_> | yeah but some of that was pulled too which means its not in Wayback |
05:51:00 | <godane> | starting the upload of sega visions now |
06:27:00 | <godane> | some of my items are not showing up |
06:31:00 | <godane> | i'm going to bed |
06:31:00 | <godane> | hope then internal error stuff goes away |
08:11:00 | <GLaDOS> | Aaron Swartz is trending worldwide on twitter. |
08:11:00 | <GLaDOS> | Wow. |
08:12:00 | <kennethre> | GLaDOS: incredible |
08:32:00 | <GLaDOS> | ...and he disappears. |
11:06:00 | <SketchCow> | Huuuuug |
11:09:00 | <SketchCow> | I'm adding a pile of material (Atari, Creative Computing, soritng BITSAVERS) |
12:56:00 | <Nemo_bis> | NATO's ftp done: Downloaded: 16099 files, 58G in 11d 12h 4m 0s (61.6 KB/s) |
14:50:00 | <schbiridi> | nice, Nemo_bis |
16:52:00 | <emijrp> | i haz a script to move videos from youtube to internet archive |
16:53:00 | <emijrp> | so, if we are ok with uploading all videos about aaron (including copyright ones) we can proceed... |
17:09:00 | <ersi> | IA has some youtube-grabbing infra as well afaik |
17:18:00 | <adamcaudi> | Can someone that's a bit more familiar with wget / warc files take a look at this and see if I've done anything stupid? https://gist.github.com/4524708 |
17:19:00 | <adamcaudi> | It seems right to me, but I'd rather not collect a few GB of mirrors then realize I missed something |
17:23:00 | <balrog_> | I hope someone's archiving the current #pdftribute |
17:23:00 | <balrog_> | (twitter hashtag) |
17:25:00 | <emijrp> | and his tw account? |
17:29:00 | <balrog_> | hmm. |
17:29:00 | <balrog_> | #pdftribute is various academics posting their papers to be freely available in protest of paywalls |
17:45:00 | <ersi> | emijrp: then again, it's always nice to have a copy if you're able to grab |
17:46:00 | <emijrp> | i will send the script to Nemo_bis |
17:46:00 | <emijrp> | i dont have upload bandwidth for that |
17:46:00 | <emijrp> | go go go http://archiveteam.org/index.php?title=Aaron_Swartz |
17:46:00 | <Nemo_bis> | emijrp: how many are they? |
17:46:00 | <emijrp> | 250 |
17:46:00 | <Nemo_bis> | oh, should be feasible then |
17:47:00 | <Nemo_bis> | I don't have much free upload or disk right now |
17:47:00 | <SketchCow> | Please grab PDFtributes if possible |
17:48:00 | <emijrp> | http://archiveteam.org/index.php?title=Aaron_Swartz/YouTube_videos |
17:49:00 | <emijrp> | add links in wiki to the grabs, so we see what is complete |
17:49:00 | <Nemo_bis> | emijrp: are you talking to me? |
17:49:00 | <emijrp> | no |
17:49:00 | <Nemo_bis> | ah ok |
17:49:00 | | Nemo_bis waiting for the script |
17:49:00 | <Nemo_bis> | if someone else could run it I wouldn't be offended though ^^ |
17:52:00 | <emijrp> | Nemo_bis: http://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py |
17:53:00 | <Nemo_bis> | emijrp: have you updated the collections and so on? |
17:53:00 | <emijrp> | no |
17:53:00 | <emijrp> | wait.. |
18:25:00 | <emijrp> | Nemo_bis: try now, read the instructions http://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py |
18:26:00 | <emijrp> | save links to videos in download/videostodo.txt |
18:26:00 | <emijrp> | and then python youtube2internetarchive.py english all aaronsw |
18:29:00 | <Nemo_bis> | 'es': {'01':'january', '02': 'february', '03':'march', '04':'april', '05':'may', '06':'june', '07':'july', '08':'august','09':'september','10':'october', '11':'november', '12':'december'} |
18:29:00 | <Nemo_bis> | File "youtube2internetarchive.py", line 59 |
18:29:00 | <Nemo_bis> | ^ |
18:29:00 | <Nemo_bis> | SyntaxError: invalid syntax |
18:29:00 | <godane> | this is cool: http://web.archive.org/web/20120911024204/http://www.underground-gamer.com/forums.php?action=viewforum&forumid=40&page=25 |
18:30:00 | <emijrp> | fixed Nemo_bis |
18:30:00 | <balrog_> | godane: why is that on IA? |
18:30:00 | <godane> | cause i mirrored it |
18:30:00 | <balrog_> | ah. |
18:30:00 | <ersi> | awesome ;D |
18:30:00 | <Smiley> | it was a int, and now you make it a string? |
18:30:00 | <ersi> | godane: Hello there, chris1975 |
18:31:00 | <godane> | thats my username there |
18:31:00 | <godane> | what sucks is i didn't do this to bitgamer fourms |
18:32:00 | <balrog_> | :( |
18:32:00 | <Nemo_bis> | emijrp: what keys should I use? |
18:34:00 | <balrog_> | godane: I wish they had imported the forums to the ug forums |
18:34:00 | <balrog_> | would have been nice |
18:34:00 | <emijrp> | Nemo_bis: yours? |
18:34:00 | <Nemo_bis> | emijrp: what collection is it? |
18:35:00 | <emijrp> | aaronsw |
18:38:00 | <Nemo_bis> | and everyone can write to it? |
18:39:00 | <emijrp> | dont know |
18:39:00 | <emijrp> | you can request admin role to SketchCow ? |
18:41:00 | <Nemo_bis> | they all seem to be erroring on download |
18:41:00 | <emijrp> | update youtube-dl .. |
18:42:00 | <godane> | balog_: i'm getting other crap like spiketv video game awards too |
18:42:00 | <godane> | found some copys going back to 2008 |
18:44:00 | <godane> | so there is going to be a spiketv-specials collection in computer and tech videos collections sometime |
18:45:00 | <Nemo_bis> | Traceback (most recent call last): |
18:45:00 | <Nemo_bis> | File "youtube2internetarchive.py", line 138, in <module> |
18:45:00 | <Nemo_bis> | KeyError: 'english' |
18:45:00 | <Nemo_bis> | upload_month = num2month[language][json_['upload_date'][4:6]] |
18:45:00 | <Nemo_bis> | emijrp: ^ |
18:46:00 | <ersi> | /query |
18:47:00 | <Smiley> | stop trying to convert int to string? |
18:47:00 | <emijrp> | Nemo_bis: fixed |
18:47:00 | <emijrp> | Smiley: not that |
18:48:00 | <Smiley> | D: |
18:52:00 | <Nemo_bis> | emijrp: are you adding a keyword? |
18:52:00 | <emijrp> | yes... lok the code |
18:52:00 | <Nemo_bis> | ok |
18:53:00 | <godane> | there is only ~8000 urls from g4tv.com feed to go |
18:54:00 | <godane> | *thefeed |
18:54:00 | <Famicoman> | godane let me know if you ever find the halo 2 specials done by mtv and spiketv in 2004 |
18:55:00 | <Famicoman> | Also, I think I have the first spiketv video game awards on vhs somewhere around here |
18:55:00 | <godane> | Famicoman: did you get g4 e3 2007 or 2008 |
18:55:00 | <godane> | i'm also looking for g4 ces from 2008 |
18:56:00 | <Famicoman> | nah, I haven't found too many g4 specials |
18:56:00 | <godane> | what do you have? |
18:56:00 | <Famicoman> | I don't know, probably more techtv stuff than anything else |
18:57:00 | <Famicoman> | I don't remember where I put it all |
18:57:00 | <godane> | whats funny is i have most of that upload to archive.org now |
18:57:00 | <Famicoman> | I feel like demonoid had a good amount of g4 stuff before it went down |
18:57:00 | <Famicoman> | I think I had G4 comicon coverage for a few years |
18:58:00 | <godane> | i have 2011 up and 2012 on my drive |
18:58:00 | <emijrp> | Nemo_bis: works fine? |
18:58:00 | <godane> | do you have any attack of the shows from 2010? |
18:59:00 | <godane> | i have nov and dec of 2010 |
18:59:00 | <godane> | the full year of 2011 |
19:00:00 | <Nemo_bis> | emijrp: no |
19:00:00 | <emijrp> | lol |
19:00:00 | <emijrp> | query me |
19:01:00 | <godane> | Famicoman: spiketv halo 2?: http://www.spike.com/full-episodes/blhn9j/gttv-halo-4-season-5-ep-528 |
19:14:00 | <godane> | now this you would not have without my help: http://web.archive.org/web/20120919075719/http://www.underground-gamer.com/forums.php?action=viewtopic&topicid=742&page=841 |
19:15:00 | <godane> | its 1500+ page forums from underground gamer in brasil |
19:17:00 | <godane> | i am suprise how much i got from ug as far as the site looking the right |
19:17:00 | <godane> | *the right way |
20:18:00 | <dashcloud> | hi guys, found this: http://pdftribute.net/ |
20:18:00 | <dashcloud> | someone's getting all the #pdftribute links with papers and collecting them there |
20:19:00 | <dashcloud> | here's a second site doing it as well: http://pdftribute.loc-com.de/ |
20:20:00 | <tef> | nice |
20:20:00 | <dashcloud> | and this person: https://twitter.com/thejbf/statuses/290551198757560320 is archiving all of the #pdftribute tweets |