00:19:00<chronomex>erp
04:06:00<tef>any attempts to archive aaron's stuff yet
04:09:00<kennethre>tef: i'll assist in any way, if needed
04:10:00<tef>i'm firing up my work's crawler atm on news.yc +1
04:17:00<Famicoman>jasons been working on some stuff
04:17:00<Famicoman>godane grabbed a copy of his site
04:17:00<Famicoman>and I'm sure there are other things
04:19:00<tef>ah cool, I was wondering if someone would do that
04:19:00<tef>I did the newsyc for sopa
04:28:00<godane>Famicoman: I got SpikeTV Xbox 360 2011 coverage
04:29:00<Famicoman>noice
04:36:00<godane>its also 720p version
04:37:00<godane>whats funny is the release group calls themself "Aggressive Archive Force"
04:38:00<Famicoman>haha wow
04:38:00<GLaDOS>Heh
04:47:00<SketchCow>OK, now on a proper laptop.
04:47:00<SketchCow>We have some rough stuff.
04:48:00<tef>SketchCow: i'm running a crawl of hn frontpage + all links appearing on it. should have snapshots, and ajax shit too.
04:48:00<tef>in the warcs.
04:48:00<chronomex>great
04:49:00<tef>not sure what to do about twitter
04:49:00<tef>especially #pdftribute
05:02:00<SketchCow>My co-workers and I made duplicate pages.
05:02:00<SketchCow>http://archive.org/details/ark-aaronsw
05:02:00<SketchCow>and
05:03:00<SketchCow>http://archive.org/details/aaronsw
05:06:00<tef>oops
05:07:00<tef>about halfway with the hackernews +1 link
05:13:00<godane>https://www.youtube.com/watch?v=AqZNebWoqnc
05:13:00<godane>that is another video for len sassaman afk
05:30:00<balrog_>SketchCow: I notice that several interesting sections of Aaron's website were removed and blocked with robots.txt in the past but I'm sure you're all aware
05:33:00<tef>ppfft who looks at robots.txts. wimpy crawlers.
05:34:00<Cameron_D>I do, to find bonus things to crawl :3
05:35:00<GLaDOS>"Disallow? Must be some saucy stuff in here.."
05:41:00<balrog_>yeah but some of that was pulled too which means its not in Wayback
05:51:00<godane>starting the upload of sega visions now
06:27:00<godane>some of my items are not showing up
06:31:00<godane>i'm going to bed
06:31:00<godane>hope then internal error stuff goes away
08:11:00<GLaDOS>Aaron Swartz is trending worldwide on twitter.
08:11:00<GLaDOS>Wow.
08:12:00<kennethre>GLaDOS: incredible
08:32:00<GLaDOS>...and he disappears.
11:06:00<SketchCow>Huuuuug
11:09:00<SketchCow>I'm adding a pile of material (Atari, Creative Computing, soritng BITSAVERS)
12:56:00<Nemo_bis>NATO's ftp done: Downloaded: 16099 files, 58G in 11d 12h 4m 0s (61.6 KB/s)
14:50:00<schbiridi>nice, Nemo_bis
16:52:00<emijrp>i haz a script to move videos from youtube to internet archive
16:53:00<emijrp>so, if we are ok with uploading all videos about aaron (including copyright ones) we can proceed...
17:09:00<ersi>IA has some youtube-grabbing infra as well afaik
17:18:00<adamcaudi>Can someone that's a bit more familiar with wget / warc files take a look at this and see if I've done anything stupid? https://gist.github.com/4524708
17:19:00<adamcaudi>It seems right to me, but I'd rather not collect a few GB of mirrors then realize I missed something
17:23:00<balrog_>I hope someone's archiving the current #pdftribute
17:23:00<balrog_>(twitter hashtag)
17:25:00<emijrp>and his tw account?
17:29:00<balrog_>hmm.
17:29:00<balrog_>#pdftribute is various academics posting their papers to be freely available in protest of paywalls
17:45:00<ersi>emijrp: then again, it's always nice to have a copy if you're able to grab
17:46:00<emijrp>i will send the script to Nemo_bis
17:46:00<emijrp>i dont have upload bandwidth for that
17:46:00<emijrp>go go go http://archiveteam.org/index.php?title=Aaron_Swartz
17:46:00<Nemo_bis>emijrp: how many are they?
17:46:00<emijrp>250
17:46:00<Nemo_bis>oh, should be feasible then
17:47:00<Nemo_bis>I don't have much free upload or disk right now
17:47:00<SketchCow>Please grab PDFtributes if possible
17:48:00<emijrp>http://archiveteam.org/index.php?title=Aaron_Swartz/YouTube_videos
17:49:00<emijrp>add links in wiki to the grabs, so we see what is complete
17:49:00<Nemo_bis>emijrp: are you talking to me?
17:49:00<emijrp>no
17:49:00<Nemo_bis>ah ok
17:49:00Nemo_bis waiting for the script
17:49:00<Nemo_bis>if someone else could run it I wouldn't be offended though ^^
17:52:00<emijrp>Nemo_bis: http://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py
17:53:00<Nemo_bis>emijrp: have you updated the collections and so on?
17:53:00<emijrp>no
17:53:00<emijrp>wait..
18:25:00<emijrp>Nemo_bis: try now, read the instructions http://code.google.com/p/emijrp/source/browse/trunk/scrapers/youtube2internetarchive.py
18:26:00<emijrp>save links to videos in download/videostodo.txt
18:26:00<emijrp>and then python youtube2internetarchive.py english all aaronsw
18:29:00<Nemo_bis>'es': {'01':'january', '02': 'february', '03':'march', '04':'april', '05':'may', '06':'june', '07':'july', '08':'august','09':'september','10':'october', '11':'november', '12':'december'}
18:29:00<Nemo_bis>File "youtube2internetarchive.py", line 59
18:29:00<Nemo_bis>^
18:29:00<Nemo_bis>SyntaxError: invalid syntax
18:29:00<godane>this is cool: http://web.archive.org/web/20120911024204/http://www.underground-gamer.com/forums.php?action=viewforum&forumid=40&page=25
18:30:00<emijrp>fixed Nemo_bis
18:30:00<balrog_>godane: why is that on IA?
18:30:00<godane>cause i mirrored it
18:30:00<balrog_>ah.
18:30:00<ersi>awesome ;D
18:30:00<Smiley>it was a int, and now you make it a string?
18:30:00<ersi>godane: Hello there, chris1975
18:31:00<godane>thats my username there
18:31:00<godane>what sucks is i didn't do this to bitgamer fourms
18:32:00<balrog_>:(
18:32:00<Nemo_bis>emijrp: what keys should I use?
18:34:00<balrog_>godane: I wish they had imported the forums to the ug forums
18:34:00<balrog_>would have been nice
18:34:00<emijrp>Nemo_bis: yours?
18:34:00<Nemo_bis>emijrp: what collection is it?
18:35:00<emijrp>aaronsw
18:38:00<Nemo_bis>and everyone can write to it?
18:39:00<emijrp>dont know
18:39:00<emijrp>you can request admin role to SketchCow ?
18:41:00<Nemo_bis>they all seem to be erroring on download
18:41:00<emijrp>update youtube-dl ..
18:42:00<godane>balog_: i'm getting other crap like spiketv video game awards too
18:42:00<godane>found some copys going back to 2008
18:44:00<godane>so there is going to be a spiketv-specials collection in computer and tech videos collections sometime
18:45:00<Nemo_bis>Traceback (most recent call last):
18:45:00<Nemo_bis>File "youtube2internetarchive.py", line 138, in <module>
18:45:00<Nemo_bis>KeyError: 'english'
18:45:00<Nemo_bis>upload_month = num2month[language][json_['upload_date'][4:6]]
18:45:00<Nemo_bis>emijrp: ^
18:46:00<ersi>/query
18:47:00<Smiley>stop trying to convert int to string?
18:47:00<emijrp>Nemo_bis: fixed
18:47:00<emijrp>Smiley: not that
18:48:00<Smiley>D:
18:52:00<Nemo_bis>emijrp: are you adding a keyword?
18:52:00<emijrp>yes... lok the code
18:52:00<Nemo_bis>ok
18:53:00<godane>there is only ~8000 urls from g4tv.com feed to go
18:54:00<godane>*thefeed
18:54:00<Famicoman>godane let me know if you ever find the halo 2 specials done by mtv and spiketv in 2004
18:55:00<Famicoman>Also, I think I have the first spiketv video game awards on vhs somewhere around here
18:55:00<godane>Famicoman: did you get g4 e3 2007 or 2008
18:55:00<godane>i'm also looking for g4 ces from 2008
18:56:00<Famicoman>nah, I haven't found too many g4 specials
18:56:00<godane>what do you have?
18:56:00<Famicoman>I don't know, probably more techtv stuff than anything else
18:57:00<Famicoman>I don't remember where I put it all
18:57:00<godane>whats funny is i have most of that upload to archive.org now
18:57:00<Famicoman>I feel like demonoid had a good amount of g4 stuff before it went down
18:57:00<Famicoman>I think I had G4 comicon coverage for a few years
18:58:00<godane>i have 2011 up and 2012 on my drive
18:58:00<emijrp>Nemo_bis: works fine?
18:58:00<godane>do you have any attack of the shows from 2010?
18:59:00<godane>i have nov and dec of 2010
18:59:00<godane>the full year of 2011
19:00:00<Nemo_bis>emijrp: no
19:00:00<emijrp>lol
19:00:00<emijrp>query me
19:01:00<godane>Famicoman: spiketv halo 2?: http://www.spike.com/full-episodes/blhn9j/gttv-halo-4-season-5-ep-528
19:14:00<godane>now this you would not have without my help: http://web.archive.org/web/20120919075719/http://www.underground-gamer.com/forums.php?action=viewtopic&topicid=742&page=841
19:15:00<godane>its 1500+ page forums from underground gamer in brasil
19:17:00<godane>i am suprise how much i got from ug as far as the site looking the right
19:17:00<godane>*the right way
20:18:00<dashcloud>hi guys, found this: http://pdftribute.net/
20:18:00<dashcloud>someone's getting all the #pdftribute links with papers and collecting them there
20:19:00<dashcloud>here's a second site doing it as well: http://pdftribute.loc-com.de/
20:20:00<tef>nice
20:20:00<dashcloud>and this person: https://twitter.com/thejbf/statuses/290551198757560320 is archiving all of the #pdftribute tweets