#archiveteam<efnet> log for 2013-01-15

Home Search Previous day Next day

00:12:00	<filer>	[16:11:28.004] GET http://aaronsw.archiveteam.org/next-item?r=0.25202059401081345 [HTTP/1.1 500 Internal Server Error 256ms]
00:13:00	<filer>	is this related to the DOS attack?
00:13:00	<Coderjoe>	one per person, iirc
00:13:00	<balrog_>	see -bs
00:13:00	<balrog_>	there's something broken :(
00:13:00	<SketchCow>	He's looking at it
00:13:00	<SketchCow>	We already blew out file handles. :)
00:13:00	<filer>	heh
00:14:00	<SketchCow>	This is a very nice experience test for underscor
00:14:00	<SketchCow>	He's going to learn a lot tonight
00:15:00	<chronomex>	haha :)
00:18:00	<BlueMaxim>	you must be so proud of your star pupil SketchCow
00:19:00	<SketchCow>	Every damned day
00:20:00	<SketchCow>	And you, you're like the Voldemort. I expect you to rise against us from an australian law firm in 2023, having bided your time appropritately
00:21:00	<SketchCow>	New meaning for the term "Kangaroo Court"
00:22:00	<TomRiddle>	actually, one difference is that Voldemort knew what he was doing
00:23:00	<TomRiddle>	comparing your knowledge of computers to mine is like a needle and a haystack
00:23:00	<SketchCow>	Not intially
00:23:00	<SketchCow>	Did you really just say that
00:23:00	<TomRiddle>	...I got it the wrong way around
00:23:00	<TomRiddle>	you know what I meant >_<
00:25:00	<SketchCow>	https://twitter.com/textfiles/status/290975346147340288
00:32:00	<kanzure>	yo
00:33:00	<SketchCow>	WELCOME
00:34:00	<SketchCow>	aaaaand now it's down
00:34:00	<kanzure>	hi ivan, X-Scale, nitro2k01
00:34:00	<kanzure>	jason sent me here
00:34:00	<kanzure>	said something about some infrastructure for rapidly archiving a failing site?
00:35:00	<nitro2k01>	Jason Fucking Scott; Middle name Fucking, hence the capitalization.
00:35:00	<SketchCow>	Fuuuuuuuuuuuuuuuuuuuuuuuuuuuuucking
00:35:00	<SketchCow>	Someone point him to the Warrior
00:35:00	<nitro2k01>	Ywah, isn't the link to it supposed to be in the /topic?
00:36:00	<kanzure>	also this:
00:36:00	<kanzure>	https://groups.google.com/group/science-liberation-front
00:36:00	<kanzure>	i've been working on some mobile app that serves as a proxy for android and iphone that college students run to grab papers
00:38:00	<kanzure>	dunno if you guys would be into that
00:39:00	<BlueMax>	wow there's 100 people in this channel
00:39:00	<kanzure>	nitro2k01: don't i know you from somewhere?
00:40:00	<balrog_>	kanzure: there an irc channel for that?
00:40:00	<balrog_>	also I want a proxy that not only grabs papers
00:42:00	<kanzure>	there's ##hplusroadmap on irc.freenode.net i guess
00:42:00	<kanzure>	we do do-it-yourself biohacking/genetic engineering/dna synthesis/nootropics and things.
00:42:00	<kanzure>	and paperbot, our paper-fetching irc bot
00:43:00	<kanzure>	balrog_: well, we could just deploy a botnet
00:44:00	<kanzure>	unfortunately i'm not as hooked into the android malware scene these days, i have no idea what software would be a good choice
00:44:00	<kanzure>	transproxy doesn't look like what i need, and proxydroid is only for redirecting your outgoing requests (not accepting incoming connections)
00:44:00	<kanzure>	plus proxydriod totally fails to run on android-x86 because it's all armeabi junk
00:44:00	<balrog_>	I'm thinking more of browser plugins
00:44:00	<balrog_>	for desktop browsers
00:45:00	<kanzure>	you're going to run a proxy in a browser plugin?
00:45:00	<balrog_>	no, a browser plugin that just saves viewed PDFs and metadata
00:45:00	<kanzure>	zotero does that already
00:45:00	<balrog_>	or something of that sort
00:45:00	<kanzure>	paperbot is based on a headless version of zotero translators
00:45:00	<kanzure>	https://github.com/zotero/translators
00:45:00	<kanzure>	https://github.com/zotero/translation-server
00:46:00	<kanzure>	however, you have to click 'save'- this could be enabled by default instead and it could be switched to HTTP POST to somewhere
00:46:00	<kanzure>	i think there's also a zotero server for collecting pdfs/bibliographies but i've never used it
00:47:00	<kanzure>	(like for managing a few institutional users)
00:49:00	<kanzure>	is that what you had in mind?
00:51:00	<balrog_>	brb
01:08:00	<GLaDOS>	Liberator is stuck on uploading.
01:08:00	<GLaDOS>	Damn you DoS!
01:12:00	<GLaDOS>	http://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/ Article is up, SketchCow
01:13:00	<kanzure>	neat
01:13:00	<BlueMax>	nice, hope that brings in some attention
01:14:00	<kanzure>	"By running the scriptâwhich is limited to once per browser" what
01:14:00	<kanzure>	that must be a misunderstanding
01:15:00	<balrog_>	that's deliberate
01:15:00	<kanzure>	that's silly
01:15:00	<GLaDOS>	Each browser can run it only once.
01:15:00	<GLaDOS>	It's a memorial more than an archiving effort.
01:15:00	<GLaDOS>	If it were the latter, we would've fired our warriors up.
01:15:00	<kanzure>	do your warriors have access?
01:16:00	<kanzure>	"(they were only dropped this morning)" also a misunderstanding
01:17:00	<GLaDOS>	No, they were.
01:18:00	<kanzure>	i thought that's because they can't go after his estate
01:19:00	<kanzure>	it would be more relevant to report if that /didn't/ happen
01:21:00	<balrog_>	http://www.huffingtonpost.com/2013/01/14/aaron-swartz-stephen-heymann_n_2473278.html?utm_hp_ref=tw
01:28:00	<SketchCow>	chronomex: ping
01:28:00	<chronomex>	pong
01:31:00	<SketchCow>	alard: redis seems to not be working for underscor now
01:31:00	<ex-parrot>	someone has probably already spotted this, but http://aaronsw.archiveteam.org/ just seems to have gone down for me
01:32:00	<chronomex>	we're on it
01:32:00	<SketchCow>	http://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/
01:32:00	<ex-parrot>	cool :) I almost managed to run the bookmarklet before it died :)
01:34:00	<SketchCow>	underscor is here now.
01:34:00	<SketchCow>	Let's sort this
01:35:00	<underscor>	pong
01:41:00	<SketchCow>	underscor: anything else needed?
01:41:00	<chronomex>	we appear to be handling this in -bs
01:41:00	<chronomex>	for better or for worse
01:41:00	<underscor>	No, other than alard's thoughts on what happened
01:41:00	<underscor>	also that
01:41:00	<SketchCow>	Thanks
01:45:00	<DonnchaC>	Hi
01:45:00	<ex-parrot>	fwiw, the JSTOR liberator seems to be sticking at "Asking for next item..." on my machine still. though I am running iceweasel 18, which is probably not well tested
01:46:00	<GLaDOS>	ex-parrot: did you run it successfully before?
01:46:00	<ex-parrot>	GLaDOS: nope, but the site went down at roughly the same instant I tried to run it for the first time, so who knows what state it's in
01:47:00	<GLaDOS>	Hm
01:47:00	<ex-parrot>	I also have Ghostery, AdBlock and NoScript installed which have a tendency to break javascript in unusual ways
01:48:00	<ex-parrot>	disabling them makes no difference
01:49:00	<chronomex>	please hold
01:50:00	<DonnchaC>	I created a crappy PoC maybe a week ago for a bug I saw on SpringerLink. Their "LookInside" functionality just loads up a png of the page with JS.
01:51:00	<DonnchaC>	It turns out the png url is /000.png, /001.png. It is possible to incrementally get each page image, download them into local browser and upload back to a server where they are converted to PDF.
01:52:00	<DonnchaC>	I had created a shitty greasemonkey script for this last week and its available at http://0bin.net/paste/29713b9cbf8d1cd60f3cf07e71757ba429196833#SalSJ4E3+RxzQz15KrnaJ9g6gtUpGXj65YFUIH3rBTw=
01:52:00	<chronomex>	nice
01:53:00	<DonnchaC>	I had a look through the terms and this doesn't appear to be anything explictly restricting, viewing the "preview" in your browser. Obviously it would be possible to expand a similar script to download original PDF instead if available.
01:53:00	<DonnchaC>	I am obviously not condoning the use of this or similar script by anyone to violate any laws in there respective countries.
01:54:00	<chronomex>	wink wink nod nod
01:55:00	<kanzure>	DonnchaC: could you also post that information here? https://groups.google.com/group/science-liberation-front
01:55:00	<kanzure>	i wonder about dumping zotero translators into a greasemonkey csript
01:55:00	<kanzure>	i think the api is different. i haven't used greasemonkey in, gosh, 4 years at least
01:57:00	<kanzure>	also, i have some PoC in the works for removing watermarks from pdfs from publishers. not quite ready yet.. but if we can detect malware in pdf, we can certainly detect watermarks.
01:57:00	<kanzure>	so far i've found that sciencedirect/elsevier/nature publishing group don't seem to add watermarks (confirming via md5sum of the documents from multiple different retrievals on different ezproxy endpoints)
01:57:00	<kanzure>	ieee definitely adds visible watermarks..
02:00:00	<DonnchaC>	RSC journals add visible watermarks around the margins, not sure if they have other watermarks.
02:04:00	<kanzure>	i keep forgetting who it is that adds that entire first page of watermarking
02:04:00	<kanzure>	is it wiley?? i want to say wiley. :(
02:04:00	<kanzure>	anyway the number one problem i am encountering is that i can't pick a reasonable pdf modification library for python
02:04:00	<kanzure>	maybe there's something in pdf.js that could be used
02:04:00	<kanzure>	https://github.com/mozilla/pdf.js
02:06:00	<balrog_>	watermarking is easy to remove from scan-sourced media
02:07:00	<balrog_>	you just extract the images and use them and that's it
02:07:00	<chronomex>	yes
02:09:00	<kanzure>	in pdf it's even easier because they are extra xml attributes in the file (more or less)
02:09:00	<kanzure>	(please don't murder me; i'm not a pdf spec wizard yet)
02:10:00	<kanzure>	xml elements, i mean. not attributes.
02:11:00	<instence>	string him up! PDF wizard mana too low!
02:11:00	<balrog_>	pdf is a messy standard
02:11:00	<balrog_>	a lot of bells and whistles
02:11:00	<balrog_>	I suggest decompressing though if you want to analyze as the first step
02:11:00	<chronomex>	pdf allows all kinds of scary things like embedded flash
02:11:00	<kanzure>	and javascript
02:12:00	<DonnchaC>	Yeah it should be relativily straightforward to remove the copyright strings from a PDF
02:12:00	<kanzure>	no it's not the copyright strings that matter
02:12:00	<DonnchaC>	I have done some playing around with the format before.
02:12:00	<DonnchaC>	(the identifying source strings)
02:12:00	<kanzure>	"Authorized licensed use limited to: University of Texas at Austin. Downloaded on July 22, 2009 at 15:50 from IEEE Xplore. Restrictions apply."
02:12:00	<kanzure>	that shit.
02:12:00	<kanzure>	that shit's gotta go.
02:13:00	<DonnchaC>	Have you go
02:13:00	<kanzure>	http://scholar.google.com/scholar?q=%22IEEE+Xplore.+Restrictions+apply.%22
02:36:00	<tef>	download it twice, from different sources, null out the bits that are different :v
02:37:00	<kanzure>	well
02:38:00	<kanzure>	oen way is to pipe it into ghostscript and just convert it to another format and then back again
02:38:00	<kanzure>	the problem with downloading from multiple sources is that it would require keeping track of which ezproxy servers have access to which publishers
02:38:00	<kanzure>	i mean, that's not a huge problem. it's just annoying.
02:43:00	<DonnchaC>	It is indeed. How extensivily are articles watermarked?
02:43:00	<filer>	I've stripped that message from IEEE Xplore documents before
02:43:00	<filer>	it was plain text inside the PDF
02:44:00	<filer>	so I just replaced it with spaces
02:44:00	<DonnchaC>	Is it just a couple of the big players or are a lot of pubishers doing that?
02:44:00	<kanzure>	manually or with some script?
02:44:00	<chronomex>	hah
02:44:00	<DonnchaC>	Keeping it simple.
02:44:00	<kanzure>	DonnchaC: it's really random, some publishers do others dont
02:44:00	<filer>	yeah, that was my experience
02:44:00	<kanzure>	i really want to write up a quick script to do it though
02:44:00	<chronomex>	there are lots of sneakier ways you can watermark a pdf, but I haven't heard of them in use yet
02:45:00	<filer>	once I realized it was a fixed string, I think I just used sed
02:45:00	<kanzure>	chronomex: yeah, i think we should be out looking for them, but for now we shouldn't assume they are being that sneaky
02:45:00	<DonnchaC>	Most watermarks will probably just be a plaintext tag in the PDF.
02:45:00	<chronomex>	yes
02:45:00	<filer>	but yeah, get a couple copies and cmp
02:45:00	<kanzure>	sed works if you know the string in advance
02:45:00	<kanzure>	i think what we need is a simple script that has a list of regexes
02:45:00	<DonnchaC>	I suppose it will be an arms rase, they will only advanced to sneaker techniques when there is mass sharing and watermark removal
02:45:00	<chronomex>	yes
02:46:00	<kanzure>	the zotero team has proved that we can win the arms race
02:46:00	<kanzure>	scrapers break -> they fix within 24 hours
02:46:00	<filer>	well, if you get two copies through two different netblocks and the diffs are simple, then you have a profile for that particular publisher
02:46:00	<chronomex>	also who cares about "hey this document was contributed to the public domain by this cool person on tuesday jan 15 2013"
02:46:00	<filer>	also that
02:46:00	<kanzure>	well, sometimes it includes an ip address
02:46:00	<chronomex>	I don't mind
02:47:00	<kanzure>	you would mind if you are downloading in mass
02:47:00	<kanzure>	suddenly your professor gets the blame because you were in his lab for whatever reason
02:47:00	<chronomex>	IA's scans of books include the library's stickers
02:47:00	<DonnchaC>	You would want something that fails safe? If no matching watermark is found for a site you know watermarks documents. You probably don't want that shared in case there is a new form of watermark, potentionally getting someone in journal
02:48:00	<kanzure>	especially if the document is redistributed
02:48:00	<chronomex>	NOBODY IS GOING TO JAIL FOR PUBLIC DOMAIN WORKS
02:48:00	<kanzure>	the last thing you want to do is get some poor bastard blamed for a pdf or some shit
02:48:00	<chronomex>	NOT UNDER MY WATCH
02:52:00	<DonnchaC>	Unfortunatly if there is large scale information liberation and redistribution they will target the small guys and whoever they can get
02:52:00	<filer>	for distributing public domain materials?
02:52:00	<kanzure>	haha, no not public domain
02:53:00	<adamcaudi>	It's less about the distrobution than it is about accessing them to distribute
02:57:00	<kanzure>	well, proxies are very easy to deploy. i should go writeup my mobile proxy idea somewhere.
02:59:00	<filer>	not sure what sense of "mobile" you're referring to, but I have a stack of $20 TP-Link TL-WR703N OpenWRT-compatible routers with USB ports
02:59:00	<filer>	easy to velcro to things, heh
02:59:00	<adamcaudi>	The 703N is great, had lots of fun with those
03:00:00	<ex-parrot>	I wish they were easier to solar power
03:00:00	<kanzure>	filer: i mean for students to run on their phone while htey are on campus
03:00:00	<kanzure>	browser extensions are cool but phones are always on
03:00:00	<filer>	ah
03:00:00	<kanzure>	just think how much battery life you could potentially be draining!
03:01:00	<filer>	I wonder how long one of those routers could run on a cheap battery
03:01:00	<filer>	I think they consume something like 100mw
03:01:00	<adamcaudi>	ex-parrot, I just embedded one in a power strip - hidden and hard-wired for power
03:01:00	<filer>	nice
03:01:00	<ex-parrot>	that's genius, assuming you're installing it inside :)
03:02:00	<ex-parrot>	and assuming the switching PSU small enough to fit inside a power strip is also well made enough not to catch fire after a while :/
03:02:00	<kanzure>	i keep forgetting the name of that really cheap board that you rop into a powerstrip
03:02:00	<kanzure>	*drop
03:02:00	<filer>	someone else I talked to had such an idea, but at the time there weren't routers that were tiny enough
03:02:00	<ex-parrot>	filer: check dealextreme, there are versions which have a built in battery already. I did some numbers on trying to solar power them but it didn't look too practical
03:02:00	<kanzure>	it was basically a linux server that was powered by cat5 or something
03:02:00	<kanzure>	am i making this up?
03:03:00	<ex-parrot>	shivaplug?
03:03:00	<filer>	*sheeva
03:04:00	<filer>	the SheevaPlug is cool, but more powerful and more expensive
03:04:00	<filer>	I have one router that is actually the size of an iphone charger
03:04:00	<filer>	unfortunately, I think it must use some RTOS
03:05:00	<adamcaudi>	ex-parrot, https://twitter.com/adamcaudill/status/227249569765916672
03:05:00	<filer>	oh, don't thost just have tp-links inside?
03:06:00	<ex-parrot>	very nice adamcaudi, certainly better than the overpriced govt engineered one doing the rounds a few months ago
03:06:00	<adamcaudi>	That's what inspired it :)
03:06:00	<filer>	oh yeah http://www.minipwner.com/index.php/minipwner-build
03:07:00	<adamcaudi>	Actually talked to the guy that designed the $1300 version - I don't think he realized just how close you could get for $50
03:07:00	<chronomex>	ha
03:09:00	<ex-parrot>	you could have built it 5+ years ago I guess, a gumstix would fit and they have had low end units which definitely came in at < $1300
03:10:00	<filer>	having a $20 router with USB definitely helps though
03:10:00	<chronomex>	filer: speaking of, mind if I swing over in 20?
03:10:00	<ex-parrot>	they are great. I have a few here for various projects. friend is using them as radio modules for robotics control
03:10:00	<chronomex>	might want to unload a TPlink from you
03:10:00	<filer>	no problem
03:11:00	<chronomex>	coolz
03:13:00	<adamcaudi>	Have you seen thegrugq's PORTAL project? It's a 703N that routes everything over TOR
03:14:00	<chronomex>	neat
03:15:00	<filer>	cool, I've wanted to have something like that
03:15:00	<filer>	glad to know someone's already made it, saves me work :)
03:18:00	<kanzure>	blast from the past:
03:18:00	<kanzure>	https://groups.google.com/forum/?fromgroups=#!topic/diybio/SFuyGIAt74k
03:18:00	<kanzure>	this was from when aaronsw was starting the getarticles group
03:21:00	<execute>	why don't you start publishing the aaronsw documents as torrents, and distribute the torrents magnet links via an RSS feed? I think a lot of people would subscribe their torrent clients to the that feed and help store and distribute it
03:22:00	<kanzure>	because nobody seeds
03:22:00	<kanzure>	library genesis did that, and nobody fucking seeds it
03:22:00	<kanzure>	http://libgen.net/
03:22:00	<execute>	ah, well, that sucks
03:23:00	<kanzure>	it's probably the greatest dump of ebooks and academic articles ever
03:47:00	<kanzure>	hmm there's a zotero plugin that is supposed to autosave pdfs when you browse to a page
03:47:00	<kanzure>	(according to zotero's maintainer)
03:47:00	<kanzure>	but he left in a cloud of smoke and now i'm not sure what he is talking about. any ideas?
03:56:00	<lithiumg>	greetings everyone
03:58:00	<lithiumg>	does a scripted version of jstor liberator exist?
04:00:00	<kanzure>	there's a springerlink version, https://groups.google.com/group/science-liberation-front/t/d6bb86b96de8c6a6
04:00:00	<kanzure>	if that's what you mean?
04:01:00	<lithiumg>	i was hoping to find a bash/perl/python/etc version
04:02:00	<lithiumg>	I've got a few linux boxes scattered about that I'd like to toss at it
04:02:00	<kanzure>	they seem to only accept one article per user, it's limited on the server end
04:03:00	<lithiumg>	I haven't seen that limit
04:03:00	<kanzure>	oh, maybe it's on the client side neat
04:03:00	<kanzure>	you guys all lied to me
04:04:00	<lithiumg>	someone on the internet lied to you?
04:06:00	<kanzure>	hmm multiple people have asked me to change the name of that mailing list, any suggestions?
05:01:00	<kanzure>	auto-save plugin for zotero https://groups.google.com/group/science-liberation-front/t/2b3b468fca63a6b2
05:12:00	<SketchCow>	OK, I've piled all the godane material into collections n' crap
06:02:00	<ersi>	Whoa.
06:13:00	<mjb_b>	greetings. sorry if this has been asked 8971236497861 times already, but what's the status of getting the JSTOR liberator bookmarklet working again?
06:14:00	<mjb_b>	and is there anything I can do to help diagnose?
06:15:00	<GLaDOS>	mjb_b: http://aaronsw.archiveteam.org/
06:15:00	<GLaDOS>	It's working, you may only use it once though.
06:18:00	<slythfox>	I'd be neat if there was a system for people to suggest articles for others to liberate, thus further encouraging the one-per-browser?
06:20:00	<SketchCow>	Our admin is either asleep or broken
06:21:00	<tsp_>	Why is it only one per browser? JSTOR limitation?
06:22:00	<chronomex>	policy choice
06:24:00	<mjb_b>	it didn't work even once for me - on win7, with chrome
06:25:00	<mjb_b>	the next-item GET hangs
06:25:00	<mjb_b>	I think because something goes wrong with the frameset creation
06:25:00	<mjb_b>	the jstor document doesnt get put into the lower frame
06:27:00	<mjb_b>	the part of the script that tries to put it into the lower frame is resulting in an immediately canceled GET, according to the network tab in the developer tools
06:28:00	<mjb_b>	aaronsw.archiveteam.org homepage keeps showing the same most recently liberated doc...nothing liberated for a while...
06:29:00	<chronomex>	immediately canceled GET may be symptomatic of needing an Access-Control-Allow-Origin HTTP header
06:30:00	<slythfox>	Try running it in --disable-web-security (chrome) ?
06:30:00	<kanzure>	my favorite option
06:30:00	<chronomex>	--enable-surprise-buttsex
06:30:00	<kanzure>	chronomex: do you guys mind me linking to science-liberation-front?
06:31:00	<kanzure>	chronomex: yes that is the correct reading of that option
06:31:00	<chronomex>	what's SLF, kanzure?
06:31:00	<Cameron_D>	https://groups.google.com/group/science-liberation-front/ this?
06:31:00	<SketchCow>	chronomex: Things stopped woking - could you check the box?
06:32:00	<kanzure>	chronomex: https://groups.google.com/group/science-liberation-front/t/2b3b468fca63a6b2
06:32:00	<kanzure>	and so on
06:32:00	<kanzure>	chronomex: just grouping together some peeps who want to work on crawlers and things
06:32:00	<chronomex>	kanzure: archiveteam welcomes inbound links from all comers
06:33:00	<chronomex>	SketchCow: I don't see anything wrong.
06:33:00	<kanzure>	ok. because when people are wondering about why you guys don't want more than 1 document per person, i feel sort of compelled to point out that there are others who would want more by linking to that. heh.
06:33:00	<kanzure>	and it feels disingenous to be spamming your excellent channel
06:33:00	<chronomex>	there are moving parts into which I have no visibility and no insight
06:34:00	<mjb_b>	no luck with --disable-web-security. Still hangs with empty lower frame, upper frame "Looking for another liberated item"
06:34:00	<chronomex>	kanzure: if I'm not mistaken, this in particular is about making a statement rather than hoovering JSTOR
06:35:00	<kanzure>	right, but some people might want to do more
06:35:00	<chronomex>	I understnad
06:35:00	<kanzure>	actually i think you guys should probably elaborate on the page itself
06:35:00	<chronomex>	probably, yes
06:36:00	<kanzure>	although that might shoot yourself in the foot. tough call.
06:36:00	<chronomex>	he who shoots from the hip sometimes forgets to point away from foot first
06:36:00	<chronomex>	archiveteam shoots from hip
06:38:00	<SketchCow>	chronomex: Thanks
06:41:00	<SketchCow>	Back and running again
06:42:00	<SketchCow>	He doesn't know how he fixed it
06:42:00	<SketchCow>	He logged in and it just worked
06:42:00	<chronomex>	sometimes you just need to kick something
06:42:00	<chronomex>	that's always the scariest kind of fix
06:43:00	<filer>	yayyyy
06:43:00		filer has successfully contributed
06:46:00	<mjb_b>	yes! it just worked for me
06:46:00	<chronomex>	\o/
06:46:00	<mjb_b>	the homepage is updating like gangbusters too
06:46:00	<filer>	indeed
06:49:00	<filer>	chronomex: apropos of nothing, I noticed recently that the north wall of the gov pubs collection at Suzzallo has many, many red boxes of microcards of parliamentary transcripts or something
06:49:00	<filer>	I wonder if those are online
06:49:00	<chronomex>	hm
06:52:00	<filer>	yes, as a ProQuest service... http://parlipapers.chadwyck.co.uk/marketing/index.jsp ... bleaugh
06:54:00	<filer>	good thing to know that all of those materials dating back to 1688 are safely protected by a paywall
06:55:00	<chronomex>	hooray
06:56:00	<Coderjoe>	btw, the "just liberated" list is showing doubles for me
06:56:00	<chronomex>	looks fine to me, refresh?
06:57:00	<Coderjoe>	hmm
06:57:00	<Coderjoe>	full refresh (or perhaps it was just the second refresh) seems to have fixed it
07:01:00	<Coderjoe>	darn it. my article was just an abstract.
07:01:00	<chronomex>	I got one that was paywalled for $34
07:01:00	<chronomex>	so I tried again
07:02:00	<kanzure>	23:01 <@jblake> scrape liberty from the heels of your oppressors
07:02:00	<kanzure>	23:01 <@jblake> march against the paywalls of injustice!
07:02:00	<chronomex>	Coderjoe: it could be sillier ... http://www.jstor.org/stable/3253788
08:06:00	<filer>	"march against the paywalls of injustice!"
08:06:00	<filer>	I like this
08:09:00	<kanzure>	filer: join science-liberation-front
08:09:00	<filer>	#?
08:10:00	<kanzure>	filer: it's a mailing list. http://groups.google.com/group/science-liberation-front
08:10:00	<kanzure>	although we have a bunch of people in ##hplusroadmap
08:10:00	<kanzure>	.. kind of a happenstance i guess. maybe a different channel should be used. btw that was freenode.
08:12:00	<filer>	cool, joined
08:12:00	<kanzure>	i am busy poking at a possible ezproxy exploit
08:16:00	<chronomex>	chumby is perhaps closing their remaining assets http://forum.chumby.com/viewtopic.php?id=8457
08:16:00	<chronomex>	I'll fire off a warc
08:18:00	<filer>	ouch, $4300-$5500/mo
08:18:00	<filer>	I wonder how many chumbies that is
08:19:00	<chronomex>	40k chumbies
08:19:00	<chronomex>	it's 11 cents a month per chumby
08:22:00	<Cameron_D>	"3) Find someone in the community to host this forum and the wiki. If you can do this, please contact me." anyone want to offer?
08:47:00	<chronomex>	well I'm sucking down the source code site and the forum
08:48:00	<chronomex>	might as well get the wiki too while I'm at it
08:50:00	<chronomex>	relatively small wiki, 197 pages
09:23:00	<Nemo_bis>	underscor, SketchCow, it's not a "secret" that https://archive.org/details/philosophicaltransactions come from JSTOR, is it?
09:23:00	<Nemo_bis>	(It could be at most a "segreto di pulcinella", as we'd say in Italian.)
09:36:00	<filer>	a secret puffin?
09:42:00	<SketchCow>	It's not a secret.
09:42:00	<SketchCow>	But it's besides the point related to the liberator.
09:45:00	<tef>	SketchCow: so i hav a warc of stuff, which ia collection? yours or brewsters?
09:45:00	<tef>	(sure i've mentioned the content of said warc enough times)
09:45:00	<SketchCow>	I've forgotten
09:46:00	<tef>	ok, I took a crawl of hn front page + articles with a crawler that supports ajax
09:46:00	<tef>	so I stick it in ark-aaronsw or aaronsw
09:46:00	<tef>	assuming it's relevant
09:50:00	<SketchCow>	Yeah
09:50:00	<SketchCow>	Do it for either
09:51:00	<SketchCow>	http://archive.org/details/magazine_rack_misc is fun stuff
09:53:00	<kanzure>	"Hey all, I'm coordinating a series of memorial hackathons for Aaron Swartz. Currently there's going to be one at Noisebridge in SF on Jan. 26 (ish) and another somewhere in Boston, but the more the better."
09:53:00	<kanzure>	"The idea is to bring together people at hackerspaces around the world to work on projects that in some way continue the work that Aaron did to facilitate the sharing of human knowledge, social/political justice, and free culture."
09:53:00	<kanzure>	https://groups.google.com/group/science-liberation-front/t/3d17904bef7759b0
09:55:00	<tef>	ok officially I am too incompetent to use the archive uploader http://archive.org/details/NewsYcFrontpagePlusArticlesThreads
09:55:00	<tef>	batcave was so much easier for my poor brain
10:46:00	<alard>	tef: It needs a different media type, but the files are there.
11:20:00	<Smiley>	https://ia601608.us.archive.org/23/items/NewsYcFrontpagePlusArticlesThreads/ << see :)
13:49:00	<maxigas>	hi, can somebody tell me where are the docs uploaded to http://aaronsw.archiveteam.org/ are available?
13:51:00	<maxigas>	also, i think the counter is restarted every once in a while.
14:18:00	<alard>	I think the counter is missing a zero.
14:19:00	<maxigas>	i saw it go from around 696 to 12 when i refreshed the page after a few minutes.
14:26:00	<alard>	12 was probably 1002, but there's a bug in the code that removes the 00 when it adds a thin space between 1 and 002.
14:26:00	<alard>	It does Math.floor(n/1000) + " " + (n%1000).
14:54:00	<maxigas>	so where are the downloaded dox?
14:55:00	<alard>	I don't know. They're probably not available at the moment, but SketchCow will surely find a way to make them available later.
14:59:00	<maxigas>	hm it undermines the legitimacy of the project a bit... documents should be available shortly after you submitted them.
15:00:00	<maxigas>	i just showed the website to four people and each of them asked about where to find the assembled documents.
15:03:00	<alard>	That may be true, but it's also easier said than done.
15:11:00	<SketchCow>	Awwwwwwww.
15:11:00	<SketchCow>	You know what I love? I mean love?
15:11:00	<SketchCow>	When someone comes to a project and complains.
15:11:00	<SketchCow>	Let's see.
15:11:00	<SketchCow>	The project was launched around 5pm last night.
15:12:00	<SketchCow>	So that's... hmmm, 16 hours or so.
15:12:00	<SketchCow>	We immediately started getting swamped.
15:12:00	<SketchCow>	We dealt with being swamped.
15:12:00	<SketchCow>	So I guess.... well....
15:12:00	<SketchCow>	I know. Fuck you.
15:13:00	<SketchCow>	How long did we spend trying to keep the server up and dealing with DoS attempts and hacking attacks? Probably 8 of those 16 hours.
15:13:00	<SketchCow>	So.... there we go.
15:13:00	<SketchCow>	Morning, alard.
15:14:00	<alard>	Hello. (Afternoon.)
15:17:00	<SketchCow>	Do you want to send underscor a suggestion to fix the counter thing?
15:18:00	<SketchCow>	That'll save him when he wakes up in whatever addled state he does this morning
15:18:00	<alard>	I've done so, before I responded here.
15:19:00	<maxigas>	SketchCow: good point, sorry about that. :/
15:19:00	<balrog_>	another thing for underscor: "millions of dollars in fees" -> "millions of dollars in fines"
15:19:00	<maxigas>	can i help in some way?
15:20:00	<SketchCow>	Yes, you can shut the hell up.
15:22:00	<SketchCow>	(fees/fines, wasn't sure of the best term)
15:22:00	<balrog_>	a fee is something you pay voluntarily, so yeah. it just sounds a bit weird
15:23:00	<SketchCow>	I will not re-iterate to you the circumstances in which I wrote the verbiage.
15:24:00	<SketchCow>	http://archive.org/stream/1975PredictionsInAwakeMagazineByJehovahsWitnesses/1975_Predictions_Awake_Magazine#page/n11/mode/2up
15:24:00	<SketchCow>	Finally, someone speaks the truth
15:25:00	<alard>	maxigas: Perhaps with other ArchiveTeam projects, later, or think of a project of your own. Are you running a warrior yet?
15:25:00	<mistym>	That title design is delightful.
15:26:00	<SketchCow>	It is.
15:26:00	<alard>	That's a question I've often asked myself, so I'm glad to see it answered.
15:26:00	<SketchCow>	As I mentioned last night, I put http://archive.org/details/magazine_rack_misc together, grabbing 100+ orphaned magazines and shoving them in, so it's this pretty crazy bin of magazines, superpamphlets, and screeds.
15:32:00	<SketchCow>	And as many of these items were sitting around in the collection for years, they have ridiculous download stats.
15:40:00	<SketchCow>	http://archive.org/details/thescreensavers - 87 episodes saved
15:40:00	<SketchCow>	Oh, so under "oh no, what the fuck", Myspace is starting to begin to get rid of old profiles.
15:40:00	<SketchCow>	Or, let people voluntarily upgrade to the new format.
15:43:00	<Smiley>	o_O
15:44:00	<Smiley>	my old myspace is so amusing, I linked it on facebook recently :<
16:29:00	<SketchCow>	I'm speaking in Germany first week of February.
16:29:00	<SketchCow>	I just found out they will have a pneumatic tube system active for the event between locations
16:29:00	<SketchCow>	And the orientation letter just let us know to expect capsules to slam into the room during our panels
16:30:00	<mistym>	Maybe the greatest distraction?
16:30:00	<SketchCow>	That's a good question.
16:30:00	<SketchCow>	It may be.
16:32:00	<Smiley>	o_O
16:44:00	<kanzure>	embedding metadata in pdfs https://groups.google.com/group/science-liberation-front/t/b73592f3606b9420
16:47:00	<Smiley>	nice.
16:47:00	<Smiley>	but whats wrong with md5 sums? :D
17:00:00	<kanzure>	Smiley: nothing, i think md5sum is a good idea
17:00:00	<kanzure>	but i am also of the opinion that people should be splicing supplemental material into the pdf
17:00:00	<kanzure>	chances are, if you don't include it in the .pdf itself, it's not going to be distributed when the paper is read/downloaded
17:01:00	<Smiley>	kanzure: a md5sum IS the pdf.
17:01:00	<Smiley>	Hense why it works so well :D
18:12:00	<kanzure>	On Tue, Jan 15, 2013 at 12:10 PM, Piotr Migdal <pmigdal@gmail.com> wrote:
18:12:00	<kanzure>	> "Zorrotero" ;))
18:12:00	<kanzure>	> (Silly remark: anyway, for the "guerrilla" Zotero, a good name is
18:14:00	<X-Scale>	Speaking of "guerrilla" ... http://www.1000manifestos.com/aaron-swartz-the-guerilla-open-access-manifesto/
18:22:00	<kanzure>	X-Scale: yes that was what the reference was to
18:25:00	<kanzure>	retoshare/jstor dump https://groups.google.com/group/science-liberation-front/t/9f6c865cfdb43382?hl=en_US
18:36:00	<balrog_>	kanzure: are you sure that isn't the collection of Philosophical Transactions of the Royal Society papers released by Greg Maxwell?
18:37:00	<kanzure>	balrog_: it seems to include other things, because it's retroshare
18:37:00	<kanzure>	balrog_: but yeah this is useless
19:02:00	<SketchCow>	Journal of Higher Education statement sent.
19:02:00	<SketchCow>	next: Forbes
19:06:00	<kanzure>	link?
19:06:00	<X-Scale>	balrog_: http://h33t.com/torrent/04934029/r-i-p-aaron-swartz-jstor-archive-35gb
19:06:00	<X-Scale>	vs http://thepiratebay.se/torrent/6554331/Papers_from_Philosophical_Transactions_of_the_Royal_Society__fro
19:07:00	<X-Scale>	Seems to be the same package
19:07:00	<kanzure>	man this distribution infrastructure sucks
19:09:00	<X-Scale>	one comment: "Brilliant upload and thoughtful rationale. Many thanks for this. For those who care, a fair amount of the non-PD portions of these journals can be found on rutracker; search for Royal Society."
19:10:00	<balrog_>	hah
19:10:00		balrog_ takes a look
19:10:00	<kanzure>	or on libgen
19:17:00	<SketchCow>	Forbes Guy is Dulllllllllllllllllllllllllllllll
19:26:00	<SketchCow>	Dulllllll
19:26:00	<SketchCow>	He's gone now
19:26:00	<SketchCow>	sooooo dullllllll
19:58:00	<SketchCow>	http://chronicle.com/blogs/profhacker/civil-disobedience-the-aaron-swartz-memorial-jstor-liberator/45397
20:06:00	<chronomex>	"fair and balanced"
20:17:00	<SketchCow>	http://web.archive.org/web/*/http://archive.org is a real thing
20:19:00	<ersi>	We need to go deeper
20:19:00	<SketchCow>	http://media.tumblr.com/tumblr_li3y1guDbS1qbdtco.gif
20:20:00	<ersi>	<3
20:23:00	<ersi>	Hmmmm, the new wayback machine doesn't redirect you to the links you click on. Like clicking on "Webmasters" on the eldest archived version, the address bar is still at archive.org/ then
20:23:00	<Ymgve>	http://web.archive.org/robots.txt
20:23:00	<Ymgve>	you can't go deeper
20:23:00	<ersi>	But at least the new wayback is super duper fast
20:23:00	<ersi>	Ymgve: Aww :<
20:24:00	<Ymgve>	but I wonder what /1 matches
20:24:00	<Ymgve>	oh wait, of course, 199x
20:25:00	<ersi>	http://web.archive.org/web/19980301000000/http://archive.org/get_archived.html
20:31:00	<SketchCow>	OK, so, seriously.
20:31:00	<SketchCow>	archiveteam.org closed wiki. That needs to stop.
20:31:00	<SketchCow>	Can people please help me find working, installable anti-spam measures?
20:31:00	<SketchCow>	And then we'll open it again.
20:31:00	<Nemo_bis>	Yes\|
20:31:00	<SketchCow>	Let's fix this. Today.
20:32:00	<Nemo_bis>	I just did a big research.
20:32:00	<Nemo_bis>	https://www.mediawiki.org/wiki/Thread:Extension_talk:ConfirmEdit/Wikis_account_registration_tour
20:32:00	<SketchCow>	I hope your solution isn't "massive burlap sacks"
20:32:00	<SketchCow>	Because that's your solution to everything
20:32:00	<Nemo_bis>	My solution is copy Arch Wiki
20:33:00	<Nemo_bis>	SketchCow: did you receive the third one?
20:33:00	<SketchCow>	not yet
20:33:00	<SketchCow>	Dude, mail.
20:33:00	<SketchCow>	You sent it in a sack
20:33:00	<Nemo_bis>	That's the only authorised kind of sack btw. I had to fetch it with a bike+train ride 30 km away from home.
20:33:00	<SketchCow>	It might be on another ship
20:33:00	<Nemo_bis>	It comes by plane.
20:33:00	<chronomex>	giant canvas sack?
20:33:00	<SketchCow>	So basically you live on the set of the Godfather's flashbacks
20:33:00	<Nemo_bis>	So they said.
20:33:00	<chronomex>	nice.
20:33:00	<Nemo_bis>	No, it's plastic.
20:33:00	<chronomex>	oh
20:33:00	<chronomex>	:(
20:33:00	<S[h]O[r]T>	What is the output of "date -u +%V`uname`\|sha256sum\|sed 's/\W//g'"?
20:34:00	<S[h]O[r]T>	lol
20:35:00	<balrog_>	979aa183120fc18c292abab0ab967e5bcf132b375f7f8f3283637e6bb10996bb
20:49:00	<DFJustin>	The Archive will provide historians, researchers, scholars, and others access to this vast collection of data (reaching ten terabytes), and ensure the longevity of this information.
20:50:00	<DFJustin>	so that's three orders of magnitude in 16 years
20:52:00		DFJustin awaits the 10 exabyte party
20:54:00	<Nemo_bis>	SketchCow: was the advice enough?
20:55:00	<Nemo_bis>	In short, just use https://www.mediawiki.org/wiki/Extension:ConfirmEdit#QuestyCaptcha
20:55:00	<Nemo_bis>	I'm sure you can find all the special witty questions one might ever need
20:57:00	<Coderjoe>	I would not suggest the uname bit, because that would differ between OSes (obviously)
21:00:00	<Nemo_bis>	Well one does not really have to copy Arch Wiki. :D
21:00:00	<Nemo_bis>	It was just funny
21:01:00	<Nemo_bis>	Of course Arch Wiki must be different from all others. ;)
21:01:00	<Coderjoe>	and stupid
21:01:00	<Nemo_bis>	Why stupid?
21:02:00	<Coderjoe>	what if I am a macos or bsd (or windows) user that happens to be trying out arch on a second machine and want to use my primary system for doing stuff on the wiki (or something)?
21:03:00	<kanzure>	did anyone archive the tweets of @tomjdolan
21:04:00	<balrog_>	kanzure: pull it from google cache
21:04:00	<SketchCow>	http://i.imgur.com/o92sl.gif
21:04:00	<balrog_>	though that's not the whole thing. I've seen it across news sites though
21:05:00	<SketchCow>	Will do, Nemo_bis
21:05:00	<kanzure>	buzzfeed might have it?
21:06:00	<Nemo_bis>	Coderjoe: sure, but most people going there surely have linux.
21:07:00	<Nemo_bis>	Coderjoe: trust me, it's not worse then SpongeBob questions in German. That was impossible and unfair. At least date has man page.
21:18:00	<S[h]O[r]T>	what if it gave you the id of a textfiles tweet and then you had to go copy/paste it :P
21:19:00	<SketchCow>	I really like questycaptcha
21:20:00	<SketchCow>	http://www.nextlevelofnews.com/2013/01/prosecutors-husband-tomjdolan-aaron-swartz-was-offered-a-6-month-deal-by-buzzfeed.html
21:20:00	<SketchCow>	http://topsy.com/twitter/tomjdolan <---- grab that immediately.
21:27:00	<SketchCow>	http://archive.is/ is real
21:28:00	<ersi>	indeed, and it's great.
21:28:00	<ersi>	I wonder who's behind it
21:29:00	<alard>	http://blog.archive.is/post/38139265209/what-will-happen-to-the-data-when-you-shut-the-site
21:30:00	<ersi>	10TB?! :O
21:30:00	<SketchCow>	hahaha
21:30:00	<SketchCow>	Wow, seems kind of sketchy, huh
21:30:00	<tef>	he saves screenshots of the sites too.
21:30:00	<tef>	probably as png.
21:34:00	<ersi>	probably means he's using a browser, right?
21:36:00	<alard>	http://archive.is/clqhG
21:38:00	<ersi>	ah, coolio
21:38:00	<alard>	It doesn't seem to have any plugins. http://archive.is/FfdwK
21:40:00	<ersi>	Wonder if it'll do flash/Java applets any way though
21:41:00	<kanzure>	phantomjs used to do flash :/
21:41:00	<kanzure>	until they removed the plugin
21:53:00	<bsmith094>	anyone archive aaron's blog yet?
22:00:00	<adamc[a]>	bsmith094, pretty sure I saw a couple copies on archive.org
22:00:00	<SketchCow>	https://twitter.com/textfiles/status/291303908205268994
22:01:00	<DFJustin>	https://archive.org/details/www.aaronsw.com-20130112-mirror
22:20:00	<SketchCow>	OK, here we go.
22:27:00	<SketchCow>	Archive Team Wiki is BACK TO NEW USER CREATION
22:27:00	<beardicus>	should i be worried that i'm downloading lots of wikipedia in the yahooblog-grab ?
22:28:00	<beardicus>	seems like it's grabbing urls such as "index.php?title=Special:WhatLinksHere&target=File:Jackie+Chan+2002.jpg.html" when maybe it should just be grabbing hotlinked images?
22:29:00	<alard>	I find the Yahooblogs quite annoying. They're slow, messy, and are they even disappearing?
22:30:00	<ersi>	There seems to be a few asking about that though (yahooblogs grab -> wikipedia)
22:31:00	<adamcaudi>	I just kill the ones that do that - they never seem to make it back out
22:33:00	<alard>	If someone wants to add wikipedia to the reject-regex, here it is: https://github.com/ArchiveTeam/yahooblog-grab/blob/master/pipeline.py#L87-L88
22:35:00	<alard>	At the moment it tries to download every url that ends with an image extension, including these Wikipedia urls.
22:37:00	<beardicus>	alard, these all end in .html though... does the "$" in the regex not mean "end of line"?
22:37:00	<beardicus>	in the accept-regex, that is ^^^^^
22:37:00	<alard>	No, the url ends in .jpg, Wget adds the .html later (the --adjust-extensions option).
22:38:00	<ersi>	Ah
22:38:00	<ersi>	makes perfect sense, or well - it makes sense
22:38:00	<chronomex>	to someone it makes sense
22:38:00	<chronomex>	that's what is important
22:38:00	<alard>	The --adjust-extensions option is handy if you have folders that aren't folders, for the sites where you download a page /a and then later find /a/b and discover that /a should have been a folder.
22:39:00	<alard>	If that makes any sense. :)
22:39:00	<SketchCow>	Who here: 1. Has been around forever 2. I know you 3. Has time to do a VERY boring Wiki thing.
22:39:00	<chronomex>	not I
22:40:00	<beardicus>	aha. thanks alard. i've got some hackerspace time to spend in a few hours... maybe i'll figure out the wizardry needed to blacklist wikipedia images or figure out why it seems to never stop slurping.
22:42:00	<alard>	beardicus: That would be nice. It's a pity that Wget's --span-hosts option doesn't make a difference between urls from image tags and non-image urls.
22:43:00	<beardicus>	seems more a shame that an html page would be served up to a .jpg request :)
22:44:00	<alard>	Perhaps you can convince the mediawiki-people to change their software. :)
22:45:00	<beardicus>	looks like keeping upload.wikimedia would get us the actual jpgs, whereas vi.wikipedia could get the boot.
22:48:00	<ersi>	So, either fix/change a regexp (in yahooblag-grab) or fix/change another regexp (mediawiki is a regexp of regexps)
22:48:00	<ersi>	:D
23:37:00	<t4rx>	is there a way to run ATW in non-virtual environment? my distro is too cool to offer stable and reliable virtualization options...
23:42:00	<t4rx>	oh, found a repo at github, guess i can set up non-virtual environment for it.
23:42:00	<t4rx>	s/a repo/the repo
23:45:00	<balrog_>	t4rx: pip install seesaw
23:45:00	<balrog_>	then clone the repo for the project that you want to participate in
23:45:00	<balrog_>	then run the get wget lua script which will download and compile wget-lue
23:45:00	<balrog_>	lua*
23:46:00	<balrog_>	then run seesaw as follows: run-pipeline ./pipeline.py username
23:46:00	<balrog_>	run-pipeline --help if you want info on parameters
23:50:00	<Nemo_bis>	http://aubreymcfato.com/2013/01/15/how-to-exploit-academics/
23:59:00	<kanzure>	why not just get a mole into elsevier and dump the databases

Home Search Previous day Next day