#archiveteam<efnet> log for 2012-10-17

Home Search Previous day Next day

03:13:00	<dragondon>	Greetings all. Getting a curl error "curl: (22) The requested URL returned error: 500". whole error here http://pastebin.com/scMx6bkf
03:15:00	<dragondon>	seeing a bunch of those but it seems that eventually the upload does happen.
03:16:00	<underscor>	dragondon: one (or more) of the upload endpoints is full
03:16:00	<underscor>	I'll be fixing it asap
03:16:00	<underscor>	Uploads should still work eventualyl
03:16:00	<dragondon>	ok, cool.
03:16:00	<underscor>	as they're roundrobined between boxes in the cluster
03:16:00	<dragondon>	yeah, that's what I did see from others.
03:16:00	<underscor>	eventually*
03:18:00	<nintendud>	bah. VirtualBox won't start up my warrior VM, or a newly downloaded warrior VM.
03:20:00	<Cameron_D>	What error>
03:21:00	<nintendud>	NS_ERROR_FAILURE
03:21:00	<nintendud>	"VBoxManage: error: The virtual machine 'archiveteam-warrior-2_1' has terminated unexpectedly during startup with exit code 0"
03:21:00	<flaushy>	can i play around with the timeouts?
03:22:00	<nintendud>	Doesn't seem to be particularly obvious. I guess this error covers a wide range of possible issues.
03:22:00	<Sue>	nintendud: are you trying to run without X
03:22:00	<nintendud>	Sue: yup
03:22:00	<nintendud>	The warrior worked before.
03:22:00	<Sue>	VBoxHeadless
03:22:00	<nintendud>	Oh. Crpa.
03:23:00	<nintendud>	Crap*
03:23:00	<Sue>	VBoxManage dies without X unless you specify headless or start with VBoxHeadless
03:23:00	<nintendud>	I forgot that was the command.
03:23:00	<Sue>	don't forget to start with &
03:23:00	<Sue>	i had that same problem at first
03:24:00	<nintendud>	yeah, I have it running in screen
03:25:00	<underscor>	Why not just run the pipeline outside?
03:25:00	<nintendud>	Sue: thanks for the help. herp derp on my end.
03:26:00	<Sue>	underscor: fun; nintendud: np
05:09:00	<dragondon>	Umm, "No item received. Retrying after 30 seconds..." and "Retrying CurlUpload for Item gourmetsexpress after 30 seconds..." are all I am getting now
05:10:00	<dragondon>	4 workers are getting "No item" and two are "retrying"
05:13:00	<dragondon>	restarted VM, now all are 'retrying"
05:16:00	<dragondon>	that was for BT Internet homepages
05:16:00	<dragondon>	switched back to Webshots and downloading data now just fine
05:33:00	<NovaKing>	apparently some servers full
05:33:00	<NovaKing>	and they working on it
07:37:00	<SmileyG>	[04:23:22] < Sue> don't forget to start with &
07:37:00	<SmileyG>	if you forget, ctrl z, then bg, then disown
08:44:00	<Nemo_bis>	anyone able to kill http://www.us.archive.org/log_show.php?task_id=124346847 here?
08:47:00	<alard>	There's a timeout 87457739 in there.
08:47:00	<alard>	(Is quite a long time, probably.)
08:53:00	<Cameron_D>	That is only 1012 days
08:55:00	<SmileyG>	lol
09:27:00	<Nemo_bis>	it wasn't enough last time
09:27:00	<Nemo_bis>	[ PDT: 2012-10-16 16:16:59 ] Executing: timeout 87457739 python /usr/local/petabox/sw/books/ol_search/solr_post.py 'EB1911WMF' '/var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml'
09:28:00	<Nemo_bis>	sorry [ PDT: 2012-09-28 04:45:24 ] Executing: timeout 87457739 python /usr/local/petabox/sw/books/ol_search/solr_post.py 'EB1911WMF' '/var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml'
09:29:00	<Nemo_bis>	[...] nice /usr/local/petabox/deriver/derive.php /var/tmp/autoclean/derive/EB1911WMF [...] failed with exit code: 9 [...] TASK FAILED AT UTC: 2012-10-02 18:11:37
09:30:00	<Nemo_bis>	underscor?
09:32:00	<underscor>	Nemo_bis: Why do you need it killed?
09:38:00	<Nemo_bis>	underscor: because it will fail surely
09:39:00	<Nemo_bis>	and I want to update the images split in volumes now, so that it will work
09:39:00	<Nemo_bis>	*upload
09:41:00	<underscor>	ah
09:41:00	<underscor>	okay
09:41:00	<underscor>	well, I'll kill it :P
09:41:00	<Nemo_bis>	underscor: thanks
09:42:00	<underscor>	Interrupting task for task_id: 124346847 1 derive.php SERVER iw600709.us.archive.org USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 5646 82.8 1.1 750384 414324 ? RN Oct16 517:23 python /usr/local/petabox/sw/books/ol_search/solr_post.py EB1911WMF /var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml KILLING 5646
09:42:00	<underscor>	cc Nemo_bis
09:47:00	<Nemo_bis>	underscor: thanks
09:49:00	<Cameron_D>	?tf2
09:49:00	<Cameron_D>	oops, wrong chnnel
09:52:00	<Nemo_bis>	hmpf only 650 kB/s upload even from a USA server
10:55:00	<godane>	looks like theblaze tv did some bad audio sync for episode 2012-10-15
10:57:00	<godane>	what is funny is that the real news and wilkow on-demand works fine
11:10:00	<godane>	i found a way to sync it
11:10:00	<godane>	its off by 1.5 seconds
11:48:00	<SmileyG>	ffmpeg will resync stuff...
12:36:00	<Cameron_D>	Hmm, IGN is up for sale and there is some talk of them possibly closing the boards (85.8 million posts), so that might be something to keep an eye on
12:38:00	<balrog_>	:O
12:39:00	<balrog_>	what's a good setting for number of instances when you have 50mbit/25mbit internet speeds?
12:44:00	<Cameron_D>	I'm assuming that is for Webshots, in which case I'm not really sure, I haven't really been able to see how much bandwidth it uses
12:46:00	<alard>	It also depends on the distance to the webshots servers.
12:46:00	<joepie91>	1mbit per thread generally
12:46:00	<balrog_>	yeah, for webshots
12:46:00	<joepie91>	appro
12:47:00	<joepie91>	aprox *
12:47:00	<joepie91>	..
12:47:00	<joepie91>	approx ***
12:47:00	<joepie91>	at least, in my experience
12:49:00	<balrog_>	alard: also how do you stop it again? STOP file in the same dir as the pipeline.py?
12:49:00	<balrog_>	because it seems to be ignoring it
12:49:00	<alard>	It finishes the current jobs first.
12:49:00	<balrog_>	yeah, but it seems to keep doing jobs
12:49:00	<alard>	Can you use the web interface?
12:49:00	<balrog_>	what port does that run on?
12:49:00	<alard>	8001
12:52:00	<Cameron_D>	And I think the STOP file needs to be in the directory you launced it from, not the pipeline directory
12:52:00	<balrog_>	ahh.
12:53:00	<balrog_>	that ... sounds like a possible bug :P
12:56:00	<alard>	That could be.
12:56:00	<alard>	curl -d "" http://127.0.0.1:8001/api/stop works too.
12:56:00	<_case>	if anyone feels like pondering a wget question re: hotlinked page requisitesâ¦ http://stackoverflow.com/questions/12934528/recursive-wget-with-hotlinked-requisites
12:58:00	<alard>	Have you tried --no-parent? (Just a guess.)
13:03:00	<balrog_>	yeah, and now I'm killing disk IO
13:23:00	<dragondon>	is it safe to go back working on the BT project yet?
13:27:00	<Cameron_D>	Is there anything left to do?
13:40:00	<alard>	No, BT is done, that is: we need more usernames.
13:41:00	<dragondon>	ok, stopped webshots projct, started BT :)
13:42:00	<alard>	dragondon: There's nothing to do there. :)
13:42:00	<dragondon>	huh? all done?
13:42:00	<alard>	We've worked through our list of usernames.
13:43:00	<alard>	There might be users that are not on our list, but we'll have to discover those usernames first. That isn't done by the warrior.
13:46:00	<dragondon>	oh, that's what you meant. I thought you meant you needed more usernames completed.
14:00:00	<SmileyG>	no work for happy workers :(
18:39:00	<alard>	Webshots numbers: S[h]O[r]T has uploaded 100,000 items; we've uploaded 20,000 GB. Hurray!
18:51:00	<[1]deathy>	I would give S[h]O[r]T the internet as a prize, but apparently he can download it by himself ...
18:59:00	<joepie91>	haha
21:03:00	<balrog_>	ugh my warrior vm crashed
22:18:00	<SketchCow>	We need more usernames. It can't be so many.
22:25:00	<arkhive>	http://news.cnet.com/8301-1023_3-57533820-93/news-corp-puts-ign-entertainment-up-for-auction/
22:25:00	<arkhive>	Probably been linked here.
22:26:00	<arkhive>	The new IGN network will probably shutter/close multiple sites
22:28:00	<arkhive>	ah. just read above.
22:37:00	<SketchCow>	https://docs.google.com/a/textfiles.com/spreadsheet/ccc?key=0ApQeH7pQrcBWdDZIUEVjR3d1UmRoU0lPSWZYX0Q1Ync#gid=0
22:37:00	<SketchCow>	Watch as I do final signoff!
22:37:00	<SketchCow>	Anything with the deep blue on the left is going into wayback!
22:38:00	<joepie91>	SketchCow: what is a MegaWARC?
22:39:00	<SketchCow>	A MegaWARC is a concatenation of warc files, allowing us to put thousands of individual warc grabs as one file.
22:39:00	<joepie91>	I see
22:45:00	<arkhive>	SketchCow: I still have MobileMe files that have not been uploaded yet. Problem with my hard drive. I cloned it to another and will finish recovering the files as soon as I can.
22:47:00	<arkhive>	Just letting you know so you don't put up an incomplete copy on WayBack
22:49:00	<arkhive>	I should be able to recover just about all of the files
22:50:00	<arkhive>	But a few might be impossible to retrieve.
22:52:00	<arkhive>	So I apologize in advance for my screw up. :)
22:59:00	<DFJustin>	https://archive.org/details/archiveteam-qaudio-archive-1 etc. not going in?
23:03:00	<SketchCow>	Sure it is.
23:04:00	<SketchCow>	I'm sure some stuff has escaped my gaze, hence my asking people to look over my shoulder at the google doc.
23:04:00	<DFJustin>	also 2-7
23:07:00	<SketchCow>	Right.
23:07:00	<SketchCow>	No, on it.
23:07:00	<SketchCow>	They're all fine, though, they already were working.
23:07:00	<SketchCow>	Now I'm just bundling them.
23:08:00	<SketchCow>	http://archive.org/details/archiveteam-qaudio-archive will have it soon.
23:15:00	<SketchCow>	http://archive.org/details/archiveteam-qaudio-archive now fixed.
23:55:00	<godane>	thanks for putting my isos in the linux format collection

Home Search Previous day Next day