03:13:00<dragondon>Greetings all. Getting a curl error "curl: (22) The requested URL returned error: 500". whole error here http://pastebin.com/scMx6bkf
03:15:00<dragondon>seeing a bunch of those but it seems that eventually the upload does happen.
03:16:00<underscor>dragondon: one (or more) of the upload endpoints is full
03:16:00<underscor>I'll be fixing it asap
03:16:00<underscor>Uploads should still work eventualyl
03:16:00<dragondon>ok, cool.
03:16:00<underscor>as they're roundrobined between boxes in the cluster
03:16:00<dragondon>yeah, that's what I did see from others.
03:16:00<underscor>eventually*
03:18:00<nintendud>bah. VirtualBox won't start up my warrior VM, or a newly downloaded warrior VM.
03:20:00<Cameron_D>What error>
03:21:00<nintendud>NS_ERROR_FAILURE
03:21:00<nintendud>"VBoxManage: error: The virtual machine 'archiveteam-warrior-2_1' has terminated unexpectedly during startup with exit code 0"
03:21:00<flaushy>can i play around with the timeouts?
03:22:00<nintendud>Doesn't seem to be particularly obvious. I guess this error covers a wide range of possible issues.
03:22:00<Sue>nintendud: are you trying to run without X
03:22:00<nintendud>Sue: yup
03:22:00<nintendud>The warrior worked before.
03:22:00<Sue>VBoxHeadless
03:22:00<nintendud>Oh. Crpa.
03:23:00<nintendud>Crap*
03:23:00<Sue>VBoxManage dies without X unless you specify headless or start with VBoxHeadless
03:23:00<nintendud>I forgot that was the command.
03:23:00<Sue>don't forget to start with &
03:23:00<Sue>i had that same problem at first
03:24:00<nintendud>yeah, I have it running in screen
03:25:00<underscor>Why not just run the pipeline outside?
03:25:00<nintendud>Sue: thanks for the help. herp derp on my end.
03:26:00<Sue>underscor: fun; nintendud: np
05:09:00<dragondon>Umm, "No item received. Retrying after 30 seconds..." and "Retrying CurlUpload for Item gourmetsexpress after 30 seconds..." are all I am getting now
05:10:00<dragondon>4 workers are getting "No item" and two are "retrying"
05:13:00<dragondon>restarted VM, now all are 'retrying"
05:16:00<dragondon>that was for BT Internet homepages
05:16:00<dragondon>switched back to Webshots and downloading data now just fine
05:33:00<NovaKing>apparently some servers full
05:33:00<NovaKing>and they working on it
07:37:00<SmileyG>[04:23:22] < Sue> don't forget to start with &
07:37:00<SmileyG>if you forget, ctrl z, then bg, then disown
08:44:00<Nemo_bis>anyone able to kill http://www.us.archive.org/log_show.php?task_id=124346847 here?
08:47:00<alard>There's a timeout 87457739 in there.
08:47:00<alard>(Is quite a long time, probably.)
08:53:00<Cameron_D>That is only 1012 days
08:55:00<SmileyG>lol
09:27:00<Nemo_bis>it wasn't enough last time
09:27:00<Nemo_bis>[ PDT: 2012-10-16 16:16:59 ] Executing: timeout 87457739 python /usr/local/petabox/sw/books/ol_search/solr_post.py 'EB1911WMF' '/var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml'
09:28:00<Nemo_bis>sorry [ PDT: 2012-09-28 04:45:24 ] Executing: timeout 87457739 python /usr/local/petabox/sw/books/ol_search/solr_post.py 'EB1911WMF' '/var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml'
09:29:00<Nemo_bis>[...] nice /usr/local/petabox/deriver/derive.php /var/tmp/autoclean/derive/EB1911WMF [...] failed with exit code: 9 [...] TASK FAILED AT UTC: 2012-10-02 18:11:37
09:30:00<Nemo_bis>underscor?
09:32:00<underscor>Nemo_bis: Why do you need it killed?
09:38:00<Nemo_bis>underscor: because it will fail surely
09:39:00<Nemo_bis>and I want to update the images split in volumes now, so that it will work
09:39:00<Nemo_bis>*upload
09:41:00<underscor>ah
09:41:00<underscor>okay
09:41:00<underscor>well, I'll kill it :P
09:41:00<Nemo_bis>underscor: thanks
09:42:00<underscor>Interrupting task for task_id: 124346847 1 derive.php SERVER iw600709.us.archive.org USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 5646 82.8 1.1 750384 414324 ? RN Oct16 517:23 python /usr/local/petabox/sw/books/ol_search/solr_post.py EB1911WMF /var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml KILLING 5646
09:42:00<underscor>cc Nemo_bis
09:47:00<Nemo_bis>underscor: thanks
09:49:00<Cameron_D>?tf2
09:49:00<Cameron_D>oops, wrong chnnel
09:52:00<Nemo_bis>hmpf only 650 kB/s upload even from a USA server
10:55:00<godane>looks like theblaze tv did some bad audio sync for episode 2012-10-15
10:57:00<godane>what is funny is that the real news and wilkow on-demand works fine
11:10:00<godane>i found a way to sync it
11:10:00<godane>its off by 1.5 seconds
11:48:00<SmileyG>ffmpeg will resync stuff...
12:36:00<Cameron_D>Hmm, IGN is up for sale and there is some talk of them possibly closing the boards (85.8 million posts), so that might be something to keep an eye on
12:38:00<balrog_>:O
12:39:00<balrog_>what's a good setting for number of instances when you have 50mbit/25mbit internet speeds?
12:44:00<Cameron_D>I'm assuming that is for Webshots, in which case I'm not really sure, I haven't really been able to see how much bandwidth it uses
12:46:00<alard>It also depends on the distance to the webshots servers.
12:46:00<joepie91>1mbit per thread generally
12:46:00<balrog_>yeah, for webshots
12:46:00<joepie91>appro
12:47:00<joepie91>aprox *
12:47:00<joepie91>..
12:47:00<joepie91>approx ***
12:47:00<joepie91>at least, in my experience
12:49:00<balrog_>alard: also how do you stop it again? STOP file in the same dir as the pipeline.py?
12:49:00<balrog_>because it seems to be ignoring it
12:49:00<alard>It finishes the current jobs first.
12:49:00<balrog_>yeah, but it seems to keep doing jobs
12:49:00<alard>Can you use the web interface?
12:49:00<balrog_>what port does that run on?
12:49:00<alard>8001
12:52:00<Cameron_D>And I think the STOP file needs to be in the directory you launced it from, not the pipeline directory
12:52:00<balrog_>ahh.
12:53:00<balrog_>that ... sounds like a possible bug :P
12:56:00<alard>That could be.
12:56:00<alard>curl -d "" http://127.0.0.1:8001/api/stop works too.
12:56:00<_case>if anyone feels like pondering a wget question re: hotlinked page requisites… http://stackoverflow.com/questions/12934528/recursive-wget-with-hotlinked-requisites
12:58:00<alard>Have you tried --no-parent? (Just a guess.)
13:03:00<balrog_>yeah, and now I'm killing disk IO
13:23:00<dragondon>is it safe to go back working on the BT project yet?
13:27:00<Cameron_D>Is there anything left to do?
13:40:00<alard>No, BT is done, that is: we need more usernames.
13:41:00<dragondon>ok, stopped webshots projct, started BT :)
13:42:00<alard>dragondon: There's nothing to do there. :)
13:42:00<dragondon>huh? all done?
13:42:00<alard>We've worked through our list of usernames.
13:43:00<alard>There might be users that are not on our list, but we'll have to discover those usernames first. That isn't done by the warrior.
13:46:00<dragondon>oh, that's what you meant. I thought you meant you needed more usernames completed.
14:00:00<SmileyG>no work for happy workers :(
18:39:00<alard>Webshots numbers: S[h]O[r]T has uploaded 100,000 items; we've uploaded 20,000 GB. Hurray!
18:51:00<[1]deathy>I would give S[h]O[r]T the internet as a prize, but apparently he can download it by himself ...
18:59:00<joepie91>haha
21:03:00<balrog_>ugh my warrior vm crashed
22:18:00<SketchCow>We need more usernames. It can't be so many.
22:25:00<arkhive>http://news.cnet.com/8301-1023_3-57533820-93/news-corp-puts-ign-entertainment-up-for-auction/
22:25:00<arkhive>Probably been linked here.
22:26:00<arkhive>The new IGN network will probably shutter/close multiple sites
22:28:00<arkhive>ah. just read above.
22:37:00<SketchCow>https://docs.google.com/a/textfiles.com/spreadsheet/ccc?key=0ApQeH7pQrcBWdDZIUEVjR3d1UmRoU0lPSWZYX0Q1Ync#gid=0
22:37:00<SketchCow>Watch as I do final signoff!
22:37:00<SketchCow>Anything with the deep blue on the left is going into wayback!
22:38:00<joepie91>SketchCow: what is a MegaWARC?
22:39:00<SketchCow>A MegaWARC is a concatenation of warc files, allowing us to put thousands of individual warc grabs as one file.
22:39:00<joepie91>I see
22:45:00<arkhive>SketchCow: I still have MobileMe files that have not been uploaded yet. Problem with my hard drive. I cloned it to another and will finish recovering the files as soon as I can.
22:47:00<arkhive>Just letting you know so you don't put up an incomplete copy on WayBack
22:49:00<arkhive>I should be able to recover just about all of the files
22:50:00<arkhive>But a few might be impossible to retrieve.
22:52:00<arkhive>So I apologize in advance for my screw up. :)
22:59:00<DFJustin>https://archive.org/details/archiveteam-qaudio-archive-1 etc. not going in?
23:03:00<SketchCow>Sure it is.
23:04:00<SketchCow>I'm sure some stuff has escaped my gaze, hence my asking people to look over my shoulder at the google doc.
23:04:00<DFJustin>also 2-7
23:07:00<SketchCow>Right.
23:07:00<SketchCow>No, on it.
23:07:00<SketchCow>They're all fine, though, they already were working.
23:07:00<SketchCow>Now I'm just bundling them.
23:08:00<SketchCow>http://archive.org/details/archiveteam-qaudio-archive will have it soon.
23:15:00<SketchCow>http://archive.org/details/archiveteam-qaudio-archive now fixed.
23:55:00<godane>thanks for putting my isos in the linux format collection