03:13:00 | <dragondon> | Greetings all. Getting a curl error "curl: (22) The requested URL returned error: 500". whole error here http://pastebin.com/scMx6bkf |
03:15:00 | <dragondon> | seeing a bunch of those but it seems that eventually the upload does happen. |
03:16:00 | <underscor> | dragondon: one (or more) of the upload endpoints is full |
03:16:00 | <underscor> | I'll be fixing it asap |
03:16:00 | <underscor> | Uploads should still work eventualyl |
03:16:00 | <dragondon> | ok, cool. |
03:16:00 | <underscor> | as they're roundrobined between boxes in the cluster |
03:16:00 | <dragondon> | yeah, that's what I did see from others. |
03:16:00 | <underscor> | eventually* |
03:18:00 | <nintendud> | bah. VirtualBox won't start up my warrior VM, or a newly downloaded warrior VM. |
03:20:00 | <Cameron_D> | What error> |
03:21:00 | <nintendud> | NS_ERROR_FAILURE |
03:21:00 | <nintendud> | "VBoxManage: error: The virtual machine 'archiveteam-warrior-2_1' has terminated unexpectedly during startup with exit code 0" |
03:21:00 | <flaushy> | can i play around with the timeouts? |
03:22:00 | <nintendud> | Doesn't seem to be particularly obvious. I guess this error covers a wide range of possible issues. |
03:22:00 | <Sue> | nintendud: are you trying to run without X |
03:22:00 | <nintendud> | Sue: yup |
03:22:00 | <nintendud> | The warrior worked before. |
03:22:00 | <Sue> | VBoxHeadless |
03:22:00 | <nintendud> | Oh. Crpa. |
03:23:00 | <nintendud> | Crap* |
03:23:00 | <Sue> | VBoxManage dies without X unless you specify headless or start with VBoxHeadless |
03:23:00 | <nintendud> | I forgot that was the command. |
03:23:00 | <Sue> | don't forget to start with & |
03:23:00 | <Sue> | i had that same problem at first |
03:24:00 | <nintendud> | yeah, I have it running in screen |
03:25:00 | <underscor> | Why not just run the pipeline outside? |
03:25:00 | <nintendud> | Sue: thanks for the help. herp derp on my end. |
03:26:00 | <Sue> | underscor: fun; nintendud: np |
05:09:00 | <dragondon> | Umm, "No item received. Retrying after 30 seconds..." and "Retrying CurlUpload for Item gourmetsexpress after 30 seconds..." are all I am getting now |
05:10:00 | <dragondon> | 4 workers are getting "No item" and two are "retrying" |
05:13:00 | <dragondon> | restarted VM, now all are 'retrying" |
05:16:00 | <dragondon> | that was for BT Internet homepages |
05:16:00 | <dragondon> | switched back to Webshots and downloading data now just fine |
05:33:00 | <NovaKing> | apparently some servers full |
05:33:00 | <NovaKing> | and they working on it |
07:37:00 | <SmileyG> | [04:23:22] < Sue> don't forget to start with & |
07:37:00 | <SmileyG> | if you forget, ctrl z, then bg, then disown |
08:44:00 | <Nemo_bis> | anyone able to kill http://www.us.archive.org/log_show.php?task_id=124346847 here? |
08:47:00 | <alard> | There's a timeout 87457739 in there. |
08:47:00 | <alard> | (Is quite a long time, probably.) |
08:53:00 | <Cameron_D> | That is only 1012 days |
08:55:00 | <SmileyG> | lol |
09:27:00 | <Nemo_bis> | it wasn't enough last time |
09:27:00 | <Nemo_bis> | [ PDT: 2012-10-16 16:16:59 ] Executing: timeout 87457739 python /usr/local/petabox/sw/books/ol_search/solr_post.py 'EB1911WMF' '/var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml' |
09:28:00 | <Nemo_bis> | sorry [ PDT: 2012-09-28 04:45:24 ] Executing: timeout 87457739 python /usr/local/petabox/sw/books/ol_search/solr_post.py 'EB1911WMF' '/var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml' |
09:29:00 | <Nemo_bis> | [...] nice /usr/local/petabox/deriver/derive.php /var/tmp/autoclean/derive/EB1911WMF [...] failed with exit code: 9 [...] TASK FAILED AT UTC: 2012-10-02 18:11:37 |
09:30:00 | <Nemo_bis> | underscor? |
09:32:00 | <underscor> | Nemo_bis: Why do you need it killed? |
09:38:00 | <Nemo_bis> | underscor: because it will fail surely |
09:39:00 | <Nemo_bis> | and I want to update the images split in volumes now, so that it will work |
09:39:00 | <Nemo_bis> | *upload |
09:41:00 | <underscor> | ah |
09:41:00 | <underscor> | okay |
09:41:00 | <underscor> | well, I'll kill it :P |
09:41:00 | <Nemo_bis> | underscor: thanks |
09:42:00 | <underscor> | Interrupting task for task_id: 124346847 1 derive.php SERVER iw600709.us.archive.org USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 5646 82.8 1.1 750384 414324 ? RN Oct16 517:23 python /usr/local/petabox/sw/books/ol_search/solr_post.py EB1911WMF /var/tmp/autoclean/derive-EB1911WMF-AbbyyXML/EB1911_abbyy.xml KILLING 5646 |
09:42:00 | <underscor> | cc Nemo_bis |
09:47:00 | <Nemo_bis> | underscor: thanks |
09:49:00 | <Cameron_D> | ?tf2 |
09:49:00 | <Cameron_D> | oops, wrong chnnel |
09:52:00 | <Nemo_bis> | hmpf only 650 kB/s upload even from a USA server |
10:55:00 | <godane> | looks like theblaze tv did some bad audio sync for episode 2012-10-15 |
10:57:00 | <godane> | what is funny is that the real news and wilkow on-demand works fine |
11:10:00 | <godane> | i found a way to sync it |
11:10:00 | <godane> | its off by 1.5 seconds |
11:48:00 | <SmileyG> | ffmpeg will resync stuff... |
12:36:00 | <Cameron_D> | Hmm, IGN is up for sale and there is some talk of them possibly closing the boards (85.8 million posts), so that might be something to keep an eye on |
12:38:00 | <balrog_> | :O |
12:39:00 | <balrog_> | what's a good setting for number of instances when you have 50mbit/25mbit internet speeds? |
12:44:00 | <Cameron_D> | I'm assuming that is for Webshots, in which case I'm not really sure, I haven't really been able to see how much bandwidth it uses |
12:46:00 | <alard> | It also depends on the distance to the webshots servers. |
12:46:00 | <joepie91> | 1mbit per thread generally |
12:46:00 | <balrog_> | yeah, for webshots |
12:46:00 | <joepie91> | appro |
12:47:00 | <joepie91> | aprox * |
12:47:00 | <joepie91> | .. |
12:47:00 | <joepie91> | approx *** |
12:47:00 | <joepie91> | at least, in my experience |
12:49:00 | <balrog_> | alard: also how do you stop it again? STOP file in the same dir as the pipeline.py? |
12:49:00 | <balrog_> | because it seems to be ignoring it |
12:49:00 | <alard> | It finishes the current jobs first. |
12:49:00 | <balrog_> | yeah, but it seems to keep doing jobs |
12:49:00 | <alard> | Can you use the web interface? |
12:49:00 | <balrog_> | what port does that run on? |
12:49:00 | <alard> | 8001 |
12:52:00 | <Cameron_D> | And I think the STOP file needs to be in the directory you launced it from, not the pipeline directory |
12:52:00 | <balrog_> | ahh. |
12:53:00 | <balrog_> | that ... sounds like a possible bug :P |
12:56:00 | <alard> | That could be. |
12:56:00 | <alard> | curl -d "" http://127.0.0.1:8001/api/stop works too. |
12:56:00 | <_case> | if anyone feels like pondering a wget question re: hotlinked page requisites⦠http://stackoverflow.com/questions/12934528/recursive-wget-with-hotlinked-requisites |
12:58:00 | <alard> | Have you tried --no-parent? (Just a guess.) |
13:03:00 | <balrog_> | yeah, and now I'm killing disk IO |
13:23:00 | <dragondon> | is it safe to go back working on the BT project yet? |
13:27:00 | <Cameron_D> | Is there anything left to do? |
13:40:00 | <alard> | No, BT is done, that is: we need more usernames. |
13:41:00 | <dragondon> | ok, stopped webshots projct, started BT :) |
13:42:00 | <alard> | dragondon: There's nothing to do there. :) |
13:42:00 | <dragondon> | huh? all done? |
13:42:00 | <alard> | We've worked through our list of usernames. |
13:43:00 | <alard> | There might be users that are not on our list, but we'll have to discover those usernames first. That isn't done by the warrior. |
13:46:00 | <dragondon> | oh, that's what you meant. I thought you meant you needed more usernames completed. |
14:00:00 | <SmileyG> | no work for happy workers :( |
18:39:00 | <alard> | Webshots numbers: S[h]O[r]T has uploaded 100,000 items; we've uploaded 20,000 GB. Hurray! |
18:51:00 | <[1]deathy> | I would give S[h]O[r]T the internet as a prize, but apparently he can download it by himself ... |
18:59:00 | <joepie91> | haha |
21:03:00 | <balrog_> | ugh my warrior vm crashed |
22:18:00 | <SketchCow> | We need more usernames. It can't be so many. |
22:25:00 | <arkhive> | http://news.cnet.com/8301-1023_3-57533820-93/news-corp-puts-ign-entertainment-up-for-auction/ |
22:25:00 | <arkhive> | Probably been linked here. |
22:26:00 | <arkhive> | The new IGN network will probably shutter/close multiple sites |
22:28:00 | <arkhive> | ah. just read above. |
22:37:00 | <SketchCow> | https://docs.google.com/a/textfiles.com/spreadsheet/ccc?key=0ApQeH7pQrcBWdDZIUEVjR3d1UmRoU0lPSWZYX0Q1Ync#gid=0 |
22:37:00 | <SketchCow> | Watch as I do final signoff! |
22:37:00 | <SketchCow> | Anything with the deep blue on the left is going into wayback! |
22:38:00 | <joepie91> | SketchCow: what is a MegaWARC? |
22:39:00 | <SketchCow> | A MegaWARC is a concatenation of warc files, allowing us to put thousands of individual warc grabs as one file. |
22:39:00 | <joepie91> | I see |
22:45:00 | <arkhive> | SketchCow: I still have MobileMe files that have not been uploaded yet. Problem with my hard drive. I cloned it to another and will finish recovering the files as soon as I can. |
22:47:00 | <arkhive> | Just letting you know so you don't put up an incomplete copy on WayBack |
22:49:00 | <arkhive> | I should be able to recover just about all of the files |
22:50:00 | <arkhive> | But a few might be impossible to retrieve. |
22:52:00 | <arkhive> | So I apologize in advance for my screw up. :) |
22:59:00 | <DFJustin> | https://archive.org/details/archiveteam-qaudio-archive-1 etc. not going in? |
23:03:00 | <SketchCow> | Sure it is. |
23:04:00 | <SketchCow> | I'm sure some stuff has escaped my gaze, hence my asking people to look over my shoulder at the google doc. |
23:04:00 | <DFJustin> | also 2-7 |
23:07:00 | <SketchCow> | Right. |
23:07:00 | <SketchCow> | No, on it. |
23:07:00 | <SketchCow> | They're all fine, though, they already were working. |
23:07:00 | <SketchCow> | Now I'm just bundling them. |
23:08:00 | <SketchCow> | http://archive.org/details/archiveteam-qaudio-archive will have it soon. |
23:15:00 | <SketchCow> | http://archive.org/details/archiveteam-qaudio-archive now fixed. |
23:55:00 | <godane> | thanks for putting my isos in the linux format collection |