#archiveteam-bs log for 2021-04-07

Home Search Previous day Next day

00:00:55	<mgrandi>	I see a login page still
00:01:13	<Ryz>	But are there links to account pages with their galleries?
00:01:31	<thuban>	mgrandi: can you rephrase? i don't understand your first message
00:01:48	<@OrIdow6>	https://i6.photobucket.com/albums/y210/Planet-man/NguyensLOTFpart8.jpg still works for me
00:02:21	<Ryz>	OrIdow6, but is the account accessible still? I can browse their galleries?
00:02:37	<mgrandi>	For a DPoS project, if your worker claims a job, you claim it , but then If your worker crashes or encounters an error, it just sits at "out" and someone has to manually requeue it in the interface
00:02:39	<@OrIdow6>	IIRC there is some project that reports back to the tracker (#// or something), don't know exactly what it reports as I'm not privvy to tracker stuff
00:02:54	<mgrandi>	Maybe this has improved recently yeah
00:03:40	<mgrandi>	I'm envisioning something like rabbitMQ where it will deliver jobs, and of the client doesn't check-in and say "I'm still here and working on it", it gets requeued and sent to someone else
00:03:59	<@JAA>	mgrandi: Jobs get recycled automatically on most projects these days.
00:04:05	<@JAA>	Well, not quite, but close enough.
00:04:15	<mgrandi>	However if it has an error, the client says "finished with error X " and those can get pushed to another queue or retried or whatever
00:04:36	<@JAA>	What happens is that once there is no queue anymore, items that have been out for a while get handed out again.
00:04:40	<@OrIdow6>	Ryz: Ah, I see
00:04:50	<mgrandi>	Ah
00:05:12	<mgrandi>	I still feel there is use in having a error reporting mechanism that can be shown somewhere to the admins to see certain jobs that always fail
00:05:21	<thuban>	i see, thx. for some reason i thought we'd had automatic retry for years, but maybe that was manual and the old wiki pages just didn't use much detail
00:05:38	<@JAA>	Yeah, until last year or so, it was all manual.
00:06:17	<@JAA>	mgrandi: We have that on URLTeam, and yes, it's very useful.
00:06:38	<Ryz>	OrIdow6, there's still usermade images hosting, like https://i238.photobucket.com/albums/ff257/szubaark/Picture1530_zpsc77001c5.jpg - coming from https://www.amibay.com/showthread.php?50472-Commodore-64-Original-Software-Titles - but again, the accounts and their galleries are not accessible publicly
00:06:40	<mgrandi>	Is that automatic or does it have to sorta be manually done on the project code
00:06:51	<Ryz>	So yeah, archiving Photobucket stuff just became a lot harder...
00:07:14	<@JAA>	URLTeam has a completely different code base. There's basically a giant try-except that sends the exception + traceback to the tracker.
00:07:18	<atphoenix>	thuban, I think I wrote the about hard-stops on docker containers
00:07:26	<atphoenix>	wrote the note*
00:07:37	<mgrandi>	Is that not using seesaw? Or a newer version of it?
00:07:37	<atphoenix>	I was working on the domains project, which had huge items
00:08:16	<atphoenix>	the point was a clean-stop was the safest answer to avoid the project missing a a site or other data
00:08:25	<mgrandi>	For big items , I've noticed that when it encounters an error, it gives up and just kinda deletes the files, these should still be uploaded I think
00:08:48	<atphoenix>	because with a dirty stop, the retry process might not kick into effect until it was too late, i.e. after the domain was killed
00:08:53	<mgrandi>	Cause even if it's accidental, process crash , OS crash , power outage, etc
00:09:26	<@JAA>	mgrandi: Yeah, it uses seesaw, but the error reporting is in the project code.
00:09:39	<thuban>	uh, meanwhile,
00:09:41	<atphoenix>	the potential loss of already saved info could also happen with small items, but obviously it is easier to retry those in a reasonable time
00:09:41	<thuban>	JAA or somebody with equivalent privileges: can we get a toclimit? https://en.wikipedia.org/wiki/Template_talk:TOC_limit#Steps_to_limit_the_TOC_in_your_mediawiki
00:09:41		pcr leaves
00:09:52	<thuban>	i'd do it myself but i don't have permission to edit the css
00:09:57	<@JAA>	Uploading data from crashed items will lead to all kinds of issues.
00:10:43	<@JAA>	thuban: Don't think I can either, but I've wished for that before, yeah. jrwr?
00:10:49	<mgrandi>	I dunno, I noticed it during the bitbucket one where crashing during a huge project dL was a big waste, feel like improving stuff around that would be good
00:11:17	<@JAA>	Yes, fixing the crashes is good. Uploading faulty data that then potentially crashes stuff on the targets etc. is not.
00:12:12	<mgrandi>	Yeah, error reporting potentially being built into seesaw and the tracker infra would help stop jobs dying halfway through probably
00:12:40	<@OrIdow6>	Warriors shutting down like that it still pretty rare
00:12:46	<@OrIdow6>	As opposed to crashes for other reasons
00:13:23	<atphoenix>	so anyhow, my notes are mostly aimed at ensuring clean docker worker shutdowns. I.e. don't hardkill them when a clean stop can be used instead.
00:15:31	<@JAA>	Yeah, clean stops should be preferred, but a hard stop isn't a huge issue if necessary.
00:15:55	<atphoenix>	I suppose there may be some cases where the data saved from a dirty-ended worker might also be of interest to keep. I think we checked a few such cases for Yahoo Groups workers. Maybe that was because of how Yahoo itself was changing things on the fly as we were saving stuff. And it was often erroring.
00:16:35	<atphoenix>	(yahoo was erroring)
00:18:55		pcr joins
00:19:09	<Ryz>	Is there an ArchiveTeam project for Photobucket?
00:19:53	<thuban>	https://wiki.archiveteam.org/index.php/Photobucket looks like no
00:25:26	<purplebot>	ArchiveTeam Warrior edited by Switchnode (-6652, clean up vm/docker coexistence …) just now -- https://www.archiveteam.org/?diff=46504&oldid=46503
00:29:39	<Ryz>	!ignore djqaqsdtvjqpc4a4zjecp0izi ^https?://(www\|m\|music\|au\|ca\|de\|es\|gaming\|fr\|ie\|il\|it\|jp\|mx\|nl\|pl\|uk)\.youtube\.com/watch\?
00:29:39	<Ryz>	!ignore djqaqsdtvjqpc4a4zjecp0izi ^https?://youtube\.com/watch\?
00:29:39	<Ryz>	!ig djqaqsdtvjqpc4a4zjecp0izi ^https?://youtu\.be/
00:29:41	<Ryz>	Oops
00:30:18	<thuban>	ok, made the big edit. in the end i removed the loads of detail on managing docker from the intro section, because it seemed like a serious turnoff for the newbs at whom the warrior is primarily aimed
00:31:15	<thuban>	plus (a) we don't include loads of detail on managing virtualbox there and if you deliberately choose to use the cli option you are bright enough to google the docs, (b) redundant section with the projects page presents awkward sync issues (i did link to it in several places), and (c) collapsing sections was a nice thought but they don't work without js
00:31:25	<thuban>	feel free to yell at me, change it back, etc
00:40:58	<thuban>	https://mommacomms.tumblr.com/post/646772288587513856/nug-juggler-demishock-k-vichan-k-vichan-take i hear ffn (which has been rotting for many years) is undergoing some technical changes soon
00:41:30	<thuban>	last at scrape in 2012, last "nearly complete" scrape in 2015--time for a revisit?
00:45:29		@Fusl quits [Excess Flood]
00:45:49		Fusl (Fusl) joins
00:45:49		@ChanServ sets mode: +o Fusl
00:56:26	<purplebot>	ArchiveTeam Warrior edited by Switchnode (+8, use correct vm name and consistent …) 22 minutes ago -- https://www.archiveteam.org/?diff=46505&oldid=46504
01:02:44		dm4v quits [Read error: Connection reset by peer]
01:03:34		dm4v joins
01:03:36		dm4v is now authenticated as dm4v
01:03:36		dm4v quits [Changing host]
01:03:36		dm4v (dm4v) joins
01:10:26	<purplebot>	Deathwatch edited by JustAnotherArchivist (-20, /* 2019 */ Link to 99.se page) just now -- https://www.archiveteam.org/?diff=46506&oldid=46492
01:19:26	<purplebot>	99.se edited by JustAnotherArchivist (+43, Link to my forums archive, clarify …) just now -- https://www.archiveteam.org/?diff=46507&oldid=46182
01:43:12	<@JAA>	s-crypt: Yes, of course AB goes into the WBM. That's the point really. :-P See also https://wiki.archiveteam.org/index.php/ArchiveBot
01:45:25		Mineroboter_ joins
01:45:25	<s-crypt>	Is that some special permission granted to archivebot or the archiveteam group? or is it all uploaded WARCs
01:47:06		Mineroboter quits [Ping timeout: 250 seconds]
01:49:28	<@JAA>	Only WARCs uploaded by whitelisted accounts get ingested.
01:54:59	<s-crypt>	Thanks for the answers! :)
01:57:29	<s-crypt>	Quick side question. Does the Warrior have the capability to switch desired (Archiveteam's pick) projects without restarting and getting a new docker image?
02:00:27		sliccricc_ quits [Remote host closed the connection]
02:12:53	<Hyenadae>	I've seen it switch as long as you have the "preferred project" task selected (in the VM version )
02:13:22	<Hyenadae>	You can also switch between two tasks I guess and back to the preferred/current project to get it restarted on the latest thing
02:41:23		fuzzy802 joins
02:41:23		fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173-224-26-244.ptcnet.net))]
02:41:24		fuzzy802 is now known as fuzzy8021
02:41:26		fuzzy8021 is now authenticated as fuzzy8021
02:41:26		fuzzy8021 quits [Changing host]
02:41:26		fuzzy8021 (fuzzy8021) joins
02:41:30		fuzzy8021 quits [Excess Flood]
02:41:48		fuzzy8021 joins
02:41:48		fuzzy8021 is now authenticated as fuzzy8021
02:41:48		fuzzy8021 quits [Changing host]
02:41:48		fuzzy8021 (fuzzy8021) joins
03:25:33		DopefishJustin quits [Remote host closed the connection]
03:29:21		DopefishJustin joins
03:29:22		DopefishJustin is now authenticated as DopefishJustin
03:30:03		YazofArc quits [Remote host closed the connection]
03:36:58		qw3rty__ joins
03:40:47		qw3rty_ quits [Ping timeout: 258 seconds]
03:50:35		DogsRNice quits [Read error: Connection reset by peer]
04:05:33		etnguyen03 quits [Client Quit]
04:06:14		Jonboy345 quits [Read error: Connection reset by peer]
04:06:32		Jonboy345 joins
04:19:07		pawbs joins
04:19:07	<tech234a>	Made a few more improvements to the Warrior page
04:19:25	<purplebot>	ArchiveTeam Warrior edited by Tech234a (+158, Shorten the infobox at the top …) just now -- https://www.archiveteam.org/?diff=46508&oldid=46505
04:20:01	<tech234a>	I combined the Docker setup commands into a one-liner which should hopefully simplify the instructions a little bit
04:20:50	<pawbs>	Has anybody gotten the Warrior to work well with BSD userland tools yet? I remember running into problems with that for #tumbledown
04:27:50	<tech234a>	Well there's always the VM or Docker... you might be able to manually run projects without either of those but that is more complicated and requires you to get the needed dependencies yourself
04:29:11	<pawbs>	Yeah, I’ve used the VM before, it’s just that the machine I have available for warrior-ing is on FreeBSD and also like a decade old. VMs make it unhappy :(
04:29:49	<tech234a>	Some instructions for running projects manually are in project READMEs; for example: https://github.com/ArchiveTeam/periscope-grab#readme
04:33:29		pawbs\|2 joins
04:33:39		pawbs quits [Client Quit]
04:33:55		pawbs\|2 is now known as pawbs
04:35:45	<pawbs>	I’ll give that a shot then, hopefully I can convince it to work well
04:35:50	<pawbs>	Thanks!
04:45:20		@Fusl quits [Excess Flood]
04:45:26	<purplebot>	ArchiveTeam Warrior edited by Tech234a (+0, Correct capitalization of VBoxManage) 23 minutes ago -- https://www.archiveteam.org/?diff=46509&oldid=46508
04:45:37		Fusl (Fusl) joins
04:45:37		@ChanServ sets mode: +o Fusl
05:02:38		Jonboy3451 joins
05:05:53		Jonboy345 quits [Ping timeout: 258 seconds]
05:33:18		pawbs quits [Ping timeout: 250 seconds]
06:29:00		dewdrop quits [Remote host closed the connection]
06:29:44		britm0b joins
06:32:54		britmob quits [Ping timeout: 258 seconds]
06:43:07		dewdrop (dewdrop) joins
06:45:19		@Fusl quits [Excess Flood]
06:45:39		Fusl (Fusl) joins
06:45:39		@ChanServ sets mode: +o Fusl
06:46:03		hooway joins
07:29:22		Wayward (wayward) joins
07:34:33		s-crypt quits [Remote host closed the connection]
07:34:33		flashfire42 quits [Remote host closed the connection]
07:34:33		kiska quits [Remote host closed the connection]
07:37:10		LeighR (LeighR) joins
07:50:03		jonboy3452 joins
07:53:24		Jonboy3451 quits [Ping timeout: 258 seconds]
08:01:36		britmob25 quits [Quit: britmob25]
08:22:49		Arcorann_ joins
08:42:19		themadpro_ is now known as themadpro
08:43:48	<themadpro>	Regarding Yahoo answers, someone raised an interesting point on the IA Discord server:
08:43:56	<themadpro>	> The versions in German, French, Spanish, Italian, etc. are also shutting down (see https://de.answers.yahoo.com/, https://fr.answers.yahoo.com/). These should be archived too, shouldn't they?
08:44:26	<themadpro>	Probably not as big as English obviously, but are we CURRENTLY grabbing from other the localization endpoints as well?
08:44:47	<themadpro>	If not, can we start crawling?
08:45:49		@Fusl quits [Excess Flood]
08:46:10		Fusl (Fusl) joins
08:46:10		@ChanServ sets mode: +o Fusl
08:49:27	<themadpro>	Preliminary analysis suggests that we are? https://github.com/ArchiveTeam/yahooanswers-grab/search?q=answers.yahoo.com https://usercontent.irccloud-cdn.com/file/EZvDUAF1/preliminary%20analysis.png
08:56:10		EggplantN joins
08:56:28		EggplantN is now authenticated as EggplantN
08:56:28		EggplantN quits [Changing host]
08:56:29		EggplantN (EggplantN) joins
08:56:29		@ChanServ sets mode: +o EggplantN
08:57:02		Arcorann_ quits [Ping timeout: 258 seconds]
09:05:22		Arcorann_ joins
09:07:39		BlueMaxima quits [Read error: Connection reset by peer]
09:16:29		s-crypt (s-crypt) joins
09:16:29		flashfire42 (flashfire42) joins
09:17:05		kiska (kiska) joins
09:20:13		LeighR quits [Ping timeout: 244 seconds]
09:35:58		nathan quits [Ping timeout: 250 seconds]
09:36:43		nathan joins
09:50:31	<AK>	#noanswers is the place for the project, but I believe the plan is to grab all languages if possible. Code is just being finished up before we start at the moment (That code is from the 2017 grab and I don't think the new version has been pushed yet)
09:56:27		Arcorann_ quits [Ping timeout: 258 seconds]
10:12:06	<themadpro>	Guess I will ask there as well then
10:14:28		katocala quits [Ping timeout: 258 seconds]
10:42:10		Arcorann_ joins
10:46:09		@Fusl quits [Excess Flood]
10:46:28		Fusl (Fusl) joins
10:46:28		@ChanServ sets mode: +o Fusl
10:48:45		@Fusl quits [Excess Flood]
10:49:02		Fusl (Fusl) joins
10:49:02		@ChanServ sets mode: +o Fusl
11:35:35		Hyenadae quits [Ping timeout: 244 seconds]
11:45:46		@Fusl quits [Excess Flood]
11:46:04		Fusl (Fusl) joins
11:46:04		@ChanServ sets mode: +o Fusl
12:10:40		yanome quits [Quit: The Lounge - https://thelounge.chat]
12:10:48		yanome (yano) joins
12:24:12		ATG64 joins
12:25:28		ATG64 quits [Remote host closed the connection]
12:43:47		Arcorann (Arcorann) joins
12:46:13		@Fusl quits [Excess Flood]
12:46:16		Arcorann_ quits [Ping timeout: 258 seconds]
12:46:33		Fusl (Fusl) joins
12:46:33		@ChanServ sets mode: +o Fusl
12:49:29		katocala joins
12:49:59		katocala is now authenticated as katocala
12:54:52		britmob25 joins
13:26:30		rewby quits [Ping timeout: 250 seconds]
13:26:43		rewby (rewby) joins
13:42:45		Arcorann_ joins
13:45:18		Arcorann quits [Ping timeout: 258 seconds]
13:46:22		@Fusl quits [Excess Flood]
13:46:41		Fusl (Fusl) joins
13:46:41		@ChanServ sets mode: +o Fusl
13:47:25		@Fusl quits [Client Quit]
13:47:32		Fusl (Fusl) joins
13:47:32		@ChanServ sets mode: +o Fusl
13:53:44		katocala quits [Ping timeout: 258 seconds]
13:54:07		katocala joins
14:02:18		LeGoupil joins
14:38:30		kyilani joins
14:43:51		kyilani quits [Remote host closed the connection]
14:56:32		Arcorann (Arcorann) joins
14:59:14		Arcorann_ quits [Ping timeout: 250 seconds]
15:20:56		themadpro quits [Read error: Connection reset by peer]
15:24:21		@HCross quits [Read error: Connection reset by peer]
15:25:00		themadpro (themadpro) joins
15:25:01		HCross (HCross) joins
15:25:01		@ChanServ sets mode: +o HCross
15:29:11		etnguyen03 (etnguyen03) joins
15:41:50		LeighR (LeighR) joins
15:58:03	<thuban>	tech234a: thanks! that was a good idea
16:01:41	<thuban>	couple of concerns: (1) did you intend to delete the 'using the web interface' section? & (2) it might not be obvious from the link in the setup section that the running-projects-with-docker page is about running _individual_ projects, not the warrior (in the sense that the warrior page uses it (we might need some new terminology))
16:03:15	<thuban>	i think i'll probably just add a line of explanation in re the latter; shouldn't be too much
16:09:06	<thuban>	(on linux vboxmanage and VBoxManage are both symlinked to the same bin, lol. i take it that's not the case for windows?)
16:25:32		Arcorann quits [Ping timeout: 258 seconds]
16:39:35		emerald (emerald) joins
16:46:37		LeGoupil quits [Ping timeout: 258 seconds]
16:48:07		Daloader_ joins
17:04:40		LeGoupil joins
17:05:00		brgtt joins
17:08:30		brgtt leaves
17:10:23		LeighR quits [Ping timeout: 244 seconds]
17:13:12		onetruth joins
17:15:28		brgtt joins
17:45:36		forkwhilefork (forkwhilefork) joins
17:49:45	<atphoenix>	I think a table of "ways to assist with AT archiving projects" could contain 3 or so columns to compare the methods:
17:50:24	<atphoenix>	1.) VM Warrior 2.) Docker-warrior 3.) Docker-direct projects
17:52:18		AlsoHP_Archivist joins
17:55:36		HP_Archivist quits [Ping timeout: 250 seconds]
17:56:57		brgtt quits [Client Quit]
18:02:23		AlsoHP_Archivist quits [Read error: Connection reset by peer]
18:02:50		AlsoHP_Archivist joins
18:11:16		spirit joins
19:00:50		LeGoupil quits [Client Quit]
19:02:20		pcr leaves
19:03:01		pcr joins
19:05:09		brgtt joins
19:11:29		brgtt quits [Client Quit]
19:21:02		Jonboy3451 joins
19:24:33		jonboy3452 quits [Ping timeout: 258 seconds]
19:38:21		spirit quits [Client Quit]
19:44:20		brgtt joins
20:01:36		jonboy3452 joins
20:04:48		Jonboy3451 quits [Ping timeout: 258 seconds]
20:10:36	<tech234a>	thuban: Thanks! I liked your edits too. As for (1) I deleted the 'using the web interface' section since it was unnecessary: when you open your browser to the control panel on a new Warrior, you are immediately taken to the screen to set your username, and once you save your username, you are immediately taken to the project list. I figure that we didn't need instructions for that. As for (2) if you are referring to the link
20:10:36	<tech234a>	labelled "here" then yeah, I think that might be a little unclear. (In general, labelling links "here" or "click here" is not a best practice.) Also something else that should be considered: while I agree the recommended method for shutting down the Warrior should be through the web interface, currently the Docker container will automatically restart after being shut down this way because of `--restart=unless-stopped`. Perhaps the
20:10:36	<tech234a>	Docker container could be updated to access the Docker socket from the host to stop itself when the shutdown button is used?
20:11:26	<purplebot>	ArchiveTeam Warrior edited by Tech234a (+6, Un-abbreviate --volume in Docker …) just now -- https://www.archiveteam.org/?diff=46510&oldid=46509
20:12:47		billy549 quits [Remote host closed the connection]
20:13:58	<thuban>	tech234a: gotcha. i didn't know the docker container couldn't shut itself down that way--seems like a good feature to add
20:14:59	<tech234a>	Yeah, alternatively we could consider changing the restart policy, but I don't think there are any other ones that fit what we want
20:20:43		billy549 (Billy549) joins
20:22:22	<tech234a>	oh and as for VBoxManage: perhaps the lowercase version does work, but pretty much everywhere online uses that capitalization so I figure that is the standard way to run
20:22:23	<tech234a>	it
20:23:33	<thuban>	sounds good (i sure don't feel like digging out a windows box to test it!)
20:34:17		pcr leaves
20:38:47		pcr joins
20:58:02		Daloader_ quits [Ping timeout: 250 seconds]
21:35:01	<tech234a>	We got called archive.org again (see last paragraph) https://www.reviewgeek.com/76740/yahoo-answers-no-more-the-qa-platform-shuts-down-may-4th/
22:08:35		hooway quits [Client Quit]
22:22:07		brgtt quits [Client Quit]
22:38:59		brgtt joins
22:40:21		brgtt quits [Client Quit]
22:45:30		katocala is now authenticated as katocala
23:21:13		BlueMaxima joins
23:21:53		notbasetwo (notbasetwo) joins

Home Search Previous day Next day