#archiveteam-bs log for 2025-03-03

Home Search Previous day Next day

00:05:54		APOLLO03 joins
00:16:34		Megame quits [Ping timeout: 250 seconds]
00:28:57		etnguyen03 (etnguyen03) joins
00:44:33		etnguyen03 quits [Client Quit]
00:48:11		lennier2 joins
00:51:03		lennier2_ quits [Ping timeout: 260 seconds]
00:59:09		SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
01:05:42		Wohlstand (Wohlstand) joins
01:08:49		SkilledAlpaca418962 joins
01:19:38		scurvy_duck quits [Ping timeout: 260 seconds]
01:28:23		etnguyen03 (etnguyen03) joins
01:30:40		rohvani joins
01:44:44		Island joins
02:28:49		biscuitenjoyer12 joins
02:32:59	<biscuitenjoyer12>	howdy, are there any performance differences between the archiveteam-warrior versus one of the project specific containers?(the usgov-grab to be specific)
02:36:58		lexikiq quits [Ping timeout: 250 seconds]
02:37:20		lexikiq joins
02:49:17		biscuitenjoyer12 quits [Client Quit]
02:59:30		DopefishJustin quits [Ping timeout: 250 seconds]
03:02:56	<TheTechRobo>	!tell biscuitenjoyer12 The project-specific containers don't have a web UI so they will likely use RAM, although in practice the effect is likely minor. The real useful thing is that they put logs on stdout, which allows you to use log centralization tools if you have a lot of containers
03:02:56	<eggdrop>	[tell] ok, I'll tell biscuitenjoyer12 when they join next
03:10:30		lunax quits [Remote host closed the connection]
03:18:44	<nicolas17>	use less* RAM?
03:30:54		etnguyen03 quits [Remote host closed the connection]
03:46:28		gust quits [Read error: Connection reset by peer]
04:05:22		scurvy_duck joins
04:11:00	<TheTechRobo>	er, yeah
04:12:18		lexikiq quits [Ping timeout: 260 seconds]
04:12:18		seab5 joins
04:12:40		lexikiq joins
04:12:52		seab5 quits [Client Quit]
04:40:41		Wohlstand quits [Remote host closed the connection]
04:45:28		caylin quits [Read error: Connection reset by peer]
04:45:41		caylin (caylin) joins
04:51:14		BlueMaxima quits [Quit: Leaving]
04:51:18		scurvy_duck quits [Ping timeout: 250 seconds]
05:00:29		dendory quits [Quit: The Lounge - https://thelounge.chat]
05:00:49		dendory (dendory) joins
06:15:44		SootBector quits [Remote host closed the connection]
06:16:04		SootBector (SootBector) joins
06:27:17		BornOn420 quits [Remote host closed the connection]
06:27:44		BornOn420 (BornOn420) joins
06:33:20		DopefishJustin joins
06:33:20		DopefishJustin is now authenticated as DopefishJustin
06:40:37		Justin[home] joins
06:40:37		Justin[home] is now authenticated as DopefishJustin
06:44:33		DopefishJustin quits [Ping timeout: 260 seconds]
06:44:45		Justin[home] is now known as DopefishJustin
06:59:17		lennier2 quits [Read error: Connection reset by peer]
06:59:33		lennier2 joins
07:00:05		beardicus quits [Quit: Ping timeout (120 seconds)]
07:02:36		beardicus (beardicus) joins
08:20:52		ahm2587 quits [Quit: The Lounge - https://thelounge.chat]
08:21:08		ahm2587 joins
08:33:38		nulldata quits [Ping timeout: 260 seconds]
09:22:03		corentin5 quits [Ping timeout: 260 seconds]
09:46:06		sparky14921 (sparky1492) joins
09:49:26		sparky1492 quits [Ping timeout: 250 seconds]
09:49:27		sparky14921 is now known as sparky1492
10:00:25		Island quits [Read error: Connection reset by peer]
10:36:29		sparky14926 (sparky1492) joins
10:40:13		sparky1492 quits [Ping timeout: 260 seconds]
10:40:14		sparky14926 is now known as sparky1492
10:46:29		loug83181422 joins
10:48:30		sec^nd quits [Remote host closed the connection]
10:48:56		sec^nd (second) joins
10:57:12	<@arkiver>	some projects have been running a bit slow lately
10:57:19	<@arkiver>	it's due to the large amount of content being archived.
10:57:53	<@arkiver>	there's long term projects with backlog, short term projects and backlog still being uploaded
10:58:15	<@arkiver>	for example blogger and livestream will soon have their items finished, which is going to save enormously already
11:18:06	<myself>	Ooo that's exciting. All that stuff has been sitting out on staging disks this whole time?
11:23:08	<@arkiver>	yeah there's both crawling backlog and an upload backlog, but they're clearing out
11:23:15	<@arkiver>	it may take more weeks though
12:00:02		Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
12:02:46		Bleo18260072271962345 joins
12:22:45		Wohlstand (Wohlstand) joins
12:34:19		SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962]
12:34:50		SkilledAlpaca418962 joins
13:26:06		notarobot1 quits [Ping timeout: 250 seconds]
14:14:18		@rewby quits [Ping timeout: 260 seconds]
14:28:41		NeonGlitch (NeonGlitch) joins
14:37:41		rewby (rewby) joins
14:37:41		@ChanServ sets mode: +o rewby
15:14:38		Megame (Megame) joins
15:31:55		Wohlstand quits [Quit: Wohlstand]
15:41:40		sparky14924 (sparky1492) joins
15:45:18		sparky1492 quits [Ping timeout: 260 seconds]
15:45:19		sparky14924 is now known as sparky1492
15:55:29		BornOn420_ (BornOn420) joins
15:58:58		BornOn420 quits [Ping timeout: 276 seconds]
16:04:15		scurvy_duck joins
16:09:22		BornOn420_ quits [Ping timeout: 276 seconds]
16:22:54		scurvy_duck quits [Ping timeout: 250 seconds]
16:23:10		BornOn420 (BornOn420) joins
16:52:28	<sparky1492>	Good Day all, I started having an issue with one of my docker containers last week and have been slowing troubleshooting it and now i'm stuck on what else to try. It's unable to see the warrior projects, on the "Avaliable Projects" page when the docker warrior is running only shows "ArchiveTeam's Choice" and below that show an error message of
16:52:28	<sparky1492>	"Phooey… No warrior projects are available for participation yet!" This docker was running fine for a week or so, then I think it was after it restarted last week sometime. I've removed the image re-downloaded, changed the compose.yaml configurations to mostly vanilla. I have other docker images and VMs on other machines running just fine on
16:52:28	<sparky1492>	the same network. I've not been able to find anything online that has/shows anything about that error message.
16:54:36	<sparky1492>	Also to add to this, the graph for downloads and uploads is constantly active showing downloads and uploads. Also checked the data folder on the docker container and only shows the warrior logs and wget items. The logs only show this:
16:54:37	<sparky1492>	2025-03-03 16:43:01,952 - seesaw.warrior - DEBUG - Update warrior hq.
16:54:37	<sparky1492>	2025-03-03 16:43:01,952 - seesaw.warrior - DEBUG - Warrior ID ''.
16:54:37	<sparky1492>	2025-03-03 16:53:01,953 - seesaw.warrior - DEBUG - Update warrior hq.
16:54:37	<sparky1492>	2025-03-03 16:53:01,953 - seesaw.warrior - DEBUG - Warrior ID ''.
17:18:08	<sparky1492>	also my apologizes if this needs to be in another channel please let me know
17:19:21	<Blueacid>	arkiver: Is the backlog due to the limited ingest speed / slot count of the IA's infrastructure? Roughly how much is staged & in the queue for transferring?
17:19:40	<Blueacid>	(It's incredible just how much stuff there is - kudos to all of you for wranglin' all this infrastructure!)
17:27:33		NeonGlitch quits [Client Quit]
17:42:17		Megame quits [Client Quit]
17:42:37		NeonGlitch (NeonGlitch) joins
18:13:02		kansei quits [Quit: ZNC 1.9.1 - https://znc.in]
18:16:20		kansei (kansei) joins
18:28:13		kansei quits [Client Quit]
18:28:45		Barto (Barto) joins
18:30:03		nicolas17 quits [Quit: Konversation terminated!]
18:30:25		kansei (kansei) joins
18:33:31		Barto quits [Client Quit]
18:38:03		Barto (Barto) joins
18:48:13		gust joins
18:58:33		gust quits [Remote host closed the connection]
18:58:52		gust joins
19:00:08		nicolas17 joins
19:24:02		wyatt8740 quits [Ping timeout: 250 seconds]
19:27:53		wyatt8740 joins
19:34:05		wyatt8750 joins
19:34:33		wyatt8740 quits [Ping timeout: 260 seconds]
19:39:48		wyatt8750 quits [Ping timeout: 260 seconds]
19:43:04		wyatt8740 joins
19:48:06		Junie joins
19:50:13	<Junie>	Hello! I'm working on my midterm for a digital preservation class, and I have a rather lengthy list of questions regarding operating procedures/policies at archive team, if anyone is feeling up to answering questions? Stuff like policies regarding metadata, funding, organizational structure, that kind of thing. Alternatively, if that kind of
19:50:13	<Junie>	information is available elsewhere and someone could point me in the right direction, that would be awesome!
19:53:55	<pokechu22>	I don't think much of that is documented anywhere (and most of it is pretty loose/unstructured anyways), but this is probably the right channel to ask in
19:56:04	<Junie>	It's cool if there isn't formal documentation, I already prefaced my paper with the fact that the structure is pretty lose, so there might not be concrete answers to everything. I just thought it looked like a cool project, and it seemed more interesting to look into than like, the library of congress or something
19:58:45	<Junie>	I guess my first question would be: the wiki mentions Warriors and Writers as some of the folks involved with the process, what other roles are associated with the project? Is there any sort of organizational structure involved?
20:09:31		qw3rty quits [Read error: Connection reset by peer]
20:09:31		benjins2_ quits [Read error: Connection reset by peer]
20:09:31		PredatorIWD25 quits [Read error: Connection reset by peer]
20:09:31		khaoohs quits [Read error: Connection reset by peer]
20:09:31		bladem quits [Read error: Connection reset by peer]
20:09:33		qw3rty joins
20:09:36		PredatorIWD25 joins
20:09:40		khaoohs joins
20:10:49		benjins2_ joins
20:14:33	<Vokun>	I suppose there are the "core members", I think there's 3 or 4 of them, who do a majority of the management of hardware and coding. Then the (i don't have a word for this) "Trusted Members" who own/rent the target servers and maybe also contribute code, and some of them host small more private projects like #burnthetwitch or #wikibot. The next down might be those who host archivebot pipelines, or even have permissison to use archivebot at all,
20:14:33	<Vokun>	would have been people who'
20:14:41		bladem (bladem) joins
20:14:55	<Vokun>	who've been here a while and are also quite trusted
20:15:49	<Vokun>	Someone correct me if I'm wrong, but that's what I've gathered. That last one may blend into the one above slightly. And all these groups tend to blend together somewhat near the edges
20:18:06	<Vokun>	The "Writers" are just anyone who has the freetime to edit the wiki. The "Warriors" are anyone who runs the projects, and just about anyone could do any or all of these roles, probably, if they've been here long enough and the core team like them
20:19:59	<Junie>	Awesome! That really clarifies things, thank you :D Let me get that into my paper and I can get to my next question
20:28:41	<Junie>	Okay, question two, which kind of has a few parts: Where does funding come from? Is there an estimate for the total annual spending to support the project? How is the budget usually split between staffing (volunteers, afaik, so this part might not be relevant), software, hardware, and services?
20:30:16	<nicolas17>	long term storage is Internet Archive which has its own funding/donations/etc
20:31:26		NeonGlitch quits [Client Quit]
20:32:09	<Junie>	Makes sense!
20:32:26	<katia>	Junie, i have a question for you, apologies if you already answered it. where can we find the output of your research? you know, to archive it. :)
20:32:30	<nicolas17>	as for our infrastructure, it's often volunteers who already have personal hardware around, who are allowed to use spare infrastructure from their workplaces, or who have money to spend on hobbies :p
20:33:20	<pokechu22>	Some funding is at https://opencollective.com/archiveteam but that's not the only way funding works (e.g. I personally host an archivebot pipeline, and pay 26.55 Euro/month for the server for that)
20:34:14	<h2ibot>	Petchea edited Niconico (-29, /* General network status */ fix): https://wiki.archiveteam.org/?diff=54516&oldid=54339
20:35:14	<h2ibot>	Petchea edited Niconico (-156, /* Extraction tools */ google cache is obsolete): https://wiki.archiveteam.org/?diff=54517&oldid=54516
20:36:13	<Junie>	This is just a paper I'm writing for a midterm assignment, it probably wasn't going to get put online anywhere. Basically I'm filling out a modified version of this sheet: https://sustainableheritagenetwork.org/digital-heritage/digital-preservation-plan-worksheet
20:36:48	<Junie>	I graduate from my library science program in about a month, this digital preservation class is the last one I need to complete for my masters! :D
20:37:00		NeonGlitch (NeonGlitch) joins
20:42:41	<Vokun>	I think the majority, if not all the software we use, is open source software, and usually heavily modified for our workcase. I don't know much about that side of it, but I know there's been a lot of work put into the software side.
20:44:26	<TheTechRobo>	It's open source-ish software. The tracker, for example, is only barely universal-tracker still.
20:45:01	<TheTechRobo>	My understanding is that there's just a lot of duct tape that was never made public, and it compounded over time.
20:45:16	<h2ibot>	Petchea edited Niconico (-15, /* Nico Nico Douga */ fix): https://wiki.archiveteam.org/?diff=54518&oldid=54517
20:45:35	<Vokun>	I was thinking mainly that ours is "based" on open source software. Not that ours is nessesarily open
20:45:42	<Vokun>	Some of it is
20:46:18	<Vokun>	Some stuff we use is so old and patched over that only one or two people even understand it... haha
20:47:36	<Junie>	Does anyone have a vague idea of how much is spent (through whatever various sources) to keep the project going annually? If its all just personal spending, donations, and other kinda ethereal funding sources that's fine too, but even a rough estimate would be super helpful?
20:48:23	<katia>	no idea, impossible to tell
20:49:17	<Junie>	That's what I figured, it definitely seems like one of those projects
20:50:15	<Vokun>	At least a few thousand I'd think. Possibly 5 figures. Some of the target servers cost ~100usd a month, and there's like 70 AB pipelines, which are at least $20+ each
20:50:20	<Vokun>	stilll impossible to confirm
20:51:19	<Junie>	That's still super helpful, thank you!
20:51:21	<Vokun>	Just infrastructure wise. Impossible to guess the money that the warriors would have cost
20:51:31	<Junie>	Question the next: Is there a formal mission statement or collection development policy? Or is it more of a "we take all the stuff we can and save it" type situation
20:52:20	<katia>	https://wiki.archiveteam.org/ says: Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten
20:52:20	<katia>	attention, resistance, press and discussion, but most importantly, we've gotten the message out: IT DOESN'T HAVE TO BE THIS WAY.
20:53:11	<pokechu22>	There's 73 pipelines, but generally one physical machine does 4-8 pipelines. It's 13 total machines right now I think (but some are bigger and thus more expensive than others)
20:54:53	<Vokun>	Anything that's under the TB range is pretty much on a "whoever has time" basis. We do have to take into account the Internet Archive when dealing with sizes at some point. Not surewhen that happens, but say for example, if it's more than a few dozen TB it probably should be at risk before we save it. None of this is guarantied tho, and there've been many acceptions
20:55:39	<Vokun>	There are also many many TB archived "proactively" which aren't at risk at all, nessesarily
20:56:23	<pokechu22>	For instance we (attempted to) save the webpages for every federal candidate in the 2024 US election (and try to do similar things for other elections), but also I like to save websites for various restaurants and businesses that might not be saved otherwise
20:56:54	<pokechu22>	I tend to focus on things that are niche and wouldn't be likely to be saved otherwise
21:00:17	<pokechu22>	One benefit is that since our data ends up on web.archive.org, it's relatively easy to discover if you have the URL of the original website
21:00:57	<Junie>	This is super helpful so far. Do y'all mind if I quote you by username in my paper? Like I said, this isn't really going anywhere besides a blackboard submission box, but it would be nice to give credit to the parties responsible for certain bits of information
21:01:09		BlueMaxima joins
21:02:06	<katia>	this channel is publicly logged so it's probably fine to mention people by their usernames
21:02:07	<pokechu22>	Sure; the logs for this channel are also public so you could link to https://irclogs.archivete.am/archiveteam-bs/2025-03-03 if you want
21:02:13	<katia>	\o\
21:03:04	<Junie>	That's super helpful, I'll do that!
21:03:25	<myself>	I'm curious how many warriors are out there, given that the "cost" of running one is supposed to be zero or nearly so (theoretically my desktop may burn an extra penny per month worth of electricity running the warrior than if it wasn't running it?), but there are.... quite a lot of us.
21:06:28	<Vokun>	~1500 ran URLS alone
21:23:04	<Junie>	Okay so! Fourth(?) question: Is there a policy in place regarding copyright/intellectual property? I know the Internet Archive in general tends to play it pretty fast and loose, but if there's an internal stance, that's what I'm looking for with this question
21:38:03		HP_Archivist quits [Quit: Leaving]
21:40:48		kansei quits [Quit: ZNC 1.9.1 - https://znc.in]
21:44:30		kansei (kansei) joins
22:01:30		Dango360 quits [Read error: Connection reset by peer]
22:07:00		Dango360 (Dango360) joins
22:14:46		etnguyen03 (etnguyen03) joins
22:21:34	<Junie>	Additional question, are there any sorts of materials the Archive Team specifically DOESNT digitally preserve? Websites that are offlimits/not considered? I assume given the bulk of data y'all are working with, things that aren't at risk are low priority, but is there anything that you specifically avoid?
22:21:35	<h2ibot>	Petchea edited Niconico (-61, /* Extraction tools */ Nicochart appears to…): https://wiki.archiveteam.org/?diff=54519&oldid=54518
22:22:34		nine quits [Ping timeout: 250 seconds]
22:23:16		nine joins
22:23:16		nine is now authenticated as nine
22:23:16		nine quits [Changing host]
22:23:16		nine (nine) joins
22:29:54		flotwig quits [Remote host closed the connection]
22:32:37	<h2ibot>	Usernam edited List of websites excluded from the Wayback Machine (-27, Why is User:JAABot not sorting and counting this?): https://wiki.archiveteam.org/?diff=54520&oldid=54513
22:32:59		flotwig joins
22:38:11		NeonGlitch quits [Client Quit]
22:41:49	<Vokun>	Mostly just things like these https://wiki.archiveteam.org/index.php/List_of_websites_excluded_from_the_Wayback_Machine
22:42:05	<Vokun>	And things we know archive themselves like wikipedia
22:43:19	<Vokun>	Some of us might still archive these that are excluded, but they won't be going into the wayback machine because the companies behind these sites have requested/demanded IA to not display them
22:43:39	<Vokun>	Just usually not an official project or anything would be started for one like htat
22:43:41	<Vokun>	that*
22:47:02	<Junie>	Makes sense! Thanks!
22:48:01	<TheTechRobo>	Not sure what the exact stance is, but our motto is generally "archive first, ask questions later"
22:56:08	<Junie>	The last question I have for now (I think) is if there's a procedure for the creation/qc for metadata for things being uploaded? Who creates it, how is it created, is there a specific workflow, or is this another situation of case to case?
23:05:43		riteo quits [Ping timeout: 260 seconds]
23:13:50	<Vokun>	for the smaller projects, it can be case by case, like in #discard for the discord archive. This one operates just, whoever grabs a discord server first, and however they label it. I try to be somewhat verbose with metadata as I can, without putting in too much extra research per server.
23:14:19	<Vokun>	Main projects, have a standard for metadata, since they are automated
23:16:25	<Vokun>	But other projects like #burnthetwitch and #wikibot are automated too, so they have separate standards, from their respective admins
23:17:08	<Junie>	What's the format/schema that's generally used for the automated ones? RDA?
23:17:11		Shyy quits [Quit: The Lounge - https://thelounge.chat]
23:17:38	<Vokun>	https://archive.org/details/archiveteam_usgovernment_20250131232111_96ad506d here's one example
23:17:42	<Junie>	(Also, thank you so much for being so helpful Vokun! You've made a relatively intimidating project much easier for me to complete, and I really appreciate it!)
23:18:26	<Junie>	oh thanks!
23:19:31	<Vokun>	I don't know the specifics on metadata in the main projects. Just glad to spit out what I know, and I'm glad others corrected some asumptions I made
23:20:05		Island joins
23:23:08		Shyy4 joins
23:25:39		riteo (riteo) joins
23:29:00	<Junie>	That's totally fine, if its more of a case to case that works, and I can dig into the metadata thats available on internet archive to find formatting info, so that example you sent is super helpful!
23:36:55		lennier2_ joins
23:37:16	<Vokun>	Here's a few examples from #Discard https://archive.org/details/Monolith-Productions-Official-Discord-Archive-635909048184733726
23:37:17	<Vokun>	https://archive.org/details/discord-938116342538706975
23:39:23	<Vokun>	Here's the twitch collection https://archive.org/details/archiveteam_twitch_metadata?tab=collection
23:40:08		lennier2 quits [Ping timeout: 260 seconds]
23:42:04	<Vokun>	wikiteam item https://archive.org/details/wiki-wiki.totemarts.games-20250303
23:49:36	<Junie>	Awesome, thanks so much y'all! I'll be back if I have more questions, but that should be everything I needed!

Home Search Previous day Next day