00:05:54 | | APOLLO03 joins |
00:16:34 | | Megame quits [Ping timeout: 250 seconds] |
00:28:57 | | etnguyen03 (etnguyen03) joins |
00:44:33 | | etnguyen03 quits [Client Quit] |
00:48:11 | | lennier2 joins |
00:51:03 | | lennier2_ quits [Ping timeout: 260 seconds] |
00:59:09 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
01:05:42 | | Wohlstand (Wohlstand) joins |
01:08:49 | | SkilledAlpaca418962 joins |
01:19:38 | | scurvy_duck quits [Ping timeout: 260 seconds] |
01:28:23 | | etnguyen03 (etnguyen03) joins |
01:30:40 | | rohvani joins |
01:44:44 | | Island joins |
02:28:49 | | biscuitenjoyer12 joins |
02:32:59 | <biscuitenjoyer12> | howdy, are there any performance differences between the archiveteam-warrior versus one of the project specific containers?(the usgov-grab to be specific) |
02:36:58 | | lexikiq quits [Ping timeout: 250 seconds] |
02:37:20 | | lexikiq joins |
02:49:17 | | biscuitenjoyer12 quits [Client Quit] |
02:59:30 | | DopefishJustin quits [Ping timeout: 250 seconds] |
03:02:56 | <TheTechRobo> | !tell biscuitenjoyer12 The project-specific containers don't have a web UI so they will likely use RAM, although in practice the effect is likely minor. The real useful thing is that they put logs on stdout, which allows you to use log centralization tools if you have a lot of containers |
03:02:56 | <eggdrop> | [tell] ok, I'll tell biscuitenjoyer12 when they join next |
03:10:30 | | lunax quits [Remote host closed the connection] |
03:18:44 | <nicolas17> | use less* RAM? |
03:30:54 | | etnguyen03 quits [Remote host closed the connection] |
03:46:28 | | gust quits [Read error: Connection reset by peer] |
04:05:22 | | scurvy_duck joins |
04:11:00 | <TheTechRobo> | er, yeah |
04:12:18 | | lexikiq quits [Ping timeout: 260 seconds] |
04:12:18 | | seab5 joins |
04:12:40 | | lexikiq joins |
04:12:52 | | seab5 quits [Client Quit] |
04:40:41 | | Wohlstand quits [Remote host closed the connection] |
04:45:28 | | caylin quits [Read error: Connection reset by peer] |
04:45:41 | | caylin (caylin) joins |
04:51:14 | | BlueMaxima quits [Quit: Leaving] |
04:51:18 | | scurvy_duck quits [Ping timeout: 250 seconds] |
05:00:29 | | dendory quits [Quit: The Lounge - https://thelounge.chat] |
05:00:49 | | dendory (dendory) joins |
06:15:44 | | SootBector quits [Remote host closed the connection] |
06:16:04 | | SootBector (SootBector) joins |
06:27:17 | | BornOn420 quits [Remote host closed the connection] |
06:27:44 | | BornOn420 (BornOn420) joins |
06:33:20 | | DopefishJustin joins |
06:33:20 | | DopefishJustin is now authenticated as DopefishJustin |
06:40:37 | | Justin[home] joins |
06:40:37 | | Justin[home] is now authenticated as DopefishJustin |
06:44:33 | | DopefishJustin quits [Ping timeout: 260 seconds] |
06:44:45 | | Justin[home] is now known as DopefishJustin |
06:59:17 | | lennier2 quits [Read error: Connection reset by peer] |
06:59:33 | | lennier2 joins |
07:00:05 | | beardicus quits [Quit: Ping timeout (120 seconds)] |
07:02:36 | | beardicus (beardicus) joins |
08:20:52 | | ahm2587 quits [Quit: The Lounge - https://thelounge.chat] |
08:21:08 | | ahm2587 joins |
08:33:38 | | nulldata quits [Ping timeout: 260 seconds] |
09:22:03 | | corentin5 quits [Ping timeout: 260 seconds] |
09:46:06 | | sparky14921 (sparky1492) joins |
09:49:26 | | sparky1492 quits [Ping timeout: 250 seconds] |
09:49:27 | | sparky14921 is now known as sparky1492 |
10:00:25 | | Island quits [Read error: Connection reset by peer] |
10:36:29 | | sparky14926 (sparky1492) joins |
10:40:13 | | sparky1492 quits [Ping timeout: 260 seconds] |
10:40:14 | | sparky14926 is now known as sparky1492 |
10:46:29 | | loug83181422 joins |
10:48:30 | | sec^nd quits [Remote host closed the connection] |
10:48:56 | | sec^nd (second) joins |
10:57:12 | <@arkiver> | some projects have been running a bit slow lately |
10:57:19 | <@arkiver> | it's due to the large amount of content being archived. |
10:57:53 | <@arkiver> | there's long term projects with backlog, short term projects and backlog still being uploaded |
10:58:15 | <@arkiver> | for example blogger and livestream will soon have their items finished, which is going to save enormously already |
11:18:06 | <myself> | Ooo that's exciting. All that stuff has been sitting out on staging disks this whole time? |
11:23:08 | <@arkiver> | yeah there's both crawling backlog and an upload backlog, but they're clearing out |
11:23:15 | <@arkiver> | it may take more weeks though |
12:00:02 | | Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:46 | | Bleo18260072271962345 joins |
12:22:45 | | Wohlstand (Wohlstand) joins |
12:34:19 | | SkilledAlpaca418962 quits [Quit: SkilledAlpaca418962] |
12:34:50 | | SkilledAlpaca418962 joins |
13:26:06 | | notarobot1 quits [Ping timeout: 250 seconds] |
14:14:18 | | @rewby quits [Ping timeout: 260 seconds] |
14:28:41 | | NeonGlitch (NeonGlitch) joins |
14:37:41 | | rewby (rewby) joins |
14:37:41 | | @ChanServ sets mode: +o rewby |
15:14:38 | | Megame (Megame) joins |
15:31:55 | | Wohlstand quits [Quit: Wohlstand] |
15:41:40 | | sparky14924 (sparky1492) joins |
15:45:18 | | sparky1492 quits [Ping timeout: 260 seconds] |
15:45:19 | | sparky14924 is now known as sparky1492 |
15:55:29 | | BornOn420_ (BornOn420) joins |
15:58:58 | | BornOn420 quits [Ping timeout: 276 seconds] |
16:04:15 | | scurvy_duck joins |
16:09:22 | | BornOn420_ quits [Ping timeout: 276 seconds] |
16:22:54 | | scurvy_duck quits [Ping timeout: 250 seconds] |
16:23:10 | | BornOn420 (BornOn420) joins |
16:52:28 | <sparky1492> | Good Day all, I started having an issue with one of my docker containers last week and have been slowing troubleshooting it and now i'm stuck on what else to try. It's unable to see the warrior projects, on the "Avaliable Projects" page when the docker warrior is running only shows "ArchiveTeam's Choice" and below that show an error message of |
16:52:28 | <sparky1492> | "Phooey… No warrior projects are available for participation yet!" This docker was running fine for a week or so, then I think it was after it restarted last week sometime. I've removed the image re-downloaded, changed the compose.yaml configurations to mostly vanilla. I have other docker images and VMs on other machines running just fine on |
16:52:28 | <sparky1492> | the same network. I've not been able to find anything online that has/shows anything about that error message. |
16:54:36 | <sparky1492> | Also to add to this, the graph for downloads and uploads is constantly active showing downloads and uploads. Also checked the data folder on the docker container and only shows the warrior logs and wget items. The logs only show this: |
16:54:37 | <sparky1492> | 2025-03-03 16:43:01,952 - seesaw.warrior - DEBUG - Update warrior hq. |
16:54:37 | <sparky1492> | 2025-03-03 16:43:01,952 - seesaw.warrior - DEBUG - Warrior ID ''. |
16:54:37 | <sparky1492> | 2025-03-03 16:53:01,953 - seesaw.warrior - DEBUG - Update warrior hq. |
16:54:37 | <sparky1492> | 2025-03-03 16:53:01,953 - seesaw.warrior - DEBUG - Warrior ID ''. |
17:18:08 | <sparky1492> | also my apologizes if this needs to be in another channel please let me know |
17:19:21 | <Blueacid> | arkiver: Is the backlog due to the limited ingest speed / slot count of the IA's infrastructure? Roughly how much is staged & in the queue for transferring? |
17:19:40 | <Blueacid> | (It's incredible just how much stuff there is - kudos to all of you for wranglin' all this infrastructure!) |
17:27:33 | | NeonGlitch quits [Client Quit] |
17:42:17 | | Megame quits [Client Quit] |
17:42:37 | | NeonGlitch (NeonGlitch) joins |
18:13:02 | | kansei quits [Quit: ZNC 1.9.1 - https://znc.in] |
18:16:20 | | kansei (kansei) joins |
18:28:13 | | kansei quits [Client Quit] |
18:28:45 | | Barto (Barto) joins |
18:30:03 | | nicolas17 quits [Quit: Konversation terminated!] |
18:30:25 | | kansei (kansei) joins |
18:33:31 | | Barto quits [Client Quit] |
18:38:03 | | Barto (Barto) joins |
18:48:13 | | gust joins |
18:58:33 | | gust quits [Remote host closed the connection] |
18:58:52 | | gust joins |
19:00:08 | | nicolas17 joins |
19:24:02 | | wyatt8740 quits [Ping timeout: 250 seconds] |
19:27:53 | | wyatt8740 joins |
19:34:05 | | wyatt8750 joins |
19:34:33 | | wyatt8740 quits [Ping timeout: 260 seconds] |
19:39:48 | | wyatt8750 quits [Ping timeout: 260 seconds] |
19:43:04 | | wyatt8740 joins |
19:48:06 | | Junie joins |
19:50:13 | <Junie> | Hello! I'm working on my midterm for a digital preservation class, and I have a rather lengthy list of questions regarding operating procedures/policies at archive team, if anyone is feeling up to answering questions? Stuff like policies regarding metadata, funding, organizational structure, that kind of thing. Alternatively, if that kind of |
19:50:13 | <Junie> | information is available elsewhere and someone could point me in the right direction, that would be awesome! |
19:53:55 | <pokechu22> | I don't think much of that is documented anywhere (and most of it is pretty loose/unstructured anyways), but this is probably the right channel to ask in |
19:56:04 | <Junie> | It's cool if there isn't formal documentation, I already prefaced my paper with the fact that the structure is pretty lose, so there might not be concrete answers to everything. I just thought it looked like a cool project, and it seemed more interesting to look into than like, the library of congress or something |
19:58:45 | <Junie> | I guess my first question would be: the wiki mentions Warriors and Writers as some of the folks involved with the process, what other roles are associated with the project? Is there any sort of organizational structure involved? |
20:09:31 | | qw3rty quits [Read error: Connection reset by peer] |
20:09:31 | | benjins2_ quits [Read error: Connection reset by peer] |
20:09:31 | | PredatorIWD25 quits [Read error: Connection reset by peer] |
20:09:31 | | khaoohs quits [Read error: Connection reset by peer] |
20:09:31 | | bladem quits [Read error: Connection reset by peer] |
20:09:33 | | qw3rty joins |
20:09:36 | | PredatorIWD25 joins |
20:09:40 | | khaoohs joins |
20:10:49 | | benjins2_ joins |
20:14:33 | <Vokun> | I suppose there are the "core members", I think there's 3 or 4 of them, who do a majority of the management of hardware and coding. Then the (i don't have a word for this) "Trusted Members" who own/rent the target servers and maybe also contribute code, and some of them host small more private projects like #burnthetwitch or #wikibot. The next down might be those who host archivebot pipelines, or even have permissison to use archivebot at all, |
20:14:33 | <Vokun> | would have been people who' |
20:14:41 | | bladem (bladem) joins |
20:14:55 | <Vokun> | who've been here a while and are also quite trusted |
20:15:49 | <Vokun> | Someone correct me if I'm wrong, but that's what I've gathered. That last one may blend into the one above slightly. And all these groups tend to blend together somewhat near the edges |
20:18:06 | <Vokun> | The "Writers" are just anyone who has the freetime to edit the wiki. The "Warriors" are anyone who runs the projects, and just about anyone could do any or all of these roles, probably, if they've been here long enough and the core team like them |
20:19:59 | <Junie> | Awesome! That really clarifies things, thank you :D Let me get that into my paper and I can get to my next question |
20:28:41 | <Junie> | Okay, question two, which kind of has a few parts: Where does funding come from? Is there an estimate for the total annual spending to support the project? How is the budget usually split between staffing (volunteers, afaik, so this part might not be relevant), software, hardware, and services? |
20:30:16 | <nicolas17> | long term storage is Internet Archive which has its own funding/donations/etc |
20:31:26 | | NeonGlitch quits [Client Quit] |
20:32:09 | <Junie> | Makes sense! |
20:32:26 | <katia> | Junie, i have a question for you, apologies if you already answered it. where can we find the output of your research? you know, to archive it. :) |
20:32:30 | <nicolas17> | as for our infrastructure, it's often volunteers who already have personal hardware around, who are allowed to use spare infrastructure from their workplaces, or who have money to spend on hobbies :p |
20:33:20 | <pokechu22> | Some funding is at https://opencollective.com/archiveteam but that's not the only way funding works (e.g. I personally host an archivebot pipeline, and pay 26.55 Euro/month for the server for that) |
20:34:14 | <h2ibot> | Petchea edited Niconico (-29, /* General network status */ fix): https://wiki.archiveteam.org/?diff=54516&oldid=54339 |
20:35:14 | <h2ibot> | Petchea edited Niconico (-156, /* Extraction tools */ google cache is obsolete): https://wiki.archiveteam.org/?diff=54517&oldid=54516 |
20:36:13 | <Junie> | This is just a paper I'm writing for a midterm assignment, it probably wasn't going to get put online anywhere. Basically I'm filling out a modified version of this sheet: https://sustainableheritagenetwork.org/digital-heritage/digital-preservation-plan-worksheet |
20:36:48 | <Junie> | I graduate from my library science program in about a month, this digital preservation class is the last one I need to complete for my masters! :D |
20:37:00 | | NeonGlitch (NeonGlitch) joins |
20:42:41 | <Vokun> | I think the majority, if not all the software we use, is open source software, and usually heavily modified for our workcase. I don't know much about that side of it, but I know there's been a lot of work put into the software side. |
20:44:26 | <TheTechRobo> | It's open source-ish software. The tracker, for example, is only barely universal-tracker still. |
20:45:01 | <TheTechRobo> | My understanding is that there's just a lot of duct tape that was never made public, and it compounded over time. |
20:45:16 | <h2ibot> | Petchea edited Niconico (-15, /* Nico Nico Douga */ fix): https://wiki.archiveteam.org/?diff=54518&oldid=54517 |
20:45:35 | <Vokun> | I was thinking mainly that ours is "based" on open source software. Not that ours is nessesarily open |
20:45:42 | <Vokun> | Some of it is |
20:46:18 | <Vokun> | Some stuff we use is so old and patched over that only one or two people even understand it... haha |
20:47:36 | <Junie> | Does anyone have a vague idea of how much is spent (through whatever various sources) to keep the project going annually? If its all just personal spending, donations, and other kinda ethereal funding sources that's fine too, but even a rough estimate would be super helpful? |
20:48:23 | <katia> | no idea, impossible to tell |
20:49:17 | <Junie> | That's what I figured, it definitely seems like one of those projects |
20:50:15 | <Vokun> | At least a few thousand I'd think. Possibly 5 figures. Some of the target servers cost ~100usd a month, and there's like 70 AB pipelines, which are at least $20+ each |
20:50:20 | <Vokun> | stilll impossible to confirm |
20:51:19 | <Junie> | That's still super helpful, thank you! |
20:51:21 | <Vokun> | Just infrastructure wise. Impossible to guess the money that the warriors would have cost |
20:51:31 | <Junie> | Question the next: Is there a formal mission statement or collection development policy? Or is it more of a "we take all the stuff we can and save it" type situation |
20:52:20 | <katia> | https://wiki.archiveteam.org/ says: Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten |
20:52:20 | <katia> | attention, resistance, press and discussion, but most importantly, we've gotten the message out: IT DOESN'T HAVE TO BE THIS WAY. |
20:53:11 | <pokechu22> | There's 73 pipelines, but generally one physical machine does 4-8 pipelines. It's 13 total machines right now I think (but some are bigger and thus more expensive than others) |
20:54:53 | <Vokun> | Anything that's under the TB range is pretty much on a "whoever has time" basis. We do have to take into account the Internet Archive when dealing with sizes at some point. Not surewhen that happens, but say for example, if it's more than a few dozen TB it probably should be at risk before we save it. None of this is guarantied tho, and there've been many acceptions |
20:55:39 | <Vokun> | There are also many many TB archived "proactively" which aren't at risk at all, nessesarily |
20:56:23 | <pokechu22> | For instance we (attempted to) save the webpages for every federal candidate in the 2024 US election (and try to do similar things for other elections), but also I like to save websites for various restaurants and businesses that might not be saved otherwise |
20:56:54 | <pokechu22> | I tend to focus on things that are niche and wouldn't be likely to be saved otherwise |
21:00:17 | <pokechu22> | One benefit is that since our data ends up on web.archive.org, it's relatively easy to discover if you have the URL of the original website |
21:00:57 | <Junie> | This is super helpful so far. Do y'all mind if I quote you by username in my paper? Like I said, this isn't really going anywhere besides a blackboard submission box, but it would be nice to give credit to the parties responsible for certain bits of information |
21:01:09 | | BlueMaxima joins |
21:02:06 | <katia> | this channel is publicly logged so it's probably fine to mention people by their usernames |
21:02:07 | <pokechu22> | Sure; the logs for this channel are also public so you could link to https://irclogs.archivete.am/archiveteam-bs/2025-03-03 if you want |
21:02:13 | <katia> | \o\ |
21:03:04 | <Junie> | That's super helpful, I'll do that! |
21:03:25 | <myself> | I'm curious how many warriors are out there, given that the "cost" of running one is supposed to be zero or nearly so (theoretically my desktop may burn an extra penny per month worth of electricity running the warrior than if it wasn't running it?), but there are.... quite a lot of us. |
21:06:28 | <Vokun> | ~1500 ran URLS alone |
21:23:04 | <Junie> | Okay so! Fourth(?) question: Is there a policy in place regarding copyright/intellectual property? I know the Internet Archive in general tends to play it pretty fast and loose, but if there's an internal stance, that's what I'm looking for with this question |
21:38:03 | | HP_Archivist quits [Quit: Leaving] |
21:40:48 | | kansei quits [Quit: ZNC 1.9.1 - https://znc.in] |
21:44:30 | | kansei (kansei) joins |
22:01:30 | | Dango360 quits [Read error: Connection reset by peer] |
22:07:00 | | Dango360 (Dango360) joins |
22:14:46 | | etnguyen03 (etnguyen03) joins |
22:21:34 | <Junie> | Additional question, are there any sorts of materials the Archive Team specifically DOESNT digitally preserve? Websites that are offlimits/not considered? I assume given the bulk of data y'all are working with, things that aren't at risk are low priority, but is there anything that you specifically avoid? |
22:21:35 | <h2ibot> | Petchea edited Niconico (-61, /* Extraction tools */ Nicochart appears to…): https://wiki.archiveteam.org/?diff=54519&oldid=54518 |
22:22:34 | | nine quits [Ping timeout: 250 seconds] |
22:23:16 | | nine joins |
22:23:16 | | nine is now authenticated as nine |
22:23:16 | | nine quits [Changing host] |
22:23:16 | | nine (nine) joins |
22:29:54 | | flotwig quits [Remote host closed the connection] |
22:32:37 | <h2ibot> | Usernam edited List of websites excluded from the Wayback Machine (-27, Why is User:JAABot not sorting and counting this?): https://wiki.archiveteam.org/?diff=54520&oldid=54513 |
22:32:59 | | flotwig joins |
22:38:11 | | NeonGlitch quits [Client Quit] |
22:41:49 | <Vokun> | Mostly just things like these https://wiki.archiveteam.org/index.php/List_of_websites_excluded_from_the_Wayback_Machine |
22:42:05 | <Vokun> | And things we know archive themselves like wikipedia |
22:43:19 | <Vokun> | Some of us might still archive these that are excluded, but they won't be going into the wayback machine because the companies behind these sites have requested/demanded IA to not display them |
22:43:39 | <Vokun> | Just usually not an official project or anything would be started for one like htat |
22:43:41 | <Vokun> | that* |
22:47:02 | <Junie> | Makes sense! Thanks! |
22:48:01 | <TheTechRobo> | Not sure what the exact stance is, but our motto is generally "archive first, ask questions later" |
22:56:08 | <Junie> | The last question I have for now (I think) is if there's a procedure for the creation/qc for metadata for things being uploaded? Who creates it, how is it created, is there a specific workflow, or is this another situation of case to case? |
23:05:43 | | riteo quits [Ping timeout: 260 seconds] |
23:13:50 | <Vokun> | for the smaller projects, it can be case by case, like in #discard for the discord archive. This one operates just, whoever grabs a discord server first, and however they label it. I try to be somewhat verbose with metadata as I can, without putting in too much extra research per server. |
23:14:19 | <Vokun> | Main projects, have a standard for metadata, since they are automated |
23:16:25 | <Vokun> | But other projects like #burnthetwitch and #wikibot are automated too, so they have separate standards, from their respective admins |
23:17:08 | <Junie> | What's the format/schema that's generally used for the automated ones? RDA? |
23:17:11 | | Shyy quits [Quit: The Lounge - https://thelounge.chat] |
23:17:38 | <Vokun> | https://archive.org/details/archiveteam_usgovernment_20250131232111_96ad506d here's one example |
23:17:42 | <Junie> | (Also, thank you so much for being so helpful Vokun! You've made a relatively intimidating project much easier for me to complete, and I really appreciate it!) |
23:18:26 | <Junie> | oh thanks! |
23:19:31 | <Vokun> | I don't know the specifics on metadata in the main projects. Just glad to spit out what I know, and I'm glad others corrected some asumptions I made |
23:20:05 | | Island joins |
23:23:08 | | Shyy4 joins |
23:25:39 | | riteo (riteo) joins |
23:29:00 | <Junie> | That's totally fine, if its more of a case to case that works, and I can dig into the metadata thats available on internet archive to find formatting info, so that example you sent is super helpful! |
23:36:55 | | lennier2_ joins |
23:37:16 | <Vokun> | Here's a few examples from #Discard https://archive.org/details/Monolith-Productions-Official-Discord-Archive-635909048184733726 |
23:37:17 | <Vokun> | https://archive.org/details/discord-938116342538706975 |
23:39:23 | <Vokun> | Here's the twitch collection https://archive.org/details/archiveteam_twitch_metadata?tab=collection |
23:40:08 | | lennier2 quits [Ping timeout: 260 seconds] |
23:42:04 | <Vokun> | wikiteam item https://archive.org/details/wiki-wiki.totemarts.games-20250303 |
23:49:36 | <Junie> | Awesome, thanks so much y'all! I'll be back if I have more questions, but that should be everything I needed! |