00:10:10tbc1887 (tbc1887) joins
00:24:50googler joins
00:25:01<googler>Hello everyone!
00:25:16<googler>Does anyone here have the access to the Google Video archive?
00:33:08tbc1887 quits [Client Quit]
00:49:18googler quits [Remote host closed the connection]
00:49:40<pabs>is there an effort to archive Peru? seems there is widespread chaos there at the moment
00:57:34<pokechu22>https://en.wikipedia.org/wiki/2022%E2%80%932023_Peruvian_political_protests - I'm not aware of any project
00:57:40rocketdive quits [Remote host closed the connection]
00:58:04rocketdive joins
01:03:10<Frogging101>buffer set read
01:07:54<audrooku|m>Frogging101: ?
01:08:01<Frogging101>I made a mistake
01:08:08<Frogging101>disregard
02:05:23rocketdive quits [Ping timeout: 265 seconds]
02:07:26<fuzzy8021>arkiver JAA probably can change default project
02:56:39<@JAA>Yup, I've switched it to Telegram for the time being.
03:00:28<h2ibot>JAABot edited CurrentWarriorProject (-2): https://wiki.archiveteam.org/?diff=49388&oldid=49386
03:29:55sonick quits [Client Quit]
04:27:01balrog quits [Quit: Bye]
04:27:53balrog (balrog) joins
04:30:27sonick (sonick) joins
04:34:13<sonick>Is there a project here to archive web pages of educational institutions that are not endangered, such as schools?
04:38:05<pokechu22>There's a list at https://wiki.archiveteam.org/index.php/University_Web_Hosting and stuff is run via #archivebot sometimes, but I don't think there's anything organized specifically for that purpose
04:38:45<pabs>JAA: re mailman list archives, did you get https://lists.mplayerhq.hu/mailman/listinfo ?
04:39:52<pabs>and https://lists.ffmpeg.org/mailman/listinfo (but its lists are hidden, a list is on https://ffmpeg.org/contact.html#MailingLists)
04:50:54<@JAA>pabs: I didn't. I never really went after Mailman instances systematically apart from the ICANN & Co. ones.
04:51:17<pabs>ah, I thought you used that long list I sent you :(
04:54:03<@JAA>https://paste.debian.net/hidden/bef92430/ ?
04:54:34<@JAA>Doesn't look like I ever did anything with that, no. It might still be on a todo list somewhere.
04:55:47<pabs>thats the first one, there was a later one that was longer
04:56:37<pabs>looks like the first one misses mplayerhq
04:57:21<@JAA>https://paste.debian.net/plain/1258270 ?
04:58:26<@JAA>I think this was a case of 'I don't have time to deal with this right now due to other more urgent things, and it isn't in immediate danger, so I'll get back to it at some point', and then I forgor. :-/
05:00:31<pabs>yep, ah
05:01:46<pabs>I expect there are many more instances out there too
05:01:59<@JAA>Yeah, certainly.
05:09:57atphoenix_ is now known as atphoenix
05:11:48<h2ibot>JustAnotherArchivist edited Deathwatch (+67, /* 2023 */ Add AltspaceVR): https://wiki.archiveteam.org/?diff=49389&oldid=49387
05:13:21pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
05:15:36pabs (pabs) joins
05:20:42<pabs>I see from #archivebot some of them are starting to go away (lists.pidgin.im)
05:24:08<pokechu22>https://lists.ffmpeg.org/mailman/listinfo/mailman says "The current archive is only available to the list members."
05:25:05<pabs>thats the sysadmin list, see here for public archives https://ffmpeg.org/contact.html#MailingLists
05:25:11<pokechu22>Oh, wait, that's not the list being discussed
05:36:55@Fusl quits [Max SendQ exceeded]
05:37:07Fusl (Fusl) joins
05:37:07@ChanServ sets mode: +o Fusl
05:48:42BlueMaxima quits [Read error: Connection reset by peer]
06:21:07Island quits [Read error: Connection reset by peer]
07:44:56Iki quits [Remote host closed the connection]
07:45:14Iki joins
07:50:00<@JAA>TIL of https://coverartarchive.org/ , a collaboration between MusicBrainz and IA that stores the covers on IA, organised by MB release IDs. Neat.
07:56:37lennier1 (lennier1) joins
08:32:07hitgrr8 joins
09:00:16sonick quits [Client Quit]
09:00:29<h2ibot>JustAnotherArchivist edited Deathwatch (+230, /* 2023 */ Add Issuu free account changes): https://wiki.archiveteam.org/?diff=49390&oldid=49389
11:07:15Megame (Megame) joins
11:27:56jacksonchen666 quits [Remote host closed the connection]
11:30:53jacksonchen666 (jacksonchen666) joins
11:33:57jacksonchen666 quits [Remote host closed the connection]
11:35:12jacksonchen666 (jacksonchen666) joins
12:05:03lunik17 quits [Quit: Ping timeout (120 seconds)]
12:06:13@Sanqui quits [Quit: .]
12:09:00lunik17 joins
12:09:29Sanqui joins
12:09:31Sanqui quits [Changing host]
12:09:31Sanqui (Sanqui) joins
12:09:31@ChanServ sets mode: +o Sanqui
12:16:12Megame quits [Client Quit]
12:46:32sonick (sonick) joins
12:47:45leo60228- quits [Quit: ZNC 1.8.2 - https://znc.in]
12:48:08leo60228 (leo60228) joins
13:14:41pie_ quits []
13:15:15pie_ joins
13:46:39jacksonchen666_ (jacksonchen666) joins
13:47:32jacksonchen666 quits [Ping timeout: 276 seconds]
13:51:29jacksonchen666_ is now known as jacksonchen666
14:25:43lunik17 quits [Ping timeout: 252 seconds]
14:27:48@Sanqui quits [Client Quit]
14:30:43Sanqui joins
14:31:10lunik17 joins
14:31:45Sanqui quits [Changing host]
14:31:45Sanqui (Sanqui) joins
14:31:45@ChanServ sets mode: +o Sanqui
14:46:01rocketdive joins
15:05:09Megame (Megame) joins
15:16:42wessel1512 joins
15:18:23jacksonchen666 quits [Remote host closed the connection]
15:19:32jacksonchen666 (jacksonchen666) joins
15:31:08<rocketdive>i'm trying to run the warrior on python but when i try to do ./get-wget-lua.sh it says permission denied?
15:31:16<rocketdive>https://github.com/ArchiveTeam/telegram-grab#running-without-a-warrior-or-docker
15:31:51<rocketdive>i'm cd into the directory and seesaw downloaded but the wget thing is not letting me go any further
16:09:28<avoozl>TheTechRobo: did some cross testing with warc reading in python and rust, and now my rust reader seems ok. I only need reading so I'll stick with that
16:21:02<ivan>I would also like to read WARCs in Rust later
16:36:00sonick quits [Client Quit]
16:41:01Megame quits [Ping timeout: 252 seconds]
17:15:50IDK quits [Quit: Connection closed for inactivity]
17:18:33<@JAA>rocketdive: Unfortunately, it's not straightforward to run things correctly without using Docker (or the warrior, which uses Docker internally). I'd recommend against it. We've had various issues with it before, including invalid data being submitted.
17:22:10<rocketdive>thanks JAA, is there any lighter way to run multiple concurrencies at once because docker doesn't half kill my CPU but if that's the only way i'll just have to deal with it
17:24:15<@JAA>I haven't seen Docker use much CPU. The vast, vast majority of it is spent in wget-at anyway in my experience. Might depend on the OS though; I imagine performance could be worse on Windows, in particular, due to having to emulate/bridge a whole lot of things between the two worlds in more complicated ways.
17:24:58<@JAA>It's not impossible to run things outside of Docker, but it's very easy to mess it up and return weird or unusable data.
17:26:42<rocketdive>yeah i'm running it on a windows 11 laptop, do you think it would be easier for me to run docker on a linux VM or should i just deal with windows
17:27:15<rocketdive>sorry for the 500 questions lol
17:28:14<@JAA>The least painful way on Windows is almost certainly the warrior VM. That'll only get so much work done though as its concurrency is much more limited than non-warrior. And you'd need a separate VM for each project you want to run.
17:29:58<@JAA>If you want to go the Docker route, I imagine running Docker on Windows is less pain than doing it through a VM. I abandoned Windows before Docker was a thing though, so I can't comment on that further.
17:31:54<rocketdive>okay thank you for your insight! i'll just stick to how i'm currently doing it. although do you think they will ever up the concurrency limit on the warrior vm?
17:32:12<@JAA>The other question is whether you actually want to run a lot of things on this machine. If you also use it directly personally, that might not be too pleasant. Just the warrior VM in the background should be fine, I suppose.
17:32:25<@JAA>We probably won't change that limit, no.
17:32:31IDK (IDK) joins
17:33:48<rocketdive>got it! thank you so much for your help :)
17:50:16NF885 joins
17:50:58<h2ibot>SaveThEWhatNow created Talk:List of lost Twitter accounts (+410, Created page with "== @elonjet and similar == …): https://wiki.archiveteam.org/?title=Talk%3AList%20of%20lost%20Twitter%20accounts
17:50:59<h2ibot>Arcorann edited 4chan (+4116, we need to update this more often. I think this…): https://wiki.archiveteam.org/?diff=49392&oldid=49048
17:51:00<h2ibot>QUACKITY edited GeoCities (+73, Updated external links, previously dead links…): https://wiki.archiveteam.org/?diff=49393&oldid=49385
17:51:58<h2ibot>QUACKITY edited Usenet (+106, added usenet archive): https://wiki.archiveteam.org/?diff=49394&oldid=47408
17:52:17<NF885>hey can somebody run ArchiveBot on https://projects.propublica.org/politwoops as it's currently not running to do the Twitter API stuff
17:53:11<NF885>there's some links without display text on https://projects.propublica.org/politwoops/users but the bot will see them
17:53:29NF885 quits [Remote host closed the connection]
17:59:59<h2ibot>OrIdow6 edited Revue (+189, Dead): https://wiki.archiveteam.org/?diff=49395&oldid=49380
18:00:00<h2ibot>OrIdow6 edited Revue (+0, .): https://wiki.archiveteam.org/?diff=49396&oldid=49395
18:00:59<h2ibot>OrIdow6 edited Deathwatch (+14, Revue is dead. First entry of 2023!): https://wiki.archiveteam.org/?diff=49397&oldid=49390
18:01:39<@JAA>Hooray :-(
18:01:52kiwec joins
18:02:00<@JAA>Yeah, I guess there are a few more entries that need moving to the dead section.
18:02:23<@JAA>Where do we stand on Zhihu and Webry?
18:04:37<kiwec>Hello, I've seen that GameTrailers videos have been archived, but idk how do search through the archive. Can somebody send me a link to this one? http://www.gametrailers.com/video/br-503-bonus-round/712234
18:05:00<h2ibot>Nintendofan885 edited Template:Twlock (+7, marked for deletion but add alt text for the…): https://wiki.archiveteam.org/?diff=49398&oldid=46738
18:07:33<@OrIdow6>I see some mentions in my logs of Webry being run in #Y
18:07:52<@OrIdow6>Looking into Zhihu Circles now though it looks like arkiver asked some questions about it last month
18:15:37<@JAA>kiwec: Finding that might be tricky, if it even still existed at the time of archival. https://web.archive.org/web/20160211210315/http://embed.gametrailers.com/embed/712234 looks pretty empty.
18:16:58<@JAA>Actually, looks like that doesn't mean anything.
18:19:44<@JAA>Actually, it does. Here's a 'working' example (although it doesn't actually play the video): https://web.archive.org/web/20160211004522/http://embed.gametrailers.com/embed/3003140
18:21:10<@JAA>So yeah, this might be one of the 'many of the videos had already been removed months before' mentioned on our wiki.
18:29:29Island joins
18:33:52<kiwec>oof, thanks
19:05:09<kiwec>For anyone that might have ideas on how to resurrect this video, it was "Bonus Round: Episode 503: The Indie Revolution Part 1 HD". Here's the only frame that exists online https://jetsetnick.files.wordpress.com/2011/04/bonusround.jpg
19:19:26<@JAA>http://media.mtvnservices.com/mgid:moses:video:gametrailers.com:712234 serves an SWF and includes a configuration URI in the redirect location, but I'm not getting any further. Might need to decompile the SWF or similar to figure out what it's doing.
19:23:47<kiwec>I also had another URL for that embed, http://media.mtvnservices.com/embed/mgid:arc:episode:gametrailers.com:96ce015b-88b3-4a20-bcd1-7e929598f9bc
19:54:12NewArchiver joins
20:04:21<Ryz>Unsure if this is talked about, but Megame brought up that Issuu is making changes to the free plan as per https://old.reddit.com/r/DataHoarder/comments/10fg8f3/issuu_making_changes_that_will_make_a_lot_of/ (basically we need a lot of archiving) - and kiska brought up that we can explore the contents via https://issuu.com/categories
20:04:37<Ryz>Basically the change will happen on 203 February 23
20:14:06kiwec quits [Client Quit]
20:55:50DLoader quits [Ping timeout: 264 seconds]
20:56:47fishingforsoup_ quits [Read error: Connection reset by peer]
21:04:41DLoader joins
21:15:04<NewArchiver>hello! is there an easier way to look for a warc save of a specific webpage within the Archive Team: URLs collection on archive.org? i've been scrolling through multiple cdx and json files for an hour or two now. there has to be a better way right??
21:20:27<pokechu22>There should be a better way, yes... I'm working on an example
21:22:28<pokechu22>OK, https://web.archive.org/web/20230118070015/http://www.portents.com/ is an archivebot capture, and if you run `curl --head https://web.archive.org/web/20230118070015/http://www.portents.com/` in a command prompt window you should see `x-archive-src: archiveteam_archivebot_go_20230118091035_6fdb953e/disktracker.com-inf-20230118-070007-frs1x-00000.warc.gz`. That means it's
21:22:29<pokechu22>in https://archive.org/details/archiveteam_archivebot_go_20230118091035_6fdb953e/ (and specifically is https://archive.org/download/archiveteam_archivebot_go_20230118091035_6fdb953e/disktracker.com-inf-20230118-070007-frs1x-00000.warc.gz).
21:22:43<pokechu22>I think Archive Team: URLs is in a slightly different format, but the same basic procedure should work
21:27:20<NewArchiver>Thank you so much! I was beginning to go a bit crazy haha
21:46:13NewArchiver leaves
21:48:26<@JAA>Ryz: It has, it's on Deathwatch.
21:55:49rocketdive quits [Remote host closed the connection]
21:56:23rocketdive joins
22:11:03rocketdive quits [Read error: Connection reset by peer]
22:13:01BlueMaxima joins
22:15:36rocketdive joins
22:29:10<tzt>A-port, a Japanese crowdfunding website, is shutting down February 28 https://a-port.asahi.com/column/detail/280/
22:45:00Ruthalas5 quits [Client Quit]
23:09:28hitgrr8 quits [Client Quit]
23:29:19Nexus joins
23:32:04<Nexus>getting a whole lot of 404s from https://tracker.archiveteam.org:1338/api/get , ik it says its normal but i have like four workers all just waiting to get a job, is this to be expected?
23:33:02<@JAA>Yes
23:35:25<@JAA>The explanation used to be easy, but these days the tracker's item handling's quite complicated. But yeah, 404s are normal.