| 00:10:10 | | tbc1887 (tbc1887) joins |
| 00:24:50 | | googler joins |
| 00:25:01 | <googler> | Hello everyone! |
| 00:25:16 | <googler> | Does anyone here have the access to the Google Video archive? |
| 00:33:08 | | tbc1887 quits [Client Quit] |
| 00:49:18 | | googler quits [Remote host closed the connection] |
| 00:49:40 | <pabs> | is there an effort to archive Peru? seems there is widespread chaos there at the moment |
| 00:57:34 | <pokechu22> | https://en.wikipedia.org/wiki/2022%E2%80%932023_Peruvian_political_protests - I'm not aware of any project |
| 00:57:40 | | rocketdive quits [Remote host closed the connection] |
| 00:58:04 | | rocketdive joins |
| 01:03:10 | <Frogging101> | buffer set read |
| 01:07:54 | <audrooku|m> | Frogging101: ? |
| 01:08:01 | <Frogging101> | I made a mistake |
| 01:08:08 | <Frogging101> | disregard |
| 02:05:23 | | rocketdive quits [Ping timeout: 265 seconds] |
| 02:07:26 | <fuzzy8021> | arkiver JAA probably can change default project |
| 02:56:39 | <@JAA> | Yup, I've switched it to Telegram for the time being. |
| 03:00:28 | <h2ibot> | JAABot edited CurrentWarriorProject (-2): https://wiki.archiveteam.org/?diff=49388&oldid=49386 |
| 03:29:55 | | sonick quits [Client Quit] |
| 04:27:01 | | balrog quits [Quit: Bye] |
| 04:27:53 | | balrog (balrog) joins |
| 04:30:27 | | sonick (sonick) joins |
| 04:34:13 | <sonick> | Is there a project here to archive web pages of educational institutions that are not endangered, such as schools? |
| 04:38:05 | <pokechu22> | There's a list at https://wiki.archiveteam.org/index.php/University_Web_Hosting and stuff is run via #archivebot sometimes, but I don't think there's anything organized specifically for that purpose |
| 04:38:45 | <pabs> | JAA: re mailman list archives, did you get https://lists.mplayerhq.hu/mailman/listinfo ? |
| 04:39:52 | <pabs> | and https://lists.ffmpeg.org/mailman/listinfo (but its lists are hidden, a list is on https://ffmpeg.org/contact.html#MailingLists) |
| 04:50:54 | <@JAA> | pabs: I didn't. I never really went after Mailman instances systematically apart from the ICANN & Co. ones. |
| 04:51:17 | <pabs> | ah, I thought you used that long list I sent you :( |
| 04:54:03 | <@JAA> | https://paste.debian.net/hidden/bef92430/ ? |
| 04:54:34 | <@JAA> | Doesn't look like I ever did anything with that, no. It might still be on a todo list somewhere. |
| 04:55:47 | <pabs> | thats the first one, there was a later one that was longer |
| 04:56:37 | <pabs> | looks like the first one misses mplayerhq |
| 04:57:21 | <@JAA> | https://paste.debian.net/plain/1258270 ? |
| 04:58:26 | <@JAA> | I think this was a case of 'I don't have time to deal with this right now due to other more urgent things, and it isn't in immediate danger, so I'll get back to it at some point', and then I forgor. :-/ |
| 05:00:31 | <pabs> | yep, ah |
| 05:01:46 | <pabs> | I expect there are many more instances out there too |
| 05:01:59 | <@JAA> | Yeah, certainly. |
| 05:09:57 | | atphoenix_ is now known as atphoenix |
| 05:11:48 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+67, /* 2023 */ Add AltspaceVR): https://wiki.archiveteam.org/?diff=49389&oldid=49387 |
| 05:13:21 | | pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.] |
| 05:15:36 | | pabs (pabs) joins |
| 05:20:42 | <pabs> | I see from #archivebot some of them are starting to go away (lists.pidgin.im) |
| 05:24:08 | <pokechu22> | https://lists.ffmpeg.org/mailman/listinfo/mailman says "The current archive is only available to the list members." |
| 05:25:05 | <pabs> | thats the sysadmin list, see here for public archives https://ffmpeg.org/contact.html#MailingLists |
| 05:25:11 | <pokechu22> | Oh, wait, that's not the list being discussed |
| 05:36:55 | | @Fusl quits [Max SendQ exceeded] |
| 05:37:07 | | Fusl (Fusl) joins |
| 05:37:07 | | @ChanServ sets mode: +o Fusl |
| 05:48:42 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 06:21:07 | | Island quits [Read error: Connection reset by peer] |
| 07:44:56 | | Iki quits [Remote host closed the connection] |
| 07:45:14 | | Iki joins |
| 07:50:00 | <@JAA> | TIL of https://coverartarchive.org/ , a collaboration between MusicBrainz and IA that stores the covers on IA, organised by MB release IDs. Neat. |
| 07:56:37 | | lennier1 (lennier1) joins |
| 08:32:07 | | hitgrr8 joins |
| 09:00:16 | | sonick quits [Client Quit] |
| 09:00:29 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+230, /* 2023 */ Add Issuu free account changes): https://wiki.archiveteam.org/?diff=49390&oldid=49389 |
| 11:07:15 | | Megame (Megame) joins |
| 11:27:56 | | jacksonchen666 quits [Remote host closed the connection] |
| 11:30:53 | | jacksonchen666 (jacksonchen666) joins |
| 11:33:57 | | jacksonchen666 quits [Remote host closed the connection] |
| 11:35:12 | | jacksonchen666 (jacksonchen666) joins |
| 12:05:03 | | lunik17 quits [Quit: Ping timeout (120 seconds)] |
| 12:06:13 | | @Sanqui quits [Quit: .] |
| 12:09:00 | | lunik17 joins |
| 12:09:29 | | Sanqui joins |
| 12:09:31 | | Sanqui is now authenticated as Sanqui |
| 12:09:31 | | Sanqui quits [Changing host] |
| 12:09:31 | | Sanqui (Sanqui) joins |
| 12:09:31 | | @ChanServ sets mode: +o Sanqui |
| 12:16:12 | | Megame quits [Client Quit] |
| 12:46:32 | | sonick (sonick) joins |
| 12:47:45 | | leo60228- quits [Quit: ZNC 1.8.2 - https://znc.in] |
| 12:48:08 | | leo60228 (leo60228) joins |
| 13:14:41 | | pie_ quits [] |
| 13:15:15 | | pie_ joins |
| 13:46:39 | | jacksonchen666_ (jacksonchen666) joins |
| 13:47:32 | | jacksonchen666 quits [Ping timeout: 276 seconds] |
| 13:51:29 | | jacksonchen666_ is now known as jacksonchen666 |
| 14:25:43 | | lunik17 quits [Ping timeout: 252 seconds] |
| 14:27:48 | | @Sanqui quits [Client Quit] |
| 14:30:43 | | Sanqui joins |
| 14:31:10 | | lunik17 joins |
| 14:31:45 | | Sanqui is now authenticated as Sanqui |
| 14:31:45 | | Sanqui quits [Changing host] |
| 14:31:45 | | Sanqui (Sanqui) joins |
| 14:31:45 | | @ChanServ sets mode: +o Sanqui |
| 14:46:01 | | rocketdive joins |
| 15:05:09 | | Megame (Megame) joins |
| 15:16:42 | | wessel1512 joins |
| 15:18:23 | | jacksonchen666 quits [Remote host closed the connection] |
| 15:19:32 | | jacksonchen666 (jacksonchen666) joins |
| 15:31:08 | <rocketdive> | i'm trying to run the warrior on python but when i try to do ./get-wget-lua.sh it says permission denied? |
| 15:31:16 | <rocketdive> | https://github.com/ArchiveTeam/telegram-grab#running-without-a-warrior-or-docker |
| 15:31:51 | <rocketdive> | i'm cd into the directory and seesaw downloaded but the wget thing is not letting me go any further |
| 16:09:28 | <avoozl> | TheTechRobo: did some cross testing with warc reading in python and rust, and now my rust reader seems ok. I only need reading so I'll stick with that |
| 16:21:02 | <ivan> | I would also like to read WARCs in Rust later |
| 16:36:00 | | sonick quits [Client Quit] |
| 16:41:01 | | Megame quits [Ping timeout: 252 seconds] |
| 17:15:50 | | IDK quits [Quit: Connection closed for inactivity] |
| 17:18:33 | <@JAA> | rocketdive: Unfortunately, it's not straightforward to run things correctly without using Docker (or the warrior, which uses Docker internally). I'd recommend against it. We've had various issues with it before, including invalid data being submitted. |
| 17:22:10 | <rocketdive> | thanks JAA, is there any lighter way to run multiple concurrencies at once because docker doesn't half kill my CPU but if that's the only way i'll just have to deal with it |
| 17:24:15 | <@JAA> | I haven't seen Docker use much CPU. The vast, vast majority of it is spent in wget-at anyway in my experience. Might depend on the OS though; I imagine performance could be worse on Windows, in particular, due to having to emulate/bridge a whole lot of things between the two worlds in more complicated ways. |
| 17:24:58 | <@JAA> | It's not impossible to run things outside of Docker, but it's very easy to mess it up and return weird or unusable data. |
| 17:26:42 | <rocketdive> | yeah i'm running it on a windows 11 laptop, do you think it would be easier for me to run docker on a linux VM or should i just deal with windows |
| 17:27:15 | <rocketdive> | sorry for the 500 questions lol |
| 17:28:14 | <@JAA> | The least painful way on Windows is almost certainly the warrior VM. That'll only get so much work done though as its concurrency is much more limited than non-warrior. And you'd need a separate VM for each project you want to run. |
| 17:29:58 | <@JAA> | If you want to go the Docker route, I imagine running Docker on Windows is less pain than doing it through a VM. I abandoned Windows before Docker was a thing though, so I can't comment on that further. |
| 17:31:54 | <rocketdive> | okay thank you for your insight! i'll just stick to how i'm currently doing it. although do you think they will ever up the concurrency limit on the warrior vm? |
| 17:32:12 | <@JAA> | The other question is whether you actually want to run a lot of things on this machine. If you also use it directly personally, that might not be too pleasant. Just the warrior VM in the background should be fine, I suppose. |
| 17:32:25 | <@JAA> | We probably won't change that limit, no. |
| 17:32:31 | | IDK (IDK) joins |
| 17:33:48 | <rocketdive> | got it! thank you so much for your help :) |
| 17:50:16 | | NF885 joins |
| 17:50:58 | <h2ibot> | SaveThEWhatNow created Talk:List of lost Twitter accounts (+410, Created page with "== @elonjet and similar == …): https://wiki.archiveteam.org/?title=Talk%3AList%20of%20lost%20Twitter%20accounts |
| 17:50:59 | <h2ibot> | Arcorann edited 4chan (+4116, we need to update this more often. I think this…): https://wiki.archiveteam.org/?diff=49392&oldid=49048 |
| 17:51:00 | <h2ibot> | QUACKITY edited GeoCities (+73, Updated external links, previously dead links…): https://wiki.archiveteam.org/?diff=49393&oldid=49385 |
| 17:51:58 | <h2ibot> | QUACKITY edited Usenet (+106, added usenet archive): https://wiki.archiveteam.org/?diff=49394&oldid=47408 |
| 17:52:17 | <NF885> | hey can somebody run ArchiveBot on https://projects.propublica.org/politwoops as it's currently not running to do the Twitter API stuff |
| 17:53:11 | <NF885> | there's some links without display text on https://projects.propublica.org/politwoops/users but the bot will see them |
| 17:53:29 | | NF885 quits [Remote host closed the connection] |
| 17:59:59 | <h2ibot> | OrIdow6 edited Revue (+189, Dead): https://wiki.archiveteam.org/?diff=49395&oldid=49380 |
| 18:00:00 | <h2ibot> | OrIdow6 edited Revue (+0, .): https://wiki.archiveteam.org/?diff=49396&oldid=49395 |
| 18:00:59 | <h2ibot> | OrIdow6 edited Deathwatch (+14, Revue is dead. First entry of 2023!): https://wiki.archiveteam.org/?diff=49397&oldid=49390 |
| 18:01:39 | <@JAA> | Hooray :-( |
| 18:01:52 | | kiwec joins |
| 18:02:00 | <@JAA> | Yeah, I guess there are a few more entries that need moving to the dead section. |
| 18:02:23 | <@JAA> | Where do we stand on Zhihu and Webry? |
| 18:04:37 | <kiwec> | Hello, I've seen that GameTrailers videos have been archived, but idk how do search through the archive. Can somebody send me a link to this one? http://www.gametrailers.com/video/br-503-bonus-round/712234 |
| 18:05:00 | <h2ibot> | Nintendofan885 edited Template:Twlock (+7, marked for deletion but add alt text for the…): https://wiki.archiveteam.org/?diff=49398&oldid=46738 |
| 18:07:33 | <@OrIdow6> | I see some mentions in my logs of Webry being run in #Y |
| 18:07:52 | <@OrIdow6> | Looking into Zhihu Circles now though it looks like arkiver asked some questions about it last month |
| 18:15:37 | <@JAA> | kiwec: Finding that might be tricky, if it even still existed at the time of archival. https://web.archive.org/web/20160211210315/http://embed.gametrailers.com/embed/712234 looks pretty empty. |
| 18:16:58 | <@JAA> | Actually, looks like that doesn't mean anything. |
| 18:19:44 | <@JAA> | Actually, it does. Here's a 'working' example (although it doesn't actually play the video): https://web.archive.org/web/20160211004522/http://embed.gametrailers.com/embed/3003140 |
| 18:21:10 | <@JAA> | So yeah, this might be one of the 'many of the videos had already been removed months before' mentioned on our wiki. |
| 18:29:29 | | Island joins |
| 18:33:52 | <kiwec> | oof, thanks |
| 19:05:09 | <kiwec> | For anyone that might have ideas on how to resurrect this video, it was "Bonus Round: Episode 503: The Indie Revolution Part 1 HD". Here's the only frame that exists online https://jetsetnick.files.wordpress.com/2011/04/bonusround.jpg |
| 19:19:26 | <@JAA> | http://media.mtvnservices.com/mgid:moses:video:gametrailers.com:712234 serves an SWF and includes a configuration URI in the redirect location, but I'm not getting any further. Might need to decompile the SWF or similar to figure out what it's doing. |
| 19:23:47 | <kiwec> | I also had another URL for that embed, http://media.mtvnservices.com/embed/mgid:arc:episode:gametrailers.com:96ce015b-88b3-4a20-bcd1-7e929598f9bc |
| 19:54:12 | | NewArchiver joins |
| 20:04:21 | <Ryz> | Unsure if this is talked about, but Megame brought up that Issuu is making changes to the free plan as per https://old.reddit.com/r/DataHoarder/comments/10fg8f3/issuu_making_changes_that_will_make_a_lot_of/ (basically we need a lot of archiving) - and kiska brought up that we can explore the contents via https://issuu.com/categories |
| 20:04:37 | <Ryz> | Basically the change will happen on 203 February 23 |
| 20:14:06 | | kiwec quits [Client Quit] |
| 20:55:50 | | DLoader quits [Ping timeout: 264 seconds] |
| 20:56:47 | | fishingforsoup_ quits [Read error: Connection reset by peer] |
| 21:04:41 | | DLoader joins |
| 21:15:04 | <NewArchiver> | hello! is there an easier way to look for a warc save of a specific webpage within the Archive Team: URLs collection on archive.org? i've been scrolling through multiple cdx and json files for an hour or two now. there has to be a better way right?? |
| 21:20:27 | <pokechu22> | There should be a better way, yes... I'm working on an example |
| 21:22:28 | <pokechu22> | OK, https://web.archive.org/web/20230118070015/http://www.portents.com/ is an archivebot capture, and if you run `curl --head https://web.archive.org/web/20230118070015/http://www.portents.com/` in a command prompt window you should see `x-archive-src: archiveteam_archivebot_go_20230118091035_6fdb953e/disktracker.com-inf-20230118-070007-frs1x-00000.warc.gz`. That means it's |
| 21:22:29 | <pokechu22> | in https://archive.org/details/archiveteam_archivebot_go_20230118091035_6fdb953e/ (and specifically is https://archive.org/download/archiveteam_archivebot_go_20230118091035_6fdb953e/disktracker.com-inf-20230118-070007-frs1x-00000.warc.gz). |
| 21:22:43 | <pokechu22> | I think Archive Team: URLs is in a slightly different format, but the same basic procedure should work |
| 21:27:20 | <NewArchiver> | Thank you so much! I was beginning to go a bit crazy haha |
| 21:46:13 | | NewArchiver leaves |
| 21:48:26 | <@JAA> | Ryz: It has, it's on Deathwatch. |
| 21:55:49 | | rocketdive quits [Remote host closed the connection] |
| 21:56:23 | | rocketdive joins |
| 22:11:03 | | rocketdive quits [Read error: Connection reset by peer] |
| 22:13:01 | | BlueMaxima joins |
| 22:15:36 | | rocketdive joins |
| 22:29:10 | <tzt> | A-port, a Japanese crowdfunding website, is shutting down February 28 https://a-port.asahi.com/column/detail/280/ |
| 22:45:00 | | Ruthalas5 quits [Client Quit] |
| 23:09:28 | | hitgrr8 quits [Client Quit] |
| 23:29:19 | | Nexus joins |
| 23:32:04 | <Nexus> | getting a whole lot of 404s from https://tracker.archiveteam.org:1338/api/get , ik it says its normal but i have like four workers all just waiting to get a job, is this to be expected? |
| 23:33:02 | <@JAA> | Yes |
| 23:35:25 | <@JAA> | The explanation used to be easy, but these days the tracker's item handling's quite complicated. But yeah, 404s are normal. |