| 00:00:36 | <h2ibot> | JAABot edited CurrentWarriorProject (+4): https://wiki.archiveteam.org/?diff=48819&oldid=48818 |
| 00:44:36 | | Arcorann (Arcorann) joins |
| 00:51:05 | | HP_Archivist (HP_Archivist) joins |
| 00:55:56 | | thelounge27 joins |
| 01:07:28 | | march_happy quits [Ping timeout: 265 seconds] |
| 01:08:06 | | march_happy (march_happy) joins |
| 01:22:52 | | sec^nd quits [Remote host closed the connection] |
| 01:23:33 | | sec^nd (second) joins |
| 01:41:18 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 01:55:48 | | march_happy quits [Ping timeout: 265 seconds] |
| 01:56:54 | | march_happy (march_happy) joins |
| 02:03:32 | | march_happy quits [Ping timeout: 265 seconds] |
| 02:06:27 | | march_happy (march_happy) joins |
| 02:08:53 | | march_happy quits [Read error: Connection reset by peer] |
| 02:09:09 | | march_happy (march_happy) joins |
| 02:11:17 | | march_happy quits [Read error: Connection reset by peer] |
| 02:12:44 | | march_happy (march_happy) joins |
| 03:39:31 | | march_happy quits [Read error: Connection reset by peer] |
| 03:39:41 | | march_happy (march_happy) joins |
| 03:42:29 | | march_happy quits [Remote host closed the connection] |
| 03:42:44 | | march_happy (march_happy) joins |
| 03:46:04 | | tzt quits [Ping timeout: 240 seconds] |
| 04:39:16 | | march_happy quits [Ping timeout: 240 seconds] |
| 04:40:21 | | march_happy (march_happy) joins |
| 05:31:36 | | revi (revi) joins |
| 05:47:17 | | HackMii_ quits [Remote host closed the connection] |
| 05:49:51 | | HackMii_ (hacktheplanet) joins |
| 05:52:16 | | march_happy quits [Ping timeout: 240 seconds] |
| 05:53:03 | | march_happy (march_happy) joins |
| 05:54:30 | | sec^nd quits [Remote host closed the connection] |
| 05:55:57 | | sec^nd (second) joins |
| 06:10:16 | | march_happy quits [Read error: Connection reset by peer] |
| 06:10:51 | | march_happy (march_happy) joins |
| 06:25:01 | | march_happy quits [Ping timeout: 265 seconds] |
| 06:25:33 | | march_happy (march_happy) joins |
| 06:28:51 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 07:58:41 | | sonick (sonick) joins |
| 08:33:04 | | qwertyasdfuiopghjkl joins |
| 08:33:56 | | adia quits [Quit: The Lounge - https://thelounge.chat] |
| 08:35:08 | | adia (adia) joins |
| 08:45:46 | | march_happy quits [Ping timeout: 240 seconds] |
| 08:45:51 | | march_happy (march_happy) joins |
| 08:50:16 | | march_happy quits [Ping timeout: 240 seconds] |
| 09:21:06 | | dunger (dunger) joins |
| 10:11:46 | | mutantmonkey quits [Ping timeout: 240 seconds] |
| 10:15:41 | | LeGoupil joins |
| 10:29:34 | | mutantmonkey (mutantmonkey) joins |
| 10:50:36 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 11:15:57 | | qwertyasdfuiopghjkl joins |
| 11:28:46 | | mutantmonkey quits [Ping timeout: 240 seconds] |
| 11:30:47 | | mutantmonkey (mutantmonkey) joins |
| 11:31:48 | | russss quits [Read error: Connection reset by peer] |
| 11:31:53 | | russss (russss) joins |
| 12:19:12 | | Sigma85 joins |
| 12:53:23 | <Sigma85> | is there were i get help with navigating a warc collection? |
| 12:56:34 | <Maakuth|m> | Sigma85: something like this? https://guides.lib.vt.edu/webarchiving/openwarc |
| 12:59:23 | | Hackerpcs quits [Client Quit] |
| 12:59:31 | <@rewby> | Just spotted in the AB channel: 12:57 <nyuuzyou> Hi all, please put tjournal.ru in archivebot, the site closes on September 10 - https://tjournal.ru/team/714914-istoriya-tj-zavershaetsya |
| 13:00:02 | <@rewby> | Did a quick look and it seems like it might be more of a warrior project |
| 13:01:00 | <@rewby> | cc arkiver |
| 13:01:41 | | Hackerpcs (Hackerpcs) joins |
| 13:03:50 | <@HCross> | based on a rough google translation of one of the comments (And they are already gloating on VC: https://vc.ru/media/484803-tjournal-obyavil-o-zakrytii-s-10-sentyabrya . Don't they understand that it's one company that runs all the sites?), is there more to this hmm |
| 13:07:45 | <@HCross> | looks like it's also vc.ru and dtf.ru |
| 13:48:08 | <michaelblob_> | vkon |
| 13:48:18 | <michaelblob_> | woops ignore that |
| 13:58:45 | | wyatt8740 quits [Remote host closed the connection] |
| 13:59:23 | | Megame (Megame) joins |
| 14:09:16 | | Arcorann quits [Ping timeout: 240 seconds] |
| 14:11:18 | | IDK (IDK) joins |
| 14:18:23 | | wyatt8740 joins |
| 14:46:45 | <@arkiver> | yes, let's do a project! |
| 14:59:50 | <@arkiver> | let's think of a channel name |
| 14:59:51 | <@arkiver> | :) |
| 15:01:19 | | thelounge27 quits [Client Quit] |
| 15:20:45 | | wyatt8740 quits [Remote host closed the connection] |
| 15:31:46 | | wyatt8740 joins |
| 15:57:22 | | alexshpilkin (alexshpilkin) joins |
| 16:00:56 | | WesleyBidsnipes joins |
| 16:02:16 | | mutantmonkey quits [Ping timeout: 240 seconds] |
| 16:02:45 | | mutantmonkey (mutantmonkey) joins |
| 16:02:55 | <WesleyBidsnipes> | Hey, I was working on a method of downloading newspaper pdfs over the weekend, and when I searched for a specific detail of my method, the only results that came back were logs to this channel. Old (2017), but I thought I’d stop by. Is that something you guys still work on? |
| 16:09:42 | <Jake> | Was it related to newsgrabber? |
| 16:10:18 | <Jake> | That project stopped some time ago due to some issues with deduplication. |
| 16:11:20 | <WesleyBidsnipes> | Dunno. Someone found a problem with Pagesuite software that made it trivial to download any newspaper e-replica that used that software. |
| 16:11:54 | <WesleyBidsnipes> | I’ve expanded upon it, and been exploring how far back I can get most newspapers… Los Angeles Times I can get back to about 2016, for instance. |
| 16:12:37 | <WesleyBidsnipes> | I’ve been trying to do similar with the non-Pagesuite newspapers (Pressreader and Newsmemory are the big ones), but not having much luck so far. |
| 16:16:04 | <nyuuzyou> | By the way, the same company also had Coub, but it was sold at the last moment |
| 16:31:23 | | wyatt8740 quits [Remote host closed the connection] |
| 16:46:47 | <@JAA> | nyuuzyou: You're referring to tjournal.ru? Do you know what else they own? Seems to all be in danger now. |
| 16:49:58 | | alexshpilkin quits [Ping timeout: 265 seconds] |
| 17:03:55 | | alexshpilkin (alexshpilkin) joins |
| 17:08:32 | | wyatt8740 joins |
| 17:11:07 | | masterX244 quits [Ping timeout: 265 seconds] |
| 17:11:45 | <nyuuzyou> | Yes, only vc.ru, dtf.ru, and tjournal.ru remain as public projects |
| 17:13:18 | <@JAA> | Thanks |
| 17:17:48 | | masterX244 (masterX244) joins |
| 17:19:39 | | Megame quits [Client Quit] |
| 17:27:40 | | jacobk quits [Ping timeout: 240 seconds] |
| 17:27:46 | | HackMii_ quits [Ping timeout: 240 seconds] |
| 17:29:09 | <nyuuzyou> | I think only TJ is really under threat now. It is now blocked in Russia for "fakes about the military operation in Ukraine". They could have just been ordered to shut it down or the company would simply be taken away/destroyed. This has been quite normal practice in Russia since the 90s, only now it has moved to the internet. This is not the first time since the start of the war |
| 17:30:13 | <@arkiver> | we'll get them |
| 17:31:12 | <nyuuzyou> | In the case of vc and dtf we just have more time until the same thing happens to them |
| 17:35:28 | | HackMii_ (hacktheplanet) joins |
| 17:35:33 | <@arkiver> | IA was also sued by the russian government |
| 17:35:45 | <@arkiver> | is there some website on which these accusations from the government as posted? |
| 17:35:59 | <@arkiver> | it could help up find sites that are accused for 'crimes', and get them archived |
| 17:36:53 | <TheTechRobo> | > <arkiver> IA was also sued by the russian government |
| 17:36:55 | <TheTechRobo> | course it was |
| 17:45:04 | <nyuuzyou> | <arkiver> "is there some website on which..." <- Yes and no, there are prohibitions that exist in fact, and on paper do not |
| 17:45:22 | <@arkiver> | can you please post the information that is at least accessible? |
| 17:45:25 | <@arkiver> | or well, the URLs |
| 17:46:19 | <nyuuzyou> | reestr.rublacklist.net here is an unofficial registry, but there are no captchas and other garbage |
| 17:46:29 | <WesleyBidsnipes> | Like, sued in a Russian court, or in some western country’s court? |
| 17:47:10 | <@arkiver> | WesleyBidsnipes: if you have good western sources, totally welcome! |
| 17:47:58 | <@arkiver> | nyuuzyou: looks good. how is the data on this one collected? |
| 17:48:06 | <@arkiver> | just people pointing out the sites are down? |
| 17:48:13 | <@arkiver> | or official releases? |
| 17:49:34 | <@arkiver> | nyuuzyou: if you know of any news sites, or lists of them, please also let me know - we'll check them daily for new articles |
| 17:50:07 | <@arkiver> | and that is in general - any news/government/related sites in the world are welcome, we'll check them hourly or daily for new URLs with #// |
| 17:50:15 | <nyuuzyou> | I'm not sure, this is the site of an organization to combat Internet censorship in Russia, I think they have their own sources |
| 17:50:22 | <@arkiver> | got it |
| 17:50:32 | <@arkiver> | are you able to confirm these sites are actually blocked? |
| 17:51:24 | <@arkiver> | we'll at least archive the "banned" URLs themselves with #// , need to look into deeper crawling as well |
| 17:52:06 | <nyuuzyou> | Not all, this is a list of all ever-banned sites in Russia, but the unblocked ones are already marked |
| 17:52:40 | <@arkiver> | perfect. we'll get a copy up to certain depth of all of them. (unlimited depth (aka full copy) not sure yet) |
| 17:52:53 | <@arkiver> | if we have these block lists for other countries as well, let me know! |
| 17:55:54 | <@arkiver> | nyuuzyou: i don't see an easy method of exporting all entries from that site. do you see any? if not it'll be scraped, hopefully they don't limit the pages |
| 17:57:57 | <nyuuzyou> | they have an API for this - https://reestr.rublacklist.net/article/api/ |
| 17:59:57 | <nyuuzyou> | or as a last resort https://github.com/zapret-info/z-i |
| 18:03:21 | <@arkiver> | nyuuzyou: is https://github.com/zapret-info/z-i different from rublacklist.net? |
| 18:05:51 | <@arkiver> | looks like they are separate |
| 18:06:24 | <nyuuzyou> | It is necessary to compare, different organizations are responsible for them |
| 18:06:46 | <@arkiver> | are there more? |
| 18:07:50 | <@arkiver> | the github repo seems somewhat official if the account is from http://zapret-info.gov.ru/ ? |
| 18:08:38 | <nyuuzyou> | I'll try to look for other countries, but for Russia it's all |
| 18:09:50 | <nyuuzyou> | arkiver: No, this is the usher2.club project |
| 18:19:31 | <@arkiver> | got it |
| 18:19:42 | <@arkiver> | but yeah, these lists are great! |
| 18:39:31 | <nyuuzyou> | For China there is https://github.com/gfwlist/gfwlist, I also looked for similar sheets for Belarusia, but found nothing. The only thing that may be somehow useful is t.me/u2byckbot to search the registry |
| 19:41:17 | <systwi_> | arkiver: Re: a project on https://tjournal.ru/ , has one begun yet? I saw the message in my scrolback, too. |
| 19:41:46 | <systwi_> | If there isn't a channel for it yet, does #journalthis work? :-) |
| 19:42:26 | <systwi_> | ("journal this," like to log/record/save it in a journal) |
| 19:48:45 | | tzt (tzt) joins |
| 19:57:17 | | Discant joins |
| 19:57:28 | | Discant quits [Client Quit] |
| 20:08:03 | | jacobk joins |
| 20:44:38 | | LeGoupil quits [Client Quit] |
| 20:44:54 | | LeGoupil joins |
| 20:46:59 | | LeGoupil quits [Client Quit] |
| 20:57:26 | | jacobk quits [Ping timeout: 265 seconds] |
| 20:59:36 | <alexshpilkin> | nyuuzyou: if there's real interest (I've no clue what the Belarusian govt blocks), message https://t.me/schors and ask him for a copy / feed of the data? |
| 21:46:14 | <@arkiver> | #journalthis it is! |
| 21:46:17 | <@arkiver> | thanks systwi_ |
| 21:46:27 | <@arkiver> | JAA is not even in the channel yet! |
| 21:47:57 | <systwi> | Cool! :-) |
| 21:48:11 | <systwi> | I'm glad I was able to name a channel. |
| 21:50:14 | <@arkiver> | :) |
| 21:58:15 | | WesleyBidsnipes quits [Client Quit] |
| 22:03:22 | | Craigle quits [Quit: Ping timeout (120 seconds)] |
| 22:03:31 | | coderobe quits [Quit: Ping timeout (120 seconds)] |
| 22:03:42 | | Craigle (Craigle) joins |
| 22:03:54 | | coderobe (coderobe) joins |
| 22:04:03 | | systwi_ quits [Quit: Ping timeout (120 seconds)] |
| 22:04:28 | | systwi_ joins |
| 22:33:03 | <h2ibot> | Gridkr edited List of websites excluded from the Wayback Machine (+32, https://www.khmertimeskh.com/ This URL has been…): https://wiki.archiveteam.org/?diff=48820&oldid=48812 |
| 22:33:04 | <h2ibot> | Entartet edited List of websites excluded from the Wayback Machine (+32, Added literallydarling.com.): https://wiki.archiveteam.org/?diff=48821&oldid=48820 |
| 22:35:58 | | jacobk joins |
| 22:37:03 | <h2ibot> | JustAnotherArchivist created TJ (+368, Created page with "{{Infobox project | URL =…): https://wiki.archiveteam.org/?title=TJ |
| 22:39:04 | <h2ibot> | JustAnotherArchivist edited Current Projects (+135, Add TJ): https://wiki.archiveteam.org/?diff=48823&oldid=48722 |
| 22:58:50 | | Hackerpcs quits [Client Quit] |
| 22:59:26 | | Hackerpcs (Hackerpcs) joins |
| 23:00:07 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=48824&oldid=48821 |
| 23:08:54 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 23:29:07 | | Arcorann (Arcorann) joins |
| 23:39:21 | | alexshpilkin quits [Ping timeout: 265 seconds] |
| 23:47:49 | | BlueMaxima joins |
| 23:52:54 | | alexshpilkin (alexshpilkin) joins |
| 23:56:01 | <@JAA> | Lexico is running through AB, but it has stupid rate limits, so that probably won't grab everything in time. |