00:00:36<h2ibot>JAABot edited CurrentWarriorProject (+4): https://wiki.archiveteam.org/?diff=48819&oldid=48818
00:44:36Arcorann (Arcorann) joins
00:51:05HP_Archivist (HP_Archivist) joins
00:55:56thelounge27 joins
01:07:28march_happy quits [Ping timeout: 265 seconds]
01:08:06march_happy (march_happy) joins
01:22:52sec^nd quits [Remote host closed the connection]
01:23:33sec^nd (second) joins
01:41:18qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
01:55:48march_happy quits [Ping timeout: 265 seconds]
01:56:54march_happy (march_happy) joins
02:03:32march_happy quits [Ping timeout: 265 seconds]
02:06:27march_happy (march_happy) joins
02:08:53march_happy quits [Read error: Connection reset by peer]
02:09:09march_happy (march_happy) joins
02:11:17march_happy quits [Read error: Connection reset by peer]
02:12:44march_happy (march_happy) joins
03:39:31march_happy quits [Read error: Connection reset by peer]
03:39:41march_happy (march_happy) joins
03:42:29march_happy quits [Remote host closed the connection]
03:42:44march_happy (march_happy) joins
03:46:04tzt quits [Ping timeout: 240 seconds]
04:39:16march_happy quits [Ping timeout: 240 seconds]
04:40:21march_happy (march_happy) joins
05:31:36revi (revi) joins
05:47:17HackMii_ quits [Remote host closed the connection]
05:49:51HackMii_ (hacktheplanet) joins
05:52:16march_happy quits [Ping timeout: 240 seconds]
05:53:03march_happy (march_happy) joins
05:54:30sec^nd quits [Remote host closed the connection]
05:55:57sec^nd (second) joins
06:10:16march_happy quits [Read error: Connection reset by peer]
06:10:51march_happy (march_happy) joins
06:25:01march_happy quits [Ping timeout: 265 seconds]
06:25:33march_happy (march_happy) joins
06:28:51BlueMaxima quits [Read error: Connection reset by peer]
07:58:41sonick (sonick) joins
08:33:04qwertyasdfuiopghjkl joins
08:33:56adia quits [Quit: The Lounge - https://thelounge.chat]
08:35:08adia (adia) joins
08:45:46march_happy quits [Ping timeout: 240 seconds]
08:45:51march_happy (march_happy) joins
08:50:16march_happy quits [Ping timeout: 240 seconds]
09:21:06dunger (dunger) joins
10:11:46mutantmonkey quits [Ping timeout: 240 seconds]
10:15:41LeGoupil joins
10:29:34mutantmonkey (mutantmonkey) joins
10:50:36qwertyasdfuiopghjkl quits [Client Quit]
11:15:57qwertyasdfuiopghjkl joins
11:28:46mutantmonkey quits [Ping timeout: 240 seconds]
11:30:47mutantmonkey (mutantmonkey) joins
11:31:48russss quits [Read error: Connection reset by peer]
11:31:53russss (russss) joins
12:19:12Sigma85 joins
12:53:23<Sigma85>is there were i get help with navigating a warc collection?
12:56:34<Maakuth|m>Sigma85: something like this? https://guides.lib.vt.edu/webarchiving/openwarc
12:59:23Hackerpcs quits [Client Quit]
12:59:31<@rewby>Just spotted in the AB channel: 12:57 <nyuuzyou> Hi all, please put tjournal.ru in archivebot, the site closes on September 10 - https://tjournal.ru/team/714914-istoriya-tj-zavershaetsya
13:00:02<@rewby>Did a quick look and it seems like it might be more of a warrior project
13:01:00<@rewby>cc arkiver
13:01:41Hackerpcs (Hackerpcs) joins
13:03:50<@HCross>based on a rough google translation of one of the comments (And they are already gloating on VC: https://vc.ru/media/484803-tjournal-obyavil-o-zakrytii-s-10-sentyabrya . Don't they understand that it's one company that runs all the sites?), is there more to this hmm
13:07:45<@HCross>looks like it's also vc.ru and dtf.ru
13:48:08<michaelblob_>vkon
13:48:18<michaelblob_>woops ignore that
13:58:45wyatt8740 quits [Remote host closed the connection]
13:59:23Megame (Megame) joins
14:09:16Arcorann quits [Ping timeout: 240 seconds]
14:11:18IDK (IDK) joins
14:18:23wyatt8740 joins
14:46:45<@arkiver>yes, let's do a project!
14:59:50<@arkiver>let's think of a channel name
14:59:51<@arkiver>:)
15:01:19thelounge27 quits [Client Quit]
15:20:45wyatt8740 quits [Remote host closed the connection]
15:31:46wyatt8740 joins
15:57:22alexshpilkin (alexshpilkin) joins
16:00:56WesleyBidsnipes joins
16:02:16mutantmonkey quits [Ping timeout: 240 seconds]
16:02:45mutantmonkey (mutantmonkey) joins
16:02:55<WesleyBidsnipes>Hey, I was working on a method of downloading newspaper pdfs over the weekend, and when I searched for a specific detail of my method, the only results that came back were logs to this channel. Old (2017), but I thought I’d stop by. Is that something you guys still work on?
16:09:42<Jake>Was it related to newsgrabber?
16:10:18<Jake>That project stopped some time ago due to some issues with deduplication.
16:11:20<WesleyBidsnipes>Dunno. Someone found a problem with Pagesuite software that made it trivial to download any newspaper e-replica that used that software.
16:11:54<WesleyBidsnipes>I’ve expanded upon it, and been exploring how far back I can get most newspapers… Los Angeles Times I can get back to about 2016, for instance.
16:12:37<WesleyBidsnipes>I’ve been trying to do similar with the non-Pagesuite newspapers (Pressreader and Newsmemory are the big ones), but not having much luck so far.
16:16:04<nyuuzyou>By the way, the same company also had Coub, but it was sold at the last moment
16:31:23wyatt8740 quits [Remote host closed the connection]
16:46:47<@JAA>nyuuzyou: You're referring to tjournal.ru? Do you know what else they own? Seems to all be in danger now.
16:49:58alexshpilkin quits [Ping timeout: 265 seconds]
17:03:55alexshpilkin (alexshpilkin) joins
17:08:32wyatt8740 joins
17:11:07masterX244 quits [Ping timeout: 265 seconds]
17:11:45<nyuuzyou>Yes, only vc.ru, dtf.ru, and tjournal.ru remain as public projects
17:13:18<@JAA>Thanks
17:17:48masterX244 (masterX244) joins
17:19:39Megame quits [Client Quit]
17:27:40jacobk quits [Ping timeout: 240 seconds]
17:27:46HackMii_ quits [Ping timeout: 240 seconds]
17:29:09<nyuuzyou>I think only TJ is really under threat now. It is now blocked in Russia for "fakes about the military operation in Ukraine". They could have just been ordered to shut it down or the company would simply be taken away/destroyed. This has been quite normal practice in Russia since the 90s, only now it has moved to the internet. This is not the first time since the start of the war
17:30:13<@arkiver>we'll get them
17:31:12<nyuuzyou>In the case of vc and dtf we just have more time until the same thing happens to them
17:35:28HackMii_ (hacktheplanet) joins
17:35:33<@arkiver>IA was also sued by the russian government
17:35:45<@arkiver>is there some website on which these accusations from the government as posted?
17:35:59<@arkiver>it could help up find sites that are accused for 'crimes', and get them archived
17:36:53<TheTechRobo>> <arkiver> IA was also sued by the russian government
17:36:55<TheTechRobo>course it was
17:45:04<nyuuzyou><arkiver> "is there some website on which..." <- Yes and no, there are prohibitions that exist in fact, and on paper do not
17:45:22<@arkiver>can you please post the information that is at least accessible?
17:45:25<@arkiver>or well, the URLs
17:46:19<nyuuzyou>reestr.rublacklist.net here is an unofficial registry, but there are no captchas and other garbage
17:46:29<WesleyBidsnipes>Like, sued in a Russian court, or in some western country’s court?
17:47:10<@arkiver>WesleyBidsnipes: if you have good western sources, totally welcome!
17:47:58<@arkiver>nyuuzyou: looks good. how is the data on this one collected?
17:48:06<@arkiver>just people pointing out the sites are down?
17:48:13<@arkiver>or official releases?
17:49:34<@arkiver>nyuuzyou: if you know of any news sites, or lists of them, please also let me know - we'll check them daily for new articles
17:50:07<@arkiver>and that is in general - any news/government/related sites in the world are welcome, we'll check them hourly or daily for new URLs with #//
17:50:15<nyuuzyou>I'm not sure, this is the site of an organization to combat Internet censorship in Russia, I think they have their own sources
17:50:22<@arkiver>got it
17:50:32<@arkiver>are you able to confirm these sites are actually blocked?
17:51:24<@arkiver>we'll at least archive the "banned" URLs themselves with #// , need to look into deeper crawling as well
17:52:06<nyuuzyou>Not all, this is a list of all ever-banned sites in Russia, but the unblocked ones are already marked
17:52:40<@arkiver>perfect. we'll get a copy up to certain depth of all of them. (unlimited depth (aka full copy) not sure yet)
17:52:53<@arkiver>if we have these block lists for other countries as well, let me know!
17:55:54<@arkiver>nyuuzyou: i don't see an easy method of exporting all entries from that site. do you see any? if not it'll be scraped, hopefully they don't limit the pages
17:57:57<nyuuzyou>they have an API for this - https://reestr.rublacklist.net/article/api/
17:59:57<nyuuzyou>or as a last resort https://github.com/zapret-info/z-i
18:03:21<@arkiver>nyuuzyou: is https://github.com/zapret-info/z-i different from rublacklist.net?
18:05:51<@arkiver>looks like they are separate
18:06:24<nyuuzyou>It is necessary to compare, different organizations are responsible for them
18:06:46<@arkiver>are there more?
18:07:50<@arkiver>the github repo seems somewhat official if the account is from http://zapret-info.gov.ru/ ?
18:08:38<nyuuzyou>I'll try to look for other countries, but for Russia it's all
18:09:50<nyuuzyou>arkiver: No, this is the usher2.club project
18:19:31<@arkiver>got it
18:19:42<@arkiver>but yeah, these lists are great!
18:39:31<nyuuzyou>For China there is https://github.com/gfwlist/gfwlist, I also looked for similar sheets for Belarusia, but found nothing. The only thing that may be somehow useful is t.me/u2byckbot to search the registry
19:41:17<systwi_>arkiver: Re: a project on https://tjournal.ru/ , has one begun yet? I saw the message in my scrolback, too.
19:41:46<systwi_>If there isn't a channel for it yet, does #journalthis work? :-)
19:42:26<systwi_>("journal this," like to log/record/save it in a journal)
19:48:45tzt (tzt) joins
19:57:17Discant joins
19:57:28Discant quits [Client Quit]
20:08:03jacobk joins
20:44:38LeGoupil quits [Client Quit]
20:44:54LeGoupil joins
20:46:59LeGoupil quits [Client Quit]
20:57:26jacobk quits [Ping timeout: 265 seconds]
20:59:36<alexshpilkin>nyuuzyou: if there's real interest (I've no clue what the Belarusian govt blocks), message https://t.me/schors and ask him for a copy / feed of the data?
21:46:14<@arkiver>#journalthis it is!
21:46:17<@arkiver>thanks systwi_
21:46:27<@arkiver>JAA is not even in the channel yet!
21:47:57<systwi>Cool! :-)
21:48:11<systwi>I'm glad I was able to name a channel.
21:50:14<@arkiver>:)
21:58:15WesleyBidsnipes quits [Client Quit]
22:03:22Craigle quits [Quit: Ping timeout (120 seconds)]
22:03:31coderobe quits [Quit: Ping timeout (120 seconds)]
22:03:42Craigle (Craigle) joins
22:03:54coderobe (coderobe) joins
22:04:03systwi_ quits [Quit: Ping timeout (120 seconds)]
22:04:28systwi_ joins
22:33:03<h2ibot>Gridkr edited List of websites excluded from the Wayback Machine (+32, https://www.khmertimeskh.com/ This URL has been…): https://wiki.archiveteam.org/?diff=48820&oldid=48812
22:33:04<h2ibot>Entartet edited List of websites excluded from the Wayback Machine (+32, Added literallydarling.com.): https://wiki.archiveteam.org/?diff=48821&oldid=48820
22:35:58jacobk joins
22:37:03<h2ibot>JustAnotherArchivist created TJ (+368, Created page with "{{Infobox project | URL =…): https://wiki.archiveteam.org/?title=TJ
22:39:04<h2ibot>JustAnotherArchivist edited Current Projects (+135, Add TJ): https://wiki.archiveteam.org/?diff=48823&oldid=48722
22:58:50Hackerpcs quits [Client Quit]
22:59:26Hackerpcs (Hackerpcs) joins
23:00:07<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=48824&oldid=48821
23:08:54qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
23:29:07Arcorann (Arcorann) joins
23:39:21alexshpilkin quits [Ping timeout: 265 seconds]
23:47:49BlueMaxima joins
23:52:54alexshpilkin (alexshpilkin) joins
23:56:01<@JAA>Lexico is running through AB, but it has stupid rate limits, so that probably won't grab everything in time.