00:00:51dm4v quits [Read error: Connection reset by peer]
00:04:39dm4v joins
00:04:41dm4v quits [Changing host]
00:04:41dm4v (dm4v) joins
00:08:44dfgffgd joins
00:11:27<dfgffgd>I searched through the wiki and found this https://wiki.archiveteam.org/index.php/Retrospring it says the site is offline while it seems to be online though? And it is still actively maintained on Github. Maybe consider updating it? (The IRC channel aswell)
00:12:56<Jake>It seems that a year later they started it up again? The statuses on the wiki are not automated, they are entirely manual. https://twitter.com/Retrospring/status/847934588269854720
00:14:43<dfgffgd>Yeah, thats the reason I wanted to mention it so it could maybe get updated?
00:24:01<Jake>I'll update it!
00:37:16BlueMaxima joins
01:02:41dm4v quits [Read error: Connection reset by peer]
01:03:34dm4v joins
01:03:36dm4v quits [Changing host]
01:03:36dm4v (dm4v) joins
01:22:56duce1337 quits [Client Quit]
01:56:49<systwi>Repeating because I'm afraid it may have been missed from channel noise:
01:57:29<systwi>Any way we can save https://gta5-mods.com/ ? It's likely going to be a huge site, but I'm wondering this because Rockstar (probably Take2 Interactive, specifically) DMCAed two hugely popular mods from the site. No one knows what they'll do next.
01:57:35<systwi>Note: They do throttle IPs after a lot of traffic goes through too quickly, with a 1-2 minute cooldown.
02:01:03<Ryz>systwi, do you have a source or link for info in regards in 2 mods being taken down by Take 2/Rockstar?
02:02:28<systwi>Ryz: https://twitter.com/TezFunz2/status/1413266622823944201
02:03:17<systwi>I'm not sure what the specific mods were on the site.
02:09:16<systwi>I found two that might be the ones, but I can't confirm this.
02:09:18<systwi>https://www.gta5-mods.com/maps/vicecity-in-v
02:09:28<systwi>https://www.gta5-mods.com/maps/grand-theft-auto-v-san-andreas
02:09:55<HP_Archivist>!a https://markmalkoff.com/ --explain 'Mark Malkoff is a comedian and filmmaker. He has been featured on the "Today Show", "Good Morning America", CNN, Fox News, MSNBC, Mashable, NPR’s “Weekend Edition”, BBC, and "The Tonight Show with Jay Leno.'
02:09:59<systwi>I'm guessing they are, as DDG has a text preview snippet.
02:10:02<HP_Archivist>oops
02:12:28<Ryz>systwi, poking around;
02:12:42<Ryz>Seeing https://web.archive.org/web/20210605002202/https://www.gta5-mods.com/maps/vicecity-in-v - interestingly the download pages are saved
02:13:08<Ryz>It would be https://files.gta5-mods.com/uploads/vicecity-in-v/c32dd7-ViceCryRemastered.zip - but I'm not sure if it was like that the whole time or it got changed
02:13:24<Ryz>It comes with a text file that has this https://www.mediafire.com/file/qv6ta4gbnjrbu92/Vice_Cry_Remastered_1.0.rar/file - which doesn't appear to exist
02:14:08<systwi>I couldn't find much with DDG when searching for "gta5-mods dmca" and "gta5-mods san andreas vice city map dmca", and trying the awful awful Google wanted me to fill out a Recraptcha, which I'm not doing :)
02:14:28<systwi>Thank you for helping look further into this.
02:15:52<Ryz>As for the other one: https://web.archive.org/web/20210418045654/https://www.gta5-mods.com/maps/grand-theft-auto-v-san-andreas - unfortunately the download page is not archived, so https://www.gta5-mods.com/maps/grand-theft-auto-v-san-andreas/download/348 is just a wall
02:15:55<systwi>gta5-mods.com also has a lot of subdomains, such as de.gta5-mods.com, gl.gta5-mods.com and zh.gta5-mods.com
02:17:29<Ryz>Yeah, but I'm not sure if it's just for translated to their languages or has original content
02:17:59<systwi>Damn. This is the problem with the GTA modding community; everybody says "DON'T REPUPLOAD ANYWHERE!!!", which in-turn means finding copies of the data online is next to impossible, unless you have people like me who spend hours manually saving mod after mod by hand, and using an auto-clicker to display all mod comments and save those too.
02:18:44<systwi>It's likely translated, but I thought maybe the missing pages would be under one of those domains.
02:18:50<systwi>Probably not.
02:21:05<Ryz>Unless someone uses that website a lot or a lot of energy spent of investigating, we can't really be sure
02:22:40<systwi>I thought maybe there'd be a quick way to view captured data under numerous subdomains on WBM. Either I can't find it or my memory's going :S
02:23:34<Ryz>socialbot: snscrape twitter-user 5mods
02:23:36<Ryz>Oops
02:23:59<@JAA>systwi: Not exposed on the web interface, but can be done through the CDX API.
02:25:10<systwi>Going AFK for a little bit.
02:40:20duce1337 joins
02:46:32jacobk joins
03:28:41qw3rty_ joins
03:32:19qw3rty__ quits [Ping timeout: 258 seconds]
03:37:09<Ryz>systwi, hmm, https://forums.gta5-mods.com/topic/5379/mod-authors-can-now-immediately-unpublish-their-own-mods - was announced on 2017 January
03:37:25<Ryz>I'm not sure if that message on both of those mods is a result of that...
03:54:03abcde quits [Ping timeout: 244 seconds]
03:54:33<@JAA>systwi: You don't happen to know the rate limit, do you?
04:15:49<jodizzle>cm: So if you're not using WARCs to store all the data, then I don't think using the WARC format in particular has much value
04:18:02<jodizzle>If you care about archival quality and want to save to WARCs, I think another way to frame your problem is to try to find tooling that can be used to view or export data from the WARCs.
04:18:42<jodizzle>That way you can have the WARCs but also get your "regular download" files as needed.
04:20:41<cm>doesn't have to be warc, but aiui that is the best supported format to store the context of the download and to be able to archive complex sites
04:24:30<jodizzle>I think WARCs are the best format insofar as trying to store everything—request, response—needed to playback a site later. I don't think they have any special power beyond that. If you don't need that, you could probably store the metadata you're interested in with e.g., some custom JSON.
04:24:56<jodizzle>Here are some wiki pages that might help you in your search: https://wiki.archiveteam.org/index.php/Software https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem
04:37:15Megame quits [Client Quit]
04:39:04<Ryz>!ignore f41yx5gnf91vsjekt031k8fkf ^https?://www\.jogglerwiki\.com/forum/ucp\.php\?mode=login&
04:39:07<Ryz>Oops
04:57:03<systwi>Ryz: Not sure. I assumed the moderators removed the mods manually, in case the uploader didn't log in fast enough.
04:57:44<systwi>JAA: <systwi > 20:57: Note: They do throttle IPs after a lot of traffic goes through too quickly, with a 1-2 minute cooldown.
04:58:06<systwi>I believe it's 1 minute.
04:58:12<Jake>I assume he was looking for a specific number of requests to get the limit?
04:58:26<systwi>Oh, probably. Not sure about that, sorry.
04:59:51<systwi>I believe it comes down to how much data is sent to you, and not how many requests you send. I've hit that limit before when downloading mods that are ~100MB+. I don't think I've hit it loading comments or downloading smaller mods (which are more common. I'd estimate the majority are around 10 and 50MB).
05:15:20<cm>jodizzle: thanks for the info. guess i will stick with warcs as the on disk format then
05:35:00nuroten quits [Remote host closed the connection]
05:36:23nuroten joins
05:59:31<nuroten>thuban: politicians and radio shows lists from the spreadsheet are added, livestream/newsworthy list in progress
06:23:57grawity quits [Remote host closed the connection]
06:24:56grawity (grawity) joins
06:45:49<lennier1>Has anyone brought up Youtube changing pre-2017 unlisted videos to private starting July 23? https://www.youtube.com/watch?v=l6UHS1-vDMM
07:07:41abcde joins
07:26:31Matthww86 joins
07:27:32Matthww8 quits [Ping timeout: 250 seconds]
07:27:32Matthww86 is now known as Matthww8
07:38:22Matthww8 quits [Ping timeout: 250 seconds]
07:43:24BlueMaxima quits [Client Quit]
07:52:26Matthww8 joins
07:54:42<@OrIdow6>lennier1: Yes, #down-the-tube
07:55:14<@OrIdow6>Though I think the real archving may be done in #youtubearchive
07:55:22<@OrIdow6>Which I am not in
08:09:56godane (godane) joins
08:30:43Aerochrome quits [Quit: Connection closed for inactivity]
09:03:44HP_Archivist quits [Ping timeout: 250 seconds]
09:12:19abcde quits [Ping timeout: 244 seconds]
10:00:31duce1337 quits [Changing host]
10:00:31duce1337 (duce1337) joins
10:20:27spirit joins
11:35:09jtagcat quits [Quit: Bye!]
11:40:26jtagcat (jtagcat) joins
12:03:13yano quits [Read error: Connection reset by peer]
12:03:57yano (yano) joins
12:04:00psy quits [Ping timeout: 250 seconds]
12:04:15noteness quits [Read error: Connection reset by peer]
12:04:32noteness (noteness) joins
12:06:46psy (psy) joins
12:44:30Matthww88 joins
12:45:05bsmith093 quits [Ping timeout: 258 seconds]
12:45:28Matthww8 quits [Ping timeout: 258 seconds]
12:45:28Matthww88 is now known as Matthww8
12:45:34bsmith093 joins
13:00:59Iki joins
13:33:13dfgffgd leaves
13:38:02Matthww8 quits [Ping timeout: 250 seconds]
13:42:05Matthww8 joins
13:54:56lunik1 quits [Quit: :x]
13:55:26lunik1 joins
13:59:18lunik1 quits [Client Quit]
13:59:43lunik1 joins
14:21:53lunik1 quits [Client Quit]
14:22:51lunik1 joins
14:24:22Stiletto quits [Ping timeout: 258 seconds]
14:42:34<Jake>(I think it's two different projects, with down-the-tube having all the video metadata in WARCs and few videos being grabbed only by admins.)
15:17:39Arcorann_ quits [Ping timeout: 258 seconds]
15:35:12<@OrIdow6>Yes
15:35:25<@OrIdow6>Though AFAIK that is not operational yet
15:35:38<@OrIdow6>down-the-tube is more broadly used for non-video Youtube projects
15:35:49<@OrIdow6>As they come up
15:53:58TheTechRobo (TheTechRobo) joins
15:54:13<TheTechRobo>What does this mean, does it mean my crawl will be incomplete? `/home/thetechrobo/gs-venv/lib/python3.7/site-packages/wpull/protocol/http/client.py:185: UserWarning: HTTP session did not complete.`
15:55:29<Jake>I believe that one is fine?
16:02:53thuban quits [Ping timeout: 258 seconds]
16:12:56nuroten quits [Remote host closed the connection]
16:14:38<@JAA>TheTechRobo: Yep, that warning is fine. Happens when you get timeouts, disconnects, etc., I believe, but those are retried twice.
16:17:50<TheTechRobo>Ah, thanks.
16:35:53TheTechRobo quits [Remote host closed the connection]
17:15:10Stiletto joins
17:23:33Megame (Megame) joins
18:22:36DogsRNice (Webuser299) joins
18:24:55HP_Archivist (HP_Archivist) joins
18:59:59somerando3 joins
19:01:48<somerando3>Is anyone taking a look as Afghan media/civil society websites? It looks like the Taliban is making big gains as the US is withdrawing: https://www.nytimes.com/2021/07/09/world/asia/taliban-kandahar-afghanistan.html
19:02:47<somerando3>Not sure at all what's out there, but my guess is anything to do with the US backed government or anything that doesn't comport with the Taliban is at risk.
19:04:55<somerando3>https://en.wikipedia.org/wiki/Mass_media_in_Afghanistan
19:47:24thuban joins
20:07:18spirit quits [Client Quit]
21:04:22Barto quits [Ping timeout: 250 seconds]
21:04:43Barto (Barto) joins
22:21:24BinzyBoi quits [Quit: Leaving]
22:24:18BinzyBoi joins
23:05:08Arcorann_ joins
23:17:24HP_Archivist quits [Ping timeout: 250 seconds]
23:39:46lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)]
23:42:10lennier1 (lennier1) joins