| 00:00:51 | | dm4v quits [Read error: Connection reset by peer] |
| 00:04:39 | | dm4v joins |
| 00:04:41 | | dm4v is now authenticated as dm4v |
| 00:04:41 | | dm4v quits [Changing host] |
| 00:04:41 | | dm4v (dm4v) joins |
| 00:08:44 | | dfgffgd joins |
| 00:11:27 | <dfgffgd> | I searched through the wiki and found this https://wiki.archiveteam.org/index.php/Retrospring it says the site is offline while it seems to be online though? And it is still actively maintained on Github. Maybe consider updating it? (The IRC channel aswell) |
| 00:12:56 | <Jake> | It seems that a year later they started it up again? The statuses on the wiki are not automated, they are entirely manual. https://twitter.com/Retrospring/status/847934588269854720 |
| 00:14:43 | <dfgffgd> | Yeah, thats the reason I wanted to mention it so it could maybe get updated? |
| 00:24:01 | <Jake> | I'll update it! |
| 00:37:16 | | BlueMaxima joins |
| 01:02:41 | | dm4v quits [Read error: Connection reset by peer] |
| 01:03:34 | | dm4v joins |
| 01:03:36 | | dm4v is now authenticated as dm4v |
| 01:03:36 | | dm4v quits [Changing host] |
| 01:03:36 | | dm4v (dm4v) joins |
| 01:22:56 | | duce1337 quits [Client Quit] |
| 01:56:49 | <systwi> | Repeating because I'm afraid it may have been missed from channel noise: |
| 01:57:29 | <systwi> | Any way we can save https://gta5-mods.com/ ? It's likely going to be a huge site, but I'm wondering this because Rockstar (probably Take2 Interactive, specifically) DMCAed two hugely popular mods from the site. No one knows what they'll do next. |
| 01:57:35 | <systwi> | Note: They do throttle IPs after a lot of traffic goes through too quickly, with a 1-2 minute cooldown. |
| 02:01:03 | <Ryz> | systwi, do you have a source or link for info in regards in 2 mods being taken down by Take 2/Rockstar? |
| 02:02:28 | <systwi> | Ryz: https://twitter.com/TezFunz2/status/1413266622823944201 |
| 02:03:17 | <systwi> | I'm not sure what the specific mods were on the site. |
| 02:09:16 | <systwi> | I found two that might be the ones, but I can't confirm this. |
| 02:09:18 | <systwi> | https://www.gta5-mods.com/maps/vicecity-in-v |
| 02:09:28 | <systwi> | https://www.gta5-mods.com/maps/grand-theft-auto-v-san-andreas |
| 02:09:55 | <HP_Archivist> | !a https://markmalkoff.com/ --explain 'Mark Malkoff is a comedian and filmmaker. He has been featured on the "Today Show", "Good Morning America", CNN, Fox News, MSNBC, Mashable, NPR’s “Weekend Edition”, BBC, and "The Tonight Show with Jay Leno.' |
| 02:09:59 | <systwi> | I'm guessing they are, as DDG has a text preview snippet. |
| 02:10:02 | <HP_Archivist> | oops |
| 02:12:28 | <Ryz> | systwi, poking around; |
| 02:12:42 | <Ryz> | Seeing https://web.archive.org/web/20210605002202/https://www.gta5-mods.com/maps/vicecity-in-v - interestingly the download pages are saved |
| 02:13:08 | <Ryz> | It would be https://files.gta5-mods.com/uploads/vicecity-in-v/c32dd7-ViceCryRemastered.zip - but I'm not sure if it was like that the whole time or it got changed |
| 02:13:24 | <Ryz> | It comes with a text file that has this https://www.mediafire.com/file/qv6ta4gbnjrbu92/Vice_Cry_Remastered_1.0.rar/file - which doesn't appear to exist |
| 02:14:08 | <systwi> | I couldn't find much with DDG when searching for "gta5-mods dmca" and "gta5-mods san andreas vice city map dmca", and trying the awful awful Google wanted me to fill out a Recraptcha, which I'm not doing :) |
| 02:14:28 | <systwi> | Thank you for helping look further into this. |
| 02:15:52 | <Ryz> | As for the other one: https://web.archive.org/web/20210418045654/https://www.gta5-mods.com/maps/grand-theft-auto-v-san-andreas - unfortunately the download page is not archived, so https://www.gta5-mods.com/maps/grand-theft-auto-v-san-andreas/download/348 is just a wall |
| 02:15:55 | <systwi> | gta5-mods.com also has a lot of subdomains, such as de.gta5-mods.com, gl.gta5-mods.com and zh.gta5-mods.com |
| 02:17:29 | <Ryz> | Yeah, but I'm not sure if it's just for translated to their languages or has original content |
| 02:17:59 | <systwi> | Damn. This is the problem with the GTA modding community; everybody says "DON'T REPUPLOAD ANYWHERE!!!", which in-turn means finding copies of the data online is next to impossible, unless you have people like me who spend hours manually saving mod after mod by hand, and using an auto-clicker to display all mod comments and save those too. |
| 02:18:44 | <systwi> | It's likely translated, but I thought maybe the missing pages would be under one of those domains. |
| 02:18:50 | <systwi> | Probably not. |
| 02:21:05 | <Ryz> | Unless someone uses that website a lot or a lot of energy spent of investigating, we can't really be sure |
| 02:22:40 | <systwi> | I thought maybe there'd be a quick way to view captured data under numerous subdomains on WBM. Either I can't find it or my memory's going :S |
| 02:23:34 | <Ryz> | socialbot: snscrape twitter-user 5mods |
| 02:23:36 | <Ryz> | Oops |
| 02:23:59 | <@JAA> | systwi: Not exposed on the web interface, but can be done through the CDX API. |
| 02:25:10 | <systwi> | Going AFK for a little bit. |
| 02:40:20 | | duce1337 joins |
| 02:46:32 | | jacobk joins |
| 03:28:41 | | qw3rty_ joins |
| 03:32:19 | | qw3rty__ quits [Ping timeout: 258 seconds] |
| 03:37:09 | <Ryz> | systwi, hmm, https://forums.gta5-mods.com/topic/5379/mod-authors-can-now-immediately-unpublish-their-own-mods - was announced on 2017 January |
| 03:37:25 | <Ryz> | I'm not sure if that message on both of those mods is a result of that... |
| 03:54:03 | | abcde quits [Ping timeout: 244 seconds] |
| 03:54:33 | <@JAA> | systwi: You don't happen to know the rate limit, do you? |
| 04:15:49 | <jodizzle> | cm: So if you're not using WARCs to store all the data, then I don't think using the WARC format in particular has much value |
| 04:18:02 | <jodizzle> | If you care about archival quality and want to save to WARCs, I think another way to frame your problem is to try to find tooling that can be used to view or export data from the WARCs. |
| 04:18:42 | <jodizzle> | That way you can have the WARCs but also get your "regular download" files as needed. |
| 04:20:41 | <cm> | doesn't have to be warc, but aiui that is the best supported format to store the context of the download and to be able to archive complex sites |
| 04:24:30 | <jodizzle> | I think WARCs are the best format insofar as trying to store everything—request, response—needed to playback a site later. I don't think they have any special power beyond that. If you don't need that, you could probably store the metadata you're interested in with e.g., some custom JSON. |
| 04:24:56 | <jodizzle> | Here are some wiki pages that might help you in your search: https://wiki.archiveteam.org/index.php/Software https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem |
| 04:37:15 | | Megame quits [Client Quit] |
| 04:39:04 | <Ryz> | !ignore f41yx5gnf91vsjekt031k8fkf ^https?://www\.jogglerwiki\.com/forum/ucp\.php\?mode=login& |
| 04:39:07 | <Ryz> | Oops |
| 04:57:03 | <systwi> | Ryz: Not sure. I assumed the moderators removed the mods manually, in case the uploader didn't log in fast enough. |
| 04:57:44 | <systwi> | JAA: <systwi > 20:57: Note: They do throttle IPs after a lot of traffic goes through too quickly, with a 1-2 minute cooldown. |
| 04:58:06 | <systwi> | I believe it's 1 minute. |
| 04:58:12 | <Jake> | I assume he was looking for a specific number of requests to get the limit? |
| 04:58:26 | <systwi> | Oh, probably. Not sure about that, sorry. |
| 04:59:51 | <systwi> | I believe it comes down to how much data is sent to you, and not how many requests you send. I've hit that limit before when downloading mods that are ~100MB+. I don't think I've hit it loading comments or downloading smaller mods (which are more common. I'd estimate the majority are around 10 and 50MB). |
| 05:15:20 | <cm> | jodizzle: thanks for the info. guess i will stick with warcs as the on disk format then |
| 05:35:00 | | nuroten quits [Remote host closed the connection] |
| 05:36:23 | | nuroten joins |
| 05:59:31 | <nuroten> | thuban: politicians and radio shows lists from the spreadsheet are added, livestream/newsworthy list in progress |
| 06:23:57 | | grawity quits [Remote host closed the connection] |
| 06:24:56 | | grawity (grawity) joins |
| 06:45:49 | <lennier1> | Has anyone brought up Youtube changing pre-2017 unlisted videos to private starting July 23? https://www.youtube.com/watch?v=l6UHS1-vDMM |
| 07:07:41 | | abcde joins |
| 07:26:31 | | Matthww86 joins |
| 07:27:32 | | Matthww8 quits [Ping timeout: 250 seconds] |
| 07:27:32 | | Matthww86 is now known as Matthww8 |
| 07:38:22 | | Matthww8 quits [Ping timeout: 250 seconds] |
| 07:43:24 | | BlueMaxima quits [Client Quit] |
| 07:52:26 | | Matthww8 joins |
| 07:54:42 | <@OrIdow6> | lennier1: Yes, #down-the-tube |
| 07:55:14 | <@OrIdow6> | Though I think the real archving may be done in #youtubearchive |
| 07:55:22 | <@OrIdow6> | Which I am not in |
| 08:09:56 | | godane (godane) joins |
| 08:30:43 | | Aerochrome quits [Quit: Connection closed for inactivity] |
| 09:03:44 | | HP_Archivist quits [Ping timeout: 250 seconds] |
| 09:12:19 | | abcde quits [Ping timeout: 244 seconds] |
| 10:00:31 | | duce1337 is now authenticated as duce1337 |
| 10:00:31 | | duce1337 quits [Changing host] |
| 10:00:31 | | duce1337 (duce1337) joins |
| 10:20:27 | | spirit joins |
| 11:35:09 | | jtagcat quits [Quit: Bye!] |
| 11:40:26 | | jtagcat (jtagcat) joins |
| 12:03:13 | | yano quits [Read error: Connection reset by peer] |
| 12:03:57 | | yano (yano) joins |
| 12:04:00 | | psy quits [Ping timeout: 250 seconds] |
| 12:04:15 | | noteness quits [Read error: Connection reset by peer] |
| 12:04:32 | | noteness (noteness) joins |
| 12:06:46 | | psy (psy) joins |
| 12:44:30 | | Matthww88 joins |
| 12:45:05 | | bsmith093 quits [Ping timeout: 258 seconds] |
| 12:45:28 | | Matthww8 quits [Ping timeout: 258 seconds] |
| 12:45:28 | | Matthww88 is now known as Matthww8 |
| 12:45:34 | | bsmith093 joins |
| 12:45:34 | | bsmith093 is now authenticated as bsmith093 |
| 13:00:59 | | Iki joins |
| 13:33:13 | | dfgffgd leaves |
| 13:38:02 | | Matthww8 quits [Ping timeout: 250 seconds] |
| 13:42:05 | | Matthww8 joins |
| 13:54:56 | | lunik1 quits [Quit: :x] |
| 13:55:26 | | lunik1 joins |
| 13:59:18 | | lunik1 quits [Client Quit] |
| 13:59:43 | | lunik1 joins |
| 14:21:53 | | lunik1 quits [Client Quit] |
| 14:22:51 | | lunik1 joins |
| 14:24:22 | | Stiletto quits [Ping timeout: 258 seconds] |
| 14:42:34 | <Jake> | (I think it's two different projects, with down-the-tube having all the video metadata in WARCs and few videos being grabbed only by admins.) |
| 15:17:39 | | Arcorann_ quits [Ping timeout: 258 seconds] |
| 15:35:12 | <@OrIdow6> | Yes |
| 15:35:25 | <@OrIdow6> | Though AFAIK that is not operational yet |
| 15:35:38 | <@OrIdow6> | down-the-tube is more broadly used for non-video Youtube projects |
| 15:35:49 | <@OrIdow6> | As they come up |
| 15:53:58 | | TheTechRobo (TheTechRobo) joins |
| 15:54:13 | <TheTechRobo> | What does this mean, does it mean my crawl will be incomplete? `/home/thetechrobo/gs-venv/lib/python3.7/site-packages/wpull/protocol/http/client.py:185: UserWarning: HTTP session did not complete.` |
| 15:55:29 | <Jake> | I believe that one is fine? |
| 16:02:53 | | thuban quits [Ping timeout: 258 seconds] |
| 16:12:56 | | nuroten quits [Remote host closed the connection] |
| 16:14:38 | <@JAA> | TheTechRobo: Yep, that warning is fine. Happens when you get timeouts, disconnects, etc., I believe, but those are retried twice. |
| 16:17:50 | <TheTechRobo> | Ah, thanks. |
| 16:35:53 | | TheTechRobo quits [Remote host closed the connection] |
| 17:15:10 | | Stiletto joins |
| 17:23:33 | | Megame (Megame) joins |
| 18:22:36 | | DogsRNice (Webuser299) joins |
| 18:24:55 | | HP_Archivist (HP_Archivist) joins |
| 18:59:59 | | somerando3 joins |
| 19:01:48 | <somerando3> | Is anyone taking a look as Afghan media/civil society websites? It looks like the Taliban is making big gains as the US is withdrawing: https://www.nytimes.com/2021/07/09/world/asia/taliban-kandahar-afghanistan.html |
| 19:02:47 | <somerando3> | Not sure at all what's out there, but my guess is anything to do with the US backed government or anything that doesn't comport with the Taliban is at risk. |
| 19:04:55 | <somerando3> | https://en.wikipedia.org/wiki/Mass_media_in_Afghanistan |
| 19:47:24 | | thuban joins |
| 20:07:18 | | spirit quits [Client Quit] |
| 21:04:22 | | Barto quits [Ping timeout: 250 seconds] |
| 21:04:43 | | Barto (Barto) joins |
| 22:21:24 | | BinzyBoi quits [Quit: Leaving] |
| 22:24:18 | | BinzyBoi joins |
| 23:05:08 | | Arcorann_ joins |
| 23:17:24 | | HP_Archivist quits [Ping timeout: 250 seconds] |
| 23:39:46 | | lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)] |
| 23:42:10 | | lennier1 (lennier1) joins |