00:00:02<nuroten>thanks for considering
00:00:42dm4v quits [Read error: Connection reset by peer]
00:01:40<nuroten>also, does anyone know the status of a potential backup of the June 4th Museum exhibit website https://8964museum.com/ ?
00:02:10dm4v joins
00:02:13dm4v quits [Changing host]
00:02:13dm4v (dm4v) joins
00:04:29<nuroten>some pro-Beijing media have opined that online records of the June 4th Museum and vigils should be made illegal ... and the current government has developed a tendency to take instructions from such media
00:11:32<@JAA>neon: It should all be available in the Wayback Machine.
00:14:03<@JAA>nuroten: I'll run https://collection.news/ through ArchiveBot, but it's JS-heavy and likely won't work properly.
00:14:43<@JAA>The same thing applies to https://8964museum.com/ I suppose. Not aware of any prior attempts, but since it's even more useless than collection.news without JS, I highly doubt it'll grab much.
00:15:29<nuroten>JAA: thanks. that's a blow, though I expected that might be the case
00:16:09<neon>JAA: oh so everything is just ingested into the waybackmachine. that makes sense, ty.
00:16:14neon leaves
00:16:47<nuroten>though also hoped someone might have ideas how to get as much as possible out of them :)
00:16:53<nuroten>the group behind the physical museum dissolved on the 23rd, also possibly charged under NSL for foreign collusion
00:17:29<nuroten>so at this time it's a toss-up how long it can stay up
00:17:43<@JAA>Yeah, it'd probably have to be something based on a full browser. brozzler or similar. Not sure if anyone here has a working setup for that.
00:18:25<@JAA>Perhaps SPN is also an option, but depends on how large the site is.
00:20:48<nuroten>okay :) thanks for anything you can save from them
00:40:41lukash7 joins
00:41:05nuroten quits [Remote host closed the connection]
00:46:29archzz_ quits [Quit: leaving]
01:03:14dm4v quits [Read error: Connection reset by peer]
01:03:34dm4v joins
01:03:36dm4v quits [Changing host]
01:03:36dm4v (dm4v) joins
01:09:35Arcorann (Arcorann) joins
02:15:11heart_ quits [Quit: Connection closed for inactivity]
02:16:59<@JAA>Eurogamer apparently handles over a billion page views per year. Means they should easily be able to handle qwarc. I'll look into that soon.
02:17:09<@JAA>Ryz: Please add it to the wiki. Thanks.
02:17:30<Ryz>Onto Deathwatch?
02:17:36<@JAA>Yep
02:17:45<Ryz>...Also, I don't think I'm automoderated <#>;
02:20:56<Ryz>That kinda demotivated me from doing impulse edits from time to time~
02:22:27qwertyasdfuiopghjkl quits [Remote host closed the connection]
02:22:59<@JAA>Well uh, if you never make edits, you don't get automodded either. lol
02:23:57qwertyasdfuiopghjkl joins
02:24:56<Ryz>Uhh, I made an account and made edits onto the wiki before oo;
02:25:28<@JAA>Your last content edit was over a year ago though.
02:26:36<@JAA>We don't preemptively make users automodded. We do it when good edits from the same users end up in the moderation queue over and over.
02:39:49<h2ibot>Ryz edited Deathwatch (+160, /* 2021 */ Added forum of Eurogamer.net entry): https://wiki.archiveteam.org/?diff=47087&oldid=47076
02:44:53hexa- quits [Ping timeout: 612 seconds]
02:44:57<Ryz>Oh, I'm also reminded why I didn't edit as much, having to check in-between what's previewed and what's edited in the wiki text stuff ><;
02:45:02<Ryz>That's actually a bit rough...
02:51:56hexa- (hexa-) joins
03:00:01swebb joins
03:02:52<Ryz>I'm gonna have to edit this entirely in Notepad++ for flexibility rasons~
03:09:35hexa- quits [Ping timeout: 612 seconds]
03:11:46hexa- (hexa-) joins
03:30:10Nay quits [Quit: this quit message is a social construct]
03:35:15<@OrIdow6>Yeah, having to wait for it to be approved really does take motivation away
03:36:53hexa- quits [Ping timeout: 612 seconds]
03:37:32Nay (JeDa) joins
03:38:45Nay quits [Client Quit]
03:39:27Nay (JeDa) joins
03:43:28Nay quits [Client Quit]
03:44:22Nay (JeDa) joins
03:44:32hexa- (hexa-) joins
03:44:48Nay quits [Client Quit]
03:45:13Nay (JeDa) joins
03:47:43qw3rty_ joins
03:51:26qw3rty__ quits [Ping timeout: 250 seconds]
04:47:44hexa- quits [Ping timeout: 612 seconds]
04:47:59DogsRNice quits [Read error: Connection reset by peer]
05:02:25Jonboy345 joins
05:03:28Jonboy3451 quits [Ping timeout: 244 seconds]
05:07:21hexa- (hexa-) joins
06:13:32hexa- quits [Ping timeout: 612 seconds]
06:15:25hexa- (hexa-) joins
06:29:08hexa- quits [Ping timeout: 612 seconds]
06:30:22gazorpazorp quits [Remote host closed the connection]
06:30:34gazorpazorp (gazorpazorp) joins
06:31:42hexa- (hexa-) joins
06:38:12<Ryz>flashfire42, if talking about https://community.eurogamer.net/ - JAA is gonna qwarc it~ >#<;
06:39:08nicolas17 quits [Ping timeout: 250 seconds]
07:40:01benjins quits [Ping timeout: 244 seconds]
07:41:48benjins joins
08:07:58benjins quits [Ping timeout: 250 seconds]
08:15:44benjins joins
08:24:16BlueMaxima quits [Client Quit]
08:34:47benjins quits [Ping timeout: 244 seconds]
08:35:42benjins joins
08:50:53spirit joins
09:41:00<pabs>an electronics startup that just died: https://blog.kobol.io/2021/08/25/we-are-pulling-the-plug/
09:43:10<pabs>(small site, kobol.io and 3 subdomains)
09:51:44<AK>Looks like Ryz and Megame ran it through yesterday. Thanks for letting us know though pabs
10:02:43Terbium quits [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
10:03:08Terbium joins
10:14:13<pabs>ah good
10:23:48qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
10:28:32qwertyasdfuiopghjkl joins
10:43:26<@arkiver>OrIdow6: thanks! i'm back from vacation later today and will then check that
10:56:21qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
11:44:03Megame quits [Client Quit]
12:53:38fuzzy8021 quits [Ping timeout: 244 seconds]
12:59:21fuzzy8021 (fuzzy8021) joins
13:11:33Daloader joins
14:07:02Iki joins
14:37:29Iki quits [Ping timeout: 244 seconds]
14:42:44Daloader quits [Ping timeout: 250 seconds]
15:12:23nicolas17 joins
15:24:56IDK quits [Client Quit]
15:34:35IDK (IDK) joins
15:34:48<IDK>Wait a minute...
15:34:50<IDK>http://archive.today/
15:34:53<IDK>is down?
15:36:15<[42]>redirects me to https://archive.ph/
15:36:29<IDK>All of the domains are down
15:36:38<IDK>DNS_PROBE_FINISHED_NXDOMAIN
15:37:03<IDK>Down for more than a week???
15:37:31<[42]>archive.ph loads fine for me
15:37:44<@OrIdow6>IDK: archive.unreliable blocks Cloudflare DNS because the owner has a grudge against them
15:37:55<IDK>https://archive.li/
15:37:59<IDK>This is up
15:40:39<[42]>supposedly the owner wants to do geo balancing via dns which is not reliable through cloudflare dns as they don't pass information or just wrong information about the origin of the query so they got blocked by archive.today
15:41:19<[42]>probably more to the story, here's some statement from them https://twitter.com/archiveis/status/1018691421182791680
15:51:38Arcorann quits [Ping timeout: 250 seconds]
16:31:10<@JAA>Ryz: It should still be run through AB as well though. As always, I'll only grab the actual thread pages, no images, avatars, etc. with qwarc.
16:59:28sec^nd quits [Remote host closed the connection]
16:59:49sec^nd (second) joins
17:16:45dserve joins
17:17:14<dserve>hey, do we have any status on anyone working on demodrop?
17:17:35<dserve>would love to help, wrote a (small scale and simple) proof of concept that downloading works
17:17:36<dserve>https://paste.ee/p/EsfS1
17:17:51<@JAA>I threw it into ArchiveBot a few days ago, but it got banned almost immediately and was still banned as of a few hours ago.
17:18:23<dserve>why?
17:18:29<dserve>:-/
17:23:40<@JAA>Also, I'm getting various TLS errors from stream.demodrop.com.
17:26:54Daloader joins
17:27:12Daloader quits [Remote host closed the connection]
17:31:48<Ryz>JAA, in that case, need to find a good holding pipeline, might be ananiel <#>;
17:33:09<@JAA>Ryz: It's not that big really. There aren't any per-post links, for example.
17:33:21<Ryz>O#o;
17:33:34<@JAA>Well, there are on the profile pages, but those are not on the community subdomain.
17:33:41<Ryz>Hmm...
17:33:52<@JAA>And only for the most recent few posts from each user.
17:49:04balrog quits [Ping timeout: 250 seconds]
17:57:55nertzy_ joins
18:12:54nertzy_ quits [Client Quit]
18:12:57systwi_ (systwi) joins
18:13:46systwi quits [Ping timeout: 250 seconds]
18:50:59<Ryz>IDK brought up something interesting on whether to archive scam websites in ArchiveBot~
18:51:51<IDK>Like they are 10/11 going to be deleted at some point
18:52:13<IDK>But its intention
18:53:18<Ryz>That's true; in that case, it's more on how to find them, which seems it might be spam websites or something unsavory oo;
18:53:51<IDK>Like they get sent through discord
18:54:08<IDK>And people who check back on 5 days thinking its real
18:54:21<IDK>Go on wayback machine, check the archive
18:54:54<IDK>And found the contact and ended up DMing them about the winning stuff
18:55:06<IDK>So im not sure on that one
18:55:35<Ryz>Also would count archiving new websites that pop up, earliest grab possible, the beginnings o:
18:56:07<IDK>Im not sure
18:58:10<IDK>socialbot: snscrape twitter-user D4T4ENJYR
19:09:09bsmith093 joins
19:13:05bsmith093 quits [Client Quit]
19:16:40Iki joins
19:25:07balrog (balrog) joins
19:35:23<AK>Another option is look at tweets to namecheap+other hosts
19:35:31<AK>They get a lot of "Delete this domain, it's spam" tweets
19:46:27Iki quits [Ping timeout: 244 seconds]
20:06:31Megame (Megame) joins
20:14:46C4K3 joins
20:58:14Iki joins
21:20:36<spirit>https://www.4players.de/ will shut down soon, in case it is not known already
21:20:41<spirit>good old gaming site
21:32:14Lord_Nightmare quits [Ping timeout: 250 seconds]
21:32:57Lord_Nightmare (Lord_Nightmare) joins
21:52:40DogsRNice (Webuser299) joins
21:54:32<dserve>JAA The TLS errors on stream.demodrop.com seem to not matter as it's AWS/Amazon hosting
21:54:50<dserve>If we could decipher the ID:s they use to store the files it would make things much easier (bypass the download.php on demodrop.com)
21:55:05<dserve>But the download.php currently leads to those ID:s/folder paths using the ID
21:55:41<@JAA>dserve: I don't mean the expired certificate. I mean actual TLS alerts.
21:57:58<@JAA>E.g. on https://demodrop.com/download.php?track_id=608866 I get (after the redirect) this with `curl -skvL`: * TLSv1.2 (IN), TLS alert, close notify (256):
21:59:11<dserve>Oh yeah, I saw those while doing my test-crawl (mass-downloading)
21:59:13<@JAA>Also, we can't bypass download.php since it uses signed S3 URLs. You get a 403 without the signature.
21:59:25<dserve>the site is partially broken so I guess it can be explained by force majure (broken site) lol
21:59:34<dserve>Oh gotcha
22:00:46<@JAA>Yeah, guess so, although it's weird since some tracks work fine and others break like that, but all of them are new.
22:01:02<@JAA>On another note, I get permission errors on most download.php requests.
22:01:03balrog quits [Client Quit]
22:01:16<@JAA>Specifically this: <script>top.alert('Sorry, it seems like you don&#39;t have permission to download this track, make sure you&#39;re logged in on the right account');</script>
22:01:17<dserve>You need to set a cookie in order to downloa
22:01:39<dserve>Basically, make an account, and copy the cookie's content and set it using curl either a file's content or in-line
22:01:48<dserve>Example:
22:01:59<@JAA>Well, some tracks work fine for me with cookies disabled in uMatrix.
22:02:20<dserve>curl -L -k https://demodrop.com/download.php?track_id=607632 -o 607632.mp3 -b "__stripe_mid=123;__stripe_sid=123;access_code=123;user_name=JAA
22:02:26<dserve>"
22:03:50<@JAA>I like that their help/FAQ link is dead, by the way. :-)
22:03:57<dserve>Those which work without cookies are public downloads, which is like 1/5 of the public songs
22:04:11<@JAA>I see.
22:07:45<@JAA>Looks like streaming works for way more tracks without an account. POST request to the API, then a simple GET to stream.demodrop.com for the MP3.
22:08:01<dserve>Yeah, the site is broken
22:08:06<dserve>It's shutting down <2 months
22:08:11<dserve>Hence why I'm here trying to solve this
22:08:19<@JAA>I saw on the API that it has separate privacy settings for listening and downloading.
22:08:27<dserve>How do you check that?
22:08:37<@JAA>607632 is "listen": "public", "download": "shared", for example.
22:08:43<dserve>oh yeah
22:08:47<@JAA>https://demodrop.docs.apiary.io/
22:08:53<@JAA>The /v1.0/tracks/id endpoint.
22:09:10<@JAA>But the API requires auth as well.
22:11:03<@JAA>Hmm, there's a sample key in the docs, and it works. I wonder what its rate limits are... :-)
22:12:04<dserve>It has no rate limits
22:12:16<dserve>I accidentially crashed the site when I tried downloading using 500 workers earlier
22:12:20<dserve>It went down for 15 min
22:12:40<dserve>It's a nightmare setup-wise so I have limit my own servers to not overload it
22:12:42<@JAA>Uh, download.php or the API?
22:12:49<dserve>Download.php (the site demodrop.com)
22:13:04<@JAA>Yeah right, I'm talking about the API, to get song metadata.
22:13:05<dserve>it went 503 after it basically overloaded the web server I presume
22:13:07<dserve>ah
22:13:28<@JAA>Which would also reveal which tracks can be downloaded directly and which require an account.
22:13:34<@JAA>Plus which of the latter can be streamed.
22:13:38<dserve>The API (stream.demodrop.com) presumably don't have any limits
22:13:42<dserve>developer pays for amazon s3
22:13:52<dserve>True.
22:14:10<@JAA>I'm talking of api.demodrop.com, and the sample key in the documentation really should have rate limits unless they're very incompetent.
22:14:32<dserve>I'd guess the site has been going down alot (the frontend can't stream / download songs) because of the lack of rate limits
22:14:52<dserve>it has been completely down in periods the last months too, so I'd unfortunately guess it's a project that never went outside its Docker shell
22:15:03<dserve>I used it very actively, it's a great site and a very big loss if not archived
22:15:04qwertyasdfuiopghjkl joins
22:15:35<dserve>So, api.demodrop.com is probably not limited in any way
22:15:46<dserve>I'd guess it's down atm since it doesn't work to stream / download through frontend but download.php works
22:17:08<dserve>I started archiving from ID 603500 upwards as a test and I'm hmm
22:17:09<@JAA>Looks like the website's on AWS EC2, the API's behind Cloudfront, and stream.demodrop.com is colo'd at TransIP (‽).
22:17:26<dserve>I'm currently up in 13GB downloaded complete mp3s
22:17:41<dserve>Seems correct JAA
22:18:26<@JAA>Well, I'll throw some load at the API later and see how that goes. I can easily go for hundreds of requests per second, so my side isn't the issue. :-P
22:18:33<dserve>How much storage do you have?
22:18:39<dserve>I'd guess this is in the TB:s
22:18:45<dserve>Lovely btw
22:18:53balrog (balrog) joins
22:19:12<dserve>I own two datacenter dedicated servers which could house the data, they have 8TB combined
22:19:21<dserve>The issue is more the method to scrape the data
22:19:59<@JAA>My storage situation is complicated and messy. In any case, that won't be the limiting factor. And there's still plenty of time at over 2 months, so it can be retrieved slowly as well.
22:20:59<@JAA>There are up to about 600k tracks, so yeah, should be a few TB probably. That's fine.
22:23:31Iki quits [Ping timeout: 244 seconds]
22:27:51<nicolas17>so all these old Xcode versions I uploaded to archive.org, they're under the 'open_source_software' IA collection because it didn't give me any better option; is that a problem, and can I do anything to change it?
22:28:59<dserve>JAA If you want to collab on saving this down, I'm up
23:15:03nertzy_ joins
23:18:13AntiLiberal joins
23:24:44nertzy__ joins
23:27:04nertzy_ quits [Ping timeout: 250 seconds]
23:48:13Arcorann (Arcorann) joins