01:37:44<audrooku|m>`https://web.archive.org/web/20160121141614/https://w.soundcloud.com/player/?visual=true&url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F198974918&%23038;show_artwork=true&%23038;maxwidth=750&%23038;maxheight=1000`
01:37:47<audrooku|m>520 response, that's a new one
01:52:43<pokechu22>I think that's pretty common from cloudflare
02:21:08<audrooku|m>This isn't cloudflare
02:21:21<audrooku|m>To my knowledge
02:23:47<@JAA>&%23038;
02:23:56<@JAA>I guess that's one way of encoding an &.
02:24:13<@JAA>Anyway, I'm getting a 404.
03:01:02<audrooku|m>> &%23038;
03:01:02<audrooku|m>Jesus christ
03:27:16<audrooku|m>another "payload error" url `https://web.archive.org/web/20161130001722/https://soundcloud.com/oembed?url=https%3A%2F%2Fsoundcloud.com%2Fatlanticrecords&format=xml`
03:27:54<@JAA>> Secure Connection Failed
03:28:17<@JAA>On the iframe, to be clear.
03:32:05AlsoHP_Archivist joins
03:34:39HP_Archivist quits [Ping timeout: 272 seconds]
04:48:07tzt quits [Ping timeout: 272 seconds]
04:49:43HP_Archivist (HP_Archivist) joins
04:51:17AlsoHP_Archivist quits [Ping timeout: 272 seconds]
05:05:51HP_Archivist quits [Ping timeout: 272 seconds]
05:25:42HP_Archivist (HP_Archivist) joins
05:33:52AlsoHP_Archivist joins
05:36:15HP_Archivist quits [Ping timeout: 272 seconds]
05:36:34DogsRNice quits [Read error: Connection reset by peer]
05:58:23HP_Archivist (HP_Archivist) joins
06:00:19AlsoHP_Archivist quits [Ping timeout: 272 seconds]
06:04:39AlsoHP_Archivist joins
06:05:23HP_Archivist quits [Ping timeout: 272 seconds]
06:10:21pabs quits [Client Quit]
06:11:05AlsoHP_Archivist quits [Ping timeout: 272 seconds]
06:15:45pabs (pabs) joins
06:29:13HP_Archivist (HP_Archivist) joins
06:32:31Doranwen quits [Remote host closed the connection]
06:32:57Doranwen (Doranwen) joins
12:25:53Iki joins
13:01:50Arcorann quits [Ping timeout: 240 seconds]
17:23:35AlsoHP_Archivist joins
17:24:50HP_Archivist quits [Ping timeout: 240 seconds]
19:09:49DogsRNice joins
20:17:25tech234a quits [Quit: Connection closed for inactivity]
21:12:28AlsoHP_Archivist quits [Read error: Connection reset by peer]
22:25:39Arcorann (Arcorann) joins
22:38:23<nicolas17>14.36s/MiB
22:38:25<nicolas17>x_x
22:39:07<nicolas17>I thought I'd be bottlenecked by samsung, not by IA
22:40:35<nicolas17>I'm doing multiple simultaneous uploads and the *total* is 700KB/s
23:03:41Webuser230 joins
23:09:06<Webuser230>Has anyone else seen the new Wayback Machine limits? "This host has been already captured 80,118.0 times today. Please try again tomorrow. Please email us at "info@archive.org" if you would like to discuss this more."Has anyone else seen the new Wayback Machine limits? "This host has been already captured 80,118.0 times today. Please try again
23:09:06<Webuser230>tomorrow. Please email us at "info@archive.org" if you would like to discuss this more."
23:11:45<@JAA>YouTube?
23:12:38<@JAA>If so, that's been happening for at least almost a year.
23:14:27<pokechu22>It resets at midnight UTC (which is in 45 minutes)
23:14:59<Webuser230>YouTube for now. Odd how I only got it now, because I've been archiving YouTube videos for a while and it's never popped up before
23:15:26<pokechu22>Yeah, it generally happens in the last few hours of the (UTC) day
23:17:11<Webuser230>On a side note, I wanted to bring up how they keep on shortening the daily max archive amount. Back in like October the limit was 100k/day/user and now it's as low as 50k/day/user. They also reduced the max archives for a single URL from 10/day to I think 6/day for now
23:35:51<@JAA>I have no idea about the former, but the latter varies all the time. It was even 1 for some time.
23:39:21<Webuser230>I use the Google Sheets archiver to mass archive a large forum over a large timespam (outlinks enabled, each permalink of https://forum.com/post/<postid> generates 50 outlinks, and there's about 16 million posts), maybe they didn't like this
23:44:47<@JAA>Yeah, SPN is probably not the best tool for that anyway.
23:45:03katia (katia) joins
23:46:15<katia>what tool do you use for uploading files to s3ia? is rclone fine? trying to delete stuff via it because i copied it at the wrong path seems to not do anything - or maybe i just haven't waited enough
23:47:52<Webuser230>JAA the thing is that the forum went down once (and a lot of old links to the previous version still don't work), so I want to archive it, but it's very huge
23:48:02<Webuser230>Like I said 16 million posts
23:48:30<Webuser230>The problem is that the forum software redirects /post/postid to /thread/threadid/page-pagenumber#postid
23:48:39<Webuser230>well to summarize it's Xenforo
23:48:42<@JAA>katia: The ia CLI is probably your friend. I use either that (mostly via preexisting automation) or my own uploading script.
23:49:50<katia>JAA, thanks
23:51:20<fireonlive>katia: any fileop takes time while it makes its way though the queue
23:51:22<@JAA>Webuser230: Sounds like a potential candidate for ArchiveBot, although 16 million posts is at the upper end of feasibility there. There are also other tools. If it has to be grabbed fast, I'd do it with qwarc, but I can't recommend that to anyone else really. In any case, SPN doesn't scale to things like this.
23:52:17<@JAA>We've archived a forum or two with many millions of posts before. :-)
23:52:57<Webuser230>It's not an emergency as of right now, since the old forum was custom made and full of security holes, so they took it down in order to migrate it to Xenforo
23:53:09<@JAA>Which forum is it?
23:53:15<Webuser230>https://forums.mangadex.org/
23:53:48<Webuser230>Well there's about 20 million posts now but only about 1.6 million threads
23:54:05<@JAA>Hmm, homepage says 6.5M posts in 530k threads.
23:54:21<@JAA>Post IDs go to 20M though, yeah.
23:54:35<@JAA>Suggesting that over two thirds of all posts got deleted. Fun.
23:55:53<Webuser230>I was looking at the IDs in the URLs to make my guess
23:55:54qwertyasdfuiopghjkl quits [Remote host closed the connection]
23:55:54<Webuser230>Although that does make sense considering for every URL I properly archived about 3 of them would throw a 404
23:56:02<Webuser230>I think the problem with the old one was that if your post was invalid it would return an error but consume a post ID
23:56:08<Webuser230>because they really don't delete posts
23:56:28<@JAA>Let's take this to #archiveteam-bs since this isn't about IA anymore.
23:57:00<Webuser230>alright