01:37:44 | <audrooku|m> | `https://web.archive.org/web/20160121141614/https://w.soundcloud.com/player/?visual=true&url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F198974918&%23038;show_artwork=true&%23038;maxwidth=750&%23038;maxheight=1000` |
01:37:47 | <audrooku|m> | 520 response, that's a new one |
01:52:43 | <pokechu22> | I think that's pretty common from cloudflare |
02:21:08 | <audrooku|m> | This isn't cloudflare |
02:21:21 | <audrooku|m> | To my knowledge |
02:23:47 | <@JAA> | &%23038; |
02:23:56 | <@JAA> | I guess that's one way of encoding an &. |
02:24:13 | <@JAA> | Anyway, I'm getting a 404. |
03:01:02 | <audrooku|m> | > &%23038; |
03:01:02 | <audrooku|m> | Jesus christ |
03:27:16 | <audrooku|m> | another "payload error" url `https://web.archive.org/web/20161130001722/https://soundcloud.com/oembed?url=https%3A%2F%2Fsoundcloud.com%2Fatlanticrecords&format=xml` |
03:27:54 | <@JAA> | > Secure Connection Failed |
03:28:17 | <@JAA> | On the iframe, to be clear. |
03:32:05 | | AlsoHP_Archivist joins |
03:34:39 | | HP_Archivist quits [Ping timeout: 272 seconds] |
04:48:07 | | tzt quits [Ping timeout: 272 seconds] |
04:49:43 | | HP_Archivist (HP_Archivist) joins |
04:51:17 | | AlsoHP_Archivist quits [Ping timeout: 272 seconds] |
05:05:51 | | HP_Archivist quits [Ping timeout: 272 seconds] |
05:25:42 | | HP_Archivist (HP_Archivist) joins |
05:33:52 | | AlsoHP_Archivist joins |
05:36:15 | | HP_Archivist quits [Ping timeout: 272 seconds] |
05:36:34 | | DogsRNice quits [Read error: Connection reset by peer] |
05:58:23 | | HP_Archivist (HP_Archivist) joins |
06:00:19 | | AlsoHP_Archivist quits [Ping timeout: 272 seconds] |
06:04:39 | | AlsoHP_Archivist joins |
06:05:23 | | HP_Archivist quits [Ping timeout: 272 seconds] |
06:10:21 | | pabs quits [Client Quit] |
06:11:05 | | AlsoHP_Archivist quits [Ping timeout: 272 seconds] |
06:15:45 | | pabs (pabs) joins |
06:29:13 | | HP_Archivist (HP_Archivist) joins |
06:32:31 | | Doranwen quits [Remote host closed the connection] |
06:32:57 | | Doranwen (Doranwen) joins |
12:25:53 | | Iki joins |
13:01:50 | | Arcorann quits [Ping timeout: 240 seconds] |
17:23:35 | | AlsoHP_Archivist joins |
17:24:50 | | HP_Archivist quits [Ping timeout: 240 seconds] |
19:09:49 | | DogsRNice joins |
20:17:25 | | tech234a quits [Quit: Connection closed for inactivity] |
21:12:28 | | AlsoHP_Archivist quits [Read error: Connection reset by peer] |
22:25:39 | | Arcorann (Arcorann) joins |
22:38:23 | <nicolas17> | 14.36s/MiB |
22:38:25 | <nicolas17> | x_x |
22:39:07 | <nicolas17> | I thought I'd be bottlenecked by samsung, not by IA |
22:40:35 | <nicolas17> | I'm doing multiple simultaneous uploads and the *total* is 700KB/s |
23:03:41 | | Webuser230 joins |
23:09:06 | <Webuser230> | Has anyone else seen the new Wayback Machine limits? "This host has been already captured 80,118.0 times today. Please try again tomorrow. Please email us at "info@archive.org" if you would like to discuss this more."Has anyone else seen the new Wayback Machine limits? "This host has been already captured 80,118.0 times today. Please try again |
23:09:06 | <Webuser230> | tomorrow. Please email us at "info@archive.org" if you would like to discuss this more." |
23:11:45 | <@JAA> | YouTube? |
23:12:38 | <@JAA> | If so, that's been happening for at least almost a year. |
23:14:27 | <pokechu22> | It resets at midnight UTC (which is in 45 minutes) |
23:14:59 | <Webuser230> | YouTube for now. Odd how I only got it now, because I've been archiving YouTube videos for a while and it's never popped up before |
23:15:26 | <pokechu22> | Yeah, it generally happens in the last few hours of the (UTC) day |
23:17:11 | <Webuser230> | On a side note, I wanted to bring up how they keep on shortening the daily max archive amount. Back in like October the limit was 100k/day/user and now it's as low as 50k/day/user. They also reduced the max archives for a single URL from 10/day to I think 6/day for now |
23:35:51 | <@JAA> | I have no idea about the former, but the latter varies all the time. It was even 1 for some time. |
23:39:21 | <Webuser230> | I use the Google Sheets archiver to mass archive a large forum over a large timespam (outlinks enabled, each permalink of https://forum.com/post/<postid> generates 50 outlinks, and there's about 16 million posts), maybe they didn't like this |
23:44:47 | <@JAA> | Yeah, SPN is probably not the best tool for that anyway. |
23:45:03 | | katia (katia) joins |
23:46:15 | <katia> | what tool do you use for uploading files to s3ia? is rclone fine? trying to delete stuff via it because i copied it at the wrong path seems to not do anything - or maybe i just haven't waited enough |
23:47:52 | <Webuser230> | JAA the thing is that the forum went down once (and a lot of old links to the previous version still don't work), so I want to archive it, but it's very huge |
23:48:02 | <Webuser230> | Like I said 16 million posts |
23:48:30 | <Webuser230> | The problem is that the forum software redirects /post/postid to /thread/threadid/page-pagenumber#postid |
23:48:39 | <Webuser230> | well to summarize it's Xenforo |
23:48:42 | <@JAA> | katia: The ia CLI is probably your friend. I use either that (mostly via preexisting automation) or my own uploading script. |
23:49:50 | <katia> | JAA, thanks |
23:51:20 | <fireonlive> | katia: any fileop takes time while it makes its way though the queue |
23:51:22 | <@JAA> | Webuser230: Sounds like a potential candidate for ArchiveBot, although 16 million posts is at the upper end of feasibility there. There are also other tools. If it has to be grabbed fast, I'd do it with qwarc, but I can't recommend that to anyone else really. In any case, SPN doesn't scale to things like this. |
23:52:17 | <@JAA> | We've archived a forum or two with many millions of posts before. :-) |
23:52:57 | <Webuser230> | It's not an emergency as of right now, since the old forum was custom made and full of security holes, so they took it down in order to migrate it to Xenforo |
23:53:09 | <@JAA> | Which forum is it? |
23:53:15 | <Webuser230> | https://forums.mangadex.org/ |
23:53:48 | <Webuser230> | Well there's about 20 million posts now but only about 1.6 million threads |
23:54:05 | <@JAA> | Hmm, homepage says 6.5M posts in 530k threads. |
23:54:21 | <@JAA> | Post IDs go to 20M though, yeah. |
23:54:35 | <@JAA> | Suggesting that over two thirds of all posts got deleted. Fun. |
23:55:53 | <Webuser230> | I was looking at the IDs in the URLs to make my guess |
23:55:54 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
23:55:54 | <Webuser230> | Although that does make sense considering for every URL I properly archived about 3 of them would throw a 404 |
23:56:02 | <Webuser230> | I think the problem with the old one was that if your post was invalid it would return an error but consume a post ID |
23:56:08 | <Webuser230> | because they really don't delete posts |
23:56:28 | <@JAA> | Let's take this to #archiveteam-bs since this isn't about IA anymore. |
23:57:00 | <Webuser230> | alright |