| 00:09:55 | | tbc1887 quits [Read error: Connection reset by peer] |
| 00:19:10 | | BlueMaxima joins |
| 00:32:23 | | sonick (sonick) joins |
| 01:01:52 | | tzt quits [Remote host closed the connection] |
| 01:02:14 | | tzt (tzt) joins |
| 01:20:06 | | tbc1887 (tbc1887) joins |
| 01:44:25 | | Arcorann (Arcorann) joins |
| 01:46:52 | <@JAA> | My Spinrilla qwarc is running. At the moment, I'm only fetching API data, covers, and pages on songs and mixtapes. Audio comes later. I'm skipping user pages for the artists because they're very slow (but am collecting them in case there's time). |
| 01:47:11 | <@JAA> | The API data includes comments. |
| 01:54:49 | <@JAA> | Hmm, might want to start the audio retrieval right away though. There's only 34 hours left. |
| 02:14:26 | | superusercode joins |
| 02:29:52 | | tbc1887 quits [Read error: Connection reset by peer] |
| 02:40:58 | <@JAA> | There are about 2.9M songs to download, each with two different URLs, which return the same file. Has to be in the terabytes. |
| 02:42:07 | | superusercode is now authenticated as superusercode |
| 02:45:01 | | superusercode quits [Client Quit] |
| 02:45:06 | | superusercode (superusercode) joins |
| 02:46:37 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+198, /* 2023 */ Add Pokémon TCG Online): https://wiki.archiveteam.org/?diff=49742&oldid=49722 |
| 02:47:37 | <h2ibot> | 0KepOnline edited Spore (+8603, Fixed URL for asset view with ATOM): https://wiki.archiveteam.org/?diff=49743&oldid=49046 |
| 03:02:57 | | Hae quits [Ping timeout: 265 seconds] |
| 03:08:40 | | umgr036 joins |
| 03:09:33 | | umgr036 quits [Remote host closed the connection] |
| 03:09:47 | | umgr036 joins |
| 03:15:50 | | whoami (whoami) joins |
| 03:35:30 | | fullpwndotnet joins |
| 03:37:21 | <@JAA> | fullpwndotnet: What's being deleted exactly, and how can the files be found? |
| 03:37:44 | | Max|m1234 joins |
| 03:38:39 | <fullpwndotnet> | drivers. the files can be found by entering your model into the toshiba site |
| 03:38:47 | <fullpwndotnet> | ill grab the url give me a sec |
| 03:39:06 | | Hae (Hae) joins |
| 03:39:25 | <@JAA> | Yeah, I searched around briefly on your first message in #archiveteam and found like a dozen different official driver sites. |
| 03:40:41 | <fullpwndotnet> | https://support.dynabook.com/support/modelHome?freeText=321482&osId=3333637 this is an example |
| 03:40:59 | <fullpwndotnet> | dynabook and toshiba share a driver site |
| 03:41:24 | <pabs> | JAA: should we !abort the Spinrilla AB? or make it ignore the audio or something? |
| 03:43:04 | <fullpwndotnet> | i have an example of the deletion |
| 03:43:05 | <@JAA> | fullpwndotnet: Yeah, about what I expected, a shitty interface that's impossible to work with. How do we discover all files they have? |
| 03:43:26 | <fullpwndotnet> | https://support.dynabook.com/support/modelHome?freeText=1200013246&osId=3333728 on this only like 3? downloads work |
| 03:44:25 | <@JAA> | pabs: Not sure, I'm currently rapidly running out of disk space because IA uploads are sad. |
| 03:45:34 | <fullpwndotnet> | im currently having a look around if i can try and pull all the computer models |
| 03:46:16 | <fullpwndotnet> | I'm just poking around to find out |
| 03:48:02 | <fullpwndotnet> | before I forget, found this weird FTP server https://uk.dynabook.com/generic/general-new-ftp-and-software-guide-sheets/ |
| 03:48:07 | <fullpwndotnet> | might be useful |
| 03:48:51 | <nicolas17> | JAA: how much data you estimate? |
| 03:48:51 | <h2ibot> | Tech234a edited Running Archive Team Projects with Docker (-8, Correct Watchtower interval: five minutes->hour): https://wiki.archiveteam.org/?diff=49744&oldid=49586 |
| 03:49:32 | <fullpwndotnet> | JAA for more concern the ftp server has write access |
| 03:49:44 | <nicolas17> | *what* |
| 03:49:53 | <fullpwndotnet> | i know. |
| 03:49:55 | <@JAA> | nicolas17: I'm struggling to upload a couple gigabytes currently, and it isn't even Spinrilla data. The Spinrilla audio should be in the terabytes, see 02:40. |
| 03:50:30 | <fullpwndotnet> | the ftp server has a fair few computers |
| 03:50:41 | <fullpwndotnet> | not many. ill keep looking arounf |
| 03:51:04 | <fullpwndotnet> | aha! found it al |
| 03:52:13 | <fullpwndotnet> | if you view source on the page |
| 03:52:15 | | Hae quits [Remote host closed the connection] |
| 03:52:38 | <fullpwndotnet> | ctrl+f chuck in this |
| 03:52:38 | <fullpwndotnet> | var allProducts |
| 03:53:10 | <fullpwndotnet> | JAA hope this helps |
| 03:53:23 | <@JAA> | Ah fun |
| 03:53:44 | <fullpwndotnet> | its... alot. |
| 03:54:09 | <@JAA> | Yeah |
| 03:54:15 | <@JAA> | The FTP is running through ArchiveBot now. |
| 03:54:46 | <andrew> | WBM supports FTP? |
| 03:54:55 | <andrew> | wait, WARC supports FTP!? |
| 03:55:17 | <fullpwndotnet> | sick! and for the json models, will you take care of that? |
| 03:55:27 | <nicolas17> | I'm trying to get the size of this ftp |
| 03:56:09 | <fullpwndotnet> | annoyingly, it does a full page reload anytime you select a machine or OS |
| 03:56:34 | <@JAA> | andrew: Well, technically, no. |
| 03:57:04 | <@JAA> | It only supports HTTP/1.1, not even 1.0 or 2. |
| 03:57:16 | <nicolas17> | Deployment_Files/Archive: 16GB |
| 03:57:21 | <nicolas17> | FTP latency suuuucks |
| 03:57:39 | <fullpwndotnet> | yikes |
| 03:58:15 | <@JAA> | I bet the AB job will crash, but it will grab something at least. |
| 03:58:29 | <fullpwndotnet> | fingers crossed |
| 03:59:08 | <@JAA> | `function filterDriversUpdatesResults() { var driversUpdatesJsonArr = eval([{ ...` |
| 03:59:11 | <@JAA> | This site... lol |
| 03:59:18 | <fullpwndotnet> | oh its awful |
| 03:59:29 | <nicolas17> | okay rclone is smarter at using parallel requests |
| 04:00:07 | <fullpwndotnet> | now toshiba and dynabook confused on why there is gon be so much traffic |
| 04:00:27 | <fullpwndotnet> | want a crazy idea? https://support.dynabook.com/support/contentDetail?contentType=DL&contentId=872584&cipherKey=&sor=undefined |
| 04:01:34 | <@JAA> | I don't have time to reverse-engineer all of that right now. |
| 04:01:53 | <nicolas17> | FTP 75GB 1700 files and still counting |
| 04:02:06 | <fullpwndotnet> | JAA fair enough |
| 04:03:41 | <fullpwndotnet> | and the numbering seems very random… like newer machines (Satellite 1955 = Pentium 4 probably around 2003) here even have lower ids than older machines (200CDS = Pentium 100MHz probably around 1995) |
| 04:06:16 | <nicolas17> | done indexing |
| 04:06:21 | <nicolas17> | FTP Total usage: 166.925G, Objects: 4027 |
| 04:06:53 | <fullpwndotnet> | not bad |
| 04:07:09 | <fullpwndotnet> | smaller than my tv archive |
| 04:10:41 | <fullpwndotnet> | im going to log off. weve got to our destination. thank you so much! |
| 04:11:27 | | fullpwndotnet quits [Remote host closed the connection] |
| 04:22:58 | | dumbgoy joins |
| 04:28:14 | <nicolas17> | with enough parallel threads, FTP downloads at 100-200Mbps |
| 04:28:38 | <nicolas17> | so far, less duplicate files than I expected |
| 05:04:11 | | sec^nd quits [Ping timeout: 245 seconds] |
| 05:05:35 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 05:11:19 | | sec^nd (second) joins |
| 05:22:03 | <nicolas17> | huh, we're doing 3Gbps, I missed that milestone :D |
| 05:23:12 | <nicolas17> | wrong channel :D |
| 05:24:59 | <nicolas17> | in more relevant news, I'll be done with the ftp in a few hours... but I'm questioning why I did it since it will take me longer to upload it anywhere than it would take for anyone else to download it from ftp |
| 06:08:34 | | sec^nd quits [Remote host closed the connection] |
| 06:08:54 | | sec^nd (second) joins |
| 06:18:23 | | jwoglom|m joins |
| 06:25:22 | | Island quits [Read error: Connection reset by peer] |
| 07:08:09 | | superkuh quits [Remote host closed the connection] |
| 07:08:09 | | AnotherIki quits [Remote host closed the connection] |
| 07:08:19 | | superkuh joins |
| 07:08:21 | | AnotherIki joins |
| 07:08:40 | | hitgrr8 joins |
| 07:37:54 | <@JAA> | So for Spinrilla, the two audio URLs for each song are https://api.spinrilla.com/tracks/2874608/original.mp3 (stream) and https://api.spinrilla.com/tracks/2874608/download for each track. IDs go up to 2875785 as of a couple hours ago. I'd estimate the total size as very roughly 5 TB (assuming dedupe between the two URLs). The streams come from Cloudfront with expiring URLs. I didn't check for rate |
| 07:38:00 | <@JAA> | limits on these, but I haven't seen any limitations on other URLs; currently pulling the other data at 100+ req/s from a single IP. |
| 07:39:10 | <@JAA> | qwarc is almost done with the mixtapes, then it'll start with songs. The latter are more IDs but fewer requests. Should easily finish in time, I think. |
| 07:40:42 | <@JAA> | Deadline is 2023-05-08 00:00 UTC. |
| 07:41:00 | <@JAA> | I basically won't be around between now and then, so I can't do anything about the audio. |
| 07:41:33 | | tsblock (tsblock) joins |
| 07:46:08 | <@JAA> | The AB job is grabbing some of it, but it obviously won't get anywhere near completion. There's already a good amount of original.mp3 in the WBM from a previous AB job a couple years ago. Prioritising the non-covered IDs original.mp3 would probably be a good idea. |
| 07:51:22 | | Billy549 quits [Quit: Goodbye!~] |
| 07:53:41 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+224, /* 2023 */ Add Spinrilla): https://wiki.archiveteam.org/?diff=49745&oldid=49742 |
| 07:54:41 | | Billy549 (Billy549) joins |
| 07:58:03 | | lexikiq quits [Client Quit] |
| 07:59:03 | <@JAA> | Song retrieval has started; not sure it will finish 'easily' in time, but it should just about work out assuming no problems occur. |
| 07:59:31 | <@JAA> | (Song metadata/comments retrieval, just to be clear.) |
| 08:03:27 | | Billy549 quits [Client Quit] |
| 08:03:45 | | Billy549 (Billy549) joins |
| 08:16:44 | | Billy549 leaves |
| 08:17:15 | | Billy549 (Billy549) joins |
| 08:24:47 | <h2ibot> | Wickedplayer494 uploaded File:Gfycat - 5-7-23.png: https://wiki.archiveteam.org/?title=File%3AGfycat%20-%205-7-23.png |
| 08:25:47 | <h2ibot> | Wickedplayer494 edited Gfycat (+49, Image and navbox): https://wiki.archiveteam.org/?diff=49747&oldid=47898 |
| 08:26:47 | <h2ibot> | Wickedplayer494 edited Enjin (+20, Navbox): https://wiki.archiveteam.org/?diff=49748&oldid=49734 |
| 08:28:47 | <h2ibot> | Wickedplayer494 edited Docker Hub (+20, Navbox): https://wiki.archiveteam.org/?diff=49749&oldid=49587 |
| 08:31:48 | <h2ibot> | Wickedplayer494 edited Pixiv (+20, Navbox): https://wiki.archiveteam.org/?diff=49750&oldid=49239 |
| 09:05:44 | | tsblock quits [Client Quit] |
| 09:21:07 | | Justin[home] joins |
| 09:21:07 | | Justin[home] is now authenticated as DopefishJustin |
| 09:22:31 | | DopefishJustin quits [Ping timeout: 252 seconds] |
| 10:04:54 | | Ruthalas5 quits [Ping timeout: 265 seconds] |
| 10:05:30 | | Ruthalas5 (Ruthalas) joins |
| 10:17:38 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 10:43:05 | | umgr036 quits [Ping timeout: 265 seconds] |
| 10:46:43 | | sec^nd quits [Remote host closed the connection] |
| 10:49:13 | | sec^nd (second) joins |
| 11:00:25 | | Ruthalas5 quits [Ping timeout: 252 seconds] |
| 11:28:50 | | za3k quits [Ping timeout: 252 seconds] |
| 12:31:14 | | imer joins |
| 12:56:11 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 13:36:48 | | Letur74 joins |
| 13:37:18 | | Letur74 leaves |
| 13:37:45 | | Letur joins |
| 13:52:33 | | Arcorann quits [Ping timeout: 265 seconds] |
| 15:01:30 | | zhongfu quits [Ping timeout: 252 seconds] |
| 15:02:11 | | zhongfu (zhongfu) joins |
| 15:05:03 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 15:56:19 | | zhongfu quits [Ping timeout: 252 seconds] |
| 16:10:33 | | sonick quits [Client Quit] |
| 16:11:14 | | zhongfu (zhongfu) joins |
| 16:21:00 | <@JAA> | Since nobody took the bait, I'll briefly try to construct some original.mp3 URL lists to run through AB. |
| 16:21:09 | <@JAA> | Will at least get us something. |
| 16:27:15 | <imer> | JAA: I've got bandwidth/storage to grab some stuff, just at a loss of how to best go about this. Tried "dumb" wget, but that seemed way too slow |
| 16:28:37 | <imer> | attempted to use with your qwarc thing as well, but couldnt get that to run |
| 16:29:59 | <imer> | I assume you'd want warc archives? otherwise I could probably just grab the files no problem, new to all this though |
| 16:30:40 | <pokechu22> | Anything's better than nothing, but warc is ideal |
| 16:31:09 | <@JAA> | imer: Yeah, qwarc is definitely not user-friendly, especially given the tremendous amount of documentation. Not surprised you couldn't get it to work. |
| 16:31:17 | <@JAA> | I want to improve that, but time... |
| 16:32:06 | <@JAA> | wget unfortunately produces broken WARCs, but you'd definitely have to parallelise heavily. |
| 16:32:14 | <imer> | my strategy would be to just brute force download all the links by id and grab those, not sure about what tool to use for the warc output though |
| 16:32:42 | <@JAA> | The good news is that our 2020 AB grab already archived 925k tracks. |
| 16:32:54 | <@JAA> | The current run probably did some on top of that, too. |
| 16:33:48 | <@JAA> | Same strategy I use almost always, yeah. |
| 16:34:23 | <@JAA> | qwarc could probably do it, but it's not perfect for large files. |
| 16:34:44 | <@JAA> | You could try wpull. |
| 16:34:53 | <@JAA> | Even concurrency should be fine on this one. |
| 16:35:08 | <imer> | Will check that out, thanks |
| 16:35:38 | <@JAA> | Quick maths: we need very roughly 320 MB/s to get it all in time. |
| 16:36:05 | <@JAA> | (2.9 million songs, 3 MB per song, 7.5 hours remaining) |
| 16:40:12 | <@JAA> | Sampling suggests about 37.5% of the IDs don't exist, so that lowers the requirements accordingly. |
| 16:42:39 | <@JAA> | Looks like my metadata retrieval is just marginally too slow. ETA is something like 00:30 currently. |
| 16:46:58 | <imer> | JAA: "wpull only supports Python 3.4 to 3.6" looks like that still applies? |
| 16:47:07 | <@JAA> | Yes, it does, unfortunately. |
| 16:47:15 | <@JAA> | Lots of our software needs some love. :-/ |
| 16:47:30 | <@JAA> | (Time...) |
| 16:48:03 | <@JAA> | The current AB job has grabbed about 200k tracks so far. |
| 16:49:35 | <@JAA> | Roughly 33k of them haven't been grabbed before. |
| 16:51:06 | <@JAA> | These track IDs have been archived by either the 2020 job or the running one as of a few minutes ago: https://transfer.archivete.am/s8Fbq/spinrilla-track-audio-cdn-archivebot |
| 16:52:48 | <@JAA> | About half of the remaining 1.92M do not exist. So that's somewhat promising. |
| 16:56:34 | <@JAA> | imer: Generating/uploading lists of remaining URLs in random order right now. |
| 16:56:43 | <@JAA> | So we can avoid duplicating effort. |
| 16:57:43 | <imer> | sweet, still struggling to get wpull setup, best way of getting an old python version set up isn't obvious to me (tried to fix some errors for newer python but got stuck on sqlalchemy complaining) |
| 16:58:35 | <@JAA> | pyenv is my favourite way, else 'official' Docker images exist. |
| 16:59:02 | <@JAA> | All of ArchiveBot runs on pyenv these days. |
| 17:00:18 | <@JAA> | Here are all the lists, each with ~19k URLs: https://transfer.archivete.am/Ay4VU/spinrilla-track-audio-lists (simply removing the .zst extension from the URL returns the decompressed lists) |
| 17:00:24 | <@JAA> | I'll start feeding them into AB from the top. |
| 17:11:23 | <imer> | pyenv seems to have worked (with some massaging), so, basically I run "wpull -i url-list.txt --warc-file some-filename"? |
| 17:12:57 | <@JAA> | Oh right, I just remembered... You have wpull 2.0.3 I guess. Concurrency via CLI is broken there. :-| |
| 17:14:19 | <imer> | what version do I want? |
| 17:14:58 | <@JAA> | `wpull -i list --warc-file fileprefix --warc-max-size $((5*1024*1024*1024)) --delete-after -o fileprefix.log` should be it I think. |
| 17:15:34 | <@JAA> | Well, version 1.2.3 would fix the concurrency problem but has other issues. If you have the resources, it's probably easier to scale horizontally instead. |
| 17:21:59 | <myself> | is there a shard-and-gather infrastructure short of a full warrior job, for quick one-offs where you just need to hand out a crapton of wpull jobs in a hurry? or is there a skeleton warrior job that could take such assignments with minimal customization? |
| 17:22:49 | | spirit joins |
| 17:26:38 | <imer> | JAA: right, i'm running 70-99 in parallel, giving me ~500mbit/s on average. looks like I should be able to run some more cpu/memory wise |
| 17:31:31 | <imer> | 50-99 running now, website seems to be happy still, > 1gbit peaks depending on 404s from the looks of it |
| 17:34:31 | <@JAA> | imer: Ack, 0-10 are running through AB and approaching the half-way point, I'll throw in more as they finish. |
| 17:38:07 | <@JAA> | myself: No, though we can feed lists through #// (potentially slowly) if we don't care about feedback about success. |
| 17:58:11 | <imer> | rough sampling from logs I'm about 1k items done in each list, seems a bit too slow if thats the pace 19k/per list, about 1k per half hour, so 8 more hours which is 2 too late |
| 17:58:38 | <imer> | 9 more hours* math is hard |
| 18:01:21 | <imer> | might abort and chop each list into more smaller parts so I can run more in parallel? |
| 18:05:55 | <@JAA> | Yeah, that sounds like a good idea. |
| 18:23:03 | | Ruthalas5 (Ruthalas) joins |
| 18:46:03 | | spirit quits [Client Quit] |
| 18:52:58 | <imer> | rough calculation says should be done with 50-99 in 5 hours, so *just* |
| 18:53:25 | | Guest50 quits [Client Quit] |
| 18:53:53 | | Guest50 joins |
| 18:55:18 | | Justin[home] is now known as DopefishJustin |
| 18:56:27 | <imer> | managing around 3.2k items/s with cpu pegged (just from peeking at logs, sampling random lines and checking where in the lists the ids are) |
| 18:56:43 | <imer> | a minute* not /s |
| 18:59:06 | <imer> | for reference, going 50 and up, so if 50 under is done can do 99 descending |
| 19:07:01 | <@JAA> | Up to 22 is done or running through AB now. |
| 19:32:16 | <imer> | Started 66 just now |
| 19:33:03 | | Craigle quits [Quit: The Lounge - https://thelounge.chat] |
| 19:33:31 | | Craigle (Craigle) joins |
| 20:04:44 | <@JAA> | 30 in AB |
| 20:10:19 | <vokunal|m> | I'd like to make a list of local business sites and have them sent through archivebot. Is this allowed? And what would be the least annoying way for me to ask this? |
| 20:17:59 | <imer> | started on 75 |
| 20:19:37 | <@JAA> | vokunal|m: Not just allowed, encouraged. :-) Uploading a list to https://tranfer.archivete.am/ and then asking in #archivebot or here (if it gets drowned over there) about it is the easiest route. How many sites are you envisioning? |
| 20:23:33 | <vokunal|m> | I have around 50 sites right now. Could be in the range of 100-200 if I keep digging. I got all the ones I know of off hand, and a bit of browsing google maps snagging the ones I know are small businesses |
| 20:32:33 | | Ivan226 leaves |
| 20:32:38 | | Ivan226 joins |
| 20:32:41 | <Ivan226> | someone get these for me thanks https://transfer.archivete.am/oG6b6/hsrwiki-alllinks.txt https://transfer.archivete.am/14Pod3/hsrwiki-newfiles-w430.txt |
| 20:37:51 | <@JAA> | vokunal|m: Yeah, that sounds fine. Most are probably tiny sites anyway. |
| 20:38:16 | <@JAA> | imer: 41 is running. |
| 20:39:09 | | hitgrr8 quits [Client Quit] |
| 20:40:26 | <@JAA> | All of this has massively slowed down my metadata grab. It probably won't finish in time. |
| 20:40:49 | <@JAA> | Completing under 1000 tracks per minute now, it was over 3k previously. |
| 20:44:47 | <Ryz> | I'm trying to push for more slots for JAA |
| 20:45:55 | <Ryz> | How many hours left? |
| 20:45:56 | <Ryz> | JAA? |
| 20:46:09 | <imer> | 80 here |
| 20:46:16 | <imer> | Ryz: bit over 3 |
| 20:54:40 | <@JAA> | Ryz: Slot freeing isn't needed, it's mostly limited by how fast pipelines can upload. |
| 20:54:59 | | icedice2 joins |
| 20:55:00 | <@JAA> | Or rather, I'm trying to make them almost fill up by the deadline. |
| 20:57:54 | | icedice quits [Ping timeout: 252 seconds] |
| 21:03:18 | | Island joins |
| 21:10:10 | <vokunal|m> | Could something like #Y be used for this? |
| 21:11:05 | | icedice2 quits [Client Quit] |
| 21:12:01 | | Guest50 quits [Ping timeout: 252 seconds] |
| 21:14:42 | <@JAA> | vokunal|m: Eh, kind of, it's not as simple as a recursive crawl in this case. But in any case, only one person knows how to weild that magic wand, and he's been busy this week. |
| 21:15:01 | <@JAA> | wield* |
| 21:19:06 | <imer> | started on 86 |
| 21:19:10 | | Ruthalas5 quits [Ping timeout: 252 seconds] |
| 21:22:59 | <nicolas17> | ok, finally can get on the computer today |
| 21:23:46 | | vitzli (vitzli) joins |
| 21:23:50 | <nicolas17> | what should I do with the tb2b data I downloaded? |
| 21:26:01 | <@JAA> | nicolas17: By default, AB recurses through the target site and retrieves one level of offsite links (including their page requisites). That's how the worthdoingbadly.com job ran as well. |
| 21:26:57 | <nicolas17> | that should be fine then |
| 21:27:23 | <@JAA> | It definitely grabbed the dump, yeah. |
| 21:27:46 | <nicolas17> | I'm running rclone sync --dry-run to see if I actually got all files off the tb2b ftp |
| 21:29:14 | <@JAA> | The AB job for the Toshiba FTP also completed, surprisingly without crashing. (wpull's FTP code is very unstable.) |
| 21:30:04 | <nicolas17> | latency is the enemy of FTP, but with multiple parallel downloads I managed more than 100mbps |
| 21:30:35 | <@JAA> | Does rclone also check whether you have files locally that aren't on the server anymore? |
| 21:30:40 | <@JAA> | That would be interesting. |
| 21:30:59 | <nicolas17> | yes |
| 21:31:15 | <nicolas17> | "Destination is updated to match source, including deleting files if necessary (except duplicate objects, see below). If you don't want to delete files from destination, use the copy command instead." |
| 21:32:03 | <Jake> | Any additional help needed on spinrilla? |
| 21:32:04 | <nicolas17> | "duplicate objects" here means different files with the same name, which is an oddity that I think only happens on Google Drive remotes |
| 21:33:07 | <@JAA> | imer: Can Jake help with your range? I'm up to 46 and will queue the remaining 3 of the lower half shortly. |
| 21:33:21 | <nicolas17> | "rclone sync --dry-run --transfers=6", which concluded "there was nothing to transfer" (so dry-run made no difference), took almost 7 minutes to get the recursive file listing |
| 21:33:40 | <nicolas17> | directory size: 167GB |
| 21:33:52 | <@JAA> | Nice |
| 21:34:03 | <nicolas17> | searching for duplicates now |
| 21:34:16 | <@JAA> | The AB job got 166.9 GiB, so pretty damn close. :-) |
| 21:34:27 | <@JAA> | (Assuming your 'GB' are actually GiB) |
| 21:34:58 | <nicolas17> | I used "du -h", so there's rounding, filesystem block alignment, etc :) |
| 21:35:06 | <imer> | JAA: of course, just let me know which ones to skip |
| 21:35:16 | <@JAA> | Jake: ^ |
| 21:35:30 | <@JAA> | Old-school task management! :-) |
| 21:35:36 | <Jake> | Haha :) |
| 21:35:53 | <@JAA> | This is how AT projects used to be coordinated in the early days from what I've heard. The person who shouts loudest gets the task. :-P |
| 21:36:15 | <imer> | got up to 88 running currently, preferably start at the back with 99 and lower Jake |
| 21:36:22 | <Jake> | 👍 |
| 21:40:58 | <nicolas17> | 1636 duplicate files (in 596 sets), occupying 54469 MB |
| 21:41:41 | <imer> | 90 just started |
| 21:41:52 | | CaldeiraG quits [Ping timeout: 265 seconds] |
| 21:42:36 | <nicolas17> | so this should compress/deduplicate to 110GB or so |
| 21:52:16 | <imer> | JAA: how am I getting the files to you/AT btw? (once its done, certainly no rush) |
| 21:54:40 | <Jake> | (I started at 99) |
| 22:03:01 | <@JAA> | 0 through 49 is completed or running in AB and should finish in time, I think. |
| 22:07:43 | <@JAA> | Saturating all the pipes and filling the disks, so probably can't do much more, but a part or two can probably still fit in there if needed. |
| 22:08:43 | | Ruthalas5 (Ruthalas) joins |
| 22:09:10 | <imer> | up to 95 started here, last chunks of 88 seem to be slowly finishing up |
| 22:09:19 | <imer> | hows the metadata doing? |
| 22:10:40 | <@JAA> | ETA 5 hours :-/ |
| 22:11:17 | <Jake> | Any speed limit on their end for the track downloads? |
| 22:12:04 | <@JAA> | Not from what I've seen. |
| 22:13:22 | <@JAA> | So 96 through 98 still open now? |
| 22:13:53 | <imer> | got 96/97 running now |
| 22:15:34 | <Jake> | Looks like we'll get the audio done before the time limit |
| 22:15:58 | <Jake> | I assume nothing we can do to speed up metadata? |
| 22:18:38 | <@JAA> | Throughput looks much better than an hour ago, but not sure we can change anything. It's just too slow on their side, yeah. |
| 22:19:22 | | lunik173 quits [Remote host closed the connection] |
| 22:19:34 | <imer> | started 98 now |
| 22:19:46 | <@JAA> | I do have all mixtape metadata, which also includes some track metadata. The only things missing will be single tracks that aren't part of a mixtape and track comments (for the tracks that aren't covered in time). |
| 22:20:11 | | Ivan226 leaves |
| 22:20:19 | | Ivan226 joins |
| 22:20:25 | <Jake> | 🎉 |
| 22:21:19 | <@JAA> | I only need a throughput of 110 req/s to get everything, come ooon... :-) |
| 22:21:51 | <@JAA> | Instead, I get 900 per minute. :-| |
| 22:22:19 | <@JAA> | Er no, 110 items/s, not reqs. |
| 22:23:13 | <@JAA> | Last AB job is projected to finish at 23:40. |
| 22:26:10 | <Jake> | 900 per minute sounds like a weird ratelimit or something |
| 22:26:43 | <@JAA> | It was just a random slowdown at that particular minute. |
| 22:27:19 | <Jake> | Haha, alright :) |
| 22:28:28 | <@JAA> | But even at its fastest, I just managed 11k req/mn, which corresponds to something like 3.5k i/mn, so still a factor two too slow. |
| 22:28:49 | <Jake> | :( |
| 22:29:43 | | tzt quits [Ping timeout: 265 seconds] |
| 22:30:47 | | tzt (tzt) joins |
| 22:37:03 | <@JAA> | AB job ETA is still 23:40, so that should be fine. |
| 22:37:29 | | lunik173 joins |
| 22:37:48 | <@JAA> | Maybe as the load drops from the downloads, I'll get a bit better rates on the metadata, but that won't finish. |
| 22:38:13 | <@JAA> | It'll only miss on the order of a couple hundred thousand tracks (of 2.9 million), so not too bad. |
| 22:38:19 | | Guest50 joins |
| 22:40:12 | | Guest50 quits [Client Quit] |
| 22:41:37 | | Guest50 joins |
| 22:43:16 | <imer> | slowly finishing up here, so should be starting to see an improvement if there will be one |
| 22:43:20 | | Ivan226 quits [Remote host closed the connection] |
| 22:43:37 | <imer> | there's also a chance they'll shut it down a bit later than announced, right? or is the time a given since its a legal thing? |
| 22:44:05 | | Ivan226 joins |
| 22:44:40 | | Guest50 quits [Client Quit] |
| 22:45:30 | | Guest50 joins |
| 22:45:34 | <@JAA> | Always possible of course, but yeah, I assume it's a legal thing. |
| 22:45:58 | <@JAA> | They edited their homepage sometime today to add a link to a countdown to the exact second, too. |
| 22:47:22 | <imer> | yeaah, safe to assume they'll be on top of it then |
| 22:48:09 | <Jake> | damn |
| 22:51:06 | | qw3rty_ joins |
| 22:51:08 | | BearFortress_ joins |
| 22:51:32 | | dumbgoy_ joins |
| 22:51:34 | | sarge (sarge) joins |
| 22:51:41 | | atphoenix_ (atphoenix) joins |
| 22:51:47 | | imer61 joins |
| 22:51:48 | | imer1 joins |
| 22:51:49 | | imer61 quits [Remote host closed the connection] |
| 22:52:23 | | Ivan22666 joins |
| 22:53:51 | <imer1> | uh-oh getting 400 bad request now |
| 22:53:53 | | UserH quits [Ping timeout: 254 seconds] |
| 22:54:14 | <@JAA> | Yeah, happens on some tracks, appears to be normal. |
| 22:54:22 | | dumbgoy quits [Ping timeout: 265 seconds] |
| 22:54:26 | | Letur79 joins |
| 22:54:51 | | Ivan226 quits [Ping timeout: 265 seconds] |
| 22:54:51 | | imer quits [Ping timeout: 265 seconds] |
| 22:54:51 | | Letur quits [Ping timeout: 265 seconds] |
| 22:54:51 | | Emitewiki quits [Ping timeout: 265 seconds] |
| 22:55:08 | | Letur79 is now known as Letur |
| 22:55:28 | <@JAA> | I saw a couple dozen 400s at the end on each of the AB jobs. |
| 22:55:30 | | Ivan22666 is now known as Ivan226 |
| 22:55:50 | <imer1> | pretty much done here and its all i'm seeing now |
| 22:55:55 | <imer1> | will wpull retry those a few times? that'd explain that then |
| 22:56:18 | | qw3rty quits [Ping timeout: 265 seconds] |
| 22:56:18 | | BearFortress quits [Ping timeout: 265 seconds] |
| 22:56:18 | | ]SaRgE[ quits [Ping timeout: 265 seconds] |
| 22:56:18 | | atphoenix quits [Ping timeout: 265 seconds] |
| 22:57:04 | <@JAA> | Yeah, it will. I don't remember what the default is. |
| 22:58:07 | <@JAA> | 20, apparently. AB uses 3. So I guess you'll see a couple hundred of them per chunk instead. |
| 22:59:54 | <imer1> | looks like my total is 2.2 TB then, all wrapped up - weren't far off with the 5TB guess |
| 23:00:06 | | imer1 is now known as imer |
| 23:01:01 | <@JAA> | Almost exactly what I expected, my estimate for your set was 2.3 TiB. |
| 23:01:37 | <@JAA> | The 5 TB guess was for all tracks though, not only the non-covered ones. The previous AB jobs grabbed another 2.6 TiB or so. |
| 23:02:06 | | Matthww1 quits [Quit: Ping timeout (120 seconds)] |
| 23:02:20 | | Matthww1 joins |
| 23:08:49 | <Jake> | 99 is completed. |
| 23:10:22 | | lexikiq joins |
| 23:20:09 | | BlueMaxima joins |
| 23:24:56 | | nicolas17 quits [Ping timeout: 252 seconds] |
| 23:27:05 | <@JAA> | AB chunks 0-49 are done, now it just needs to upload its 1.1 TiB backlog. |
| 23:27:43 | | nicolas17 joins |
| 23:27:49 | | @JAA makes a note here: H U G E S U C C E S S |
| 23:28:45 | <@JAA> | imer, Jake: Will sort out the data transfer tomorrow. |
| 23:29:10 | | Matthww1 quits [Client Quit] |
| 23:29:26 | | Matthww1 joins |
| 23:30:29 | | vxbinaca joins |
| 23:31:14 | <imer> | sweet, I'm fine holding onto the data for a while so whenever works, I'll make sure to poke my head into irc tomorrow at some point |
| 23:32:56 | | vxbinaca leaves |
| 23:33:34 | <Jake> | Sounds good! Glad we got all the tracks! |
| 23:34:54 | <imer> | I'll be heading off for the day - see ya tomorrow |
| 23:34:58 | | imer quits [Remote host closed the connection] |
| 23:35:34 | <@JAA> | Metadata is still going more brt-t-t-t than brrr. :-( |
| 23:36:12 | <nicolas17> | interesting, that toshiba FTP got deduplicated/compressed down to 64GB |