#archiveteam-bs log for 2023-05-07

Home Search Previous day Next day

00:09:55		tbc1887 quits [Read error: Connection reset by peer]
00:19:10		BlueMaxima joins
00:32:23		sonick (sonick) joins
01:01:52		tzt quits [Remote host closed the connection]
01:02:14		tzt (tzt) joins
01:20:06		tbc1887 (tbc1887) joins
01:44:25		Arcorann (Arcorann) joins
01:46:52	<@JAA>	My Spinrilla qwarc is running. At the moment, I'm only fetching API data, covers, and pages on songs and mixtapes. Audio comes later. I'm skipping user pages for the artists because they're very slow (but am collecting them in case there's time).
01:47:11	<@JAA>	The API data includes comments.
01:54:49	<@JAA>	Hmm, might want to start the audio retrieval right away though. There's only 34 hours left.
02:14:26		superusercode joins
02:29:52		tbc1887 quits [Read error: Connection reset by peer]
02:40:58	<@JAA>	There are about 2.9M songs to download, each with two different URLs, which return the same file. Has to be in the terabytes.
02:42:07		superusercode is now authenticated as superusercode
02:45:01		superusercode quits [Client Quit]
02:45:06		superusercode (superusercode) joins
02:46:37	<h2ibot>	JustAnotherArchivist edited Deathwatch (+198, /* 2023 */ Add Pokémon TCG Online): https://wiki.archiveteam.org/?diff=49742&oldid=49722
02:47:37	<h2ibot>	0KepOnline edited Spore (+8603, Fixed URL for asset view with ATOM): https://wiki.archiveteam.org/?diff=49743&oldid=49046
03:02:57		Hae quits [Ping timeout: 265 seconds]
03:08:40		umgr036 joins
03:09:33		umgr036 quits [Remote host closed the connection]
03:09:47		umgr036 joins
03:15:50		whoami (whoami) joins
03:35:30		fullpwndotnet joins
03:37:21	<@JAA>	fullpwndotnet: What's being deleted exactly, and how can the files be found?
03:37:44		Max\|m1234 joins
03:38:39	<fullpwndotnet>	drivers. the files can be found by entering your model into the toshiba site
03:38:47	<fullpwndotnet>	ill grab the url give me a sec
03:39:06		Hae (Hae) joins
03:39:25	<@JAA>	Yeah, I searched around briefly on your first message in #archiveteam and found like a dozen different official driver sites.
03:40:41	<fullpwndotnet>	https://support.dynabook.com/support/modelHome?freeText=321482&osId=3333637 this is an example
03:40:59	<fullpwndotnet>	dynabook and toshiba share a driver site
03:41:24	<pabs>	JAA: should we !abort the Spinrilla AB? or make it ignore the audio or something?
03:43:04	<fullpwndotnet>	i have an example of the deletion
03:43:05	<@JAA>	fullpwndotnet: Yeah, about what I expected, a shitty interface that's impossible to work with. How do we discover all files they have?
03:43:26	<fullpwndotnet>	https://support.dynabook.com/support/modelHome?freeText=1200013246&osId=3333728 on this only like 3? downloads work
03:44:25	<@JAA>	pabs: Not sure, I'm currently rapidly running out of disk space because IA uploads are sad.
03:45:34	<fullpwndotnet>	im currently having a look around if i can try and pull all the computer models
03:46:16	<fullpwndotnet>	I'm just poking around to find out
03:48:02	<fullpwndotnet>	before I forget, found this weird FTP server https://uk.dynabook.com/generic/general-new-ftp-and-software-guide-sheets/
03:48:07	<fullpwndotnet>	might be useful
03:48:51	<nicolas17>	JAA: how much data you estimate?
03:48:51	<h2ibot>	Tech234a edited Running Archive Team Projects with Docker (-8, Correct Watchtower interval: five minutes->hour): https://wiki.archiveteam.org/?diff=49744&oldid=49586
03:49:32	<fullpwndotnet>	JAA for more concern the ftp server has write access
03:49:44	<nicolas17>	what
03:49:53	<fullpwndotnet>	i know.
03:49:55	<@JAA>	nicolas17: I'm struggling to upload a couple gigabytes currently, and it isn't even Spinrilla data. The Spinrilla audio should be in the terabytes, see 02:40.
03:50:30	<fullpwndotnet>	the ftp server has a fair few computers
03:50:41	<fullpwndotnet>	not many. ill keep looking arounf
03:51:04	<fullpwndotnet>	aha! found it al
03:52:13	<fullpwndotnet>	if you view source on the page
03:52:15		Hae quits [Remote host closed the connection]
03:52:38	<fullpwndotnet>	ctrl+f chuck in this
03:52:38	<fullpwndotnet>	var allProducts
03:53:10	<fullpwndotnet>	JAA hope this helps
03:53:23	<@JAA>	Ah fun
03:53:44	<fullpwndotnet>	its... alot.
03:54:09	<@JAA>	Yeah
03:54:15	<@JAA>	The FTP is running through ArchiveBot now.
03:54:46	<andrew>	WBM supports FTP?
03:54:55	<andrew>	wait, WARC supports FTP!?
03:55:17	<fullpwndotnet>	sick! and for the json models, will you take care of that?
03:55:27	<nicolas17>	I'm trying to get the size of this ftp
03:56:09	<fullpwndotnet>	annoyingly, it does a full page reload anytime you select a machine or OS
03:56:34	<@JAA>	andrew: Well, technically, no.
03:57:04	<@JAA>	It only supports HTTP/1.1, not even 1.0 or 2.
03:57:16	<nicolas17>	Deployment_Files/Archive: 16GB
03:57:21	<nicolas17>	FTP latency suuuucks
03:57:39	<fullpwndotnet>	yikes
03:58:15	<@JAA>	I bet the AB job will crash, but it will grab something at least.
03:58:29	<fullpwndotnet>	fingers crossed
03:59:08	<@JAA>	`function filterDriversUpdatesResults() { var driversUpdatesJsonArr = eval([{ ...`
03:59:11	<@JAA>	This site... lol
03:59:18	<fullpwndotnet>	oh its awful
03:59:29	<nicolas17>	okay rclone is smarter at using parallel requests
04:00:07	<fullpwndotnet>	now toshiba and dynabook confused on why there is gon be so much traffic
04:00:27	<fullpwndotnet>	want a crazy idea? https://support.dynabook.com/support/contentDetail?contentType=DL&contentId=872584&cipherKey=&sor=undefined
04:01:34	<@JAA>	I don't have time to reverse-engineer all of that right now.
04:01:53	<nicolas17>	FTP 75GB 1700 files and still counting
04:02:06	<fullpwndotnet>	JAA fair enough
04:03:41	<fullpwndotnet>	and the numbering seems very random… like newer machines (Satellite 1955 = Pentium 4 probably around 2003) here even have lower ids than older machines (200CDS = Pentium 100MHz probably around 1995)
04:06:16	<nicolas17>	done indexing
04:06:21	<nicolas17>	FTP Total usage: 166.925G, Objects: 4027
04:06:53	<fullpwndotnet>	not bad
04:07:09	<fullpwndotnet>	smaller than my tv archive
04:10:41	<fullpwndotnet>	im going to log off. weve got to our destination. thank you so much!
04:11:27		fullpwndotnet quits [Remote host closed the connection]
04:22:58		dumbgoy joins
04:28:14	<nicolas17>	with enough parallel threads, FTP downloads at 100-200Mbps
04:28:38	<nicolas17>	so far, less duplicate files than I expected
05:04:11		sec^nd quits [Ping timeout: 245 seconds]
05:05:35		BlueMaxima quits [Read error: Connection reset by peer]
05:11:19		sec^nd (second) joins
05:22:03	<nicolas17>	huh, we're doing 3Gbps, I missed that milestone :D
05:23:12	<nicolas17>	wrong channel :D
05:24:59	<nicolas17>	in more relevant news, I'll be done with the ftp in a few hours... but I'm questioning why I did it since it will take me longer to upload it anywhere than it would take for anyone else to download it from ftp
06:08:34		sec^nd quits [Remote host closed the connection]
06:08:54		sec^nd (second) joins
06:18:23		jwoglom\|m joins
06:25:22		Island quits [Read error: Connection reset by peer]
07:08:09		superkuh quits [Remote host closed the connection]
07:08:09		AnotherIki quits [Remote host closed the connection]
07:08:19		superkuh joins
07:08:21		AnotherIki joins
07:08:40		hitgrr8 joins
07:37:54	<@JAA>	So for Spinrilla, the two audio URLs for each song are https://api.spinrilla.com/tracks/2874608/original.mp3 (stream) and https://api.spinrilla.com/tracks/2874608/download for each track. IDs go up to 2875785 as of a couple hours ago. I'd estimate the total size as very roughly 5 TB (assuming dedupe between the two URLs). The streams come from Cloudfront with expiring URLs. I didn't check for rate
07:38:00	<@JAA>	limits on these, but I haven't seen any limitations on other URLs; currently pulling the other data at 100+ req/s from a single IP.
07:39:10	<@JAA>	qwarc is almost done with the mixtapes, then it'll start with songs. The latter are more IDs but fewer requests. Should easily finish in time, I think.
07:40:42	<@JAA>	Deadline is 2023-05-08 00:00 UTC.
07:41:00	<@JAA>	I basically won't be around between now and then, so I can't do anything about the audio.
07:41:33		tsblock (tsblock) joins
07:46:08	<@JAA>	The AB job is grabbing some of it, but it obviously won't get anywhere near completion. There's already a good amount of original.mp3 in the WBM from a previous AB job a couple years ago. Prioritising the non-covered IDs original.mp3 would probably be a good idea.
07:51:22		Billy549 quits [Quit: Goodbye!~]
07:53:41	<h2ibot>	JustAnotherArchivist edited Deathwatch (+224, /* 2023 */ Add Spinrilla): https://wiki.archiveteam.org/?diff=49745&oldid=49742
07:54:41		Billy549 (Billy549) joins
07:58:03		lexikiq quits [Client Quit]
07:59:03	<@JAA>	Song retrieval has started; not sure it will finish 'easily' in time, but it should just about work out assuming no problems occur.
07:59:31	<@JAA>	(Song metadata/comments retrieval, just to be clear.)
08:03:27		Billy549 quits [Client Quit]
08:03:45		Billy549 (Billy549) joins
08:16:44		Billy549 leaves
08:17:15		Billy549 (Billy549) joins
08:24:47	<h2ibot>	Wickedplayer494 uploaded File:Gfycat - 5-7-23.png: https://wiki.archiveteam.org/?title=File%3AGfycat%20-%205-7-23.png
08:25:47	<h2ibot>	Wickedplayer494 edited Gfycat (+49, Image and navbox): https://wiki.archiveteam.org/?diff=49747&oldid=47898
08:26:47	<h2ibot>	Wickedplayer494 edited Enjin (+20, Navbox): https://wiki.archiveteam.org/?diff=49748&oldid=49734
08:28:47	<h2ibot>	Wickedplayer494 edited Docker Hub (+20, Navbox): https://wiki.archiveteam.org/?diff=49749&oldid=49587
08:31:48	<h2ibot>	Wickedplayer494 edited Pixiv (+20, Navbox): https://wiki.archiveteam.org/?diff=49750&oldid=49239
09:05:44		tsblock quits [Client Quit]
09:21:07		Justin[home] joins
09:21:07		Justin[home] is now authenticated as DopefishJustin
09:22:31		DopefishJustin quits [Ping timeout: 252 seconds]
10:04:54		Ruthalas5 quits [Ping timeout: 265 seconds]
10:05:30		Ruthalas5 (Ruthalas) joins
10:17:38		qwertyasdfuiopghjkl quits [Remote host closed the connection]
10:43:05		umgr036 quits [Ping timeout: 265 seconds]
10:46:43		sec^nd quits [Remote host closed the connection]
10:49:13		sec^nd (second) joins
11:00:25		Ruthalas5 quits [Ping timeout: 252 seconds]
11:28:50		za3k quits [Ping timeout: 252 seconds]
12:31:14		imer joins
12:56:11		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
13:36:48		Letur74 joins
13:37:18		Letur74 leaves
13:37:45		Letur joins
13:52:33		Arcorann quits [Ping timeout: 265 seconds]
15:01:30		zhongfu quits [Ping timeout: 252 seconds]
15:02:11		zhongfu (zhongfu) joins
15:05:03		qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
15:56:19		zhongfu quits [Ping timeout: 252 seconds]
16:10:33		sonick quits [Client Quit]
16:11:14		zhongfu (zhongfu) joins
16:21:00	<@JAA>	Since nobody took the bait, I'll briefly try to construct some original.mp3 URL lists to run through AB.
16:21:09	<@JAA>	Will at least get us something.
16:27:15	<imer>	JAA: I've got bandwidth/storage to grab some stuff, just at a loss of how to best go about this. Tried "dumb" wget, but that seemed way too slow
16:28:37	<imer>	attempted to use with your qwarc thing as well, but couldnt get that to run
16:29:59	<imer>	I assume you'd want warc archives? otherwise I could probably just grab the files no problem, new to all this though
16:30:40	<pokechu22>	Anything's better than nothing, but warc is ideal
16:31:09	<@JAA>	imer: Yeah, qwarc is definitely not user-friendly, especially given the tremendous amount of documentation. Not surprised you couldn't get it to work.
16:31:17	<@JAA>	I want to improve that, but time...
16:32:06	<@JAA>	wget unfortunately produces broken WARCs, but you'd definitely have to parallelise heavily.
16:32:14	<imer>	my strategy would be to just brute force download all the links by id and grab those, not sure about what tool to use for the warc output though
16:32:42	<@JAA>	The good news is that our 2020 AB grab already archived 925k tracks.
16:32:54	<@JAA>	The current run probably did some on top of that, too.
16:33:48	<@JAA>	Same strategy I use almost always, yeah.
16:34:23	<@JAA>	qwarc could probably do it, but it's not perfect for large files.
16:34:44	<@JAA>	You could try wpull.
16:34:53	<@JAA>	Even concurrency should be fine on this one.
16:35:08	<imer>	Will check that out, thanks
16:35:38	<@JAA>	Quick maths: we need very roughly 320 MB/s to get it all in time.
16:36:05	<@JAA>	(2.9 million songs, 3 MB per song, 7.5 hours remaining)
16:40:12	<@JAA>	Sampling suggests about 37.5% of the IDs don't exist, so that lowers the requirements accordingly.
16:42:39	<@JAA>	Looks like my metadata retrieval is just marginally too slow. ETA is something like 00:30 currently.
16:46:58	<imer>	JAA: "wpull only supports Python 3.4 to 3.6" looks like that still applies?
16:47:07	<@JAA>	Yes, it does, unfortunately.
16:47:15	<@JAA>	Lots of our software needs some love. :-/
16:47:30	<@JAA>	(Time...)
16:48:03	<@JAA>	The current AB job has grabbed about 200k tracks so far.
16:49:35	<@JAA>	Roughly 33k of them haven't been grabbed before.
16:51:06	<@JAA>	These track IDs have been archived by either the 2020 job or the running one as of a few minutes ago: https://transfer.archivete.am/s8Fbq/spinrilla-track-audio-cdn-archivebot
16:52:48	<@JAA>	About half of the remaining 1.92M do not exist. So that's somewhat promising.
16:56:34	<@JAA>	imer: Generating/uploading lists of remaining URLs in random order right now.
16:56:43	<@JAA>	So we can avoid duplicating effort.
16:57:43	<imer>	sweet, still struggling to get wpull setup, best way of getting an old python version set up isn't obvious to me (tried to fix some errors for newer python but got stuck on sqlalchemy complaining)
16:58:35	<@JAA>	pyenv is my favourite way, else 'official' Docker images exist.
16:59:02	<@JAA>	All of ArchiveBot runs on pyenv these days.
17:00:18	<@JAA>	Here are all the lists, each with ~19k URLs: https://transfer.archivete.am/Ay4VU/spinrilla-track-audio-lists (simply removing the .zst extension from the URL returns the decompressed lists)
17:00:24	<@JAA>	I'll start feeding them into AB from the top.
17:11:23	<imer>	pyenv seems to have worked (with some massaging), so, basically I run "wpull -i url-list.txt --warc-file some-filename"?
17:12:57	<@JAA>	Oh right, I just remembered... You have wpull 2.0.3 I guess. Concurrency via CLI is broken there. :-\|
17:14:19	<imer>	what version do I want?
17:14:58	<@JAA>	`wpull -i list --warc-file fileprefix --warc-max-size $((510241024*1024)) --delete-after -o fileprefix.log` should be it I think.
17:15:34	<@JAA>	Well, version 1.2.3 would fix the concurrency problem but has other issues. If you have the resources, it's probably easier to scale horizontally instead.
17:21:59	<myself>	is there a shard-and-gather infrastructure short of a full warrior job, for quick one-offs where you just need to hand out a crapton of wpull jobs in a hurry? or is there a skeleton warrior job that could take such assignments with minimal customization?
17:22:49		spirit joins
17:26:38	<imer>	JAA: right, i'm running 70-99 in parallel, giving me ~500mbit/s on average. looks like I should be able to run some more cpu/memory wise
17:31:31	<imer>	50-99 running now, website seems to be happy still, > 1gbit peaks depending on 404s from the looks of it
17:34:31	<@JAA>	imer: Ack, 0-10 are running through AB and approaching the half-way point, I'll throw in more as they finish.
17:38:07	<@JAA>	myself: No, though we can feed lists through #// (potentially slowly) if we don't care about feedback about success.
17:58:11	<imer>	rough sampling from logs I'm about 1k items done in each list, seems a bit too slow if thats the pace 19k/per list, about 1k per half hour, so 8 more hours which is 2 too late
17:58:38	<imer>	9 more hours* math is hard
18:01:21	<imer>	might abort and chop each list into more smaller parts so I can run more in parallel?
18:05:55	<@JAA>	Yeah, that sounds like a good idea.
18:23:03		Ruthalas5 (Ruthalas) joins
18:46:03		spirit quits [Client Quit]
18:52:58	<imer>	rough calculation says should be done with 50-99 in 5 hours, so just
18:53:25		Guest50 quits [Client Quit]
18:53:53		Guest50 joins
18:55:18		Justin[home] is now known as DopefishJustin
18:56:27	<imer>	managing around 3.2k items/s with cpu pegged (just from peeking at logs, sampling random lines and checking where in the lists the ids are)
18:56:43	<imer>	a minute* not /s
18:59:06	<imer>	for reference, going 50 and up, so if 50 under is done can do 99 descending
19:07:01	<@JAA>	Up to 22 is done or running through AB now.
19:32:16	<imer>	Started 66 just now
19:33:03		Craigle quits [Quit: The Lounge - https://thelounge.chat]
19:33:31		Craigle (Craigle) joins
20:04:44	<@JAA>	30 in AB
20:10:19	<vokunal\|m>	I'd like to make a list of local business sites and have them sent through archivebot. Is this allowed? And what would be the least annoying way for me to ask this?
20:17:59	<imer>	started on 75
20:19:37	<@JAA>	vokunal\|m: Not just allowed, encouraged. :-) Uploading a list to https://tranfer.archivete.am/ and then asking in #archivebot or here (if it gets drowned over there) about it is the easiest route. How many sites are you envisioning?
20:23:33	<vokunal\|m>	I have around 50 sites right now. Could be in the range of 100-200 if I keep digging. I got all the ones I know of off hand, and a bit of browsing google maps snagging the ones I know are small businesses
20:32:33		Ivan226 leaves
20:32:38		Ivan226 joins
20:32:41	<Ivan226>	someone get these for me thanks https://transfer.archivete.am/oG6b6/hsrwiki-alllinks.txt https://transfer.archivete.am/14Pod3/hsrwiki-newfiles-w430.txt
20:37:51	<@JAA>	vokunal\|m: Yeah, that sounds fine. Most are probably tiny sites anyway.
20:38:16	<@JAA>	imer: 41 is running.
20:39:09		hitgrr8 quits [Client Quit]
20:40:26	<@JAA>	All of this has massively slowed down my metadata grab. It probably won't finish in time.
20:40:49	<@JAA>	Completing under 1000 tracks per minute now, it was over 3k previously.
20:44:47	<Ryz>	I'm trying to push for more slots for JAA
20:45:55	<Ryz>	How many hours left?
20:45:56	<Ryz>	JAA?
20:46:09	<imer>	80 here
20:46:16	<imer>	Ryz: bit over 3
20:54:40	<@JAA>	Ryz: Slot freeing isn't needed, it's mostly limited by how fast pipelines can upload.
20:54:59		icedice2 joins
20:55:00	<@JAA>	Or rather, I'm trying to make them almost fill up by the deadline.
20:57:54		icedice quits [Ping timeout: 252 seconds]
21:03:18		Island joins
21:10:10	<vokunal\|m>	Could something like #Y be used for this?
21:11:05		icedice2 quits [Client Quit]
21:12:01		Guest50 quits [Ping timeout: 252 seconds]
21:14:42	<@JAA>	vokunal\|m: Eh, kind of, it's not as simple as a recursive crawl in this case. But in any case, only one person knows how to weild that magic wand, and he's been busy this week.
21:15:01	<@JAA>	wield*
21:19:06	<imer>	started on 86
21:19:10		Ruthalas5 quits [Ping timeout: 252 seconds]
21:22:59	<nicolas17>	ok, finally can get on the computer today
21:23:46		vitzli (vitzli) joins
21:23:50	<nicolas17>	what should I do with the tb2b data I downloaded?
21:26:01	<@JAA>	nicolas17: By default, AB recurses through the target site and retrieves one level of offsite links (including their page requisites). That's how the worthdoingbadly.com job ran as well.
21:26:57	<nicolas17>	that should be fine then
21:27:23	<@JAA>	It definitely grabbed the dump, yeah.
21:27:46	<nicolas17>	I'm running rclone sync --dry-run to see if I actually got all files off the tb2b ftp
21:29:14	<@JAA>	The AB job for the Toshiba FTP also completed, surprisingly without crashing. (wpull's FTP code is very unstable.)
21:30:04	<nicolas17>	latency is the enemy of FTP, but with multiple parallel downloads I managed more than 100mbps
21:30:35	<@JAA>	Does rclone also check whether you have files locally that aren't on the server anymore?
21:30:40	<@JAA>	That would be interesting.
21:30:59	<nicolas17>	yes
21:31:15	<nicolas17>	"Destination is updated to match source, including deleting files if necessary (except duplicate objects, see below). If you don't want to delete files from destination, use the copy command instead."
21:32:03	<Jake>	Any additional help needed on spinrilla?
21:32:04	<nicolas17>	"duplicate objects" here means different files with the same name, which is an oddity that I think only happens on Google Drive remotes
21:33:07	<@JAA>	imer: Can Jake help with your range? I'm up to 46 and will queue the remaining 3 of the lower half shortly.
21:33:21	<nicolas17>	"rclone sync --dry-run --transfers=6", which concluded "there was nothing to transfer" (so dry-run made no difference), took almost 7 minutes to get the recursive file listing
21:33:40	<nicolas17>	directory size: 167GB
21:33:52	<@JAA>	Nice
21:34:03	<nicolas17>	searching for duplicates now
21:34:16	<@JAA>	The AB job got 166.9 GiB, so pretty damn close. :-)
21:34:27	<@JAA>	(Assuming your 'GB' are actually GiB)
21:34:58	<nicolas17>	I used "du -h", so there's rounding, filesystem block alignment, etc :)
21:35:06	<imer>	JAA: of course, just let me know which ones to skip
21:35:16	<@JAA>	Jake: ^
21:35:30	<@JAA>	Old-school task management! :-)
21:35:36	<Jake>	Haha :)
21:35:53	<@JAA>	This is how AT projects used to be coordinated in the early days from what I've heard. The person who shouts loudest gets the task. :-P
21:36:15	<imer>	got up to 88 running currently, preferably start at the back with 99 and lower Jake
21:36:22	<Jake>	👍
21:40:58	<nicolas17>	1636 duplicate files (in 596 sets), occupying 54469 MB
21:41:41	<imer>	90 just started
21:41:52		CaldeiraG quits [Ping timeout: 265 seconds]
21:42:36	<nicolas17>	so this should compress/deduplicate to 110GB or so
21:52:16	<imer>	JAA: how am I getting the files to you/AT btw? (once its done, certainly no rush)
21:54:40	<Jake>	(I started at 99)
22:03:01	<@JAA>	0 through 49 is completed or running in AB and should finish in time, I think.
22:07:43	<@JAA>	Saturating all the pipes and filling the disks, so probably can't do much more, but a part or two can probably still fit in there if needed.
22:08:43		Ruthalas5 (Ruthalas) joins
22:09:10	<imer>	up to 95 started here, last chunks of 88 seem to be slowly finishing up
22:09:19	<imer>	hows the metadata doing?
22:10:40	<@JAA>	ETA 5 hours :-/
22:11:17	<Jake>	Any speed limit on their end for the track downloads?
22:12:04	<@JAA>	Not from what I've seen.
22:13:22	<@JAA>	So 96 through 98 still open now?
22:13:53	<imer>	got 96/97 running now
22:15:34	<Jake>	Looks like we'll get the audio done before the time limit
22:15:58	<Jake>	I assume nothing we can do to speed up metadata?
22:18:38	<@JAA>	Throughput looks much better than an hour ago, but not sure we can change anything. It's just too slow on their side, yeah.
22:19:22		lunik173 quits [Remote host closed the connection]
22:19:34	<imer>	started 98 now
22:19:46	<@JAA>	I do have all mixtape metadata, which also includes some track metadata. The only things missing will be single tracks that aren't part of a mixtape and track comments (for the tracks that aren't covered in time).
22:20:11		Ivan226 leaves
22:20:19		Ivan226 joins
22:20:25	<Jake>	🎉
22:21:19	<@JAA>	I only need a throughput of 110 req/s to get everything, come ooon... :-)
22:21:51	<@JAA>	Instead, I get 900 per minute. :-\|
22:22:19	<@JAA>	Er no, 110 items/s, not reqs.
22:23:13	<@JAA>	Last AB job is projected to finish at 23:40.
22:26:10	<Jake>	900 per minute sounds like a weird ratelimit or something
22:26:43	<@JAA>	It was just a random slowdown at that particular minute.
22:27:19	<Jake>	Haha, alright :)
22:28:28	<@JAA>	But even at its fastest, I just managed 11k req/mn, which corresponds to something like 3.5k i/mn, so still a factor two too slow.
22:28:49	<Jake>	:(
22:29:43		tzt quits [Ping timeout: 265 seconds]
22:30:47		tzt (tzt) joins
22:37:03	<@JAA>	AB job ETA is still 23:40, so that should be fine.
22:37:29		lunik173 joins
22:37:48	<@JAA>	Maybe as the load drops from the downloads, I'll get a bit better rates on the metadata, but that won't finish.
22:38:13	<@JAA>	It'll only miss on the order of a couple hundred thousand tracks (of 2.9 million), so not too bad.
22:38:19		Guest50 joins
22:40:12		Guest50 quits [Client Quit]
22:41:37		Guest50 joins
22:43:16	<imer>	slowly finishing up here, so should be starting to see an improvement if there will be one
22:43:20		Ivan226 quits [Remote host closed the connection]
22:43:37	<imer>	there's also a chance they'll shut it down a bit later than announced, right? or is the time a given since its a legal thing?
22:44:05		Ivan226 joins
22:44:40		Guest50 quits [Client Quit]
22:45:30		Guest50 joins
22:45:34	<@JAA>	Always possible of course, but yeah, I assume it's a legal thing.
22:45:58	<@JAA>	They edited their homepage sometime today to add a link to a countdown to the exact second, too.
22:47:22	<imer>	yeaah, safe to assume they'll be on top of it then
22:48:09	<Jake>	damn
22:51:06		qw3rty_ joins
22:51:08		BearFortress_ joins
22:51:32		dumbgoy_ joins
22:51:34		sarge (sarge) joins
22:51:41		atphoenix_ (atphoenix) joins
22:51:47		imer61 joins
22:51:48		imer1 joins
22:51:49		imer61 quits [Remote host closed the connection]
22:52:23		Ivan22666 joins
22:53:51	<imer1>	uh-oh getting 400 bad request now
22:53:53		UserH quits [Ping timeout: 254 seconds]
22:54:14	<@JAA>	Yeah, happens on some tracks, appears to be normal.
22:54:22		dumbgoy quits [Ping timeout: 265 seconds]
22:54:26		Letur79 joins
22:54:51		Ivan226 quits [Ping timeout: 265 seconds]
22:54:51		imer quits [Ping timeout: 265 seconds]
22:54:51		Letur quits [Ping timeout: 265 seconds]
22:54:51		Emitewiki quits [Ping timeout: 265 seconds]
22:55:08		Letur79 is now known as Letur
22:55:28	<@JAA>	I saw a couple dozen 400s at the end on each of the AB jobs.
22:55:30		Ivan22666 is now known as Ivan226
22:55:50	<imer1>	pretty much done here and its all i'm seeing now
22:55:55	<imer1>	will wpull retry those a few times? that'd explain that then
22:56:18		qw3rty quits [Ping timeout: 265 seconds]
22:56:18		BearFortress quits [Ping timeout: 265 seconds]
22:56:18		]SaRgE[ quits [Ping timeout: 265 seconds]
22:56:18		atphoenix quits [Ping timeout: 265 seconds]
22:57:04	<@JAA>	Yeah, it will. I don't remember what the default is.
22:58:07	<@JAA>	20, apparently. AB uses 3. So I guess you'll see a couple hundred of them per chunk instead.
22:59:54	<imer1>	looks like my total is 2.2 TB then, all wrapped up - weren't far off with the 5TB guess
23:00:06		imer1 is now known as imer
23:01:01	<@JAA>	Almost exactly what I expected, my estimate for your set was 2.3 TiB.
23:01:37	<@JAA>	The 5 TB guess was for all tracks though, not only the non-covered ones. The previous AB jobs grabbed another 2.6 TiB or so.
23:02:06		Matthww1 quits [Quit: Ping timeout (120 seconds)]
23:02:20		Matthww1 joins
23:08:49	<Jake>	99 is completed.
23:10:22		lexikiq joins
23:20:09		BlueMaxima joins
23:24:56		nicolas17 quits [Ping timeout: 252 seconds]
23:27:05	<@JAA>	AB chunks 0-49 are done, now it just needs to upload its 1.1 TiB backlog.
23:27:43		nicolas17 joins
23:27:49		@JAA makes a note here: H U G E S U C C E S S
23:28:45	<@JAA>	imer, Jake: Will sort out the data transfer tomorrow.
23:29:10		Matthww1 quits [Client Quit]
23:29:26		Matthww1 joins
23:30:29		vxbinaca joins
23:31:14	<imer>	sweet, I'm fine holding onto the data for a while so whenever works, I'll make sure to poke my head into irc tomorrow at some point
23:32:56		vxbinaca leaves
23:33:34	<Jake>	Sounds good! Glad we got all the tracks!
23:34:54	<imer>	I'll be heading off for the day - see ya tomorrow
23:34:58		imer quits [Remote host closed the connection]
23:35:34	<@JAA>	Metadata is still going more brt-t-t-t than brrr. :-(
23:36:12	<nicolas17>	interesting, that toshiba FTP got deduplicated/compressed down to 64GB

Home Search Previous day Next day