#archiveteam-bs log for 2022-10-18

Home Search Previous day Next day

00:09:56		LegitSi quits [Remote host closed the connection]
00:39:44		pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
00:41:35		pabs (pabs) joins
00:51:39		Hackerpcs quits [Quit: Hackerpcs]
00:53:48		Hackerpcs (Hackerpcs) joins
01:02:51		yawkat quits [Ping timeout: 255 seconds]
01:03:08		gazorpazorp quits [Read error: Connection reset by peer]
01:03:14		gazorpazorp (gazorpazorp) joins
01:06:52		tzt quits [Client Quit]
01:07:06		tzt (tzt) joins
01:11:54		yawkat (yawkat) joins
02:10:32		lukash79 joins
02:13:32	<tech234a>	Regarding sweb there is a listing of some sites but I doubt it is complete http://rozcestnik.sweb.cz/
02:18:38	<tech234a>	Also here is the last capture of the help site that isn't an error message http://web.archive.org/web/20220427152748/http://napoveda-sweb.sweb.cz/
02:19:05	<tech234a>	Additionally apparently an older URL format was sweb.cz/username
02:26:44		katocala is now authenticated as katocala
03:03:08		march_happy quits [Ping timeout: 265 seconds]
03:03:16		march_happy (march_happy) joins
03:04:06		ThreeHM quits [Ping timeout: 265 seconds]
03:04:30		ThreeHM (ThreeHeadedMonkey) joins
03:09:45		march_happy quits [Ping timeout: 255 seconds]
03:10:07		march_happy (march_happy) joins
04:18:23	<tech234a>	YT ended the 4k experiment but it could reappear in the future https://www.theverge.com/2022/10/17/23410072/youtube-4k-premium-feature-test-ends
04:26:58	<h2ibot>	Wickedplayer494 edited Dota 2 (+141, Dev forums are dead): https://wiki.archiveteam.org/?diff=49096&oldid=48715
04:50:59		michaelblob_ (michaelblob) joins
04:54:36		michaelblob quits [Ping timeout: 255 seconds]
05:12:45		pabs quits [Read error: Connection reset by peer]
05:13:38		pabs (pabs) joins
05:24:45		march_happy quits [Ping timeout: 255 seconds]
05:25:03		march_happy (march_happy) joins
05:27:06	<h2ibot>	Wickedplayer494 edited SteamDB (+17, /* Vital signs */ xPaw has put some…): https://wiki.archiveteam.org/?diff=49097&oldid=28976
05:32:07	<h2ibot>	Wickedplayer494 edited Template:Navigation box (-7, Reflecting Vkontakte page move in navbox): https://wiki.archiveteam.org/?diff=49098&oldid=48892
05:34:40		DLoader quits [Client Quit]
05:36:08	<h2ibot>	Wickedplayer494 edited Heroes of Newerth (+80, Website is dead too): https://wiki.archiveteam.org/?diff=49099&oldid=48714
07:46:51		BlueMaxima quits [Client Quit]
07:59:24		michaelblob (michaelblob) joins
08:03:17		michaelblob_ quits [Ping timeout: 265 seconds]
08:13:13		DLoader joins
08:23:34		DLoader quits [Client Quit]
08:25:45		DLoader joins
09:47:18		gazorpazorp quits [Remote host closed the connection]
09:47:31		gazorpazorp (gazorpazorp) joins
10:09:55		qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
10:23:09		qwertyasdfuiopghjkl joins
10:48:35		qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
10:50:12		qwertyasdfuiopghjkl joins
11:20:45		Chris5010 (Chris5010) joins
11:24:07		mutantm0nkey quits [Remote host closed the connection]
11:24:45		mutantm0nkey (mutantmonkey) joins
11:25:20		LeGoupil joins
11:30:08		tech_exorcist (tech_exorcist) joins
11:39:53	<betamax>	Has anyone attempted to reverse-engineer Issuu? There are a few tools for downloading the images, but the original text is still searchable through some delightfully encoded .bin files that are fetched from a "layers" server
11:40:06	<betamax>	e.g: look at this document https://issuu.com/filmhouse/docs/fhmar20_online_eecbde5e3f79e9
11:40:54	<betamax>	when you go to another page in the document, it downloads a "page_<n>.bin" file
11:41:02	<betamax>	(here's an example: https://layers.isu.pub/19bf10fbc89b38330de576c9a5d332a8/200221154119/v2/page_4.bin )
11:41:38	<betamax>	and when you click the "Find Text" button, it downloads another .bin file: https://layers.isu.pub/19bf10fbc89b38330de576c9a5d332a8/200221154119/text_v0/text_info.bin
11:41:57	<betamax>	now the text is visible in those files (once decompressed) if you inspect them
11:42:08	<betamax>	but what is not visible are the positions of the text on the document
11:42:34	<betamax>	e.g: if you use the "Find Text" feature of Issuu, it highlights the exact place in the document where the text occurs
11:42:47	<betamax>	so it must be storing the positional information of the text in that .bin file too
11:43:08	<betamax>	... why couldn't they just allow PDF downloads for all their documents >:(
11:56:19		Megame (Megame) joins
12:11:09		omglolbah quits [Remote host closed the connection]
12:45:07		Megame quits [Client Quit]
12:50:23		qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
12:55:13		march_happy quits [Ping timeout: 265 seconds]
12:55:58		march_happy (march_happy) joins
13:06:04		Arcorann quits [Ping timeout: 240 seconds]
13:09:28	<tech_exorcist>	hello, which items in the https://archive.org/details/archiveteam_youtube collection contain video comments?
13:10:11	<tech_exorcist>	i just noticed i'm in #down-the-tube too, so is that question more appropriate for that channel?
13:10:14	<@arkiver>	tech_exorcist: you can check the CDX files in the items and see if any comment records are included
13:12:03	<tech_exorcist>	there are 31k items in total, and i'm trying to avoid having to check all of them (even though i can do that if necessary)
13:12:46	<tech_exorcist>	for example, "curl -L -o - https://archive.org/download/archiveteam_youtube_20210720180401_7e76ed14/youtube_20210720180401_7e76ed14.megawarc.warc.os.cdx.gz \| zcat \| grep -i comment" returns no output
13:13:16	<@arkiver>	you don't know what comment URLs look like?
13:13:24	<tech_exorcist>	not really, sorry
13:14:14	<@arkiver>	it's the URLs like https://www.youtube.com/youtubei/v1/next?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8
13:14:24	<tech_exorcist>	oh.
13:14:53	<tech_exorcist>	where is the video ID though?
13:15:04	<@arkiver>	yeah not in the URL
13:15:15	<tech_exorcist>	dammit
13:15:25	<@arkiver>	to serve pages of comments, youtube makes POST requests to that same endpoint
13:15:40	<@arkiver>	you'll have to download the actual records to check
13:15:46	<tech_exorcist>	every 50G file?
13:15:58	<@arkiver>	the comment WARC records at least
13:16:06	<@arkiver>	(can do a range request)
13:17:12	<tech_exorcist>	what's a range request? does it mean i can send a request to archive.org to scan through all warcs and see if they contain a specified string?
13:18:24	<tech_exorcist>	sorry for the dumb questions, i'm more familiar with the tinypic collection since i've looked for stuff in it a few times
13:18:44	<@arkiver>	i have some code here that get's a single records, finds the zstd dictionary as well and extracts it https://github.com/ArchiveTeam/zstd-dictionary-trainer/blob/master/trainer/urls.py
13:18:48	<@arkiver>	especially https://github.com/ArchiveTeam/zstd-dictionary-trainer/blob/master/trainer/urls.py#L18-L61
13:19:17	<@arkiver>	i don't have time now to explain range requests and zstd, etc., though
13:19:31	<tech_exorcist>	sorry
13:19:35	<@arkiver>	no worries :)
13:19:51	<tech_exorcist>	oh, a range request in the "give me bytes x to y" sense, i know what that is
13:20:08	<@arkiver>	but yeah in short - can't know from the comment URL what video, need to download the records.
13:21:04	<tech_exorcist>	got it
13:24:22	<tech_exorcist>	so: download cdx files -> find all comment urls and the warcs they're in -> look through all those warcs until the desired video id is found (if it's there=
13:24:22	<tech_exorcist>	*)
13:24:34	<tech_exorcist>	i can do that
14:45:41		LeGoupil quits [Client Quit]
14:45:52		LeGoupil joins
14:57:32		qwertyasdfuiopghjkl joins
15:23:21		fangfufu quits [Remote host closed the connection]
15:23:29		fangfufu joins
15:23:30		fangfufu is now authenticated as fangfufu
15:30:00		Stiletto quits [Ping timeout: 255 seconds]
15:56:37		qwertyasdfuiopghjkl quits [Client Quit]
15:57:00		qwertyasdfuiopghjkl joins
16:17:32		katocala quits [Remote host closed the connection]
16:27:25		Hackerpcs quits [Client Quit]
16:28:01		Hackerpcs (Hackerpcs) joins
16:40:39		march_happy quits [Ping timeout: 255 seconds]
17:16:15		miana quits [Quit: Connection closed for inactivity]
17:40:37		tech_exorcist quits [Remote host closed the connection]
17:41:48		tech_exorcist (tech_exorcist) joins
18:37:25		LegitSi joins
18:43:42		dm4v quits [Ping timeout: 265 seconds]
19:15:14		dm4v joins
19:29:16		LeGoupil quits [Ping timeout: 240 seconds]
19:30:17		tech_exorcist quits [Remote host closed the connection]
19:30:32		tech_exorcist (tech_exorcist) joins
20:02:54		Forstyhia joins
20:08:08	<Forstyhia>	Hi. I'm a random stranger here, so please pardon any faux pas etc. I make in how I present this. I was talking on Discord about something that came up with Photobucket and asked to relay it to y'all. The gist of it is that Photobucket may be erasing all unpaid images soon. For a couple months, I've been getting nag emails from them to log into an
20:08:09	<Forstyhia>	account that was supposed to have been deleted, so I finally did. When I did, I discovered that they now insist everyone with an account have a paid plan, or else the account will be deleted. It doesn't specify the timeframe, but hints heavily that it will be soon; if you select to ignore the nag to pick a paid plan, it says that's only a temporary
20:08:09	<Forstyhia>	option. It also has most images over a new limit (100? 200?) in an account blocked, where even the owner can't access them from within the account until they've deleted some of the unblocked images. I unfortunately can't provide more information, because I decided to request account deletion after seeing that. (It's no loss to the internet, it was
20:08:10	<Forstyhia>	just personal family photos.) And essentially I worry that means soon Photobucket images will be gone entirely, not just watermarked. And my friends on Discord wanted me to relay that so the archive team could be aware, just in case you aren't already.
20:15:18		DLoader quits [Ping timeout: 255 seconds]
20:15:25		DLoader joins
20:33:09		Forstyhia quits [Remote host closed the connection]
20:40:08		DLoader_ joins
20:41:38		DLoader quits [Ping timeout: 265 seconds]
20:41:39		DLoader_ is now known as DLoader
21:14:57		tech_exorcist quits [Client Quit]
21:31:48		DLoader quits [Ping timeout: 255 seconds]
21:34:09		DLoader joins
21:39:48		CounterTurns joins
21:45:07		DLoader_ joins
21:46:53	<CounterTurns>	I hope this is an okay place to ask this question, please lmk if not: I have a dead tindeck link (http://tindeck.com/listen/mynz) from a tumblr embed. I can find the listen page on wayback, but it's not clear to me how to access the archived mp3 itself. Any advice would be great
21:47:14		DLoader__ joins
21:47:16		DLoader quits [Ping timeout: 240 seconds]
21:47:25		DLoader__ is now known as DLoader
21:50:15		DLoader_ quits [Ping timeout: 255 seconds]
21:50:18		march_happy (march_happy) joins
21:51:13	<@JAA>	CounterTurns: https://web.archive.org/web/20180729183128/http://tindeck.com/dl/mynz (Click on 'direct link' to get the MP3 immediately rather than waiting for the countdown.)
21:56:14	<CounterTurns>	Oh thanks, that makes sense. I have some other links with listen pages where that method doesn't work (e.g. http://tindeck.com/listen/cggq); should I assume those mp3s aren't archived?
22:02:31	<@JAA>	https://web.archive.org/web/20180731182421/http://tindeck.com/dl/cggq works fine for me.
22:02:44	<@JAA>	That page is linked on https://web.archive.org/web/20180731182420/http://tindeck.com/listen/cggq by the way ('Download Track' on the right).
22:03:42	<@JAA>	The project was in late July and early August 2018, so that's the time range you want to check in the WBM.
22:05:11	<CounterTurns>	Ah, okay I was looking at a November 2018 date on the WBM where I was getting the dead links
22:06:00	<CounterTurns>	But you're saying if I use that time range for the project there should be a date that has the file? (assuming it was still live on tindeck at that time)
22:06:51	<@JAA>	Yeah, probably.
22:07:41	<@JAA>	Look for the archiveteam_tindeck captures in particular. You can see the collection in the calendar view when you hover over a timestamp. It appears above the calendar.
22:08:30	<@JAA>	E.g. on https://web.archive.org/web/*/http://tindeck.com/listen/cggq 'Tue, 31 Jul 2018 18:24:20 GMT (why: archiveteam, archiveteam_tindeck)'
22:08:49		lennier1 quits [Client Quit]
22:09:13		lennier1 (lennier1) joins
22:11:34	<CounterTurns>	Thanks, that's very helpful! Really appreciate you taking the time to walk me through it
22:39:00	<lennier1>	G4TV is shutting down: https://g4tv.com/blog/g4update
22:51:42		CounterTurns quits [Remote host closed the connection]
22:56:46		Arcorann (Arcorann) joins
23:05:07		BlueMaxima joins
23:35:51		Stiletto joins
23:56:47		BlueMaxima quits [Read error: Connection reset by peer]

Home Search Previous day Next day