#archiveteam-bs log for 2025-08-14

Home Search Previous day Next day

00:07:08	<BlankEclair>	they really don't have any moderation for these?
00:07:22	<BlankEclair>	sounds like a bad idea when you're soliciting user input and displaying it out to the world
00:07:43	<BlankEclair>	next up: someone publishes a leak of uk government records through blocked.org.uk
00:07:59		cuphead2527480 quits [Quit: Connection closed for inactivity]
00:30:24		DogsRNice quits [Client Quit]
02:01:18		etnguyen03 (etnguyen03) joins
02:10:51		Webuser492993 joins
02:12:30		Webuser492993 quits [Client Quit]
02:48:51		SootBector quits [Remote host closed the connection]
02:59:41		IDK (IDK) joins
03:06:31		etnguyen03 quits [Remote host closed the connection]
03:22:59		tzt quits [Ping timeout: 260 seconds]
03:23:45		lflare quits [Quit: Bye]
03:24:20		tzt (tzt) joins
03:24:31		lflare (lflare) joins
04:10:31		DogsRNice joins
04:20:12		tmg1\|michelson joins
05:01:07	<ericgallager>	I see WPlace is trending: https://bsky.app/profile/trending.bsky.app/feed/371544258
05:01:26	<ericgallager>	when I try to visit https://wplace.live/ though, it says it's down
05:01:37	<ericgallager>	anyone know if it's been archived at all?
05:03:42	<pokechu22>	We've got an archivebot job for various tiles on the site, but it has to run slowly and will take 2 weeks to finish
05:03:58	<pokechu22>	looks like that's still up, e.g. https://backend.wplace.live/files/s0/tiles/388/874.png
05:04:31	<ericgallager>	maybe it's just my browser, then...
05:05:20	<pokechu22>	the site's still up for me in firefox, yeah
05:06:48	<ericgallager>	ok so maybe it's a particular extension, uBlock Origin or something, then...
05:09:44		IDK quits [Client Quit]
05:14:03		LddPotato quits [Remote host closed the connection]
05:15:50		LddPotato (LddPotato) joins
05:16:48		IDK (IDK) joins
05:26:28		notSokar joins
05:26:34		Sokar quits [Ping timeout: 240 seconds]
05:43:16		BornOn420 (BornOn420) joins
05:48:34		NatTheCat quits [Ping timeout: 240 seconds]
05:54:33		Island quits [Read error: Connection reset by peer]
05:59:29		awauwa (awauwa) joins
06:11:50		NatTheCat (NatTheCat) joins
06:31:39		DogsRNice quits [Read error: Connection reset by peer]
06:35:46		Webuser862649 joins
06:44:08		Webuser862649 quits [Client Quit]
07:19:44		IDK quits [Client Quit]
07:44:34		APOLLO03 quits [Ping timeout: 240 seconds]
07:54:49		notSokar quits [Ping timeout: 260 seconds]
07:57:51		Sokar joins
08:15:52		nine quits [Quit: See ya!]
08:16:05		nine joins
08:16:06		nine is now authenticated as nine
08:16:06		nine quits [Changing host]
08:16:06		nine (nine) joins
08:36:14		midou quits [Ping timeout: 240 seconds]
08:40:15		hexagonwin is now authenticated as hexagonwin
08:45:59		midou joins
09:00:32		IDK (IDK) joins
09:17:40		cyanbox joins
09:20:37		APOLLO03 joins
10:02:00		igloo22225 quits [Quit: The Lounge - https://thelounge.chat]
10:02:27		igloo22225 (igloo22225) joins
10:27:29	<c3manu>	"Eastman Kodak, the 133-year-old photography company, is warning investors thats it might not survive much longer." - https://edition.cnn.com/2025/08/12/business/kodak-survival-warning
10:32:26	<cyanbox>	https://www.kodak.com/en/company/blog-post/statement-regarding-misleading-media-reports/
10:33:37	<cyanbox>	film is crazy popular rn, wouldn't make sense for them to be struggling that hard
10:36:30	<h2ibot>	Manu edited Discourse (+348, Add more active Discourses): https://wiki.archiveteam.org/?diff=56903&oldid=56863
10:45:23		FiTheArchiver joins
11:00:03		Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
11:02:48		Bleo182600722719623455222 joins
11:03:37		NotGLaDOS quits []
11:17:56		FiTheArchiver1 joins
11:21:19		FiTheArchiver quits [Ping timeout: 260 seconds]
11:31:54		rohvani quits [Ping timeout: 240 seconds]
11:52:14		FiTheArchiver1 quits [Read error: Connection reset by peer]
11:58:08		BornOn420 quits [Read error: Connection reset by peer]
11:58:38		BornOn420 (BornOn420) joins
12:02:28		Wohlstand (Wohlstand) joins
12:19:18		etnguyen03 (etnguyen03) joins
12:29:26		notSokar joins
12:31:19		Sokar quits [Ping timeout: 260 seconds]
12:32:07		etnguyen03 quits [Client Quit]
12:32:49		Barto quits [Quit: WeeChat 4.7.0]
12:40:15		notSokar quits [Client Quit]
12:40:25		Sokar joins
12:50:53		Barto (Barto) joins
13:01:43		etnguyen03 (etnguyen03) joins
13:03:49		Barto quits [Client Quit]
13:15:22		etnguyen03 quits [Client Quit]
13:27:54		nine quits [Ping timeout: 240 seconds]
13:31:04		sepro (sepro) joins
13:34:32		nine joins
13:34:32		nine is now authenticated as nine
13:34:32		nine quits [Changing host]
13:34:32		nine (nine) joins
13:51:14		midou quits [Ping timeout: 240 seconds]
14:00:30		midou joins
14:02:36		ATinySpaceMarine joins
14:08:32		ATinySpaceMarine quits [Client Quit]
14:11:08		ATinySpaceMarine joins
14:18:06		ATSM joins
14:19:26		ATSM quits [Client Quit]
14:20:59		ATinySpaceMarine quits [Ping timeout: 260 seconds]
14:30:20		shuuji3 quits [Quit: Ooops, wrong browser tab.]
14:44:24		MrMcNuggets (MrMcNuggets) joins
14:50:03		Dada joins
15:26:39	<gamer191-1\|m>	Farm Transparency Project (who I mentioned ages ago, but I don’t think anything happened because their site uses Vimeo embeds so it couldn’t be downloaded with a simple AB job) is now publicly encouraging users to archive their website (and the videos on it) https://www.instagram.com/p/DNVGMMeyc5o/
15:35:48	<gamer191-1\|m>	Context: Farm Transparency Project is an Australian website which publishes undercover footage showing (often illegal) animal cruelty and animal welfare violations at farms and slaughterhouses. The Australian federal court recently ruled that they were violating copyright laws by publishing undercover footage (because the slaughterhouse owns the copyright to undercover footage shot there)
15:40:08	<gamer191-1\|m>	I DMed them yesterday suggesting that they should create a BitTorrent of the site, and they left me on “seen”. Not sure if we should try to contact them by email requesting a copy of their site, or if we should run an AB job (I don’t have the skills to run a job involving Vimeo)
15:59:37	<@arkiver>	gamer191-1\|m: do you have their site?
15:59:48	<@arkiver>	we should definitely make a copy and put their youtube (if any) in #down-the-tube
16:03:41		BornOn420 quits [Read error: Connection reset by peer]
16:04:11		BornOn420 (BornOn420) joins
16:20:03	<Vokun>	https://www.farmtransparency.org
16:26:51	<cruller>	They have 1258 videos in their repository (https://www.farmtransparency.org/), but only 233 on their Vimeo channel (https://vimeo.com/farmtransparency) and 99 on their YouTube channel (https://vimeo.com/farmtransparency).
16:35:15	<cruller>	According to https://wiki.archiveteam.org/index.php/vimeo, downloading public videos on Vimeo requires a login. yt-dlp shows a similar message.
16:36:43	<cruller>	However, some cobalt instances generate MP4 links that don't require a login. AB should be able to grab them.
16:38:02	<cruller>	(I don't know the mechanism of link generation.)
16:42:09		pabs quits [Ping timeout: 260 seconds]
16:43:16		pabs (pabs) joins
16:48:32	<cruller>	IIRC, embedded private Vimeo videos can only be viewed from the embedding page. Browser developer tools and yt-dlp --referer https://www.farmtransparency.org/campaigns/eggs-exposed https://player.vimeo.com/video/1075508246 fetch hls, but it is unclear whether MP4 URLs exist.
16:49:53	<justauser\|m>	Yeah, encouraging mirrors without providing a good way to do this is so... humanish? bureaucracish?
16:50:34		ThreeHM quits [Ping timeout: 240 seconds]
16:52:40		ThreeHM (ThreeHeadedMonkey) joins
16:53:38	<cruller>	I don't understand why they make so many Vimeo videos "private".
17:04:51		cyanbox quits [Read error: Connection reset by peer]
17:05:42	<cruller>	<cruller> "They have 1258 videos in their..." <- https://www.youtube.com/c/farmtransparencyproject
17:14:43		MrMcNuggets quits [Client Quit]
17:24:37	<h2ibot>	HadeanEon edited Deaths in 2025 (+689, BOT - Updating page: {{saved}} (149),…): https://wiki.archiveteam.org/?diff=56904&oldid=56873
17:24:38	<h2ibot>	HadeanEon edited Deaths in 2025/list (+54, BOT - Updating list): https://wiki.archiveteam.org/?diff=56905&oldid=56874
17:34:33		ducky quits [Remote host closed the connection]
17:36:45		ducky (ducky) joins
17:38:33		Webuser278280 joins
17:42:49		Webuser278280 quits [Client Quit]
17:43:04		ducky quits [Read error: Connection reset by peer]
17:45:11		ducky (ducky) joins
17:47:42		notSokar joins
17:49:49		Sokar quits [Ping timeout: 260 seconds]
17:57:12		ducky quits [Remote host closed the connection]
17:57:45		ducky (ducky) joins
18:22:12		awauwa quits [Quit: awauwa]
18:26:04		dhinakg (dhinakg) joins
18:29:12		ducky_ (ducky) joins
18:31:14		ducky quits [Ping timeout: 260 seconds]
18:31:14		ducky_ is now known as ducky
18:52:48		Barto (Barto) joins
18:59:32		DogsRNice joins
19:00:14		TheEnbyperor_ quits [Ping timeout: 240 seconds]
19:00:24		TheEnbyperor quits [Ping timeout: 260 seconds]
19:06:51		HP_Archivist quits [Quit: Leaving]
19:08:09		IDK quits [Quit: Connection closed for inactivity]
19:22:10		ducky quits [Remote host closed the connection]
19:24:12		ducky (ducky) joins
19:36:11		TheEnbyperor (TheEnbyperor) joins
19:43:34		TheEnbyperor quits [Ping timeout: 260 seconds]
19:58:44	<Ryz>	Heya folks, I have this subdomain https://audiothek.dasdeck.com/ - that I found, and I was about to archive it weeks ago, since it's from the other subdomains under https://dasdeck.com/ - but stopped because it seemed it was getting video files or audio files from somewhere, but the video files are converted into audio files? Can anyone figure out
19:58:44	<Ryz>	where these files came from? Sadly I can't really archive this (even if assuming there's no funky JS stuff blockading from archiving), just needing my curiosity satiated~
20:03:00	<pokechu22>	clicking one I see it loads https://audiothek.dasdeck.com/?url=https://rodlzdf-a.akamaihd.net/none/zdf/22/04/220415_1720_sendung_trs/5/220415_1720_sendung_trs_a3a4_808k_p11v17.mp4&title=%5Bzdf%2014.08.2025%5D%20bali%20(s24_e03)(deu-ad)
20:03:24	<pokechu22>	and it's listed at https://mediathekviewweb.de/api/query?query=%7B%22queries%22%3A%5B%5D%2C%22sortBy%22%3A%22timestamp%22%2C%22sortOrder%22%3A%22desc%22%2C%22future%22%3Afalse%2C%22size%22%3A30%2C%22offset%22%3A0%7D
20:04:02	<pokechu22>	uh, "Results 1 to 30 of 694178." (from "Treffer 1 bis 30 von insgesamt 694178.") though, which seems probably too big?
20:05:42		ducky quits [Remote host closed the connection]
20:07:19		ducky (ducky) joins
20:08:50	<@JAA>	Sounds like it's an alternative interface to the audio parts of the German public broadcasters' Mediatheken, i.e. https://www.ardaudiothek.de/ and whatever the ZDF equivalent is.
20:08:54	<@JAA>	That's bound to be big.
20:11:26		TheEnbyperor joins
20:12:36	<Ryz>	Am a bit confused and tried poking around the files before or shortly after typing it up, and wasn't sure if this is legit or not, since I'm not strongly familiar diving into websites other than English in terms of finding goodies
20:14:52	<@JAA>	Oh yeah, it goes beyond that and takes anything from the Mediatheken and extracts just the audio track. So it's even bigger than just the specific audio releases...
20:15:17	<pokechu22>	Those URLs appeared in the browser console (F12, reload the page after opening it). We'd need to generate a list of URLs and then do an !ao < list; it's too scripty for !a
20:16:02	<pokechu22>	it also seems like the server itself is doing work when it's extracting audio - I don't think they'd be happy with us bulk-requesting that
20:16:22	<@JAA>	The Mediatheken basically contain every TV production by any of the public broadcasters in Germany. You can watch them for free for like a month (often geolocked to Germany).
20:16:41	<@JAA>	Yeah, I don't think we need to archive audio streams via a third-party service anyway.
20:17:38	<pokechu22>	!a https://elitemeetus.org/ -i blogs,badvideos -e Proactive
20:17:57	<pokechu22>	I guess I might as well generate a list of those API URLs though
20:19:22	<pokechu22>	oh, https://mediathekviewweb.de/api/query?query={%22queries%22%3A[]%2C%22sortBy%22%3A%22timestamp%22%2C%22sortOrder%22%3A%22desc%22%2C%22future%22%3Afalse%2C%22size%22%3A30%2C%22offset%22%3A3694170} doesn't work - elasticsearch only wants to expose the first 10000 results
20:21:57	<@JAA>	Yeah, the API would be reasonable.
20:22:10	<@JAA>	That's the official API, too, I think.
20:22:53	<@JAA>	Hmm, no
20:23:28		notarobot17 quits [Quit: Ping timeout (120 seconds)]
20:23:41		notarobot17 joins
20:25:45	<@JAA>	Mixing it up with something else, nevermind.
20:25:53		TheEnbyperor_ (TheEnbyperor) joins
20:29:10	<h2ibot>	Debug32 edited List of lost online videos/list (+929): https://wiki.archiveteam.org/?diff=56906&oldid=53968
20:40:58		Dada quits [Remote host closed the connection]
20:45:47		APOLLO03 quits [Quit: .]
20:47:41		cuphead2527480 (Cuphead2527480) joins
20:50:09		cuphead2527480 is now known as CuppyMan
20:55:09		notSokar quits [Quit: Leaving]
20:55:21		Sokar joins
20:58:57		APOLLO03 joins
21:21:19		dabs joins
21:41:07		abirkill quits [Quit: Let us prepare to grapple with the ineffable itself, and see if we may not eff it after all.]
21:58:12		ericgallager quits [Quit: This computer has gone to sleep]
21:59:58		dabs quits [Read error: Connection reset by peer]
22:16:52		atphoenix__ (atphoenix) joins
22:18:54		atphoenix_ quits [Ping timeout: 240 seconds]
23:12:35		ericgallager joins
23:13:56	<gamer191-1\|m>	koichi: The situation with Vimeo (I was discussing this with one of the yt-dlp developers, for unrelated reasons) is that you can no longer generate unauthenticated guest tokens because they’ve hardened their api. However, if you have a cached guest token (I have one, which I’m willing to share if needed) then you can continue using it. Also, Vimeo embeds can be downloaded without an account (subject to heavy rate-limiting), but
23:13:56	<gamer191-1\|m>	the embed url usually isn’t guessable and often requires a referrer for a website it’s embedded on (https://www.farmtransparency.org, I guess)
23:15:54	<pokechu22>	I assume a recursive AB crawl of https://www.farmtransparency.org would generate embed URLs
23:17:00	<pokechu22>	hmm, it looks like https://archive.fart.website/archivebot/viewer/job/202103290041478ojfw had videos ignored?
23:21:43	<pokechu22>	those might have been done in https://archive.fart.website/archivebot/viewer/job/20210329010802akni3 instead
23:26:30	<gamer191-1\|m>	“I assume a recursive AB crawl of https://www.farmtransparency.org would generate embed URLs”
23:26:30	<gamer191-1\|m>	Yeah, although like I said that’s very heavily rate-limited, so we’d need to switch IP addresses every few videos (once an IP address is rate-limited, it will start getting Cloudflare turnstile captchas in Vimeo embeds, and idk how long that will last)
23:26:30	<gamer191-1\|m>	Also idk if it would give us m3u8, dash or https links (I can’t check right now cause I’m on my phone) and it would require the curl_cffi dependency (I assume that’s installed on AB
23:29:35	<pokechu22>	I was more thinking we ignore the embeds in archivebot, but use the recursive crawl to generate a list of URLs like https://player.vimeo.com/video/1103632636?h=2a9984e565 from https://www.farmtransparency.org/videos?id=l5b22836ku
23:30:20	<pokechu22>	the videos themselves could be downloaded outside of archivebot
23:31:55	<gamer191-1\|m>	pokechu22: Hang on…can we just use the download button on the video pages
23:32:55	<pokechu22>	hmm, that seems to download it from vimeo actually
23:33:18	<pokechu22>	... but how does it choose which resolution to download?
23:33:36	<pokechu22>	just going to https://www.farmtransparency.org/videos?id=l5b22836ku&action=download fails
23:34:30	<gamer191-1\|m>	I guess it uses a post request (I’m on my phone right now so I can’t check)
23:35:20	<pokechu22>	ah, it's a POST to that with a CSRF token (in firefox, navigate to https://www.farmtransparency.org/videos?id=l5b22836ku, then press alt, then select file -> work offline, then click one of the download links which should open https://www.farmtransparency.org/videos?id=l5b22836ku&action=download in a new tab, then go to that tab, then press alt, then uncheck file -> work
23:35:21	<pokechu22>	offline, then press f12 for dev tools, then refresh the page, then confirm that you're willing to refresh a POST request. it should then appear in devtools.)
23:35:40		Wohlstand quits [Quit: Wohlstand]
23:38:25	<pokechu22>	hmm, the CSRF token doesn't change per pageload but does seem to be tied to a cookie of some sort
23:38:51	<pokechu22>	either way, probably best to just use an archivebot job to enumerate videos and other media without downloading it, and then do something with the videos separately afterwards
23:44:41	<pokechu22>	ugh, pagination requires loading https://www.farmtransparency.org/scripts/asset-display?p=3&asset_types=videos with the header X-Requested-With: XMLHttpRequest
23:48:34	<gamer191-1\|m>	“either way, probably best to just use an archivebot job to enumerate videos and other media without downloading it, and then do something with the videos separately afterwards” Agreed!
23:48:34	<gamer191-1\|m>	Should the AB job also enumerate their photos, documents, and campaign material? (I don’t know if any of those 3 categories are easy to archive)
23:49:22	<@JAA>	'Please mirror our stuff, but also, we'll make it as hard as we can.'
23:49:23	<@JAA>	Lovely
23:50:44	<gamer191-1\|m>	Should we contact them?
23:50:57	<pokechu22>	seems like curl 'https://www.farmtransparency.org/scripts/asset-display?p=3&asset_types=videos' -H 'X-Requested-With: XMLHttpRequest' works so I can enumerate things that way
23:51:33	<pokechu22>	I don't think a recursive AB job would find everything on its own
23:57:34		andrew quits [Ping timeout: 240 seconds]

Home Search Previous day Next day