#archiveteam-bs log for 2023-11-12

Home Search Previous day Next day

00:02:44	<wyatt8740>	Well, here I am. :p
00:02:50	<wyatt8740>	https://jp.mercari.com/item/m40357548342 is the page i'm trying to grab
00:03:27	<wyatt8740>	It's a react.js page
00:03:55	<@JAA>	wyatt8740: ArchiveBot doesn't know JS at all. wpull tries to find URLs in <script> blocks, but that's very unreliable. Anything beyond that simply won't be covered by it.
00:04:16	<wyatt8740>	alright. so since it's react, probably a non-starter. Had a bad feeling that'd be the case.
00:04:34	<wyatt8740>	(I love the modern web.)
00:05:30	<@JAA>	WARC is capable of many things, but far from everything. HTTP/2 and HTTP/3 are right out. WebSockets, too. If you can achieve the page with HTTP/1.1 requests only, then WARC would work.
00:05:51	<wyatt8740>	any suggestions for how best to do this, then? Just have a screenshot in my article, links to self-hosted copies of images, and let wayback machine/archivebot crawl my site?
00:06:02	<@JAA>	Playback is a whole different topic. POST requests in particular are hard, as is anything involving random variables (JSONP, timestamps as cache busters, etc.).
00:06:07	<wyatt8740>	my blog should be fine for that :)
00:06:08	<wyatt8740>	https://wyatt8740.gitlab.io/site/blog/011_012.html#pc9801-1
00:06:55	<thuban>	and here's a list of 190 blogs extracted from other sources i had lying around (deduped from previous): https://transfer.archivete.am/2urPt/blogspot_blogs_2.txt
00:06:55	<wyatt8740>	yeah i'd looked into WARC format before and quickly got confused because I didn't grok that it was actually transcribing the full HTTP transaction
00:06:58	<thuban>	(these aren't really filtered for significance--a lot of them are from when we were trawling for zippyshare links--but if we end up doing horizontal discovery, more seeds don't hurt, right?)
00:07:37	<wyatt8740>	*transactions
00:08:08	<@JAA>	Getting playback right in the general case is virtually impossible. There are too many things that can influence what exactly is displayed etc., e.g. screen size, browser version, datetime, time zone, you name it.
00:08:35	<wyatt8740>	i do like what i imagine the archive.is approach is as a supplement to WARC
00:08:43	<@JAA>	The WBM does a lot of tricks and manages to work around some of those, but ... yeah.
00:08:55	<pabs>	I think SPN2 is the main thing that does JavaScript. either submit to the form on web.archive.org/save/ or send a mail with links to savepagenow@archive.org
00:08:57	<@JAA>	What you can do is use SPN as a logged-in user and make it also create a screenshot.
00:09:21	<wyatt8740>	yeah i did the /save/ thing and got the near-empty file
00:09:22	<@JAA>	Probably the data itself is captured, but the playback doesn't work. SPN uses a browser under the hood.
00:09:28	<pabs>	ah, you're talking about saving the DOM to HTML?
00:09:44	<wyatt8740>	that would be one method, sure.
00:09:59	<pabs>	archive.is is the only thing I know of that does the DOM2HTML thing
00:10:10	<wyatt8740>	i mostly want things i link in my blog externally to be archived/findable in future
00:10:14	<pabs>	for having a public archive at least
00:10:16	<wyatt8740>	since it drives me nuts when people don't
00:10:51	<@JAA>	I don't see a screenshot on the WBM for https://jp.mercari.com/item/m40357548342
00:11:13	<wyatt8740>	nothing at https://web.archive.org/web/20231111234533id_/https://jp.mercari.com/item/m40357548342 ?
00:11:19	<@JAA>	Screenshot, not snapshot
00:11:22	<wyatt8740>	ahh ok
00:11:34	<@JAA>	There's an option for logged-in users on SPN to also capture a screenshot.
00:11:48	<@JAA>	That happens from the browser doing the archival, so it should be reasonably usable.
00:12:00	<wyatt8740>	i guess i was using a diff. browser than normal
00:12:06	<wyatt8740>	let me go back to the one i'm logged in on :\
00:12:46	<@JAA>	Can't be resaved currently due to the cooldown timer, but should work again in half an hour or so (not sure what the current limit is).
00:13:00	<wyatt8740>	yeah, discovered that.
00:13:33	<@JAA>	It'll still only be a screenshot, so no Ctrl+F, no copying, etc.
00:13:44	<wyatt8740>	yeah
00:13:47	<wyatt8740>	better than nothing
00:14:11	<@JAA>	DOM dump as a static page would be nice. Then again, that method also has its limitations, as can frequently be seen on archive.$tldoftheday.
00:14:31	<@JAA>	Anything requiring scripting on the page, e.g. expanding sections or whatever, won't work.
00:14:33	<wyatt8740>	https://archive.is/HUTy2
00:14:49	<wyatt8740>	thankfully the side image thumbnails seem to be the full size images shrunk in CSS/JS
00:14:58	<wyatt8740>	so they're actually saved
00:15:50	<@JAA>	They're in the WBM, too, e.g. https://web.archive.org/web/20231111234535/https://static.mercdn.net/item/detail/orig/photos/m40357548342_5.jpg?1695524270
00:16:03	<wyatt8740>	hmm. well, that's good at least.
00:16:43	<@JAA>	As I said, probably captured alright, just doesn't play back, which makes it fairly useless currently. :-/
00:17:03	<wyatt8740>	The state of modern web dev; I love it.
00:17:42	<@JAA>	Aye
00:18:03	<@JAA>	And it'll only get worse. Hooray.
00:18:10	<wyatt8740>	I love the future.
00:18:20	<wyatt8740>	And I especially love facebook
00:18:20	<@JAA>	I've seen a site before that did all content loading with a WebSocket.
00:18:39	<wyatt8740>	That's... like doing rtmp grabs in a SWF or something, as far as archival is concerned
00:18:59	<wyatt8740>	actually what you describe reminds me a lot of flash-based sites
00:19:09	<@JAA>	Yeah, pretty much.
00:22:51	<h2ibot>	Switchnode edited Deathwatch (+4, /* 2023 */ fix syntax): https://wiki.archiveteam.org/?diff=51131&oldid=51126
00:23:13	<@JAA>	Whoops, thanks.
00:31:15	<pabs>	/cc arkiver re having SPN2 get an option to save the DOM to HTML, similar to how it has the screenshot thing
00:40:11		ScenarioPlanet quits [Ping timeout: 272 seconds]
01:06:47		katocala quits [Ping timeout: 272 seconds]
01:07:35		katocala joins
01:13:01		katocala is now authenticated as katocala
01:34:34	<tomodachi94>	@Pedrosso:hackint.org @JAA:hackint.org thank you for grabbing Fextralife's wikis, I appreciate it! ❤️
01:35:13	<Pedrosso>	<3
01:36:38	<Pedrosso>	You were right about it being a gold-mine. So satisfying.
01:37:06		useretail_ joins
01:40:21		useretail__ quits [Ping timeout: 272 seconds]
01:44:08		useretail__ joins
01:46:45		useretail_ quits [Ping timeout: 265 seconds]
01:56:11		lennier2 quits [Ping timeout: 272 seconds]
01:58:40		lennier2 joins
02:21:06		BearFortress joins
02:26:45	<Pedrosso>	I know of a website https://svtplay.se (videos cannot be archived, most of it is locked behind a region specific wall too) that is often the only source to a specific media and they're often deleted on grounds of copyright or other rights. There are -dl scripts for it. I am concerned about archival though. It definitely needs archival since
02:26:46	<Pedrosso>	otherwise a lot of media is continually lost, however I cannot hold it and it's clear it cannot just be submitted publically. What would be adviced here?
02:27:02	<Pedrosso>	(videos cannot be archived via save-page or a web save afaik)*
02:28:09	<Flashfire42>	Maybe tubeup but use it VERY SPARINGLY because there is a lot of garbage people upload using it and it can cause a lot of space usage for IA
02:28:32	<Pedrosso>	tubeup?
02:29:06	<Pedrosso>	I'm not entirely sure if you understand what I'm asking about
02:30:43	<Pedrosso>	to reclarify, there are -dl scripts to get the videos. ( https://github.com/spaam/svtplay-dl ). My problem is more legal and ethical
02:33:11	<Pedrosso>	It's a general question but if specifics are required, it's about storage.
02:57:25	<h2ibot>	Tech234a edited YouTube (+302, /* Stories */ Discontinued): https://wiki.archiveteam.org/?diff=51132&oldid=50877
03:01:26	<h2ibot>	Tech234a edited YouTube (+15, /* Playlist notes (October 2020) */ Add…): https://wiki.archiveteam.org/?diff=51133&oldid=51132
03:43:23	<Pedrosso>	(I feel locked-out from asking any other questions by having this one here lol) Is there no like, go-to process in situations like this?
03:44:50	<pokechu22>	I would say in practice we usually lean towards archiving something if it's useful to have - but it also does depend on the total size
03:46:17	<Pedrosso>	The point of my lemma is that since when items are removed it's because the rights run out, it's innately and obviously an item not using Creative Commons
03:51:23	<Pedrosso>	For context, videos are up for free and not all videos are deleted
03:53:26	<Pedrosso>	and with "videos" I mean movies/films, series, news, documentaries, tv channels, etc. Which I believe is in what's counted as useful to have
04:02:21		lennier2_ joins
04:04:35		Island_ joins
04:04:40	<@JAA>	Pedrosso: Just so I understand what we're talking about: this is a legitimate site, right? Based on the name, I assume it's the TV broadcaster's digital platform, where they make their and licensed content available for a limited time?
04:04:58		Pedrosso47 joins
04:05:28		lennier2 quits [Ping timeout: 265 seconds]
04:05:32	<@JAA>	I'll assume that you missed that message.
04:05:37	<@JAA>	Pedrosso47: Just so I understand what we're talking about: this is a legitimate site, right? Based on the name, I assume it's the TV broadcaster's digital platform, where they make their and licensed content available for a limited time?
04:06:07	<Pedrosso47>	Oh yes, indeed.
04:07:01	<@JAA>	Virtually everything we archive is copyrighted content. That's not really a factor at play here. It's how intellectual property works, for better or for worse. There are exceptions in many jurisdictions that free you from having to follow copyright restrictions when it's done for preservation purposes, which would probably apply here.
04:07:02		Pedrosso quits [Ping timeout: 243 seconds]
04:08:52		Island quits [Ping timeout: 265 seconds]
04:10:00	<Pedrosso47>	That's very nice to know, however whenever I try to look up information on the internet archive they seem adamant about not posting non-creative commons. Though I may have gotten the wrong inpression
04:11:01		Pedrosso47 is now known as Pedrosso
04:11:01	<pokechu22>	Where'd you see that?
04:12:03		BlueMaxima quits [Read error: Connection reset by peer]
04:12:42	<@JAA>	They probably say something along those lines to discourage people from uploading stuff that's already widely shared and won't get lost anytime soon (e.g. latest Hollywood productions). There's likely also a 'we have to say that so we don't get in trouble' angle to it. Nevertheless, IA does have the legal right to store such content. They might not be able to make it publicly available until the
04:12:48	<@JAA>	copyright expires in a few hundred years.
04:13:26	<@JAA>	So for an individual uploader, that's the policy they probably want, more or less.
04:13:35	<Pedrosso>	Great, great.
04:14:04	<thuban>	yeah, in practice ia 'darks' items (makes them inaccessible) in response to dmca claims; while accumulating a lot of reports or flagrantly pirating popular content can get you b&, they're pretty relaxed about good-faith uploads. if it's niche or abandoned enough not to get reported in the first place, it's basically fine
04:14:21	<@JAA>	That doesn't mean they might not be interested in something like this. It'd be all about size and logistics. How much data is it, and do they just need to provide storage for it or does it involve them doing work.
04:15:35	<@JAA>	Talking to them is important for things like this. Either directly or through arkiver, for example. If they want to take the data, and they already know what this is about, future takedowns etc. won't be as problematic.
04:17:11	<@JAA>	Archiving these official platforms by major broadcasters has been on my wishlist for a while. It's a lot of work though, especially at scale (i.e. many countries etc.).
04:18:13	<Pedrosso>	I see ~~But I'm shy~~ As for the major broadcasters though; svt.se is a "parent" website with loads of news articles all over the country. I'd believe it's quite large
04:19:04	<Pedrosso>	as in, https://www.svt.se/
04:20:32	<@JAA>	We probably archive a fair bit of that through #//. These audio and video platforms can virtually never be archived properly like that though and need special stuff.
04:21:26	<Pedrosso>	does #// get that through outlinks or are you saying a lot of it is manually added?
04:22:22		balrog quits [Quit: Bye]
04:22:32	<@JAA>	There are things we grab regularly. At least one of those lists is news outlets sourced from Wikidata. I'd expect svt.se to be there, though I didn't check.
04:22:45	<@JAA>	For those, we regularly grab the homepage and links from it, or something along those lines.
04:23:28	<pokechu22>	Yeah, there's https://www.wikidata.org/wiki/Q215363 (and also https://www.wikidata.org/wiki/Q10686370 for some reason?)
04:24:08	<Pedrosso>	How do you search on IA for URLs in a domain archived by WikiTeam?
04:25:09	<@JAA>	pokechu22: One is the company, the other is their website. But also, yes, naturally it's in Wikidata, but I'm not sure whether it made it into the list of news outlets since that was filtered by probably the 'instance of' value and I don't remember which possible values were accepted there.
04:25:59	<Pedrosso>	would that be an extensive list of outlinks or simply a selection?
04:26:24	<Pedrosso>	assuming it is in the list of news outlets
04:26:47	<@JAA>	It's in 43200_wikidata_Q11033_mass-media.wikidata.txt
04:26:57	<@JAA>	Which should mean it gets grabbed every 12 hours.
04:27:15	<@JAA>	But the GitHub repo is outdated, so...
04:27:32	<@JAA>	https://github.com/ArchiveTeam/urls-sources if you want to poke around.
04:29:48	<@JAA>	archiveteam_urls doesn't show up on https://web.archive.org/web/collections/20230000000000*/https://www.svt.se/ though, odd.
04:30:46		balrog (balrog) joins
04:31:32	<@JAA>	Pedrosso: The idea is that we fetch the homepage every N hours and then queue back any links found on it. If they were already captured, that gets filtered out. New links make it through and get archived.
04:32:01	<Pedrosso>	Ahh, I get the concept
04:32:11	<Pedrosso>	because of frontpage stuff
04:32:14	<@JAA>	(Also, bringing up that missing stuff in #// directly.)
04:32:33	<Pedrosso>	(thx for the note)
04:38:04		DogsRNice quits [Read error: Connection reset by peer]
04:51:20		mossssss90 quits [Remote host closed the connection]
04:54:57	<Pedrosso>	A list of big websites that I have been debating on sharing here. I suppose even if they're too big & not useful enough to archive there's no harm in sharing https://transfer.archivete.am/13l4Ga/list.txt
05:01:57		pabs wonders if legit .tk domains need to get grabbed https://www.technologyreview.com/2023/11/02/1082798/tiny-pacific-island-global-capital-cybercrime/
05:02:12	<pabs>	tcl.tk for eg :)
05:15:33	<pokechu22>	There's an archivebot job for https://www.legislation.gov.uk/ but it turns out the UK has a lot of law (and also that site's banned us as of a bit under a month ago :\|)
05:19:01	<pabs>	perhaps needs a distributed project?
05:33:17		nicolas17_ is now known as nicolas17
05:46:26		nick joins
05:46:46		nick quits [Remote host closed the connection]
06:23:38		nicolas17 quits [Client Quit]
06:26:27		Pedrosso quits [Remote host closed the connection]
06:31:33		useretail_ joins
06:31:37		useretail__ quits [Remote host closed the connection]
06:31:37		Arcorann quits [Remote host closed the connection]
06:37:25		Arcorann (Arcorann) joins
06:58:17		dumbgoy quits [Ping timeout: 272 seconds]
07:31:13		itachi1706 quits [Ping timeout: 272 seconds]
07:33:39		itachi1706 (itachi1706) joins
07:33:55		Ruthalas59 quits [Read error: Connection reset by peer]
07:34:12		Ruthalas59 (Ruthalas) joins
07:41:55		hitgrr8 joins
08:09:44		Perk quits [Client Quit]
08:10:01		Perk joins
08:24:13		Wohlstand (Wohlstand) joins
08:54:13		Wohlstand quits [Client Quit]
09:05:55		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
09:40:51		Island_ quits [Read error: Connection reset by peer]
10:00:03		Bleo1 quits [Client Quit]
10:01:23		Bleo1 joins
10:28:32		BearFortress quits [Read error: Connection reset by peer]
10:29:06		BearFortress joins
11:36:28		Arcorann quits [Remote host closed the connection]
11:37:09		icedice quits [Client Quit]
11:42:29		Arcorann (Arcorann) joins
11:47:34		icedice (icedice) joins
12:05:06		icedice quits [Client Quit]
12:15:05		Arcorann quits [Ping timeout: 265 seconds]
12:50:52		Barto quits [Read error: Connection reset by peer]
12:59:23		Barto (Barto) joins
13:34:46	<h2ibot>	Bzc6p edited Fextralife (+0, fix banner link): https://wiki.archiveteam.org/?diff=51134&oldid=51129
13:57:57		mossssss joins
14:07:01	<@arkiver>	hi
14:07:10	<@arkiver>	so google is doing stuff
14:08:33	<@arkiver>	pabs: yeah we perform discovery while archiving of blogger
14:09:53	<h2ibot>	0KepOnline edited Spore (+40, Add OLDEST view type): https://wiki.archiveteam.org/?diff=51135&oldid=51112
14:20:50		mossssss quits [Remote host closed the connection]
14:22:25		mossssss joins
14:22:56	<mossssss>	wait sorry - it disconnected (i think my internet is just bad lol), arkiver what is google doing?
14:34:12		mossssss quits [Remote host closed the connection]
14:35:57		mossssss joins
14:55:02		kiryu quits [Remote host closed the connection]
14:56:31		kiryu joins
14:56:31		kiryu is now authenticated as kiryu
14:56:31		kiryu quits [Changing host]
14:56:31		kiryu (kiryu) joins
14:59:46		mossssss quits [Remote host closed the connection]
15:08:04		etnguyen03 (etnguyen03) joins
15:08:08		RJHacker9147 joins
15:09:01		RJHacker9147 is now known as redlattice
15:12:23		toss (toss) joins
15:13:54		toss quits [Client Quit]
15:21:46		dumbgoy joins
15:32:31		dumbgoy_ joins
15:35:40		dumbgoy quits [Ping timeout: 265 seconds]
16:09:34		redlattice quits [Client Quit]
16:25:51		sec^nd quits [Ping timeout: 245 seconds]
16:26:43		sec^nd (second) joins
16:43:08		mossssss joins
16:45:41		Megame (Megame) joins
16:48:36		mossssss quits [Remote host closed the connection]
16:48:40		dumbgoy__ joins
16:48:46		BearFortress_ joins
16:49:09		mossssss joins
16:50:35		katocala quits [Ping timeout: 240 seconds]
16:50:51		katocala joins
16:51:33		dumbgoy_ quits [Ping timeout: 265 seconds]
16:54:12		BearFortress quits [Ping timeout: 265 seconds]
17:01:14		BearFortress joins
17:04:32		marto_ quits [Quit: zzzzz]
17:04:36		BearFortress_ quits [Ping timeout: 265 seconds]
17:05:22		marto_ (marto_) joins
17:11:21		bilboed quits [Ping timeout: 272 seconds]
17:12:49		etnguyen03 quits [Ping timeout: 265 seconds]
17:13:47		bilboed joins
17:19:17		katocala is now authenticated as katocala
17:21:22		Pedrosso joins
17:25:37	<Pedrosso>	https://wiki.archiveteam.org/index.php/Frequently_Asked_Questions#:~:text=I%20saved/archived,for%20hosting%20archives! as per this, if I use the given tools to create archives of svtplay.se would the process then be to have someone here review the files' integrity? Is there any nice naming scheme the IA items should have (and any other preferred
17:25:37	<Pedrosso>	fields & metadata for IA)?
17:45:45		treora quits [Quit: blub blub.]
17:46:59		treora joins
17:51:12	<fireonlive>	mossssss: https://hackint.logs.kiska.pw/archiveteam-bs just in case you get disconnected :3
17:51:35	<fireonlive>	mossssss: also if you leave webirc in a background tab, browsers suspend the tab which drops the connection
17:52:06	<mossssss>	oohhh that would make sense. ill keep it in another window to leave it up. also thank you!!!!!
17:52:13	<fireonlive>	welcome =]
17:52:27	<fireonlive>	you can also use a desktop IRC client if you wish such as hexchat
17:52:56	<fireonlive>	or quassel, there's a few out there
17:53:23	<mossssss>	ill have to look into that! my partner is a lot more well versed in this stuff haha so ill ask them
17:53:52	<mossssss>	(im the archiving nerd, they are the computer stuff (inc. irc) nerd)
17:55:40	<Pedrosso>	fireonlive: I keep forgetting how to get to those logs lol. Still do
17:55:45	<fireonlive>	:)
17:55:53		fireonlive hands Pedrosso a bookmark
17:56:17	<Pedrosso>	(How to do actions? "* text"?)
17:56:27	<katia>	/me pets a cat
17:56:59		Wohlstand (Wohlstand) joins
17:57:35		Pedrosso gladly receives said bookmark
17:57:38	<fireonlive>	:3
17:57:49		Pedrosso tests /me
18:01:26	<fireonlive>	katia: taking a look under the hood eh
18:01:43	<fireonlive>	:p
18:02:15	<katia>	👀
18:28:53	<Ryz>	arkiver, any updates on Blogger/Google stuff?
18:40:33		apache2 joins
18:40:47		apache2 quits [Client Quit]
18:41:10		apache2 joins
18:47:02		DogsRNice joins
18:53:48		etnguyen03 (etnguyen03) joins
18:53:57	<h2ibot>	JustAnotherArchivist edited List of websites excluded from the Wayback Machine/Partial exclusions (+907, Add Airbnb): https://wiki.archiveteam.org/?diff=51136&oldid=51122
19:28:09		Wohlstand quits [Remote host closed the connection]
19:28:17		Wohlstand (Wohlstand) joins
19:49:54		etnguyen03 quits [Ping timeout: 265 seconds]
19:50:14		etnguyen03 (etnguyen03) joins
19:54:43		Lord_Nightmare quits [Quit: ZNC - http://znc.in]
19:59:29		benjinsm joins
20:02:59		benjins quits [Ping timeout: 272 seconds]
20:04:04		benjinsmi joins
20:04:44		Lord_Nightmare (Lord_Nightmare) joins
20:05:35		benjinsmi is now known as benjins
20:05:37		benjins is now authenticated as benjins
20:08:03		benjinsm quits [Ping timeout: 272 seconds]
20:14:41		BlueMaxima joins
20:15:39		benjins quits [Remote host closed the connection]
20:15:52		benjins joins
20:18:45		redlattice joins
20:20:34		benjinsm joins
20:21:59		benjins quits [Ping timeout: 272 seconds]
20:23:54		redlattice quits [Client Quit]
20:26:16	<h2ibot>	Exorcism uploaded File:Fextralife-screenshot.png: https://wiki.archiveteam.org/?title=File%3AFextralife-screenshot.png
20:27:17	<h2ibot>	Exorcism edited Fextralife (+35): https://wiki.archiveteam.org/?diff=51138&oldid=51134
20:35:54		benjinsmi joins
20:39:05		benjinsm quits [Ping timeout: 272 seconds]
20:40:43		benjinsmi is now known as benjins
20:40:43		benjins is now authenticated as benjins
21:18:40	<vokunal\|m>	erai-raws.info missed a payment on their ER-drive service, and lost their subscription.
21:19:04	<vokunal\|m>	I have no idea what an ER-Drive is
21:20:41	<vokunal\|m>	They've had issues with their paypal being banned before. I'm not sure if this is related
21:45:35		rohvani quits [Ping timeout: 272 seconds]
21:46:35	<thuban>	Pedrosso: your question is a little unclear to me. by "us[ing] the given tools to create archives of svtplay.se", do you mean using -dl scripts to get the video files, or using warc tools to create warcs?
21:46:44	<thuban>	(the latter would be difficult, because most warc tools won't work well with such a js-heavy site without substantial custom scripting. (also, even a perfect capture might or might not play back correctly in the wayback machine))
21:46:56	<thuban>	in either case, no, there isn't a process to "review the files' integrity"; there's no technical mechanism to do that (tls isn't designed that way), so the internet archive basically operates on trust. archiveteam, aiui, no longer adopts third-party data into the archiveteam collection--that faq entry is outdated and should be changed.
21:47:02	<thuban>	what JAA said earlier is right; if you want to do this at scale you should consider talking to ia about it first
21:47:10	<thuban>	that said, for general information about metadata consult https://archive.org/developers/metadata-schema/index.html and/or https://web.archive.org/web/20221001171424/https://archive.org/services/docs/api/metadata-schema/index.html (latter has file-level metadata documentation; i have no idea why it was removed)
21:49:34	<Pedrosso>	thuban: The answer to your first question is archiveteam's "grab-site" tool. As the -dl scripts would require some scripting to get working within a web format I'd imagine
21:50:51	<thuban>	yeah, i would be _very_ surprised if that worked
21:52:20	<Pedrosso>	I didn't mean a technical mechanism specifically, just any mechanism technical or otherwise. Sad to know there are none adopted anymore but I suppose it may be better to go straight to the top ~~still shy about that tho~~. It's a little annoying that the wiki is out of date with such things, but still nice to have the info. Thanks about the
21:52:20	<Pedrosso>	metadata-related links
21:53:40	<thuban>	sorry about that! i'll update the page if an op confirms the current policy.
22:04:56		Wohlstand quits [Remote host closed the connection]
22:05:14		Wohlstand (Wohlstand) joins
22:16:19		benjinsm joins
22:16:43		Perk5 joins
22:17:23		aninternettroll_ (aninternettroll) joins
22:17:27		shreyasminocha quits [Ping timeout: 250 seconds]
22:17:27		thehedgeh0g quits [Ping timeout: 250 seconds]
22:17:27		evan quits [Ping timeout: 250 seconds]
22:17:27		aninternettroll quits [Ping timeout: 250 seconds]
22:17:27		TheTechRobo quits [Client Quit]
22:17:27		Wohlstand quits [Remote host closed the connection]
22:17:27		benjins quits [Remote host closed the connection]
22:17:27		bilboed quits [Client Quit]
22:17:27		Perk quits [Client Quit]
22:17:27		Perk5 is now known as Perk
22:17:30		aninternettroll_ is now known as aninternettroll
22:17:33		Pedrosso quits [Remote host closed the connection]
22:17:33		mossssss quits [Remote host closed the connection]
22:17:34		bilboed joins
22:17:37		Wohlstand (Wohlstand) joins
22:18:00		TheTechRobo (TheTechRobo) joins
22:18:31		evan joins
22:18:33		sepro quits [Quit: Bye!]
22:18:41		shreyasminocha (shreyasminocha) joins
22:19:16		thehedgeh0g (mrHedgehog0) joins
22:19:53		benjinsm is now known as benjins
22:19:54		benjins is now authenticated as benjins
22:20:44		mossssss joins
22:26:00		qwertyasdfuiopghjkl quits [Remote host closed the connection]
22:26:35		sepro (sepro) joins
22:27:43	<h2ibot>	JustAnotherArchivist edited List of websites excluded from the Wayback Machine/Partial exclusions (+875, More Airbnb): https://wiki.archiveteam.org/?diff=51139&oldid=51136
22:28:16		Island joins
22:41:58		etnguyen03 quits [Ping timeout: 265 seconds]
23:03:15		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
23:04:38		useretail__ joins
23:07:55		useretail_ quits [Ping timeout: 272 seconds]
23:14:02		etnguyen03 (etnguyen03) joins
23:17:10	<that_lurker>	https://arstechnica.com/science/2023/11/first-planned-small-nuclear-reactor-plant-in-the-us-has-been-canceled/
23:17:35	<that_lurker>	Could maybe be a good idea to grab https://www.nuscalepower.com/en
23:20:39		abirkill- (abirkill) joins
23:22:55	<vokunal\|m>	vokunal: Source for my above message earlier https://www.erai-raws.info/news/er-drive-and-hevc/
23:23:07		abirkill quits [Ping timeout: 272 seconds]
23:23:07		abirkill- is now known as abirkill
23:24:07		Megame quits [Client Quit]
23:28:13		HP_Archivist quits [Read error: Connection reset by peer]
23:28:14		]SaRgE[ quits [Read error: Connection reset by peer]
23:28:17		benjins quits [Read error: Connection reset by peer]
23:28:17		benjins2_ quits [Read error: Connection reset by peer]
23:28:21		Naruyoko5 quits [Read error: Connection reset by peer]
23:28:35		HP_Archivist (HP_Archivist) joins
23:28:37		]SaRgE[ (sarge) joins
23:28:40		fuzzy8021 quits [Read error: Connection reset by peer]
23:28:42		kiryu quits [Read error: Connection reset by peer]
23:28:43		BlueMaxima_ joins
23:28:49		DogsRNice_ joins
23:28:58		benjins joins
23:28:59		kiryu (kiryu) joins
23:29:03		Naruyoko5 joins
23:29:03		useretail_ joins
23:29:04		superkuh_ joins
23:29:13		atphoenix_ (atphoenix) joins
23:29:14		Carnildo quits [Remote host closed the connection]
23:29:22		Miori quits [Quit: Ping timeout (120 seconds)]
23:29:24		marto_ quits [Client Quit]
23:29:24		Jake quits [Quit: Ping timeout (120 seconds)]
23:29:30		marto_ (marto_) joins
23:29:32		sepro quits [Client Quit]
23:29:34		project10 quits [Quit: Ping timeout (120 seconds)]
23:29:34		lflare quits [Quit: Ping timeout (120 seconds)]
23:29:34		CraftByte3 (DragonSec\|CraftByte) joins
23:29:35		Ryz28 (Ryz) joins
23:29:35		graham1 joins
23:29:39		fireonlive quits [Quit: Ping timeout (120 seconds)]
23:29:39		s-crypt21 (s-crypt) joins
23:29:40		nic90 quits [Quit: Ping timeout (120 seconds)]
23:29:43		fuzzy8021 (fuzzy8021) joins
23:29:43		sepro (sepro) joins
23:29:47		TastyWiener959 (TastyWiener95) joins
23:29:47		nulldata2 (nulldata) joins
23:29:48		benjins2_ joins
23:29:50		andrew8 (andrew) joins
23:29:50		CandidSparrow9 joins
23:29:52		kiska7 (kiska) joins
23:29:55		lflare (lflare) joins
23:30:03		Flashfire423 joins
23:30:04		project10 (project10) joins
23:30:06		sloop_ joins
23:30:09		endrift\|ZNC quits [Remote host closed the connection]
23:30:10		Carnildo joins
23:30:10		kiska54 joins
23:30:12		CraftByte quits [Read error: Connection reset by peer]
23:30:12		CraftByte3 is now known as CraftByte
23:30:13		jasons quits [Quit: Ping timeout (120 seconds)]
23:30:14		Justin[home] joins
23:30:14		Justin[home] is now authenticated as DopefishJustin
23:30:16		Miori joins
23:30:16		nic9 (nic) joins
23:30:17		endrift joins
23:30:18		Perk9 joins
23:30:24		Jake (Jake) joins
23:30:27		Lord_Nightmare quits [Client Quit]
23:30:29		jasons (jasons) joins
23:30:30		graham quits [Quit: Ping timeout (120 seconds)]
23:30:30		null quits [Remote host closed the connection]
23:30:30		graham1 is now known as graham
23:30:32		abirkill- (abirkill) joins
23:30:34		wyatt8740 quits [Remote host closed the connection]
23:30:34		dxrt_ quits [Quit: ZNC - http://znc.sourceforge.net]
23:30:36		BPCZ quits [Quit: eh???]
23:30:38		leo60228- quits [Quit: ZNC 1.8.2 - https://znc.in]
23:30:38		Ryz4 (Ryz) joins
23:30:41		andrew quits [Read error: Connection reset by peer]
23:30:41		andrew8 is now known as andrew
23:30:43		Lord_Nightmare (Lord_Nightmare) joins
23:30:45		BPCZ (BPCZ) joins
23:30:45		sloop quits [Quit: ZNC 1.8.2 - https://znc.in]
23:30:53		dxrt joins
23:30:55		dxrt is now authenticated as dxrt
23:30:55		dxrt quits [Changing host]
23:30:55		dxrt (dxrt) joins
23:30:55		@ChanServ sets mode: +o dxrt
23:30:56		hogchips_ joins
23:30:57		fireonlive (fireonlive) joins
23:31:01		leo60228 (leo60228) joins
23:31:03		CandidSparrow quits [Read error: Connection reset by peer]
23:31:03		Perk quits [Read error: Connection reset by peer]
23:31:03		CandidSparrow9 is now known as CandidSparrow
23:31:04		Perk9 is now known as Perk
23:31:05		Earendil7 quits [Client Quit]
23:31:11		nulldata quits [Read error: Connection reset by peer]
23:31:12		nulldata2 is now known as nulldata
23:31:14		Ryz quits [Read error: Connection reset by peer]
23:31:14		kiska5 quits [Read error: Connection reset by peer]
23:31:14		Ryz4 is now known as Ryz
23:31:14		kiska54 is now known as kiska5
23:31:21		DopefishJustin quits [Ping timeout: 272 seconds]
23:31:21		wickedplayer494 quits [Ping timeout: 272 seconds]
23:31:21		danwellby quits [Ping timeout: 272 seconds]
23:31:22		Earendil7 (Earendil7) joins
23:31:37		wickedplayer494 joins
23:31:39		wickedplayer494 is now authenticated as wickedplayer494
23:31:44		hogchips quits [Read error: Connection reset by peer]
23:31:49		rktk (rktk) joins
23:31:59		useretail__ quits [Ping timeout: 272 seconds]
23:31:59		atphoenix__ quits [Ping timeout: 272 seconds]
23:31:59		TastyWiener95 quits [Ping timeout: 272 seconds]
23:31:59		s-crypt2 quits [Ping timeout: 272 seconds]
23:31:59		Ryz2 quits [Ping timeout: 272 seconds]
23:31:59		superkuh quits [Ping timeout: 272 seconds]
23:31:59		bladem quits [Ping timeout: 272 seconds]
23:31:59		s-crypt21 is now known as s-crypt2
23:32:00		Ryz28 is now known as Ryz2
23:32:00		TastyWiener959 is now known as TastyWiener95
23:32:37		abirkill quits [Ping timeout: 272 seconds]
23:32:37		BlueMaxima quits [Ping timeout: 272 seconds]
23:32:37		DogsRNice quits [Ping timeout: 272 seconds]
23:32:37		Flashfire42 quits [Ping timeout: 272 seconds]
23:32:37		kiska quits [Ping timeout: 272 seconds]
23:32:37		abirkill- is now known as abirkill
23:32:37		Flashfire423 is now known as Flashfire42
23:32:38		kiska7 is now known as kiska
23:32:44		danwellby joins
23:32:50		wyatt8740 joins
23:33:18		bladem (bladem) joins
23:34:08		benjins is now authenticated as benjins
23:46:24		mossssss quits [Client Quit]
23:50:41		Pedrosso joins
23:57:20		Earendil7 quits [Client Quit]
23:58:19		Earendil7 (Earendil7) joins

Home Search Previous day Next day