00:02:44 | <wyatt8740> | Well, here I am. :p |
00:02:50 | <wyatt8740> | https://jp.mercari.com/item/m40357548342 is the page i'm trying to grab |
00:03:27 | <wyatt8740> | It's a react.js page |
00:03:55 | <@JAA> | wyatt8740: ArchiveBot doesn't know JS at all. wpull tries to find URLs in <script> blocks, but that's very unreliable. Anything beyond that simply won't be covered by it. |
00:04:16 | <wyatt8740> | alright. so since it's react, probably a non-starter. Had a bad feeling that'd be the case. |
00:04:34 | <wyatt8740> | (I love the modern web.) |
00:05:30 | <@JAA> | WARC is capable of many things, but far from everything. HTTP/2 and HTTP/3 are right out. WebSockets, too. If you can achieve the page with HTTP/1.1 requests only, then WARC would work. |
00:05:51 | <wyatt8740> | any suggestions for how best to do this, then? Just have a screenshot in my article, links to self-hosted copies of images, and let wayback machine/archivebot crawl my site? |
00:06:02 | <@JAA> | Playback is a whole different topic. POST requests in particular are hard, as is anything involving random variables (JSONP, timestamps as cache busters, etc.). |
00:06:07 | <wyatt8740> | my blog should be fine for that :) |
00:06:08 | <wyatt8740> | https://wyatt8740.gitlab.io/site/blog/011_012.html#pc9801-1 |
00:06:55 | <thuban> | and here's a list of 190 blogs extracted from other sources i had lying around (deduped from previous): https://transfer.archivete.am/2urPt/blogspot_blogs_2.txt |
00:06:55 | <wyatt8740> | yeah i'd looked into WARC format before and quickly got confused because I didn't grok that it was actually transcribing the full HTTP transaction |
00:06:58 | <thuban> | (these aren't really filtered for significance--a lot of them are from when we were trawling for zippyshare links--but if we end up doing horizontal discovery, more seeds don't hurt, right?) |
00:07:37 | <wyatt8740> | *transactions |
00:08:08 | <@JAA> | Getting playback right in the general case is virtually impossible. There are too many things that can influence what exactly is displayed etc., e.g. screen size, browser version, datetime, time zone, you name it. |
00:08:35 | <wyatt8740> | i do like what i imagine the archive.is approach is as a supplement to WARC |
00:08:43 | <@JAA> | The WBM does a lot of tricks and manages to work around some of those, but ... yeah. |
00:08:55 | <pabs> | I think SPN2 is the main thing that does JavaScript. either submit to the form on web.archive.org/save/ or send a mail with links to savepagenow@archive.org |
00:08:57 | <@JAA> | What you can do is use SPN as a logged-in user and make it also create a screenshot. |
00:09:21 | <wyatt8740> | yeah i did the /save/ thing and got the near-empty file |
00:09:22 | <@JAA> | Probably the data itself is captured, but the playback doesn't work. SPN uses a browser under the hood. |
00:09:28 | <pabs> | ah, you're talking about saving the DOM to HTML? |
00:09:44 | <wyatt8740> | that would be one method, sure. |
00:09:59 | <pabs> | archive.is is the only thing I know of that does the DOM2HTML thing |
00:10:10 | <wyatt8740> | i mostly want things i link in my blog externally to be archived/findable in future |
00:10:14 | <pabs> | for having a public archive at least |
00:10:16 | <wyatt8740> | since it drives me nuts when people don't |
00:10:51 | <@JAA> | I don't see a screenshot on the WBM for https://jp.mercari.com/item/m40357548342 |
00:11:13 | <wyatt8740> | nothing at https://web.archive.org/web/20231111234533id_/https://jp.mercari.com/item/m40357548342 ? |
00:11:19 | <@JAA> | Screenshot, not snapshot |
00:11:22 | <wyatt8740> | ahh ok |
00:11:34 | <@JAA> | There's an option for logged-in users on SPN to also capture a screenshot. |
00:11:48 | <@JAA> | That happens from the browser doing the archival, so it should be reasonably usable. |
00:12:00 | <wyatt8740> | i guess i was using a diff. browser than normal |
00:12:06 | <wyatt8740> | let me go back to the one i'm logged in on :\ |
00:12:46 | <@JAA> | Can't be resaved currently due to the cooldown timer, but should work again in half an hour or so (not sure what the current limit is). |
00:13:00 | <wyatt8740> | yeah, discovered that. |
00:13:33 | <@JAA> | It'll still only be a screenshot, so no Ctrl+F, no copying, etc. |
00:13:44 | <wyatt8740> | yeah |
00:13:47 | <wyatt8740> | better than nothing |
00:14:11 | <@JAA> | DOM dump as a static page would be nice. Then again, that method also has its limitations, as can frequently be seen on archive.$tldoftheday. |
00:14:31 | <@JAA> | Anything requiring scripting on the page, e.g. expanding sections or whatever, won't work. |
00:14:33 | <wyatt8740> | https://archive.is/HUTy2 |
00:14:49 | <wyatt8740> | thankfully the side image thumbnails seem to be the full size images shrunk in CSS/JS |
00:14:58 | <wyatt8740> | so they're actually saved |
00:15:50 | <@JAA> | They're in the WBM, too, e.g. https://web.archive.org/web/20231111234535/https://static.mercdn.net/item/detail/orig/photos/m40357548342_5.jpg?1695524270 |
00:16:03 | <wyatt8740> | hmm. well, that's good at least. |
00:16:43 | <@JAA> | As I said, probably captured alright, just doesn't play back, which makes it fairly useless currently. :-/ |
00:17:03 | <wyatt8740> | The state of modern web dev; I love it. |
00:17:42 | <@JAA> | Aye |
00:18:03 | <@JAA> | And it'll only get worse. Hooray. |
00:18:10 | <wyatt8740> | I love the future. |
00:18:20 | <wyatt8740> | And I especially love facebook |
00:18:20 | <@JAA> | I've seen a site before that did all content loading with a WebSocket. |
00:18:39 | <wyatt8740> | That's... like doing rtmp grabs in a SWF or something, as far as archival is concerned |
00:18:59 | <wyatt8740> | actually what you describe reminds me a lot of flash-based sites |
00:19:09 | <@JAA> | Yeah, pretty much. |
00:22:51 | <h2ibot> | Switchnode edited Deathwatch (+4, /* 2023 */ fix syntax): https://wiki.archiveteam.org/?diff=51131&oldid=51126 |
00:23:13 | <@JAA> | Whoops, thanks. |
00:31:15 | <pabs> | /cc arkiver re having SPN2 get an option to save the DOM to HTML, similar to how it has the screenshot thing |
00:40:11 | | ScenarioPlanet quits [Ping timeout: 272 seconds] |
01:06:47 | | katocala quits [Ping timeout: 272 seconds] |
01:07:35 | | katocala joins |
01:13:01 | | katocala is now authenticated as katocala |
01:34:34 | <tomodachi94> | @Pedrosso:hackint.org @JAA:hackint.org thank you for grabbing Fextralife's wikis, I appreciate it! ❤️ |
01:35:13 | <Pedrosso> | <3 |
01:36:38 | <Pedrosso> | You were right about it being a gold-mine. So satisfying. |
01:37:06 | | useretail_ joins |
01:40:21 | | useretail__ quits [Ping timeout: 272 seconds] |
01:44:08 | | useretail__ joins |
01:46:45 | | useretail_ quits [Ping timeout: 265 seconds] |
01:56:11 | | lennier2 quits [Ping timeout: 272 seconds] |
01:58:40 | | lennier2 joins |
02:21:06 | | BearFortress joins |
02:26:45 | <Pedrosso> | I know of a website https://svtplay.se (videos cannot be archived, most of it is locked behind a region specific wall too) that is often the only source to a specific media and they're often deleted on grounds of copyright or other rights. There are -dl scripts for it. I am concerned about archival though. It definitely needs archival since |
02:26:46 | <Pedrosso> | otherwise a lot of media is continually lost, however I cannot hold it and it's clear it cannot just be submitted publically. What would be adviced here? |
02:27:02 | <Pedrosso> | (videos cannot be archived via save-page or a web save afaik)* |
02:28:09 | <Flashfire42> | Maybe tubeup but use it VERY SPARINGLY because there is a lot of garbage people upload using it and it can cause a lot of space usage for IA |
02:28:32 | <Pedrosso> | tubeup? |
02:29:06 | <Pedrosso> | I'm not entirely sure if you understand what I'm asking about |
02:30:43 | <Pedrosso> | to reclarify, there are -dl scripts to get the videos. ( https://github.com/spaam/svtplay-dl ). My problem is more legal and ethical |
02:33:11 | <Pedrosso> | It's a general question but if specifics are required, it's about storage. |
02:57:25 | <h2ibot> | Tech234a edited YouTube (+302, /* Stories */ Discontinued): https://wiki.archiveteam.org/?diff=51132&oldid=50877 |
03:01:26 | <h2ibot> | Tech234a edited YouTube (+15, /* Playlist notes (October 2020) */ Add…): https://wiki.archiveteam.org/?diff=51133&oldid=51132 |
03:43:23 | <Pedrosso> | (I feel locked-out from asking any other questions by having this one here lol) Is there no like, go-to process in situations like this? |
03:44:50 | <pokechu22> | I would say in practice we usually lean towards archiving something if it's useful to have - but it also does depend on the total size |
03:46:17 | <Pedrosso> | The point of my lemma is that since when items are removed it's because the rights run out, it's innately and obviously an item not using Creative Commons |
03:51:23 | <Pedrosso> | For context, videos are up for free and not all videos are deleted |
03:53:26 | <Pedrosso> | and with "videos" I mean movies/films, series, news, documentaries, tv channels, etc. Which I believe is in what's counted as useful to have |
04:02:21 | | lennier2_ joins |
04:04:35 | | Island_ joins |
04:04:40 | <@JAA> | Pedrosso: Just so I understand what we're talking about: this is a legitimate site, right? Based on the name, I assume it's the TV broadcaster's digital platform, where they make their and licensed content available for a limited time? |
04:04:58 | | Pedrosso47 joins |
04:05:28 | | lennier2 quits [Ping timeout: 265 seconds] |
04:05:32 | <@JAA> | I'll assume that you missed that message. |
04:05:37 | <@JAA> | Pedrosso47: Just so I understand what we're talking about: this is a legitimate site, right? Based on the name, I assume it's the TV broadcaster's digital platform, where they make their and licensed content available for a limited time? |
04:06:07 | <Pedrosso47> | Oh yes, indeed. |
04:07:01 | <@JAA> | Virtually everything we archive is copyrighted content. That's not really a factor at play here. It's how intellectual property works, for better or for worse. There are exceptions in many jurisdictions that free you from having to follow copyright restrictions when it's done for preservation purposes, which would probably apply here. |
04:07:02 | | Pedrosso quits [Ping timeout: 243 seconds] |
04:08:52 | | Island quits [Ping timeout: 265 seconds] |
04:10:00 | <Pedrosso47> | That's very nice to know, however whenever I try to look up information on the internet archive they seem adamant about not posting non-creative commons. Though I may have gotten the wrong inpression |
04:11:01 | | Pedrosso47 is now known as Pedrosso |
04:11:01 | <pokechu22> | Where'd you see that? |
04:12:03 | | BlueMaxima quits [Read error: Connection reset by peer] |
04:12:42 | <@JAA> | They probably say something along those lines to discourage people from uploading stuff that's already widely shared and won't get lost anytime soon (e.g. latest Hollywood productions). There's likely also a 'we have to say that so we don't get in trouble' angle to it. Nevertheless, IA does have the legal right to store such content. They might not be able to make it publicly available until the |
04:12:48 | <@JAA> | copyright expires in a few hundred years. |
04:13:26 | <@JAA> | So for an individual uploader, that's the policy they probably want, more or less. |
04:13:35 | <Pedrosso> | Great, great. |
04:14:04 | <thuban> | yeah, in practice ia 'darks' items (makes them inaccessible) in response to dmca claims; while accumulating a lot of reports or flagrantly pirating popular content can get you b&, they're pretty relaxed about good-faith uploads. if it's niche or abandoned enough not to get reported in the first place, it's basically fine |
04:14:21 | <@JAA> | That doesn't mean they might not be interested in something like this. It'd be all about size and logistics. How much data is it, and do they just need to provide storage for it or does it involve them doing work. |
04:15:35 | <@JAA> | Talking to them is important for things like this. Either directly or through arkiver, for example. If they want to take the data, and they already know what this is about, future takedowns etc. won't be as problematic. |
04:17:11 | <@JAA> | Archiving these official platforms by major broadcasters has been on my wishlist for a while. It's a lot of work though, especially at scale (i.e. many countries etc.). |
04:18:13 | <Pedrosso> | I see ~~But I'm shy~~ As for the major broadcasters though; svt.se is a "parent" website with loads of news articles all over the country. I'd believe it's quite large |
04:19:04 | <Pedrosso> | as in, https://www.svt.se/ |
04:20:32 | <@JAA> | We probably archive a fair bit of that through #//. These audio and video platforms can virtually never be archived properly like that though and need special stuff. |
04:21:26 | <Pedrosso> | does #// get that through outlinks or are you saying a lot of it is manually added? |
04:22:22 | | balrog quits [Quit: Bye] |
04:22:32 | <@JAA> | There are things we grab regularly. At least one of those lists is news outlets sourced from Wikidata. I'd expect svt.se to be there, though I didn't check. |
04:22:45 | <@JAA> | For those, we regularly grab the homepage and links from it, or something along those lines. |
04:23:28 | <pokechu22> | Yeah, there's https://www.wikidata.org/wiki/Q215363 (and also https://www.wikidata.org/wiki/Q10686370 for some reason?) |
04:24:08 | <Pedrosso> | How do you search on IA for URLs in a domain archived by WikiTeam? |
04:25:09 | <@JAA> | pokechu22: One is the company, the other is their website. But also, yes, naturally it's in Wikidata, but I'm not sure whether it made it into the list of news outlets since that was filtered by probably the 'instance of' value and I don't remember which possible values were accepted there. |
04:25:59 | <Pedrosso> | would that be an extensive list of outlinks or simply a selection? |
04:26:24 | <Pedrosso> | assuming it is in the list of news outlets |
04:26:47 | <@JAA> | It's in 43200_wikidata_Q11033_mass-media.wikidata.txt |
04:26:57 | <@JAA> | Which should mean it gets grabbed every 12 hours. |
04:27:15 | <@JAA> | But the GitHub repo is outdated, so... |
04:27:32 | <@JAA> | https://github.com/ArchiveTeam/urls-sources if you want to poke around. |
04:29:48 | <@JAA> | archiveteam_urls doesn't show up on https://web.archive.org/web/collections/20230000000000*/https://www.svt.se/ though, odd. |
04:30:46 | | balrog (balrog) joins |
04:31:32 | <@JAA> | Pedrosso: The idea is that we fetch the homepage every N hours and then queue back any links found on it. If they were already captured, that gets filtered out. New links make it through and get archived. |
04:32:01 | <Pedrosso> | Ahh, I get the concept |
04:32:11 | <Pedrosso> | because of frontpage stuff |
04:32:14 | <@JAA> | (Also, bringing up that missing stuff in #// directly.) |
04:32:33 | <Pedrosso> | (thx for the note) |
04:38:04 | | DogsRNice quits [Read error: Connection reset by peer] |
04:51:20 | | mossssss90 quits [Remote host closed the connection] |
04:54:57 | <Pedrosso> | A list of big websites that I have been debating on sharing here. I suppose even if they're too big & not useful enough to archive there's no harm in sharing https://transfer.archivete.am/13l4Ga/list.txt |
05:01:57 | | pabs wonders if legit .tk domains need to get grabbed https://www.technologyreview.com/2023/11/02/1082798/tiny-pacific-island-global-capital-cybercrime/ |
05:02:12 | <pabs> | tcl.tk for eg :) |
05:15:33 | <pokechu22> | There's an archivebot job for https://www.legislation.gov.uk/ but it turns out the UK has a lot of law (and also that site's banned us as of a bit under a month ago :|) |
05:19:01 | <pabs> | perhaps needs a distributed project? |
05:33:17 | | nicolas17_ is now known as nicolas17 |
05:46:26 | | nick joins |
05:46:46 | | nick quits [Remote host closed the connection] |
06:23:38 | | nicolas17 quits [Client Quit] |
06:26:27 | | Pedrosso quits [Remote host closed the connection] |
06:31:33 | | useretail_ joins |
06:31:37 | | useretail__ quits [Remote host closed the connection] |
06:31:37 | | Arcorann quits [Remote host closed the connection] |
06:37:25 | | Arcorann (Arcorann) joins |
06:58:17 | | dumbgoy quits [Ping timeout: 272 seconds] |
07:31:13 | | itachi1706 quits [Ping timeout: 272 seconds] |
07:33:39 | | itachi1706 (itachi1706) joins |
07:33:55 | | Ruthalas59 quits [Read error: Connection reset by peer] |
07:34:12 | | Ruthalas59 (Ruthalas) joins |
07:41:55 | | hitgrr8 joins |
08:09:44 | | Perk quits [Client Quit] |
08:10:01 | | Perk joins |
08:24:13 | | Wohlstand (Wohlstand) joins |
08:54:13 | | Wohlstand quits [Client Quit] |
09:05:55 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
09:40:51 | | Island_ quits [Read error: Connection reset by peer] |
10:00:03 | | Bleo1 quits [Client Quit] |
10:01:23 | | Bleo1 joins |
10:28:32 | | BearFortress quits [Read error: Connection reset by peer] |
10:29:06 | | BearFortress joins |
11:36:28 | | Arcorann quits [Remote host closed the connection] |
11:37:09 | | icedice quits [Client Quit] |
11:42:29 | | Arcorann (Arcorann) joins |
11:47:34 | | icedice (icedice) joins |
12:05:06 | | icedice quits [Client Quit] |
12:15:05 | | Arcorann quits [Ping timeout: 265 seconds] |
12:50:52 | | Barto quits [Read error: Connection reset by peer] |
12:59:23 | | Barto (Barto) joins |
13:34:46 | <h2ibot> | Bzc6p edited Fextralife (+0, fix banner link): https://wiki.archiveteam.org/?diff=51134&oldid=51129 |
13:57:57 | | mossssss joins |
14:07:01 | <@arkiver> | hi |
14:07:10 | <@arkiver> | so google is doing stuff |
14:08:33 | <@arkiver> | pabs: yeah we perform discovery while archiving of blogger |
14:09:53 | <h2ibot> | 0KepOnline edited Spore (+40, Add OLDEST view type): https://wiki.archiveteam.org/?diff=51135&oldid=51112 |
14:20:50 | | mossssss quits [Remote host closed the connection] |
14:22:25 | | mossssss joins |
14:22:56 | <mossssss> | wait sorry - it disconnected (i think my internet is just bad lol), arkiver what is google doing? |
14:34:12 | | mossssss quits [Remote host closed the connection] |
14:35:57 | | mossssss joins |
14:55:02 | | kiryu quits [Remote host closed the connection] |
14:56:31 | | kiryu joins |
14:56:31 | | kiryu is now authenticated as kiryu |
14:56:31 | | kiryu quits [Changing host] |
14:56:31 | | kiryu (kiryu) joins |
14:59:46 | | mossssss quits [Remote host closed the connection] |
15:08:04 | | etnguyen03 (etnguyen03) joins |
15:08:08 | | RJHacker9147 joins |
15:09:01 | | RJHacker9147 is now known as redlattice |
15:12:23 | | toss (toss) joins |
15:13:54 | | toss quits [Client Quit] |
15:21:46 | | dumbgoy joins |
15:32:31 | | dumbgoy_ joins |
15:35:40 | | dumbgoy quits [Ping timeout: 265 seconds] |
16:09:34 | | redlattice quits [Client Quit] |
16:25:51 | | sec^nd quits [Ping timeout: 245 seconds] |
16:26:43 | | sec^nd (second) joins |
16:43:08 | | mossssss joins |
16:45:41 | | Megame (Megame) joins |
16:48:36 | | mossssss quits [Remote host closed the connection] |
16:48:40 | | dumbgoy__ joins |
16:48:46 | | BearFortress_ joins |
16:49:09 | | mossssss joins |
16:50:35 | | katocala quits [Ping timeout: 240 seconds] |
16:50:51 | | katocala joins |
16:51:33 | | dumbgoy_ quits [Ping timeout: 265 seconds] |
16:54:12 | | BearFortress quits [Ping timeout: 265 seconds] |
17:01:14 | | BearFortress joins |
17:04:32 | | marto_ quits [Quit: zzzzz] |
17:04:36 | | BearFortress_ quits [Ping timeout: 265 seconds] |
17:05:22 | | marto_ (marto_) joins |
17:11:21 | | bilboed quits [Ping timeout: 272 seconds] |
17:12:49 | | etnguyen03 quits [Ping timeout: 265 seconds] |
17:13:47 | | bilboed joins |
17:19:17 | | katocala is now authenticated as katocala |
17:21:22 | | Pedrosso joins |
17:25:37 | <Pedrosso> | https://wiki.archiveteam.org/index.php/Frequently_Asked_Questions#:~:text=I%20saved/archived,for%20hosting%20archives! as per this, if I use the given tools to create archives of svtplay.se would the process then be to have someone here review the files' integrity? Is there any nice naming scheme the IA items should have (and any other preferred |
17:25:37 | <Pedrosso> | fields & metadata for IA)? |
17:45:45 | | treora quits [Quit: blub blub.] |
17:46:59 | | treora joins |
17:51:12 | <fireonlive> | mossssss: https://hackint.logs.kiska.pw/archiveteam-bs just in case you get disconnected :3 |
17:51:35 | <fireonlive> | mossssss: also if you leave webirc in a background tab, browsers suspend the tab which drops the connection |
17:52:06 | <mossssss> | oohhh that would make sense. ill keep it in another window to leave it up. also thank you!!!!! |
17:52:13 | <fireonlive> | welcome =] |
17:52:27 | <fireonlive> | you can also use a desktop IRC client if you wish such as hexchat |
17:52:56 | <fireonlive> | or quassel, there's a few out there |
17:53:23 | <mossssss> | ill have to look into that! my partner is a lot more well versed in this stuff haha so ill ask them |
17:53:52 | <mossssss> | (im the archiving nerd, they are the computer stuff (inc. irc) nerd) |
17:55:40 | <Pedrosso> | fireonlive: I keep forgetting how to get to those logs lol. Still do |
17:55:45 | <fireonlive> | :) |
17:55:53 | | fireonlive hands Pedrosso a bookmark |
17:56:17 | <Pedrosso> | (How to do actions? "* text"?) |
17:56:27 | <katia> | /me pets a cat |
17:56:59 | | Wohlstand (Wohlstand) joins |
17:57:35 | | Pedrosso gladly receives said bookmark |
17:57:38 | <fireonlive> | :3 |
17:57:49 | | Pedrosso tests /me |
18:01:26 | <fireonlive> | katia: taking a look under the hood eh |
18:01:43 | <fireonlive> | :p |
18:02:15 | <katia> | 👀 |
18:28:53 | <Ryz> | arkiver, any updates on Blogger/Google stuff? |
18:40:33 | | apache2 joins |
18:40:47 | | apache2 quits [Client Quit] |
18:41:10 | | apache2 joins |
18:47:02 | | DogsRNice joins |
18:53:48 | | etnguyen03 (etnguyen03) joins |
18:53:57 | <h2ibot> | JustAnotherArchivist edited List of websites excluded from the Wayback Machine/Partial exclusions (+907, Add Airbnb): https://wiki.archiveteam.org/?diff=51136&oldid=51122 |
19:28:09 | | Wohlstand quits [Remote host closed the connection] |
19:28:17 | | Wohlstand (Wohlstand) joins |
19:49:54 | | etnguyen03 quits [Ping timeout: 265 seconds] |
19:50:14 | | etnguyen03 (etnguyen03) joins |
19:54:43 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
19:59:29 | | benjinsm joins |
20:02:59 | | benjins quits [Ping timeout: 272 seconds] |
20:04:04 | | benjinsmi joins |
20:04:44 | | Lord_Nightmare (Lord_Nightmare) joins |
20:05:35 | | benjinsmi is now known as benjins |
20:05:37 | | benjins is now authenticated as benjins |
20:08:03 | | benjinsm quits [Ping timeout: 272 seconds] |
20:14:41 | | BlueMaxima joins |
20:15:39 | | benjins quits [Remote host closed the connection] |
20:15:52 | | benjins joins |
20:18:45 | | redlattice joins |
20:20:34 | | benjinsm joins |
20:21:59 | | benjins quits [Ping timeout: 272 seconds] |
20:23:54 | | redlattice quits [Client Quit] |
20:26:16 | <h2ibot> | Exorcism uploaded File:Fextralife-screenshot.png: https://wiki.archiveteam.org/?title=File%3AFextralife-screenshot.png |
20:27:17 | <h2ibot> | Exorcism edited Fextralife (+35): https://wiki.archiveteam.org/?diff=51138&oldid=51134 |
20:35:54 | | benjinsmi joins |
20:39:05 | | benjinsm quits [Ping timeout: 272 seconds] |
20:40:43 | | benjinsmi is now known as benjins |
20:40:43 | | benjins is now authenticated as benjins |
21:18:40 | <vokunal|m> | erai-raws.info missed a payment on their ER-drive service, and lost their subscription. |
21:19:04 | <vokunal|m> | I have no idea what an ER-Drive is |
21:20:41 | <vokunal|m> | They've had issues with their paypal being banned before. I'm not sure if this is related |
21:45:35 | | rohvani quits [Ping timeout: 272 seconds] |
21:46:35 | <thuban> | Pedrosso: your question is a little unclear to me. by "us[ing] the given tools to create archives of svtplay.se", do you mean using -dl scripts to get the video files, or using warc tools to create warcs? |
21:46:44 | <thuban> | (the latter would be difficult, because most warc tools won't work well with such a js-heavy site without substantial custom scripting. (also, even a perfect capture might or might not play back correctly in the wayback machine)) |
21:46:56 | <thuban> | in either case, no, there isn't a process to "review the files' integrity"; there's no technical mechanism to do that (tls isn't designed that way), so the internet archive basically operates on trust. archiveteam, aiui, no longer adopts third-party data into the archiveteam collection--that faq entry is outdated and should be changed. |
21:47:02 | <thuban> | what JAA said earlier is right; if you want to do this at scale you should consider talking to ia about it first |
21:47:10 | <thuban> | that said, for general information about metadata consult https://archive.org/developers/metadata-schema/index.html and/or https://web.archive.org/web/20221001171424/https://archive.org/services/docs/api/metadata-schema/index.html (latter has file-level metadata documentation; i have no idea why it was removed) |
21:49:34 | <Pedrosso> | thuban: The answer to your first question is archiveteam's "grab-site" tool. As the -dl scripts would require some scripting to get working within a web format I'd imagine |
21:50:51 | <thuban> | yeah, i would be _very_ surprised if that worked |
21:52:20 | <Pedrosso> | I didn't mean a technical mechanism specifically, just any mechanism technical or otherwise. Sad to know there are none adopted anymore but I suppose it may be better to go straight to the top ~~still shy about that tho~~. It's a little annoying that the wiki is out of date with such things, but still nice to have the info. Thanks about the |
21:52:20 | <Pedrosso> | metadata-related links |
21:53:40 | <thuban> | sorry about that! i'll update the page if an op confirms the current policy. |
22:04:56 | | Wohlstand quits [Remote host closed the connection] |
22:05:14 | | Wohlstand (Wohlstand) joins |
22:16:19 | | benjinsm joins |
22:16:43 | | Perk5 joins |
22:17:23 | | aninternettroll_ (aninternettroll) joins |
22:17:27 | | shreyasminocha quits [Ping timeout: 250 seconds] |
22:17:27 | | thehedgeh0g quits [Ping timeout: 250 seconds] |
22:17:27 | | evan quits [Ping timeout: 250 seconds] |
22:17:27 | | aninternettroll quits [Ping timeout: 250 seconds] |
22:17:27 | | TheTechRobo quits [Client Quit] |
22:17:27 | | Wohlstand quits [Remote host closed the connection] |
22:17:27 | | benjins quits [Remote host closed the connection] |
22:17:27 | | bilboed quits [Client Quit] |
22:17:27 | | Perk quits [Client Quit] |
22:17:27 | | Perk5 is now known as Perk |
22:17:30 | | aninternettroll_ is now known as aninternettroll |
22:17:33 | | Pedrosso quits [Remote host closed the connection] |
22:17:33 | | mossssss quits [Remote host closed the connection] |
22:17:34 | | bilboed joins |
22:17:37 | | Wohlstand (Wohlstand) joins |
22:18:00 | | TheTechRobo (TheTechRobo) joins |
22:18:31 | | evan joins |
22:18:33 | | sepro quits [Quit: Bye!] |
22:18:41 | | shreyasminocha (shreyasminocha) joins |
22:19:16 | | thehedgeh0g (mrHedgehog0) joins |
22:19:53 | | benjinsm is now known as benjins |
22:19:54 | | benjins is now authenticated as benjins |
22:20:44 | | mossssss joins |
22:26:00 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
22:26:35 | | sepro (sepro) joins |
22:27:43 | <h2ibot> | JustAnotherArchivist edited List of websites excluded from the Wayback Machine/Partial exclusions (+875, More Airbnb): https://wiki.archiveteam.org/?diff=51139&oldid=51136 |
22:28:16 | | Island joins |
22:41:58 | | etnguyen03 quits [Ping timeout: 265 seconds] |
23:03:15 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
23:04:38 | | useretail__ joins |
23:07:55 | | useretail_ quits [Ping timeout: 272 seconds] |
23:14:02 | | etnguyen03 (etnguyen03) joins |
23:17:10 | <that_lurker> | https://arstechnica.com/science/2023/11/first-planned-small-nuclear-reactor-plant-in-the-us-has-been-canceled/ |
23:17:35 | <that_lurker> | Could maybe be a good idea to grab https://www.nuscalepower.com/en |
23:20:39 | | abirkill- (abirkill) joins |
23:22:55 | <vokunal|m> | vokunal: Source for my above message earlier https://www.erai-raws.info/news/er-drive-and-hevc/ |
23:23:07 | | abirkill quits [Ping timeout: 272 seconds] |
23:23:07 | | abirkill- is now known as abirkill |
23:24:07 | | Megame quits [Client Quit] |
23:28:13 | | HP_Archivist quits [Read error: Connection reset by peer] |
23:28:14 | | ]SaRgE[ quits [Read error: Connection reset by peer] |
23:28:17 | | benjins quits [Read error: Connection reset by peer] |
23:28:17 | | benjins2_ quits [Read error: Connection reset by peer] |
23:28:21 | | Naruyoko5 quits [Read error: Connection reset by peer] |
23:28:35 | | HP_Archivist (HP_Archivist) joins |
23:28:37 | | ]SaRgE[ (sarge) joins |
23:28:40 | | fuzzy8021 quits [Read error: Connection reset by peer] |
23:28:42 | | kiryu quits [Read error: Connection reset by peer] |
23:28:43 | | BlueMaxima_ joins |
23:28:49 | | DogsRNice_ joins |
23:28:58 | | benjins joins |
23:28:59 | | kiryu (kiryu) joins |
23:29:03 | | Naruyoko5 joins |
23:29:03 | | useretail_ joins |
23:29:04 | | superkuh_ joins |
23:29:13 | | atphoenix_ (atphoenix) joins |
23:29:14 | | Carnildo quits [Remote host closed the connection] |
23:29:22 | | Miori quits [Quit: Ping timeout (120 seconds)] |
23:29:24 | | marto_ quits [Client Quit] |
23:29:24 | | Jake quits [Quit: Ping timeout (120 seconds)] |
23:29:30 | | marto_ (marto_) joins |
23:29:32 | | sepro quits [Client Quit] |
23:29:34 | | project10 quits [Quit: Ping timeout (120 seconds)] |
23:29:34 | | lflare quits [Quit: Ping timeout (120 seconds)] |
23:29:34 | | CraftByte3 (DragonSec|CraftByte) joins |
23:29:35 | | Ryz28 (Ryz) joins |
23:29:35 | | graham1 joins |
23:29:39 | | fireonlive quits [Quit: Ping timeout (120 seconds)] |
23:29:39 | | s-crypt21 (s-crypt) joins |
23:29:40 | | nic90 quits [Quit: Ping timeout (120 seconds)] |
23:29:43 | | fuzzy8021 (fuzzy8021) joins |
23:29:43 | | sepro (sepro) joins |
23:29:47 | | TastyWiener959 (TastyWiener95) joins |
23:29:47 | | nulldata2 (nulldata) joins |
23:29:48 | | benjins2_ joins |
23:29:50 | | andrew8 (andrew) joins |
23:29:50 | | CandidSparrow9 joins |
23:29:52 | | kiska7 (kiska) joins |
23:29:55 | | lflare (lflare) joins |
23:30:03 | | Flashfire423 joins |
23:30:04 | | project10 (project10) joins |
23:30:06 | | sloop_ joins |
23:30:09 | | endrift|ZNC quits [Remote host closed the connection] |
23:30:10 | | Carnildo joins |
23:30:10 | | kiska54 joins |
23:30:12 | | CraftByte quits [Read error: Connection reset by peer] |
23:30:12 | | CraftByte3 is now known as CraftByte |
23:30:13 | | jasons quits [Quit: Ping timeout (120 seconds)] |
23:30:14 | | Justin[home] joins |
23:30:14 | | Justin[home] is now authenticated as DopefishJustin |
23:30:16 | | Miori joins |
23:30:16 | | nic9 (nic) joins |
23:30:17 | | endrift joins |
23:30:18 | | Perk9 joins |
23:30:24 | | Jake (Jake) joins |
23:30:27 | | Lord_Nightmare quits [Client Quit] |
23:30:29 | | jasons (jasons) joins |
23:30:30 | | graham quits [Quit: Ping timeout (120 seconds)] |
23:30:30 | | null quits [Remote host closed the connection] |
23:30:30 | | graham1 is now known as graham |
23:30:32 | | abirkill- (abirkill) joins |
23:30:34 | | wyatt8740 quits [Remote host closed the connection] |
23:30:34 | | dxrt_ quits [Quit: ZNC - http://znc.sourceforge.net] |
23:30:36 | | BPCZ quits [Quit: eh???] |
23:30:38 | | leo60228- quits [Quit: ZNC 1.8.2 - https://znc.in] |
23:30:38 | | Ryz4 (Ryz) joins |
23:30:41 | | andrew quits [Read error: Connection reset by peer] |
23:30:41 | | andrew8 is now known as andrew |
23:30:43 | | Lord_Nightmare (Lord_Nightmare) joins |
23:30:45 | | BPCZ (BPCZ) joins |
23:30:45 | | sloop quits [Quit: ZNC 1.8.2 - https://znc.in] |
23:30:53 | | dxrt joins |
23:30:55 | | dxrt is now authenticated as dxrt |
23:30:55 | | dxrt quits [Changing host] |
23:30:55 | | dxrt (dxrt) joins |
23:30:55 | | @ChanServ sets mode: +o dxrt |
23:30:56 | | hogchips_ joins |
23:30:57 | | fireonlive (fireonlive) joins |
23:31:01 | | leo60228 (leo60228) joins |
23:31:03 | | CandidSparrow quits [Read error: Connection reset by peer] |
23:31:03 | | Perk quits [Read error: Connection reset by peer] |
23:31:03 | | CandidSparrow9 is now known as CandidSparrow |
23:31:04 | | Perk9 is now known as Perk |
23:31:05 | | Earendil7 quits [Client Quit] |
23:31:11 | | nulldata quits [Read error: Connection reset by peer] |
23:31:12 | | nulldata2 is now known as nulldata |
23:31:14 | | Ryz quits [Read error: Connection reset by peer] |
23:31:14 | | kiska5 quits [Read error: Connection reset by peer] |
23:31:14 | | Ryz4 is now known as Ryz |
23:31:14 | | kiska54 is now known as kiska5 |
23:31:21 | | DopefishJustin quits [Ping timeout: 272 seconds] |
23:31:21 | | wickedplayer494 quits [Ping timeout: 272 seconds] |
23:31:21 | | danwellby quits [Ping timeout: 272 seconds] |
23:31:22 | | Earendil7 (Earendil7) joins |
23:31:37 | | wickedplayer494 joins |
23:31:39 | | wickedplayer494 is now authenticated as wickedplayer494 |
23:31:44 | | hogchips quits [Read error: Connection reset by peer] |
23:31:49 | | rktk (rktk) joins |
23:31:59 | | useretail__ quits [Ping timeout: 272 seconds] |
23:31:59 | | atphoenix__ quits [Ping timeout: 272 seconds] |
23:31:59 | | TastyWiener95 quits [Ping timeout: 272 seconds] |
23:31:59 | | s-crypt2 quits [Ping timeout: 272 seconds] |
23:31:59 | | Ryz2 quits [Ping timeout: 272 seconds] |
23:31:59 | | superkuh quits [Ping timeout: 272 seconds] |
23:31:59 | | bladem quits [Ping timeout: 272 seconds] |
23:31:59 | | s-crypt21 is now known as s-crypt2 |
23:32:00 | | Ryz28 is now known as Ryz2 |
23:32:00 | | TastyWiener959 is now known as TastyWiener95 |
23:32:37 | | abirkill quits [Ping timeout: 272 seconds] |
23:32:37 | | BlueMaxima quits [Ping timeout: 272 seconds] |
23:32:37 | | DogsRNice quits [Ping timeout: 272 seconds] |
23:32:37 | | Flashfire42 quits [Ping timeout: 272 seconds] |
23:32:37 | | kiska quits [Ping timeout: 272 seconds] |
23:32:37 | | abirkill- is now known as abirkill |
23:32:37 | | Flashfire423 is now known as Flashfire42 |
23:32:38 | | kiska7 is now known as kiska |
23:32:44 | | danwellby joins |
23:32:50 | | wyatt8740 joins |
23:33:18 | | bladem (bladem) joins |
23:34:08 | | benjins is now authenticated as benjins |
23:46:24 | | mossssss quits [Client Quit] |
23:50:41 | | Pedrosso joins |
23:57:20 | | Earendil7 quits [Client Quit] |
23:58:19 | | Earendil7 (Earendil7) joins |