#archiveteam-bs log for 2024-03-21

Home Search Previous day Next day

00:01:27	<Terbium>	I see a bunch of free and paid APIs for M&A feeds
00:10:15	<Terbium>	https://site.financialmodelingprep.com/developer/docs/merger-and-acquisition-api
00:10:51	<fireonlive>	hmm
00:10:59	<fireonlive>	if there’s a good rss feed i could hook it up to rss
00:12:06	<Terbium>	there's this: https://seekingalpha.com/market-news/m-a
00:13:01	<Terbium>	There's an RSS feed
00:15:22	<fireonlive>	this seems to be the url for the feed: https://seekingalpha.com/tag/m-a.xml
00:16:10	<fireonlive>	i’m sitting in a vehicle on my phone so hard to tell for sure haha
00:16:57	<Terbium>	fireonlive: yep that's the one, it has the stock ticker symbols like the other FMP feed
00:17:05	<fireonlive>	ah awesome :)
00:17:09	<Terbium>	which makes finding companies a lot easier
00:17:30	<Terbium>	it also showed failed or cancelled M&As as well
00:18:10	<fireonlive>	i’ll toss it up in #m&a if that suits everyone when i’m back at a more proper computer later; just out and about with a friend who’s visiting for the first time in a while
00:29:39		wickedplayer494 quits [Remote host closed the connection]
00:43:20		icedice (icedice) joins
00:44:05		icedice quits [Client Quit]
00:45:53		icedice (icedice) joins
00:57:18		wickedplayer494 joins
00:57:45		wickedplayer494 is now authenticated as wickedplayer494
01:03:04		nicolas17 joins
01:05:55		Wohlstand (Wohlstand) joins
01:13:04		Naruyoko5 joins
01:16:16		Naruyoko quits [Ping timeout: 255 seconds]
01:16:43		le0n quits [Ping timeout: 255 seconds]
01:18:20	<qwertyasdfuiopghjkl>	https://www.thewrap.com/gannett-drops-ap-associated-press-usa-today/ "Gannett, publisher of USA Today and hundreds of local newspapers, will stop using the Associated Press’ content starting next week, [...] will eliminate AP dispatches, photos and video as of March 25, according to an internal memo"
01:19:06	<qwertyasdfuiopghjkl>	Not sure if this means removal of existing content or just discontinuing new content
01:23:08		le0n (le0n) joins
01:27:17	<qwertyasdfuiopghjkl>	https://apnews.com/article/gannett-associated-press-contract-97405e4715c9a25d21477b992028db2a "Shortly after, AP said it had been informed by McClatchy that it would also drop the service." https://www.nytimes.com/2024/03/19/business/media/gannett-mcclatchy-ap-associated-press.html "McClatchy [...] told its editors this week that it would stop
01:27:18	<qwertyasdfuiopghjkl>	using some A.P. services next month." "[McClatchy] said that The A.P.’s feed would end on March 29 and that no A.P. content could be published after March 31." apparently there's also another one
02:20:56		hackbug quits [Remote host closed the connection]
02:26:04		hackbug (hackbug) joins
02:29:50	<fireonlive>	#m&a is now setup, we should see if it works within the hour :3
02:32:28	<fireonlive>	Terbium++
02:32:29	<eggdrop>	[karma] 'Terbium' now has 2 karma!
02:34:40		hackbug quits [Remote host closed the connection]
02:37:39		hackbug (hackbug) joins
03:02:34		lennier2 joins
03:03:59		threedeeitguy39 quits [Ping timeout: 272 seconds]
03:04:37		lennier2_ quits [Ping timeout: 272 seconds]
03:11:53		threedeeitguy39 (threedeeitguy) joins
03:16:38		Perk quits [Quit: Ping timeout (120 seconds)]
03:19:06		Perk joins
03:55:34		PredatorIWD joins
04:03:56		Island quits [Read error: Connection reset by peer]
05:11:48		BlueMaxima quits [Read error: Connection reset by peer]
05:47:51		ell quits [Client Quit]
06:08:11		Arcorann (Arcorann) joins
06:23:59		Dango360_ joins
06:25:16		_Dango360 joins
06:27:55		Dango360 quits [Ping timeout: 272 seconds]
06:29:01		Dango360_ quits [Ping timeout: 255 seconds]
06:57:49		G4te_Keep3r34924 quits [Ping timeout: 255 seconds]
06:58:57		G4te_Keep3r34924 joins
07:51:50		groentela joins
07:53:27		groentela quits [Client Quit]
09:00:02		Bleo182600 quits [Client Quit]
09:01:22		Bleo182600 joins
09:18:31		Wohlstand quits [Client Quit]
09:30:23		newbie007 joins
09:31:36	<newbie007>	is it possible to upload locally archived websites to internet archive such that they are searchable using wayback machine?
09:32:30	<pabs>	that isn't possible
09:36:52		newbie007 quits [Ping timeout: 265 seconds]
09:48:49		rohvani quits [Ping timeout: 255 seconds]
09:53:43	<@arkiver>	RIP original redis
10:13:16		newbie007 joins
10:18:27		monika quits [Quit: Zzz]
10:41:48		newbie007 quits [Client Quit]
11:55:53		^ quits [Remote host closed the connection]
11:56:46		^ (^) joins
12:12:14		nicolas17 quits [Read error: Connection reset by peer]
12:12:44		nicolas17 joins
12:16:10		monika (boom) joins
12:21:11		linuxgemini (linuxgemini) joins
12:32:05		Arcorann quits [Ping timeout: 272 seconds]
12:37:56		Darken quits [Remote host closed the connection]
12:38:20		Darken (Darken) joins
12:38:52		^ quits [Remote host closed the connection]
12:39:11		^ (^) joins
12:42:22		PredatorIWD quits [Read error: Connection reset by peer]
12:53:43		PredatorIWD joins
13:27:22		^ quits [Remote host closed the connection]
13:27:46		^ (^) joins
13:31:51		Guest54 joins
14:16:20		Ruthalas59 quits [Quit: Ping timeout (120 seconds)]
14:16:44		Ruthalas59 (Ruthalas) joins
14:24:43		katia quits [Remote host closed the connection]
14:25:41		katia (katia) joins
14:26:42		katia quits [Remote host closed the connection]
14:27:21		katia (katia) joins
14:27:55		katia quits [Remote host closed the connection]
14:28:35		knecht4 quits [Client Quit]
14:28:57		Derpest joins
14:28:59		katia (katia) joins
14:29:36		Derpest quits [Client Quit]
14:30:07		katia quits [Remote host closed the connection]
14:30:36		katia (katia) joins
14:31:21		katia quits [Remote host closed the connection]
14:32:22		katia (katia) joins
14:33:00		katia quits [Remote host closed the connection]
14:33:56		katia (katia) joins
14:34:29		katia quits [Remote host closed the connection]
14:35:40		katia (katia) joins
14:36:14		katia quits [Remote host closed the connection]
14:36:31		ikkoup joins
14:36:52	<ikkoup>	Hi,
14:36:53	<ikkoup>	Would you be interested in archiving the biggest (and only) Arabic archive of literary magazines? Its owner died last week and it's at risk of dying at anytime.
14:36:53	<ikkoup>	https://archive.alsharekh.org
14:37:20		katia (katia) joins
14:37:53		katia quits [Remote host closed the connection]
14:38:58		katia (katia) joins
14:39:31		katia quits [Remote host closed the connection]
14:39:38	<ikkoup>	the site also has a sitemap (https://archive.alsharekh.org/sitemap.xml) which would help ramp things up!
14:39:42	<pokechu22>	Hmm, the stats are 2 million pages, 326,446 articles, 52,234 writers, 273 magazines, 15,857 issues. It looks like images are directly embedded (view-source:https://archive.alsharekh.org/Articles/293/20679/470610 has <img _ngcontent-sc1 class="slide_image" src="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"
14:39:44	<pokechu22>	data-normal="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg" data-full="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"> + <base href="/">) and archivebot extracts those correctly, and the server doesn't mind the backslashes not being replaced by the browser with forward slashes
14:40:24		knecht4 joins
14:40:32		katia (katia) joins
14:41:06		katia quits [Remote host closed the connection]
14:41:10	<ikkoup>	Yes, it uses "flipbuilder.com" (PDF Page Flipper) to make the reading pages.
14:41:10	<ikkoup>	Don't know if you encountered that before. sorry for my weak language.
14:42:08		katia (katia) joins
14:42:42		katia quits [Remote host closed the connection]
14:43:48		katia (katia) joins
14:43:49	<pokechu22>	I think archivebot will work here - 2 million URLs is a bit large, but we've done bigger. Do you know if it's at risk of shutting down in a few weeks, or if it'll probably be up for months?
14:44:29		katia quits [Remote host closed the connection]
14:45:24		katia (katia) joins
14:45:38		katia quits [Remote host closed the connection]
14:46:16	<pokechu22>	hmm, https://archive.alsharekh.org/contents/293/20679 requires a bunch of API requests to e.g. https://archiveapi.alsharekh.org/Search/IssueIndex?IID=20679 actually; archivebot probably won't follow those
14:47:13	<ikkoup>	Hmm, not sure.
14:47:14	<ikkoup>	The owner was the pioneer or Arabic language in the early days of computers and he (and his company at the time) added Arabic support for almost every OS/software at the time.
14:47:14	<ikkoup>	The company isn't very active these days and he stepped down from it. I guess it'd be up for a few months considering his finances and tech background?
14:47:36	<pokechu22>	... though https://archive.alsharekh.org/sitemap10.xml links to articles, so it would find all of the articles, but the table of contents would not work unless we did that separately (which would not be too hard)
14:48:23	<ikkoup>	Not sure if its possible, but can you ignore the API requests?
14:48:24	<ikkoup>	It's for info about individual articles which is not as important as the whole issue/chapter/magazine (https://archive.alsharekh.org/MagazinePages/MagazineBook/~xxx)
14:49:26	<ikkoup>	The important stuff is at the above url structure, the API acts like an index for the issue (article 1 is at page 3, article 2 is at page 6 etc)
14:52:43	<pokechu22>	Hmm, http://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/index.html doesn't have any URLs archivebot would find in it... that flipbook won't work well with it
14:53:45	<pokechu22>	it looks like https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/mobile/javascript/config.js has bookConfig.totalPageCount=337 and bookConfig.CreatedTime ="201204132846"
14:54:47	<ikkoup>	If you check dev inspection (ctrl shift i) then you can see that the flipbook is just a bunch of images and js.
14:54:48	<ikkoup>	I guess it's not possible after all eh?
14:55:36	<pokechu22>	It would be possible, but it would require additional work to make the flipbooks function
14:56:23	<pokechu22>	https://archive.alsharekh.org/Articles/293/20679/470610 links the images directly though so that would work. Do all magazines have both flipbooks and those /Articles/ pages?
14:59:53	<pokechu22>	https://archive.alsharekh.org/Articles/293/20679/470610 has a blue "تصفح العدد" button that opens https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/index.html so it seems like flipbooks do exist for everything... but I can't see where that link comes from
15:01:35		katia (katia) joins
15:01:50	<pokechu22>	... and the flipbook uses https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/files/mobile/1.jpg?201204132846 while the /Articles/ page uses https://archive.alsharekh.org/MagazinePages/Magazine_JPG/Al_Shariqa/Al_Shariqa_2017/Issue_3/001.jpg (better quality).
15:01:54	<ikkoup>	The whole thing is basically a giant flip book :(
15:01:54	<ikkoup>	And not very sure about articles page, but it exists for most of it (unindexed issues have no articles, only flipbook)
15:02:26		katia quits [Remote host closed the connection]
15:03:20		katia (katia) joins
15:03:35	<pokechu22>	I'll start it in archivebot just to get something, and hopefully a solution for the flipbooks can be found afterwards
15:03:45	<pokechu22>	Thanks for letting us know about the site, we probably wouldn't have found it otherwise :)
15:04:19		katia quits [Remote host closed the connection]
15:04:24		grid joins
15:05:18	<pokechu22>	I assume the rest of alsharekh.org should also be saved?
15:06:25	<@arkiver>	thank you ikkoup!
15:07:33	<@arkiver>	yeah it might be interesting to save everything on that site
15:07:44	<@arkiver>	at least into WARCs, perhaps separate items on IA as well
15:10:01	<ikkoup>	Not really, alsharekh.org is landing page for other services run by the same guy.
15:10:01	<ikkoup>	a Lexicon, Dictionary (acquired by Saudi government), Tashkeel (vowel movement corrector) and a spell checker. I guess they can't be saved.
15:11:57	<ikkoup>	I also tried to setup grab-site (https://github.com/ArchiveTeam/grab-site) on a vps to help crawling the archive, but had some troubles with python 3.8 not being supported.
15:12:50		TheTechRobo quits [Read error: Connection reset by peer]
15:12:50		Pedrosso quits [Read error: Connection reset by peer]
15:12:50		ScenarioPlanet quits [Read error: Connection reset by peer]
15:13:19		Pedrosso joins
15:13:24		ScenarioPlanet (ScenarioPlanet) joins
15:13:36		TheTechRobo (TheTechRobo) joins
15:19:57		orchidcnr (orchidcnr) joins
15:19:57		orchidcnr quits [Remote host closed the connection]
15:22:53	<Terbium>	ikkoup: I would recommend using a container or Python version manager for grab-site in that case to drop back down to Python 3.7
15:28:39	<pokechu22>	That said, archivebot isn't a distributed project - running grab-site locally would mean you grab the entire site yourself, and additional archivebot grabs the entire site by itself. It won't make things run faster.
15:32:12	<ikkoup>	Ah, I thought it was something like the archivewarrior.
15:32:12	<ikkoup>	I wanted to run grab-site since it has some advanced crawling/scraping capabilities for forums like vBulletin and SMF which are not found in other crawling/scarping tools I looked up.
15:50:24		MtN joins
15:50:41		MtN quits [Client Quit]
16:03:06		ikkoup quits [Client Quit]
16:09:40	<@arkiver>	i realise i don't know much about storj
16:10:00	<@arkiver>	is it just private storage only for files to be made available from elsewhere, page requisites and such?
16:15:22		GNU_world quits [Ping timeout: 255 seconds]
16:16:41	<kiska>	I think you can use storj as S3
16:21:24		SootBector quits [Ping timeout: 255 seconds]
16:22:04		SootBector (SootBector) joins
16:22:42		GNU_world joins
16:26:57	<kiska>	Which I guess means you could have some site assets on storj being served
16:27:00	<kiska>	Or something like that
16:28:08	<@arkiver>	right
16:30:07	<kpcyrd>	is there a channel for archiving #web3?
16:32:16		Guest54 quits [Client Quit]
16:33:52	<@arkiver>	archiving web3?
16:34:01		Wohlstand (Wohlstand) joins
16:34:12	<@arkiver>	so like... archiving blockchains?
16:35:01	<FireFly>	I thought part of the point was that it's kind of implicitly so already due to its distributed nature
16:37:40	<@arkiver>	that's not archiving
16:40:26	<FireFly>	..fair
16:48:26		katia (katia) joins
16:49:06		katia quits [Remote host closed the connection]
16:50:08		katia (katia) joins
16:51:34		katia quits [Remote host closed the connection]
16:51:48		katia (katia) joins
16:53:52		katia quits [Remote host closed the connection]
17:04:05		katia (katia) joins
17:09:21		Wohlstand quits [Remote host closed the connection]
17:09:29		GNU_world quits [Ping timeout: 272 seconds]
17:29:45		linuxgemini quits [Ping timeout: 272 seconds]
17:30:21	<kpcyrd>	the question was tongue in cheek, I probably should've made that more obvious :)
17:31:52		GNU_world joins
17:43:32		G4te_Keep3r34924 quits [Client Quit]
17:44:01		G4te_Keep3r34924 joins
17:50:52	<h2ibot>	Censuro edited Talk:URLTeam (+983, /* Shouldn't archive.today be considered a URL…): https://wiki.archiveteam.org/?diff=51913&oldid=26103
17:50:53	<h2ibot>	Popthebop edited Talk:Deathwatch (+423, /* the Tom Lehrer website containing original…): https://wiki.archiveteam.org/?diff=51914&oldid=51350
17:50:54	<h2ibot>	Popthebop edited Talk:Tumblr (+1278, /* Current state of tumblr \| IMPORTANT */ new…): https://wiki.archiveteam.org/?diff=51915&oldid=45705
17:50:55	<h2ibot>	Sepro edited List of websites excluded from the Wayback Machine (+24, Add loom.com): https://wiki.archiveteam.org/?diff=51916&oldid=51896
17:50:56	<h2ibot>	Flama12333 edited Deathwatch (+167, added realtek ftp sadly): https://wiki.archiveteam.org/?diff=51917&oldid=51901
17:58:06		katia quits [Remote host closed the connection]
17:58:43		katia (katia) joins
17:59:16		katia quits [Remote host closed the connection]
18:00:53	<h2ibot>	JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51918&oldid=51916
18:04:00		grid quits [Client Quit]
18:17:06		grid joins
18:26:50		linuxgemini (linuxgemini) joins
18:31:28		Island joins
18:33:12		Island quits [Read error: Connection reset by peer]
18:36:07		Island joins
19:13:07	<h2ibot>	JacksonChen666 edited Deathwatch (+3, fix citation errors): https://wiki.archiveteam.org/?diff=51919&oldid=51917
19:20:57		jacksonchen666 (jacksonchen666) joins
19:38:41		systwi_ quits [Quit: systwi_]
19:38:42		nothere_ quits [Quit: Leaving]
19:53:33		Darken2 (Darken) joins
19:55:43	<michaelblob>	how are people doing log agg? looking into grafana loki but getting piss poor performance generating graphs
19:56:03	<michaelblob>	also eyeing influxdb but now sure how/where that fits in
19:56:40	<Barto>	work use an ELK stack
19:57:13		Darken quits [Ping timeout: 255 seconds]
20:03:00		nothere joins
20:17:13		wyatt8750 joins
20:18:13		wyatt8740 quits [Ping timeout: 272 seconds]
20:22:52		wyatt8750 quits [Ping timeout: 255 seconds]
20:22:58	<nstrom\|m>	Just using dozzle on individual servers, no agg
20:23:08		wyatt8740 joins
21:02:26		qwertyasdfuiopghjkl quits [Client Quit]
21:09:01		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
21:14:54		Darken2 quits [Client Quit]
21:15:10		Darken (Darken) joins
21:22:44	<pabs>	arkiver, kpcyrd: I wonder if Web3 is as distributed as advertised? relatedly NFTs certainly aren't, lots of them apparently just load stuff off HTTP
21:24:00		grid quits [Client Quit]
21:28:25	<nicolas17>	lmk when there's anything of value worth archiving, too
21:29:55		qwertyasdfuiopghjkl quits [Client Quit]
21:33:18	<AK>	I did ELK, but then it was approaching hundreds of GB of logs per day, now I just use dozzle everywhere 🤷‍♂️ At work we use Azure stuff and grafana if we need graphs
21:34:04	<AK>	dozzle does everything I need for almost all my personal stuff: https://logs.hel1.aktheknight.co.uk/
21:34:04		qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
21:41:16		Darken quits [Read error: Connection reset by peer]
21:41:41		Darken (Darken) joins
22:14:46		Darken quits [Remote host closed the connection]
22:21:37		nulldata quits [Client Quit]
22:25:03		nulldata (nulldata) joins
22:31:20		Darken (Darken) joins
23:05:46		neggles quits [Remote host closed the connection]
23:05:58		neggles (neggles) joins
23:06:14		NotGLaDOS quits [Remote host closed the connection]
23:07:20		NotGLaDOS joins
23:13:17		jacksonchen666 quits [Client Quit]
23:15:20		Darken quits [Remote host closed the connection]
23:15:45		Darken (Darken) joins
23:25:08		wickedplayer494 quits [Remote host closed the connection]
23:42:17		icedice quits [Client Quit]
23:49:48		icedice (icedice) joins
23:52:41		wickedplayer494 joins
23:52:50		wickedplayer494 is now authenticated as wickedplayer494
23:54:11	<icedice>	JAA if you haven't gotten The PokéCommunity completely archived by now, you might want to put it high up on the priority list. A Pokémon fan game website was just shut down by DMCA: https://twitter.com/RelicCastleCom/status/1770901435867361351
23:54:57	<icedice>	The PokéCommunity has probably the largest Pokémon fan game communities out there and they had four games C&D'd a while ago, so the ninja lawyers are well aware that they exist
23:57:10	<Terbium>	why they gotta do my PokeCommunity like that....
23:57:36	<pokechu22>	I think we last did it 10 months ago: https://archive.fart.website/archivebot/viewer/job/202305131413054huog
23:58:30	<nulldata>	Terbium - because Nintendo loathes its fans.
23:59:27	<Terbium>	Also, they really should have hosted the site in a DMCA ignored location. After so many DMCA's over the decades, it seems like this lesson is never learned

Home Search Previous day Next day