00:01:27<Terbium>I see a bunch of free and paid APIs for M&A feeds
00:10:15<Terbium>https://site.financialmodelingprep.com/developer/docs/merger-and-acquisition-api
00:10:51<fireonlive>hmm
00:10:59<fireonlive>if there’s a good rss feed i could hook it up to rss
00:12:06<Terbium>there's this: https://seekingalpha.com/market-news/m-a
00:13:01<Terbium>There's an RSS feed
00:15:22<fireonlive>this seems to be the url for the feed: https://seekingalpha.com/tag/m-a.xml
00:16:10<fireonlive>i’m sitting in a vehicle on my phone so hard to tell for sure haha
00:16:57<Terbium>fireonlive: yep that's the one, it has the stock ticker symbols like the other FMP feed
00:17:05<fireonlive>ah awesome :)
00:17:09<Terbium>which makes finding companies a lot easier
00:17:30<Terbium>it also showed failed or cancelled M&As as well
00:18:10<fireonlive>i’ll toss it up in #m&a if that suits everyone when i’m back at a more proper computer later; just out and about with a friend who’s visiting for the first time in a while
00:29:39wickedplayer494 quits [Remote host closed the connection]
00:43:20icedice (icedice) joins
00:44:05icedice quits [Client Quit]
00:45:53icedice (icedice) joins
00:57:18wickedplayer494 joins
01:03:04nicolas17 joins
01:05:55Wohlstand (Wohlstand) joins
01:13:04Naruyoko5 joins
01:16:16Naruyoko quits [Ping timeout: 255 seconds]
01:16:43le0n quits [Ping timeout: 255 seconds]
01:18:20<qwertyasdfuiopghjkl>https://www.thewrap.com/gannett-drops-ap-associated-press-usa-today/ "Gannett, publisher of USA Today and hundreds of local newspapers, will stop using the Associated Press’ content starting next week, [...] will eliminate AP dispatches, photos and video as of March 25, according to an internal memo"
01:19:06<qwertyasdfuiopghjkl>Not sure if this means removal of existing content or just discontinuing new content
01:23:08le0n (le0n) joins
01:27:17<qwertyasdfuiopghjkl>https://apnews.com/article/gannett-associated-press-contract-97405e4715c9a25d21477b992028db2a "Shortly after, AP said it had been informed by McClatchy that it would also drop the service." https://www.nytimes.com/2024/03/19/business/media/gannett-mcclatchy-ap-associated-press.html "McClatchy [...] told its editors this week that it would stop
01:27:18<qwertyasdfuiopghjkl>using some A.P. services next month." "[McClatchy] said that The A.P.’s feed would end on March 29 and that no A.P. content could be published after March 31." apparently there's also another one
02:20:56hackbug quits [Remote host closed the connection]
02:26:04hackbug (hackbug) joins
02:29:50<fireonlive>#m&a is now setup, we should see if it works within the hour :3
02:32:28<fireonlive>Terbium++
02:32:29<eggdrop>[karma] 'Terbium' now has 2 karma!
02:34:40hackbug quits [Remote host closed the connection]
02:37:39hackbug (hackbug) joins
03:02:34lennier2 joins
03:03:59threedeeitguy39 quits [Ping timeout: 272 seconds]
03:04:37lennier2_ quits [Ping timeout: 272 seconds]
03:11:53threedeeitguy39 (threedeeitguy) joins
03:16:38Perk quits [Quit: Ping timeout (120 seconds)]
03:19:06Perk joins
03:55:34PredatorIWD joins
04:03:56Island quits [Read error: Connection reset by peer]
05:11:48BlueMaxima quits [Read error: Connection reset by peer]
05:47:51ell quits [Client Quit]
06:08:11Arcorann (Arcorann) joins
06:23:59Dango360_ joins
06:25:16_Dango360 joins
06:27:55Dango360 quits [Ping timeout: 272 seconds]
06:29:01Dango360_ quits [Ping timeout: 255 seconds]
06:57:49G4te_Keep3r34924 quits [Ping timeout: 255 seconds]
06:58:57G4te_Keep3r34924 joins
07:51:50groentela joins
07:53:27groentela quits [Client Quit]
09:00:02Bleo182600 quits [Client Quit]
09:01:22Bleo182600 joins
09:18:31Wohlstand quits [Client Quit]
09:30:23newbie007 joins
09:31:36<newbie007>is it possible to upload locally archived websites to internet archive such that they are searchable using wayback machine?
09:32:30<pabs>that isn't possible
09:36:52newbie007 quits [Ping timeout: 265 seconds]
09:48:49rohvani quits [Ping timeout: 255 seconds]
09:53:43<@arkiver>RIP original redis
10:13:16newbie007 joins
10:18:27monika quits [Quit: Zzz]
10:41:48newbie007 quits [Client Quit]
11:55:53^ quits [Remote host closed the connection]
11:56:46^ (^) joins
12:12:14nicolas17 quits [Read error: Connection reset by peer]
12:12:44nicolas17 joins
12:16:10monika (boom) joins
12:21:11linuxgemini (linuxgemini) joins
12:32:05Arcorann quits [Ping timeout: 272 seconds]
12:37:56Darken quits [Remote host closed the connection]
12:38:20Darken (Darken) joins
12:38:52^ quits [Remote host closed the connection]
12:39:11^ (^) joins
12:42:22PredatorIWD quits [Read error: Connection reset by peer]
12:53:43PredatorIWD joins
13:27:22^ quits [Remote host closed the connection]
13:27:46^ (^) joins
13:31:51Guest54 joins
14:16:20Ruthalas59 quits [Quit: Ping timeout (120 seconds)]
14:16:44Ruthalas59 (Ruthalas) joins
14:24:43katia quits [Remote host closed the connection]
14:25:41katia (katia) joins
14:26:42katia quits [Remote host closed the connection]
14:27:21katia (katia) joins
14:27:55katia quits [Remote host closed the connection]
14:28:35knecht4 quits [Client Quit]
14:28:57Derpest joins
14:28:59katia (katia) joins
14:29:36Derpest quits [Client Quit]
14:30:07katia quits [Remote host closed the connection]
14:30:36katia (katia) joins
14:31:21katia quits [Remote host closed the connection]
14:32:22katia (katia) joins
14:33:00katia quits [Remote host closed the connection]
14:33:56katia (katia) joins
14:34:29katia quits [Remote host closed the connection]
14:35:40katia (katia) joins
14:36:14katia quits [Remote host closed the connection]
14:36:31ikkoup joins
14:36:52<ikkoup>Hi,
14:36:53<ikkoup>Would you be interested in archiving the biggest (and only) Arabic archive of literary magazines? Its owner died last week and it's at risk of dying at anytime.
14:36:53<ikkoup>https://archive.alsharekh.org
14:37:20katia (katia) joins
14:37:53katia quits [Remote host closed the connection]
14:38:58katia (katia) joins
14:39:31katia quits [Remote host closed the connection]
14:39:38<ikkoup>the site also has a sitemap (https://archive.alsharekh.org/sitemap.xml) which would help ramp things up!
14:39:42<pokechu22>Hmm, the stats are 2 million pages, 326,446 articles, 52,234 writers, 273 magazines, 15,857 issues. It looks like images are directly embedded (view-source:https://archive.alsharekh.org/Articles/293/20679/470610 has <img _ngcontent-sc1 class="slide_image" src="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"
14:39:44<pokechu22>data-normal="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg" data-full="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"> + <base href="/">) and archivebot extracts those correctly, and the server doesn't mind the backslashes not being replaced by the browser with forward slashes
14:40:24knecht4 joins
14:40:32katia (katia) joins
14:41:06katia quits [Remote host closed the connection]
14:41:10<ikkoup>Yes, it uses "flipbuilder.com" (PDF Page Flipper) to make the reading pages.
14:41:10<ikkoup>Don't know if you encountered that before. sorry for my weak language.
14:42:08katia (katia) joins
14:42:42katia quits [Remote host closed the connection]
14:43:48katia (katia) joins
14:43:49<pokechu22>I think archivebot will work here - 2 million URLs is a bit large, but we've done bigger. Do you know if it's at risk of shutting down in a few weeks, or if it'll probably be up for months?
14:44:29katia quits [Remote host closed the connection]
14:45:24katia (katia) joins
14:45:38katia quits [Remote host closed the connection]
14:46:16<pokechu22>hmm, https://archive.alsharekh.org/contents/293/20679 requires a bunch of API requests to e.g. https://archiveapi.alsharekh.org/Search/IssueIndex?IID=20679 actually; archivebot probably won't follow those
14:47:13<ikkoup>Hmm, not sure.
14:47:14<ikkoup>The owner was the pioneer or Arabic language in the early days of computers and he (and his company at the time) added Arabic support for almost every OS/software at the time.
14:47:14<ikkoup>The company isn't very active these days and he stepped down from it. I guess it'd be up for a few months considering his finances and tech background?
14:47:36<pokechu22>... though https://archive.alsharekh.org/sitemap10.xml links to articles, so it *would* find all of the articles, but the table of contents would not work unless we did that separately (which would not be *too* hard)
14:48:23<ikkoup>Not sure if its possible, but can you ignore the API requests?
14:48:24<ikkoup>It's for info about individual articles which is not as important as the whole issue/chapter/magazine (https://archive.alsharekh.org/MagazinePages/MagazineBook/~xxx)
14:49:26<ikkoup>The important stuff is at the above url structure, the API acts like an index for the issue (article 1 is at page 3, article 2 is at page 6 etc)
14:52:43<pokechu22>Hmm, http://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/index.html doesn't have any URLs archivebot would find in it... that flipbook won't work well with it
14:53:45<pokechu22>it looks like https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/mobile/javascript/config.js has bookConfig.totalPageCount=337 and bookConfig.CreatedTime ="201204132846"
14:54:47<ikkoup>If you check dev inspection (ctrl shift i) then you can see that the flipbook is just a bunch of images and js.
14:54:48<ikkoup>I guess it's not possible after all eh?
14:55:36<pokechu22>It would be possible, but it would require additional work to make the flipbooks function
14:56:23<pokechu22>https://archive.alsharekh.org/Articles/293/20679/470610 links the images directly though so that would work. Do all magazines have both flipbooks and those /Articles/ pages?
14:59:53<pokechu22>https://archive.alsharekh.org/Articles/293/20679/470610 has a blue "تصفح العدد" button that opens https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/index.html so it seems like flipbooks do exist for everything... but I can't see where that link comes from
15:01:35katia (katia) joins
15:01:50<pokechu22>... and the flipbook uses https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/files/mobile/1.jpg?201204132846 while the /Articles/ page uses https://archive.alsharekh.org/MagazinePages/Magazine_JPG/Al_Shariqa/Al_Shariqa_2017/Issue_3/001.jpg (better quality).
15:01:54<ikkoup>The whole thing is basically a giant flip book :(
15:01:54<ikkoup>And not very sure about articles page, but it exists for most of it (unindexed issues have no articles, only flipbook)
15:02:26katia quits [Remote host closed the connection]
15:03:20katia (katia) joins
15:03:35<pokechu22>I'll start it in archivebot just to get *something*, and hopefully a solution for the flipbooks can be found afterwards
15:03:45<pokechu22>Thanks for letting us know about the site, we probably wouldn't have found it otherwise :)
15:04:19katia quits [Remote host closed the connection]
15:04:24grid joins
15:05:18<pokechu22>I assume the rest of alsharekh.org should also be saved?
15:06:25<@arkiver>thank you ikkoup!
15:07:33<@arkiver>yeah it might be interesting to save everything on that site
15:07:44<@arkiver>at least into WARCs, perhaps separate items on IA as well
15:10:01<ikkoup>Not really, alsharekh.org is landing page for other services run by the same guy.
15:10:01<ikkoup>a Lexicon, Dictionary (acquired by Saudi government), Tashkeel (vowel movement corrector) and a spell checker. I guess they can't be saved.
15:11:57<ikkoup>I also tried to setup grab-site (https://github.com/ArchiveTeam/grab-site) on a vps to help crawling the archive, but had some troubles with python 3.8 not being supported.
15:12:50TheTechRobo quits [Read error: Connection reset by peer]
15:12:50Pedrosso quits [Read error: Connection reset by peer]
15:12:50ScenarioPlanet quits [Read error: Connection reset by peer]
15:13:19Pedrosso joins
15:13:24ScenarioPlanet (ScenarioPlanet) joins
15:13:36TheTechRobo (TheTechRobo) joins
15:19:57orchidcnr (orchidcnr) joins
15:19:57orchidcnr quits [Remote host closed the connection]
15:22:53<Terbium>ikkoup: I would recommend using a container or Python version manager for grab-site in that case to drop back down to Python 3.7
15:28:39<pokechu22>That said, archivebot isn't a distributed project - running grab-site locally would mean you grab the entire site yourself, and additional archivebot grabs the entire site by itself. It won't make things run faster.
15:32:12<ikkoup>Ah, I thought it was something like the archivewarrior.
15:32:12<ikkoup>I wanted to run grab-site since it has some advanced crawling/scraping capabilities for forums like vBulletin and SMF which are not found in other crawling/scarping tools I looked up.
15:50:24MtN joins
15:50:41MtN quits [Client Quit]
16:03:06ikkoup quits [Client Quit]
16:09:40<@arkiver>i realise i don't know much about storj
16:10:00<@arkiver>is it just private storage only for files to be made available from elsewhere, page requisites and such?
16:15:22GNU_world quits [Ping timeout: 255 seconds]
16:16:41<kiska>I think you can use storj as S3
16:21:24SootBector quits [Ping timeout: 255 seconds]
16:22:04SootBector (SootBector) joins
16:22:42GNU_world joins
16:26:57<kiska>Which I guess means you could have some site assets on storj being served
16:27:00<kiska>Or something like that
16:28:08<@arkiver>right
16:30:07<kpcyrd>is there a channel for archiving #web3?
16:32:16Guest54 quits [Client Quit]
16:33:52<@arkiver>archiving web3?
16:34:01Wohlstand (Wohlstand) joins
16:34:12<@arkiver>so like... archiving blockchains?
16:35:01<FireFly>I thought part of the point was that it's kind of implicitly so already due to its distributed nature
16:37:40<@arkiver>that's not archiving
16:40:26<FireFly>..fair
16:48:26katia (katia) joins
16:49:06katia quits [Remote host closed the connection]
16:50:08katia (katia) joins
16:51:34katia quits [Remote host closed the connection]
16:51:48katia (katia) joins
16:53:52katia quits [Remote host closed the connection]
17:04:05katia (katia) joins
17:09:21Wohlstand quits [Remote host closed the connection]
17:09:29GNU_world quits [Ping timeout: 272 seconds]
17:29:45linuxgemini quits [Ping timeout: 272 seconds]
17:30:21<kpcyrd>the question was tongue in cheek, I probably should've made that more obvious :)
17:31:52GNU_world joins
17:43:32G4te_Keep3r34924 quits [Client Quit]
17:44:01G4te_Keep3r34924 joins
17:50:52<h2ibot>Censuro edited Talk:URLTeam (+983, /* Shouldn't archive.today be considered a URL…): https://wiki.archiveteam.org/?diff=51913&oldid=26103
17:50:53<h2ibot>Popthebop edited Talk:Deathwatch (+423, /* the Tom Lehrer website containing original…): https://wiki.archiveteam.org/?diff=51914&oldid=51350
17:50:54<h2ibot>Popthebop edited Talk:Tumblr (+1278, /* Current state of tumblr | IMPORTANT */ new…): https://wiki.archiveteam.org/?diff=51915&oldid=45705
17:50:55<h2ibot>Sepro edited List of websites excluded from the Wayback Machine (+24, Add loom.com): https://wiki.archiveteam.org/?diff=51916&oldid=51896
17:50:56<h2ibot>Flama12333 edited Deathwatch (+167, added realtek ftp sadly): https://wiki.archiveteam.org/?diff=51917&oldid=51901
17:58:06katia quits [Remote host closed the connection]
17:58:43katia (katia) joins
17:59:16katia quits [Remote host closed the connection]
18:00:53<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51918&oldid=51916
18:04:00grid quits [Client Quit]
18:17:06grid joins
18:26:50linuxgemini (linuxgemini) joins
18:31:28Island joins
18:33:12Island quits [Read error: Connection reset by peer]
18:36:07Island joins
19:13:07<h2ibot>JacksonChen666 edited Deathwatch (+3, fix citation errors): https://wiki.archiveteam.org/?diff=51919&oldid=51917
19:20:57jacksonchen666 (jacksonchen666) joins
19:38:41systwi_ quits [Quit: systwi_]
19:38:42nothere_ quits [Quit: Leaving]
19:53:33Darken2 (Darken) joins
19:55:43<michaelblob>how are people doing log agg? looking into grafana loki but getting piss poor performance generating graphs
19:56:03<michaelblob>also eyeing influxdb but now sure how/where that fits in
19:56:40<Barto>work use an ELK stack
19:57:13Darken quits [Ping timeout: 255 seconds]
20:03:00nothere joins
20:17:13wyatt8750 joins
20:18:13wyatt8740 quits [Ping timeout: 272 seconds]
20:22:52wyatt8750 quits [Ping timeout: 255 seconds]
20:22:58<nstrom|m>Just using dozzle on individual servers, no agg
20:23:08wyatt8740 joins
21:02:26qwertyasdfuiopghjkl quits [Client Quit]
21:09:01qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
21:14:54Darken2 quits [Client Quit]
21:15:10Darken (Darken) joins
21:22:44<pabs>arkiver, kpcyrd: I wonder if Web3 is as distributed as advertised? relatedly NFTs certainly aren't, lots of them apparently just load stuff off HTTP
21:24:00grid quits [Client Quit]
21:28:25<nicolas17>lmk when there's anything of value worth archiving, too
21:29:55qwertyasdfuiopghjkl quits [Client Quit]
21:33:18<AK>I did ELK, but then it was approaching hundreds of GB of logs per day, now I just use dozzle everywhere 🤷‍♂️ At work we use Azure stuff and grafana if we need graphs
21:34:04<AK>dozzle does everything I need for almost all my personal stuff: https://logs.hel1.aktheknight.co.uk/
21:34:04qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
21:41:16Darken quits [Read error: Connection reset by peer]
21:41:41Darken (Darken) joins
22:14:46Darken quits [Remote host closed the connection]
22:21:37nulldata quits [Client Quit]
22:25:03nulldata (nulldata) joins
22:31:20Darken (Darken) joins
23:05:46neggles quits [Remote host closed the connection]
23:05:58neggles (neggles) joins
23:06:14NotGLaDOS quits [Remote host closed the connection]
23:07:20NotGLaDOS joins
23:13:17jacksonchen666 quits [Client Quit]
23:15:20Darken quits [Remote host closed the connection]
23:15:45Darken (Darken) joins
23:25:08wickedplayer494 quits [Remote host closed the connection]
23:42:17icedice quits [Client Quit]
23:49:48icedice (icedice) joins
23:52:41wickedplayer494 joins
23:54:11<icedice>JAA if you haven't gotten The PokéCommunity completely archived by now, you might want to put it high up on the priority list. A Pokémon fan game website was just shut down by DMCA: https://twitter.com/RelicCastleCom/status/1770901435867361351
23:54:57<icedice>The PokéCommunity has probably the largest Pokémon fan game communities out there and they had four games C&D'd a while ago, so the ninja lawyers are well aware that they exist
23:57:10<Terbium>why they gotta do my PokeCommunity like that....
23:57:36<pokechu22>I think we last did it 10 months ago: https://archive.fart.website/archivebot/viewer/job/202305131413054huog
23:58:30<nulldata>Terbium - because Nintendo loathes its fans.
23:59:27<Terbium>Also, they really should have hosted the site in a DMCA ignored location. After so many DMCA's over the decades, it seems like this lesson is never learned