00:01:27 | <Terbium> | I see a bunch of free and paid APIs for M&A feeds |
00:10:15 | <Terbium> | https://site.financialmodelingprep.com/developer/docs/merger-and-acquisition-api |
00:10:51 | <fireonlive> | hmm |
00:10:59 | <fireonlive> | if there’s a good rss feed i could hook it up to rss |
00:12:06 | <Terbium> | there's this: https://seekingalpha.com/market-news/m-a |
00:13:01 | <Terbium> | There's an RSS feed |
00:15:22 | <fireonlive> | this seems to be the url for the feed: https://seekingalpha.com/tag/m-a.xml |
00:16:10 | <fireonlive> | i’m sitting in a vehicle on my phone so hard to tell for sure haha |
00:16:57 | <Terbium> | fireonlive: yep that's the one, it has the stock ticker symbols like the other FMP feed |
00:17:05 | <fireonlive> | ah awesome :) |
00:17:09 | <Terbium> | which makes finding companies a lot easier |
00:17:30 | <Terbium> | it also showed failed or cancelled M&As as well |
00:18:10 | <fireonlive> | i’ll toss it up in #m&a if that suits everyone when i’m back at a more proper computer later; just out and about with a friend who’s visiting for the first time in a while |
00:29:39 | | wickedplayer494 quits [Remote host closed the connection] |
00:43:20 | | icedice (icedice) joins |
00:44:05 | | icedice quits [Client Quit] |
00:45:53 | | icedice (icedice) joins |
00:57:18 | | wickedplayer494 joins |
00:57:45 | | wickedplayer494 is now authenticated as wickedplayer494 |
01:03:04 | | nicolas17 joins |
01:05:55 | | Wohlstand (Wohlstand) joins |
01:13:04 | | Naruyoko5 joins |
01:16:16 | | Naruyoko quits [Ping timeout: 255 seconds] |
01:16:43 | | le0n quits [Ping timeout: 255 seconds] |
01:18:20 | <qwertyasdfuiopghjkl> | https://www.thewrap.com/gannett-drops-ap-associated-press-usa-today/ "Gannett, publisher of USA Today and hundreds of local newspapers, will stop using the Associated Press’ content starting next week, [...] will eliminate AP dispatches, photos and video as of March 25, according to an internal memo" |
01:19:06 | <qwertyasdfuiopghjkl> | Not sure if this means removal of existing content or just discontinuing new content |
01:23:08 | | le0n (le0n) joins |
01:27:17 | <qwertyasdfuiopghjkl> | https://apnews.com/article/gannett-associated-press-contract-97405e4715c9a25d21477b992028db2a "Shortly after, AP said it had been informed by McClatchy that it would also drop the service." https://www.nytimes.com/2024/03/19/business/media/gannett-mcclatchy-ap-associated-press.html "McClatchy [...] told its editors this week that it would stop |
01:27:18 | <qwertyasdfuiopghjkl> | using some A.P. services next month." "[McClatchy] said that The A.P.’s feed would end on March 29 and that no A.P. content could be published after March 31." apparently there's also another one |
02:20:56 | | hackbug quits [Remote host closed the connection] |
02:26:04 | | hackbug (hackbug) joins |
02:29:50 | <fireonlive> | #m&a is now setup, we should see if it works within the hour :3 |
02:32:28 | <fireonlive> | Terbium++ |
02:32:29 | <eggdrop> | [karma] 'Terbium' now has 2 karma! |
02:34:40 | | hackbug quits [Remote host closed the connection] |
02:37:39 | | hackbug (hackbug) joins |
03:02:34 | | lennier2 joins |
03:03:59 | | threedeeitguy39 quits [Ping timeout: 272 seconds] |
03:04:37 | | lennier2_ quits [Ping timeout: 272 seconds] |
03:11:53 | | threedeeitguy39 (threedeeitguy) joins |
03:16:38 | | Perk quits [Quit: Ping timeout (120 seconds)] |
03:19:06 | | Perk joins |
03:55:34 | | PredatorIWD joins |
04:03:56 | | Island quits [Read error: Connection reset by peer] |
05:11:48 | | BlueMaxima quits [Read error: Connection reset by peer] |
05:47:51 | | ell quits [Client Quit] |
06:08:11 | | Arcorann (Arcorann) joins |
06:23:59 | | Dango360_ joins |
06:25:16 | | _Dango360 joins |
06:27:55 | | Dango360 quits [Ping timeout: 272 seconds] |
06:29:01 | | Dango360_ quits [Ping timeout: 255 seconds] |
06:57:49 | | G4te_Keep3r34924 quits [Ping timeout: 255 seconds] |
06:58:57 | | G4te_Keep3r34924 joins |
07:51:50 | | groentela joins |
07:53:27 | | groentela quits [Client Quit] |
09:00:02 | | Bleo182600 quits [Client Quit] |
09:01:22 | | Bleo182600 joins |
09:18:31 | | Wohlstand quits [Client Quit] |
09:30:23 | | newbie007 joins |
09:31:36 | <newbie007> | is it possible to upload locally archived websites to internet archive such that they are searchable using wayback machine? |
09:32:30 | <pabs> | that isn't possible |
09:36:52 | | newbie007 quits [Ping timeout: 265 seconds] |
09:48:49 | | rohvani quits [Ping timeout: 255 seconds] |
09:53:43 | <@arkiver> | RIP original redis |
10:13:16 | | newbie007 joins |
10:18:27 | | monika quits [Quit: Zzz] |
10:41:48 | | newbie007 quits [Client Quit] |
11:55:53 | | ^ quits [Remote host closed the connection] |
11:56:46 | | ^ (^) joins |
12:12:14 | | nicolas17 quits [Read error: Connection reset by peer] |
12:12:44 | | nicolas17 joins |
12:16:10 | | monika (boom) joins |
12:21:11 | | linuxgemini (linuxgemini) joins |
12:32:05 | | Arcorann quits [Ping timeout: 272 seconds] |
12:37:56 | | Darken quits [Remote host closed the connection] |
12:38:20 | | Darken (Darken) joins |
12:38:52 | | ^ quits [Remote host closed the connection] |
12:39:11 | | ^ (^) joins |
12:42:22 | | PredatorIWD quits [Read error: Connection reset by peer] |
12:53:43 | | PredatorIWD joins |
13:27:22 | | ^ quits [Remote host closed the connection] |
13:27:46 | | ^ (^) joins |
13:31:51 | | Guest54 joins |
14:16:20 | | Ruthalas59 quits [Quit: Ping timeout (120 seconds)] |
14:16:44 | | Ruthalas59 (Ruthalas) joins |
14:24:43 | | katia quits [Remote host closed the connection] |
14:25:41 | | katia (katia) joins |
14:26:42 | | katia quits [Remote host closed the connection] |
14:27:21 | | katia (katia) joins |
14:27:55 | | katia quits [Remote host closed the connection] |
14:28:35 | | knecht4 quits [Client Quit] |
14:28:57 | | Derpest joins |
14:28:59 | | katia (katia) joins |
14:29:36 | | Derpest quits [Client Quit] |
14:30:07 | | katia quits [Remote host closed the connection] |
14:30:36 | | katia (katia) joins |
14:31:21 | | katia quits [Remote host closed the connection] |
14:32:22 | | katia (katia) joins |
14:33:00 | | katia quits [Remote host closed the connection] |
14:33:56 | | katia (katia) joins |
14:34:29 | | katia quits [Remote host closed the connection] |
14:35:40 | | katia (katia) joins |
14:36:14 | | katia quits [Remote host closed the connection] |
14:36:31 | | ikkoup joins |
14:36:52 | <ikkoup> | Hi, |
14:36:53 | <ikkoup> | Would you be interested in archiving the biggest (and only) Arabic archive of literary magazines? Its owner died last week and it's at risk of dying at anytime. |
14:36:53 | <ikkoup> | https://archive.alsharekh.org |
14:37:20 | | katia (katia) joins |
14:37:53 | | katia quits [Remote host closed the connection] |
14:38:58 | | katia (katia) joins |
14:39:31 | | katia quits [Remote host closed the connection] |
14:39:38 | <ikkoup> | the site also has a sitemap (https://archive.alsharekh.org/sitemap.xml) which would help ramp things up! |
14:39:42 | <pokechu22> | Hmm, the stats are 2 million pages, 326,446 articles, 52,234 writers, 273 magazines, 15,857 issues. It looks like images are directly embedded (view-source:https://archive.alsharekh.org/Articles/293/20679/470610 has <img _ngcontent-sc1 class="slide_image" src="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg" |
14:39:44 | <pokechu22> | data-normal="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg" data-full="MagazinePages\Magazine_JPG\Al_Shariqa\Al_Shariqa_2017\Issue_3\014.jpg"> + <base href="/">) and archivebot extracts those correctly, and the server doesn't mind the backslashes not being replaced by the browser with forward slashes |
14:40:24 | | knecht4 joins |
14:40:32 | | katia (katia) joins |
14:41:06 | | katia quits [Remote host closed the connection] |
14:41:10 | <ikkoup> | Yes, it uses "flipbuilder.com" (PDF Page Flipper) to make the reading pages. |
14:41:10 | <ikkoup> | Don't know if you encountered that before. sorry for my weak language. |
14:42:08 | | katia (katia) joins |
14:42:42 | | katia quits [Remote host closed the connection] |
14:43:48 | | katia (katia) joins |
14:43:49 | <pokechu22> | I think archivebot will work here - 2 million URLs is a bit large, but we've done bigger. Do you know if it's at risk of shutting down in a few weeks, or if it'll probably be up for months? |
14:44:29 | | katia quits [Remote host closed the connection] |
14:45:24 | | katia (katia) joins |
14:45:38 | | katia quits [Remote host closed the connection] |
14:46:16 | <pokechu22> | hmm, https://archive.alsharekh.org/contents/293/20679 requires a bunch of API requests to e.g. https://archiveapi.alsharekh.org/Search/IssueIndex?IID=20679 actually; archivebot probably won't follow those |
14:47:13 | <ikkoup> | Hmm, not sure. |
14:47:14 | <ikkoup> | The owner was the pioneer or Arabic language in the early days of computers and he (and his company at the time) added Arabic support for almost every OS/software at the time. |
14:47:14 | <ikkoup> | The company isn't very active these days and he stepped down from it. I guess it'd be up for a few months considering his finances and tech background? |
14:47:36 | <pokechu22> | ... though https://archive.alsharekh.org/sitemap10.xml links to articles, so it *would* find all of the articles, but the table of contents would not work unless we did that separately (which would not be *too* hard) |
14:48:23 | <ikkoup> | Not sure if its possible, but can you ignore the API requests? |
14:48:24 | <ikkoup> | It's for info about individual articles which is not as important as the whole issue/chapter/magazine (https://archive.alsharekh.org/MagazinePages/MagazineBook/~xxx) |
14:49:26 | <ikkoup> | The important stuff is at the above url structure, the API acts like an index for the issue (article 1 is at page 3, article 2 is at page 6 etc) |
14:52:43 | <pokechu22> | Hmm, http://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/index.html doesn't have any URLs archivebot would find in it... that flipbook won't work well with it |
14:53:45 | <pokechu22> | it looks like https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Maarefa/Al_Maarefa_2020/Issue_681/mobile/javascript/config.js has bookConfig.totalPageCount=337 and bookConfig.CreatedTime ="201204132846" |
14:54:47 | <ikkoup> | If you check dev inspection (ctrl shift i) then you can see that the flipbook is just a bunch of images and js. |
14:54:48 | <ikkoup> | I guess it's not possible after all eh? |
14:55:36 | <pokechu22> | It would be possible, but it would require additional work to make the flipbooks function |
14:56:23 | <pokechu22> | https://archive.alsharekh.org/Articles/293/20679/470610 links the images directly though so that would work. Do all magazines have both flipbooks and those /Articles/ pages? |
14:59:53 | <pokechu22> | https://archive.alsharekh.org/Articles/293/20679/470610 has a blue "تصفح العدد" button that opens https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/index.html so it seems like flipbooks do exist for everything... but I can't see where that link comes from |
15:01:35 | | katia (katia) joins |
15:01:50 | <pokechu22> | ... and the flipbook uses https://archive.alsharekh.org/MagazinePages/MagazineBook/Al_Shariqa/Al_Shariqa_2017/Issue_3/files/mobile/1.jpg?201204132846 while the /Articles/ page uses https://archive.alsharekh.org/MagazinePages/Magazine_JPG/Al_Shariqa/Al_Shariqa_2017/Issue_3/001.jpg (better quality). |
15:01:54 | <ikkoup> | The whole thing is basically a giant flip book :( |
15:01:54 | <ikkoup> | And not very sure about articles page, but it exists for most of it (unindexed issues have no articles, only flipbook) |
15:02:26 | | katia quits [Remote host closed the connection] |
15:03:20 | | katia (katia) joins |
15:03:35 | <pokechu22> | I'll start it in archivebot just to get *something*, and hopefully a solution for the flipbooks can be found afterwards |
15:03:45 | <pokechu22> | Thanks for letting us know about the site, we probably wouldn't have found it otherwise :) |
15:04:19 | | katia quits [Remote host closed the connection] |
15:04:24 | | grid joins |
15:05:18 | <pokechu22> | I assume the rest of alsharekh.org should also be saved? |
15:06:25 | <@arkiver> | thank you ikkoup! |
15:07:33 | <@arkiver> | yeah it might be interesting to save everything on that site |
15:07:44 | <@arkiver> | at least into WARCs, perhaps separate items on IA as well |
15:10:01 | <ikkoup> | Not really, alsharekh.org is landing page for other services run by the same guy. |
15:10:01 | <ikkoup> | a Lexicon, Dictionary (acquired by Saudi government), Tashkeel (vowel movement corrector) and a spell checker. I guess they can't be saved. |
15:11:57 | <ikkoup> | I also tried to setup grab-site (https://github.com/ArchiveTeam/grab-site) on a vps to help crawling the archive, but had some troubles with python 3.8 not being supported. |
15:12:50 | | TheTechRobo quits [Read error: Connection reset by peer] |
15:12:50 | | Pedrosso quits [Read error: Connection reset by peer] |
15:12:50 | | ScenarioPlanet quits [Read error: Connection reset by peer] |
15:13:19 | | Pedrosso joins |
15:13:24 | | ScenarioPlanet (ScenarioPlanet) joins |
15:13:36 | | TheTechRobo (TheTechRobo) joins |
15:19:57 | | orchidcnr (orchidcnr) joins |
15:19:57 | | orchidcnr quits [Remote host closed the connection] |
15:22:53 | <Terbium> | ikkoup: I would recommend using a container or Python version manager for grab-site in that case to drop back down to Python 3.7 |
15:28:39 | <pokechu22> | That said, archivebot isn't a distributed project - running grab-site locally would mean you grab the entire site yourself, and additional archivebot grabs the entire site by itself. It won't make things run faster. |
15:32:12 | <ikkoup> | Ah, I thought it was something like the archivewarrior. |
15:32:12 | <ikkoup> | I wanted to run grab-site since it has some advanced crawling/scraping capabilities for forums like vBulletin and SMF which are not found in other crawling/scarping tools I looked up. |
15:50:24 | | MtN joins |
15:50:41 | | MtN quits [Client Quit] |
16:03:06 | | ikkoup quits [Client Quit] |
16:09:40 | <@arkiver> | i realise i don't know much about storj |
16:10:00 | <@arkiver> | is it just private storage only for files to be made available from elsewhere, page requisites and such? |
16:15:22 | | GNU_world quits [Ping timeout: 255 seconds] |
16:16:41 | <kiska> | I think you can use storj as S3 |
16:21:24 | | SootBector quits [Ping timeout: 255 seconds] |
16:22:04 | | SootBector (SootBector) joins |
16:22:42 | | GNU_world joins |
16:26:57 | <kiska> | Which I guess means you could have some site assets on storj being served |
16:27:00 | <kiska> | Or something like that |
16:28:08 | <@arkiver> | right |
16:30:07 | <kpcyrd> | is there a channel for archiving #web3? |
16:32:16 | | Guest54 quits [Client Quit] |
16:33:52 | <@arkiver> | archiving web3? |
16:34:01 | | Wohlstand (Wohlstand) joins |
16:34:12 | <@arkiver> | so like... archiving blockchains? |
16:35:01 | <FireFly> | I thought part of the point was that it's kind of implicitly so already due to its distributed nature |
16:37:40 | <@arkiver> | that's not archiving |
16:40:26 | <FireFly> | ..fair |
16:48:26 | | katia (katia) joins |
16:49:06 | | katia quits [Remote host closed the connection] |
16:50:08 | | katia (katia) joins |
16:51:34 | | katia quits [Remote host closed the connection] |
16:51:48 | | katia (katia) joins |
16:53:52 | | katia quits [Remote host closed the connection] |
17:04:05 | | katia (katia) joins |
17:09:21 | | Wohlstand quits [Remote host closed the connection] |
17:09:29 | | GNU_world quits [Ping timeout: 272 seconds] |
17:29:45 | | linuxgemini quits [Ping timeout: 272 seconds] |
17:30:21 | <kpcyrd> | the question was tongue in cheek, I probably should've made that more obvious :) |
17:31:52 | | GNU_world joins |
17:43:32 | | G4te_Keep3r34924 quits [Client Quit] |
17:44:01 | | G4te_Keep3r34924 joins |
17:50:52 | <h2ibot> | Censuro edited Talk:URLTeam (+983, /* Shouldn't archive.today be considered a URL…): https://wiki.archiveteam.org/?diff=51913&oldid=26103 |
17:50:53 | <h2ibot> | Popthebop edited Talk:Deathwatch (+423, /* the Tom Lehrer website containing original…): https://wiki.archiveteam.org/?diff=51914&oldid=51350 |
17:50:54 | <h2ibot> | Popthebop edited Talk:Tumblr (+1278, /* Current state of tumblr | IMPORTANT */ new…): https://wiki.archiveteam.org/?diff=51915&oldid=45705 |
17:50:55 | <h2ibot> | Sepro edited List of websites excluded from the Wayback Machine (+24, Add loom.com): https://wiki.archiveteam.org/?diff=51916&oldid=51896 |
17:50:56 | <h2ibot> | Flama12333 edited Deathwatch (+167, added realtek ftp sadly): https://wiki.archiveteam.org/?diff=51917&oldid=51901 |
17:58:06 | | katia quits [Remote host closed the connection] |
17:58:43 | | katia (katia) joins |
17:59:16 | | katia quits [Remote host closed the connection] |
18:00:53 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51918&oldid=51916 |
18:04:00 | | grid quits [Client Quit] |
18:17:06 | | grid joins |
18:26:50 | | linuxgemini (linuxgemini) joins |
18:31:28 | | Island joins |
18:33:12 | | Island quits [Read error: Connection reset by peer] |
18:36:07 | | Island joins |
19:13:07 | <h2ibot> | JacksonChen666 edited Deathwatch (+3, fix citation errors): https://wiki.archiveteam.org/?diff=51919&oldid=51917 |
19:20:57 | | jacksonchen666 (jacksonchen666) joins |
19:38:41 | | systwi_ quits [Quit: systwi_] |
19:38:42 | | nothere_ quits [Quit: Leaving] |
19:53:33 | | Darken2 (Darken) joins |
19:55:43 | <michaelblob> | how are people doing log agg? looking into grafana loki but getting piss poor performance generating graphs |
19:56:03 | <michaelblob> | also eyeing influxdb but now sure how/where that fits in |
19:56:40 | <Barto> | work use an ELK stack |
19:57:13 | | Darken quits [Ping timeout: 255 seconds] |
20:03:00 | | nothere joins |
20:17:13 | | wyatt8750 joins |
20:18:13 | | wyatt8740 quits [Ping timeout: 272 seconds] |
20:22:52 | | wyatt8750 quits [Ping timeout: 255 seconds] |
20:22:58 | <nstrom|m> | Just using dozzle on individual servers, no agg |
20:23:08 | | wyatt8740 joins |
21:02:26 | | qwertyasdfuiopghjkl quits [Client Quit] |
21:09:01 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
21:14:54 | | Darken2 quits [Client Quit] |
21:15:10 | | Darken (Darken) joins |
21:22:44 | <pabs> | arkiver, kpcyrd: I wonder if Web3 is as distributed as advertised? relatedly NFTs certainly aren't, lots of them apparently just load stuff off HTTP |
21:24:00 | | grid quits [Client Quit] |
21:28:25 | <nicolas17> | lmk when there's anything of value worth archiving, too |
21:29:55 | | qwertyasdfuiopghjkl quits [Client Quit] |
21:33:18 | <AK> | I did ELK, but then it was approaching hundreds of GB of logs per day, now I just use dozzle everywhere 🤷♂️ At work we use Azure stuff and grafana if we need graphs |
21:34:04 | <AK> | dozzle does everything I need for almost all my personal stuff: https://logs.hel1.aktheknight.co.uk/ |
21:34:04 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
21:41:16 | | Darken quits [Read error: Connection reset by peer] |
21:41:41 | | Darken (Darken) joins |
22:14:46 | | Darken quits [Remote host closed the connection] |
22:21:37 | | nulldata quits [Client Quit] |
22:25:03 | | nulldata (nulldata) joins |
22:31:20 | | Darken (Darken) joins |
23:05:46 | | neggles quits [Remote host closed the connection] |
23:05:58 | | neggles (neggles) joins |
23:06:14 | | NotGLaDOS quits [Remote host closed the connection] |
23:07:20 | | NotGLaDOS joins |
23:13:17 | | jacksonchen666 quits [Client Quit] |
23:15:20 | | Darken quits [Remote host closed the connection] |
23:15:45 | | Darken (Darken) joins |
23:25:08 | | wickedplayer494 quits [Remote host closed the connection] |
23:42:17 | | icedice quits [Client Quit] |
23:49:48 | | icedice (icedice) joins |
23:52:41 | | wickedplayer494 joins |
23:52:50 | | wickedplayer494 is now authenticated as wickedplayer494 |
23:54:11 | <icedice> | JAA if you haven't gotten The PokéCommunity completely archived by now, you might want to put it high up on the priority list. A Pokémon fan game website was just shut down by DMCA: https://twitter.com/RelicCastleCom/status/1770901435867361351 |
23:54:57 | <icedice> | The PokéCommunity has probably the largest Pokémon fan game communities out there and they had four games C&D'd a while ago, so the ninja lawyers are well aware that they exist |
23:57:10 | <Terbium> | why they gotta do my PokeCommunity like that.... |
23:57:36 | <pokechu22> | I think we last did it 10 months ago: https://archive.fart.website/archivebot/viewer/job/202305131413054huog |
23:58:30 | <nulldata> | Terbium - because Nintendo loathes its fans. |
23:59:27 | <Terbium> | Also, they really should have hosted the site in a DMCA ignored location. After so many DMCA's over the decades, it seems like this lesson is never learned |