These days, a well-functioning infrastructure and app monitoring plays a key role and it basically goes without saying. In the majority of cases, for enterprises relying on online products the success (I intentionally skip the idea and coding of an app) depends on an appropriately functioning platform.
What does that mean for users?
The multitude of factors can be narrowed down to two key parameters:
- Availability
- Operational quickness
Fluctuations of any of them can generate immense costs, both direct and indirect, which include e.g. loss of client’s trust that is very difficult to restore.
For business people, the profitability of investments in a proper monitoring system is always a hard nut to crack in comparison with e.g. a new app feature that will directly impact client’s satisfaction. Nonetheless, this cost has to be incurred in order to succeed.
Let’s take a look at these two cases:
- A newly-created, developing startup,
- A corporation with a stable market position.
IT monitoring from the perspective of a startup:
We don’t want to assign a large part of our budget to a monitoring system. On the contrary, we want to spend as little as possible on that. A perfect solution would be to implement an open-source monitoring stack based on e.g. Prometheus. Whether our architecture is based on microservices and Kubernetes in the cloud, or monolith on-premise, Prometheus with an army of exporters should be perfect for the job, and a well-structured stack will provide infrastructure and app metrics. This can be supported by a noticeable traffic from suppliers of commercial products who equip their solutions with internal mechanisms responsible for providing metrics for Prometheus. Data presented in Grafan, anomaly notifications based on AlertManager, and clever service discovery with the use of e.g. Consul, provide excellent value for reasonable price.
However, what should we do if the publicly available exporters don’t meet our needs and we need something tailored to our requirements?
Not a problem. Creating a bespoke exporter should not be an issue for a developer or an experienced DevOps engineer. Taking another step forwards, it sounds like a good idea to add requirements at the development stage, so that development teams can implement appropriate metrics while building the app. What benefits does that bring? Clear, consistent, and transparent data based on a single standard. And everything should obviously be built with automation in mind. Automate whenever possible!
An investment? Where are the expenditures? Well, clearly in the time spent by developers, DevOps engineers, and admins on implementing this solution. However, remember that thanks to automation the high costs have to be incurred only once.
IT monitoring from the perspective of a corporation:
While reading the paragraph above and analysing the startup’s perspective, one may think: OK, that solution provides everything I need, and it’s free. So why should I invest in often expensive and commercial solution the APM? Well, there’s obviously an open-source sector, and even though it has been dynamically growing, it’s still far behind the commercial and paid competition.
Thus, is an enterprise wants to play a key role within its sector, it should set the bar high and reach for solutions that offer the highest possible observability level and in-depth analyses of platform functioning. This can be achieved with the use of APM-class products.
The top apps feature an abundance of functionalities, so let’s focus on a few selected ones that I find crucial.
Digital user experience monitoring – simply put, it’s about examining user’s behaviour in our app. When combined with Performance of Web Requests and Transactions, we get a multitude of parameters to analyse: from metrics related to user’s behaviour towards concrete functionalities of the platform, through slowdowns of particular components, to precise identification of the part of the infrastructure or even portion of the code that may be malfunctioning.
Let’s dig deeper, down to the development process. Monitoring can be very useful during the app building stage, so it becomes a tool used by developers as well. Code-Level Performance Profiling can identify free class, method, or dependency calls that directly impact the performance. By taking preventive measures at the building stage, we protect ourselves from the domino effect that in case of a potential failure could generate huge costs. This can primarily include app unavailability, engaging teams in repair works, or the load of tasks aimed solely at ”preventing this from happening again”.
The final term I want to mention is Anomaly detection. Nowadays, clouds, containers, and microservices offer huge flexibility (if you want to learn more about microservices, here you will find: Essentials on microservices - 10 features of an effective architecture). And a monitoring stack has to offer the same, and the key parameter here is a proactive approach. Thanks to AI, continuous analysis of the environment condition, its performance and topology, we can prevent instead of curing, and therefore limit the risk of a failure to a very low level.