In the modern IT environment, it is critical to proactively monitor your servers and related infrastructure. Today, there is a wide array of monitoring solutions, and each of them has its pros and cons – some are platform-specific, some are better suited to on-premise servers and others work best on cloud platforms. Some are easier to start deploying than others, offering a wider range of integrations with data sources than alternatives, or they feature slicker UI’s that are easier to understand than others. And most importantly, there is a significant variance in costs among the solutions.
In this article, we will go through and compare the most popular server-monitoring solutions available today. Is there a solution that is platform-agnostic, cost-effective, and simple to use? The answer is ‘yes’ – MetricFire offers all these and more. You should sign up for a free demo and the free trial here.
Let us start by specifying and clarifying what ‘server monitoring’ is and what is covered. At a very basic level, we can define server monitoring as a set of processes and actions geared towards reviewing and analyzing a server for availability, operations, performance, security, and other operations-related processes.
In this article, we are mostly concerned with the performance of the actual hardware of the server (both physical and virtual hardware). We are excluding application performance, except to the extent that application performance is directly related to and affected by the performance of the underlying server resources.
To illustrate how server monitoring becomes involved in APM, consider an application metric that relies on a reserved amount of server memory (such as for an in-memory database). In this case, the application metric is clearly adversely affected if the server’s memory is regularly full and the reserved memory cannot be allocated, right? In this case, we can use the application’s reserved memory requirement as a base figure when setting the server’s memory limits or metrics.
Managing an increasing number of servers with limited human resources is the challenge to overcome when monitoring your servers and applications. It is simply not smart or feasible to increase the number of IT personnel dedicated to monitoring as your servers increase. Clearly what is needed is a scalable solution to server monitoring – with bonus points to those solutions that allow your IT monitoring people to quickly identify the most crucial pain-points in the servers being monitored.
This can be achieved either as a fully in-house solution, or by outsourcing to specialized 3rd-party tools and services. We will look at these two different methods in the subsequent sections in this article, and analyze how best to scale your server monitoring on a budget.
Throughout this analysis, we'll look at how each method handles monitoring the metrics listed below. This is not an exhaustive list, but it covers the most common and most useful metrics that a majority of IT departments will want to keep an eye on:
As mentioned previously, one approach to server monitoring is to handle it all in-house. This includes the following:
As you may have noticed by reading between the lines in the above points, in-house server monitoring can get really expensive in terms of IT human resources, as well as paying for the actual software you are using. Plus, you may incur both one-off and recurring related costs - a new server to host the server-monitoring solution, training for your IT engineers, extra consulting services, etc.
The knock-on effect of this significant investment is that once set up, you are most likely tied to your in-house solution for at least the next few years. If it turns out that the solution you chose was not the best, you are stuck with that suboptimal choice.
Clearly, in-house server monitoring is not an ideal setup. It is perhaps best left to very large IT departments that can spare the costs and headaches of an in-house solution, or for those organizations that absolutely must use an in-house solution, usually this happens for security reasons - such as defense contractors or high-security biotech firms.
As an alternative to in-house server monitoring, let’s take an in-depth look at some cloud monitoring solutions:
The first of these is AWS's monitoring solution, CloudWatch. For users hosting servers on the ubiquitous AWS platform, CloudWatch is an obvious solution. However, CloudWatch has 3 main limitations even for users whose server infrastructure is wholly hosted on AWS:
As we shall see shortly, MetricFire’s monitoring solution is designed exactly to overcome these limitations.
MetricFire is a hosted service that combines Prometheus, Graphite, and Grafana. It offers a complete infrastructure and application monitoring platform which helps customers collect, store, and visualize time-series data from any source. MericFire’s monitoring platform is fully cloud-hosted, and the monitoring agents can be deployed on both on-premise and cloud servers.
MetricFire’s support engineers are always available to help out on alerting design, analytics, and overall monitoring. And it contains a full-featured web UI that allows you to send metrics and visualize your data directly on the platform. You can extend the product functionality using plugins such as GitHub, PagerDuty, Slack, Heroku, CircleCI, and more.
Typical use cases are for monitoring servers, applications, IT networks, or any other infrastructure. MetricFire’s most important USP by far is its cost - it offers a far more affordable alternative to enterprise monitoring solutions. As explained on the pricing page, MetricFire’s monitoring solutions are about half the cost of Datadog’s, and are much more affordable than CloudWatch because of the use of bundled services and features, as opposed to CloudWatch’s itemized pricing that gets very expensive very fast.
Compared to the monitoring platforms above, MetricFire has additional and unique features, such as:
Grafana is an online open-source tool for running analytics and monitoring. Grafana integrates with several data sources and can create excellent dashboards. It is especially useful for comparing and analyzing trends and metrics over longer time periods.
However, Grafana is a complex beast that can be overwhelming for beginners to master and utilize. This is where MetricFire's customer support pulls ahead. Check out the MetricFire hosted Grafana solution, available with with all MetricFire packages.
Grafana Labs is a private commercial solution that helps users deploy and use Grafana for their server monitoring needs. They offer two solutions depending on your needs: Grafana Cloud is targeted at smaller-scale users. It includes a dedicated Grafana instance and is compatible with both Prometheus and Graphite.
Grafana Cloud pricing starts at $49/ month for the standard version (with a 30-day free trial) and customized pricing for the Pro version. The other solution is Grafana Enterprise, designed for larger organizations that want to utilize even more of the Grafana stack: not just Grafana itself, but also the Prometheus and Graphite backends. You can read more about these solutions here.
Datadog is a cloud-based infrastructure & application monitoring tool. Datadog is used mostly in environments with a need to monitor a wide range of tools and services over the cloud - from network to system to server monitoring. Datadog covers it all with its 200+ integrations for tools and services, making it easier to monitor every component of the tech stack. It also includes a useful recorder to create your own tests that cannot easily be defined using APIs or single metrics. Like with Grafana, the product’s complexities mean a steep learning curve as it takes some time to get used to.
Datadog originally began as a simpler cloud infrastructure monitoring service with dashboards, alerting, and visualizations of metrics. As cloud adoption increased, Datadog grew rapidly and expanded its product offering to cover service providers including Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform, Red Hat OpenShift, and OpenStack. Datadog quite recently launched its application monitoring service as well. It is possible to integrate with apps such as PagerDuty or slack to receive notifications.
Datadog is free for up to 5 hosts (but with only a 1-day data retention period) and offers a 14-day free trial. After that, customers are billed at $15/host per month, and network performance at $5 per host per month. Log management is billed at $1.27 per month for every million log events and security monitoring at $0.20 per month for every GB of analyzed logs. You can check the updated pricing of the Datadog infrastructure on the website, but it is clear that Datadog pricing can get expensive for a wide range of monitoring metrics.
Yet another alternative to Grafana and Datadog, New Relic is especially good at monitoring real-time events, making it useful for IT departments and organizations that host real-time applications like web servers and gaming services. It also provides preconfigured dashboards for a wide array of cloud platforms and their integrations, including the big 3 - Amazon Web Services, Microsoft Azure, Google Cloud Platform.
You can also build custom integrations using New Relic’s integrations SDK. However, its integrations are somewhat clumsily documented and so not easy for everyone to set up; plus they require at least an intermediate level of technical understanding. New Relic’s general documentation and front-end UI are also not as polished as its main competitors.
Pricing includes a 30-day free trial, and after that kicks off at $14.40 per month for the Pro version which comes with 13 months of data retention and up to 2275 integration events. Beyond that, New Relic only states on their pricing page that they offer “flexible pricing options for customers in highly dynamic environments”.
It is clear that MetricFire is a greatly affordable and complete monitoring solution, unlike many of its alternatives that fail in one or more key areas of customer concern. MetricFire integrates with the big providers, such as AWS and Azure, as well as many other data sources. Use MetricFire to monitor your infrastructure as well as data coming from systems across your stack. Check out the MetricFire demo and also sign up for a free trial here.