The Top 5 Monitorama talks of 2019

METRICFIRE

Mar 20, 2020 ∙ 5 min read

MetricFire Blogger

Table of Contents

Designing Alerts to Direct Attention — Ryan Frantz
What’s Fitness Function Driven Development for Operability — Rosemary Wang
Have we finally reached #monitoringlove — Pete Cheslock
Dashboard Renaissance — How dashboards work and how to improve them — Cory Watson
ElasticSearch Data Exploration in Your Terminal — Brad Lhotsky

Great systems are not just built. They are monitored.

MetricFire runs Graphite and Grafana as a fully managed service for growing engineering teams, taking care of storage, scaling, and version updates so your team doesn't have to. Plans start at $19/month, billed per metric namespace rather than per host, and include engineer-staffed support. Integrations work natively with Heroku, AWS, Azure, and GCP, and data is stored with 3× redundancy in SOC2- and ISO:27001-certified data centres.

In October, 2019, one of the Monitorama conferences was held in Baltimore, MD. For those of you who missed the conference, this article will go through MetricFire's top 5 favorite talks, and summarize the most important information from each.

This year, we also noticed a huge uptake in the popularity of Prometheus and Grafana. You can check out some of our resources on Prometheus and Grafana in our blog. If you're interested in trying out Hosted Prometheus and Grafana, give our free trial a shot, and start monitoring your metrics today.

‍

Designing Alerts to Direct Attention — Ryan Frantz

Ryan talks about the “bouncing feedback loop”, where the concept is to maintain or correct system behavior. He points out that in the loop that connects your system to your teammates, if your monitoring systems are not getting enough feedback from you, it becomes an incomplete system. This is why he emphasizes to treat computers as teammates, and not a separate entity.

To give good feedback, your alerts need to be designed in a way that lets your mental mindset adapt to the alerts. When an alert comes to you, it’s assumed that the information in the alert is enough to get you started. You need to design alerts to give more information like urgency levels, location and details of the location of the alert, frequency of the alert, and graphs. Anytime you create an alert, give as much detail as possible to empower whoever sees the alert to give the appropriate reaction.

‍

What’s Fitness Function Driven Development for Operability — Rosemary Wang

Rosemary starts by talking about how any architectural changes to your product will affect several different people that are working with different parts of the product. She then goes into monitoring your unknowns with Fitness Function Driven Development. It borrows a lot from TTD (test driven development), which gives feedback for architectural conformance and gives development process in real time.

She recommends we need to clean up alerts by working with the people who need the alerts, taking the stuff they know, and modifying the test functions for alerts. All alerts that people don’t respond to should be removed.

‍

Have we finally reached #monitoringlove — Pete Cheslock

This talk is about observability and logging, where Pete says to put all your logs into JSON format so you can find things easily, and so that you can convert it to a database without much trouble. The reason for this is that the common alternative is creating a custom parser for your custom log format, and that isn’t efficient. Instead of putting logs in text or csv files, put them into some sort of database so that you can query any specific log you want at any given time. Some people say to log everything, and others say to log structured events. Pete says to do them both, and to put as much context into the logs as you can. For teams managing large volumes of structured data, Integrate.io provides a low-code data integration platform that can help transform and pipeline your logs and other data sources into queryable databases and data warehouses for better observability.

‍

Dashboard Renaissance — How dashboards work and how to improve them — Cory Watson

Before creating a dashboard, have a spec sheet for the dashboard. Cory talks about how the best charts are the ones that have two comparable metrics alongside another. For example: request volume and latency, as those two are closely related. Cory mentions that people on average can only keep 3 or 4 items in memory, so it's key to keep the number of dashboards to a maximum of four. Emphasizing the most important information within the visualizations is also important. When comparing several data points to a single point, making that single point bold will make it stand out and make it easier to compare to the rest. Create efficient visuals by slightly modifying important events to stand out more than the other events. Also, make wider panels rather than taller ones, because lines become easier to discern when the panel is wide.

‍

ElasticSearch Data Exploration in Your Terminal — Brad Lhotsky

Brad talks about introducing ElasticSearch into the terminal to make your data easier to explore. This is not only about monitoring, but also about workflow. He then talks about working from the terminal because it is faster and more efficient. The browser is not an IDE, and that the CLI (command line interface) is a workspace. The CLI is less distracting than a browser, as there are a lot of buttons and notifications that pop up when browsing around. Brad stresses a lot on the quote “There are a finite amount of keystrokes in your life, use them wisely”, and then adds that “You also have a finite amount of distance you can mouse or gesture”. Explorability is the key to the reason why Brad uses ElasticSearch in his terminal. Having outputs in the terminal lets you further explore by being able to pipe other search commands. All of this has led Brad to create a terminal app called es-utils. It is a log search utility that is built on Perl and uses ElasticSearch.

‍

Monitorama was a great experience, and there is a long list of other talks available on their website. We're looking forward to heading back and meeting more people next year.

Reach out and talk to the MetricFire team by booking a demo, or sign up for our free trial and start using Prometheus, Graphite, or Grafana today. You can build great dashboards without having to worry about the setup by using our product, and we wiill always remain open source at heart.

Total Servers to monitor ~150 metrics per host (configurable for fewer metrics if needed) Cloud Services to monitor (in AWS, Azure, GCP)

~25 metrics per instance / service (typical baseline monitoring) Application / Custom metric event footprint Custom metrics are defined and emitted from your app code Heroku Applications ~75 metrics (varies by app-size / add-ons)