Anomaly Detection Using OSquery and Grafana

Anomaly Detection Using OSquery and Grafana

Table of Contents

  • Introduction
  • Understanding Osquery
  • Using Osquery To Detect Anomalies
    • Using The Processes Table
    • Using The Startup Applications List
    • Monitoring Network Activity 
    • Monitoring Suspicious Resource Utilization
  • Connecting Osquery To Grafana
  • Osquery With MetricFire
  • Conclusion


Detecting unauthorized usage and malicious applications in an instance involves analyzing OS and application logs. Doing this manually is a herculean effort because of the number of logs and the patterns one has to look for. Having a tool that can provide an aggregated view of your instance and the ability to analyze them easily can greatly reduce manual effort. Osquery is an open-source utility that can represent information about devices in table structure and provide a SQL interface to analyze them. Osquery enables querying devices like a database. It is available for Windows, Mac, and Linux. This article discusses how one can use Osquery to detect anomalies in infrastructure. 


Understanding Osquery

Osquery is an open-source instrumentation tool that helps to describe a device in terms of a structured database. It is used for operating system monitoring and analytics. The tables inside the database contain information on active processes, kernel modules, network connections, device events, CPU utilization, and file system information. Osquery can be installed in all popular variations of Windows, Mac, and all popular variations of Linux. Having a tool that can work in all the OSes in the organization goes a long way in streamlining security monitoring. 


On a high level, Osquery enables two functions. It provides an SQL interface that engineers can use to analyze what is happening in a device. The second function is a monitoring daemon that allows one to schedule SQL queries. This daemon can aggregate the query results, log them and alert when there are state changes. The output from this daemon can provide information about security and performance. Organizations can integrate this output into their centralized logging system to maintain a historical database of device states. 


Osquery represents all information about a device inside the tables. Rows in the tables represent a state change in the OS. Every event that happens inside a device like a new printer being added or a network being disconnected is added as rows in various tables. Some of the information that does not conform to the table structure, like the OS version and CPU specification are represented as tables with a single row.  User actions are also logged to tables. For example, if a user tries to list a directory, the user’s command and its output are logged as rows. 



Understanding Osquery SQL

Osquery provides a shell for arbitrary SQL execution and analysis. Osquery SQL is similar to SQLite SQL with some added functions. One can start the shell by typing the command ‘osqueryi’ in the command line interface. Inside the shell, one can get a list of all the tables by using the ‘.tables’ command. To understand the schema of a table, one can type the below command.

.schema <table_name>


Every query execution represents a monitored view of the device. Users can type ‘SELECT’ statements to understand the content of the tables. For example to get the details of users in a device, one can type the below command. 

Select uid,username from users;


Osquery stores the metadata about itself in a table called osquery_info. All standard constructs of SQL like WHERE, USING, ORDER BY, etc are supported by Osquery. One can also use the JOIN query to combine the information in two tables.


Some of the tables inside Osquery can only be used with a ‘WHERE’ clause. This is because the output will be very huge and will not serve any real purpose for the user. An example of such a table is the ‘file’ table. It can only be used with a path or directory as an input parameter. Statements like ‘INSERT’, ‘DELETE’ etc do not have any meaning within Osquery and hence will not provide any output. 


The results from Osquery scheduled runs are written in the below file.



Osquery provides support for logger plugins. The plugins help one to export Osquery logs to a centralized login system and visualization dashboard. It comes with a set of built-in plugins that can be used to log to file systems, AWS Kinesis, Firehose, and Kafka. In case developers want to build custom functionality to integrate with their infrastructure or reporting mechanism, they can use the Osquery extensions. 


Using Osquery To Detect Anomalies

Detecting anomalies in any scenario requires one to have a clear idea of what is existing in the system before an event happens. Osquery helps engineers to establish this baseline for their infrastructure or devices. Once installed, it auto-populates the tables with all the information about the device including but not limited to processes, network connections, files, and configuration. The tables are then updated in real-time whenever a change happens inside the system. All that is required to detect an anomaly then is to run a scheduled query and alert when an unusual event is detected. The below sections detail different variations of this method to detect anomalies in a device.


Using The Processes Table

The ‘processes’ table details all the currently running applications in a device. This table can be used to establish a baseline of current running applications and then to monitor any newly added process. To understand the processes that are currently, one can type the below command.

SELECT pid, name, path FROM processes


Other than monitoring for newly added processes, one can also watch out for typical intrusion evidence that attackers leave in the system. For example, attackers often start a process and delete the binary executable file using which it was started. The ‘path’ column in the ‘process’ table shows the path of the file from which the process started. It can be used to watch out for such patterns. 


Using The Startup Applications List

Most malware configures the exploited systems to start specific applications after booting. Osquery can help to establish a baseline of auto-starting applications and monitor them. For example, one can get the list of applications that get started after booting in a Mac device using the below command.

SELECT * FROM startup_items;


Monitoring Network Activity 

The processes_open_sockets table in Osquery provides insights on the outbound and inbound network connections. It provides information on ports and IP addresses used by the processes to establish network connections. Monitoring these network connections can help unearth malicious applications. For example, malicious applications often communicate with Known evil IP addresses through specific ports. One can monitor the network connections using the below command.

select * from process_open_sockets where local_port=22222 and remote_address !=’′ and remote_address !=’::’;


Monitoring Suspicious Resource Utilization

One way of identifying suspicious activity is by monitoring the usage of system resources like CPU and memory. For example, in crypto-jacking it is quite common to find rogue applications maxing out the device’s CPU and memory. The cpu_time table gives details about the idle time and usage time of the CPU. This table can be used to calculate the applications that used CPU resources. For example, one can get the idle_time of the CPU using the below command.

SELECT  SUM(COALESCE(idle, 0)) + SUM(COALESCE(iowait, 0)) AS idle_time FROM cpu_time;


Connecting Osquery To Grafana

Viewing Osquery metrics in Grafana is a two-step process. Grafana does not directly support Osquery as a data source. Hence the Osquery metrics have to be exported to a database supported by Grafana to add it as a dashboard. 


Here, we will use Prometheus to store data from Osquery. It is a popular time series database and monitoring platform supported by Grafana. It comes with open-source exporters for Osquery. Head to this GitHub link to download the Prometheus exporter. 


Developers can configure the exporter with the Osquery SQL that they need to export to Prometheus. For example, to export the number of applications that a device runs as part of the start-up sequence, one can use the below configuration file. 

  # osqueryi binary. Looked up in PATH if not specified as absolute path
  osquery: "osqueryi"
  # timeout for a single call to osqueryi
  timeout: 10s

   	# a list of counter definitions
    	name: history_lines_count
    	# metric help
    	help: "number of entries in the history"
    	# the query to be executed via osqueryi
    	query: "select count(*) as count from startup_items"
	# reference to the name of the result column
	valueidentifier: count


Once the metrics reach Prometheus, developers can use it to create the required dashboards in Grafana. Grafana dashboards are defined in terms of YAML files. A sample dashboard using Osquery output can be found here


As evident from the above sequence, anomaly detection using Osquery and Grafana is not easy to execute and contains multiple steps. This is where MetricFire can help. 


Osquery With MetricFire

MetricsFire is a hosted infrastructure monitoring solution that comes with everything built in to accept data from your Osquery deployment. It is a combination of hosted Graphite and Grafana. Here Graphite accepts the query output from Osquery and serves as a data source for Grafana. There is no additional effort required to separately set up a data source and then Grafana since everything is hosted as a service. 



Osquery provides a useful tool to represent the state of your infrastructure and devices as a set of tables. Its SQL interface helps one to analyze the state of the device through simple queries. It can be used to establish a baseline for your device and then monitor the changes that happen to it in real-time. Integrating Osquery into a visualization solution like Grafana is a tedious process since it involves setting up data sources and then configuring Grafana to work with them. MetricsFire is a hosted monitoring solution that can integrate with Osquery through a set of easy steps. 

You might also like other posts...
metricfire Apr 10, 2024 · 9 min read

Step-by-Step Guide to Monitoring Your SNMP Devices With Telegraf

Monitoring SNMP devices is crucial for maintaining network health and security, enabling early detection... Continue Reading

metricfire Mar 13, 2024 · 8 min read

Easy Guide to monitoring uWSGI Using Telegraf and MetricFire

It's important to monitor uWSGI instances to ensure their stability, performance, and availability, helping... Continue Reading

metricfire Mar 12, 2024 · 8 min read

How to Monitor ClickHouse With Telegraf and MetricFire

Monitoring your ClickHouse database is a proactive measure that helps maintain its health and... Continue Reading

header image

We strive for
99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required