Linux offers a lot of commands to help users gather information about their host operating system: listing files or directories to check attributes; querying to see what packages are installed, processes are running, and services start at boot; or learning about the system's hardware.
Each command uses its own output format to list this information. You need to use tools like grep
, sed
, and awk
to filter the results to find specific information. Also, a lot of this information changes frequently, leading to changes in the system's state.
It would be helpful to view all of this information formatted like the output of a database SQL query. Imagine that you could query the output of the ps
and rpm
commands as if you were querying an SQL database table with similar names.
Fortunately, there is a tool that does just that and much more: Osquery is an open source "SQL powered operating system instrumentation, monitoring, and analytics framework."
Many applications that handle security, DevOps, compliance, and inventory management (to name a few) depend upon the core functionalities provided by Osquery at their heart.
Install Osquery
Osquery is available for Linux, macOS, Windows, and FreeBSD. Install the latest version for your operating system by following its installation instructions. (I'll use version 4.7.0 in these examples.)
After installation, verify it's working:
$ rpm -qa | grep osquery
osquery-4.7.0-1.linux.x86_64
$
$ osqueryi --version
osqueryi version 4.7.0
$
Osquery components
Osquery has two main components:
- osqueri is an interactive SQL query console. It is a standalone utility that does not need super-user privileges (unless you are querying tables that need that level of access).
- osqueryd is like a monitoring daemon for the host it is installed on. This daemon can schedule queries to execute at regular intervals to gather information from the infrastructure.
You can run the osqueri utility without having the osqueryd daemon running. Another utility, osqueryctl, controls starting, stopping, and checking the status of the daemon.
$ rpm -ql osquery-4.8.0-1.linux.x86_64 | grep bin
/usr/bin/osqueryctl
/usr/bin/osqueryd
/usr/bin/osqueryi
$
Use the osqueryi interactive prompt
You interact with Osquery much like you would use an SQL database. In fact, osqueryi is a modified version of the SQLite shell. Running the osqueryi
command drops you into an interactive shell where you can run commands specific to Osquery, which often start with a .
:
$ osqueryi
Using a virtual database. Need help, type '.help'
osquery>
To quit the interactive shell, run the .quit
command to get back to the operating system's shell:
osquery>
osquery> .quit
$
Find out what tables are available
As mentioned, Osquery makes data available as the output of SQL queries. Information in databases is often saved in tables. But how can you query these tables if you don't know their names? Well, you can run the .tables
command to list all the tables that you can query. If you are a long-time Linux user or a sysadmin, the table names will be familiar, as you have been using operating system commands to get this information:
osquery> .tables
=> acpi_tables
=> apparmor_events
=> apparmor_profiles
=> apt_sources
<< snip >>
=> arp_cache
=> user_ssh_keys
=> users
=> yara
=> yara_events
=> ycloud_instance_metadata
=> yum_sources
osquery>
Check the schema for individual tables
Now that you know the table names, you can see what information each table provides. As an example, choose processes
, since the ps
command is used quite often to get this information. Run the .schema
command followed by the table name to see what information is saved in this table. If you want to check the results, you could quickly run ps -ef
or ps aux
and compare the output with the contents of the table:
osquery> .schema processes
CREATE TABLE processes(`pid` BIGINT, `name` TEXT, `path` TEXT, `cmdline` TEXT, `state` TEXT, `cwd` TEXT, `root` TEXT, `uid` BIGINT, `gid` BIGINT, `euid` BIGINT, `egid` BIGINT, `suid` BIGINT, `sgid` BIGINT, `on_disk` INTEGER, `wired_size` BIGINT, `resident_size` BIGINT, `total_size` BIGINT, `user_time` BIGINT, `system_time` BIGINT, `disk_bytes_read` BIGINT, `disk_bytes_written` BIGINT, `start_time` BIGINT, `parent` BIGINT, `pgroup` BIGINT, `threads` INTEGER, `nice` INTEGER, `is_elevated_token` INTEGER HIDDEN, `elapsed_time` BIGINT HIDDEN, `handle_count` BIGINT HIDDEN, `percent_processor_time` BIGINT HIDDEN, `upid` BIGINT HIDDEN, `uppid` BIGINT HIDDEN, `cpu_type` INTEGER HIDDEN, `cpu_subtype` INTEGER HIDDEN, `phys_footprint` BIGINT HIDDEN, PRIMARY KEY (`pid`)) WITHOUT ROWID;
osquery>
To drive home the point, use the following command to see the schema for the RPM packages and compare the information with rpm -qa
and rpm -qi
operating system commands:
osquery>
osquery> .schema rpm_packages
CREATE TABLE rpm_packages(`name` TEXT, `version` TEXT, `release` TEXT, `source` TEXT, `size` BIGINT, `sha1` TEXT, `arch` TEXT, `epoch` INTEGER, `install_time` INTEGER, `vendor` TEXT, `package_group` TEXT, `pid_with_namespace` INTEGER HIDDEN, `mount_namespace_id` TEXT HIDDEN, PRIMARY KEY (`name`, `version`, `release`, `arch`, `epoch`, `pid_with_namespace`)) WITHOUT ROWID;
osquery>
You learn more in Osquery's tables documentation.
Use the PRAGMA command
In case that schema information is too cryptic for you, there is another way to print the table information in a verbose, tabular format: the PRAGMA
command. For example, I'll use PRAGMA
to see information for the rpm_packages
table in a nice format:
osquery> PRAGMA table_info(rpm_packages);
One benefit of this tabular information is that you can focus on the field you want to query and see the type of information that it provides:
osquery> PRAGMA table_info(users);
+-----+-------------+--------+---------+------------+----+
| cid | name | type | notnull | dflt_value | pk |
+-----+-------------+--------+---------+------------+----+
| 0 | uid | BIGINT | 1 | | 1 |
| 1 | gid | BIGINT | 0 | | 0 |
| 2 | uid_signed | BIGINT | 0 | | 0 |
| 3 | gid_signed | BIGINT | 0 | | 0 |
| 4 | username | TEXT | 1 | | 2 |
| 5 | description | TEXT | 0 | | 0 |
| 6 | directory | TEXT | 0 | | 0 |
| 7 | shell | TEXT | 0 | | 0 |
| 8 | uuid | TEXT | 1 | | 3 |
+-----+-------------+--------+---------+------------+----+
osquery>
Run your first query
Now that you have all the required information from the table, the schema, and the items to query, run your first SQL query to view the information. The query below returns the users that are present on the system and each one's user ID, group ID, home directory, and default shell. Linux users could get this information by viewing the contents of the /etc/passwd
file and doing some grep
, sed
, and awk
magic.
osquery>
osquery> select uid,gid,directory,shell,uuid FROM users LIMIT 7;
+-----+-----+----------------+----------------+------+
| uid | gid | directory | shell | uuid |
+-----+-----+----------------+----------------+------+
| 0 | 0 | /root | /bin/bash | |
| 1 | 1 | /bin | /sbin/nologin | |
| 2 | 2 | /sbin | /sbin/nologin | |
| 3 | 4 | /var/adm | /sbin/nologin | |
| 4 | 7 | /var/spool/lpd | /sbin/nologin | |
| 5 | 0 | /sbin | /bin/sync | |
| 6 | 0 | /sbin | /sbin/shutdown | |
+-----+-----+----------------+----------------+------+
osquery>
Run queries without entering interactive mode
What if you want to run a query without entering the osqueri interactive mode? This could be very useful if you are writing shell scripts around it. In this case, you could echo
the SQL query and pipe it to osqueri right from the Bash shell:
$ echo "select uid,gid,directory,shell,uuid FROM users LIMIT 7;" | osqueryi
+-----+-----+----------------+----------------+------+
| uid | gid | directory | shell | uuid |
+-----+-----+----------------+----------------+------+
| 0 | 0 | /root | /bin/bash | |
| 1 | 1 | /bin | /sbin/nologin | |
| 2 | 2 | /sbin | /sbin/nologin | |
| 3 | 4 | /var/adm | /sbin/nologin | |
| 4 | 7 | /var/spool/lpd | /sbin/nologin | |
| 5 | 0 | /sbin | /bin/sync | |
| 6 | 0 | /sbin | /sbin/shutdown | |
+-----+-----+----------------+----------------+------+
$
Learn what services start when booting up
Osquery can also return all the services set to start at boot. For example, to query the startup_items
table and get the name, status, and path of the first five services that run at startup:
osquery> SELECT name,type,status,path FROM startup_items LIMIT 5;
name = README
type = Startup Item
status = enabled
path = /etc/rc.d/init.d/README
name = anamon
type = Startup Item
status = enabled
path = /etc/rc.d/init.d/anamon
name = functions
type = Startup Item
status = enabled
path = /etc/rc.d/init.d/functions
name = osqueryd
type = Startup Item
status = enabled
path = /etc/rc.d/init.d/osqueryd
name = AT-SPI D-Bus Bus
type = Startup Item
status = enabled
path = /usr/libexec/at-spi-bus-launcher --launch-immediately
osquery>
Look up ELF information for a binary
Imagine you want to find out more details about the ls
binary. Usually, you would do it with the readelf -h
command followed by the ls
command's path. You can query the elf_info
table with Osquery and get the same information:
osquery> SELECT * FROM elf_info WHERE path="/bin/ls";
class = 64
abi = sysv
abi_version = 0
type = dyn
machine = 62
version = 1
entry = 24064
flags = 0
path = /bin/ls
osquery>
Now you have a taste of how to use osqueri to look for information of interest to you. However, this information is stored on a huge number of tables; one system I queried had 156 different tables, which can be overwhelming:
$ echo ".tables" | osqueryi | wc -l
156
$
To make things easier, you can start with these tables to get information about your Linux system:
System information table
osquery> select * from system_info;
System limit information
osquery> select * from ulimit_info;
Files opened by various processes
osquery> select * from process_open_files;
Open ports on a system
osquery> select * from listening_ports;
Running processes information
osquery> select * from processes;
Installed packages information
osquery> select * from rpm_packages;
User login information
osquery> select * from last;
System log information
osquery> select * from syslog_events;
Learn more
Osquery is a powerful tool that provides a lot of host information that can be used to solve various use cases. You can learn more about Osquery by reading its documentation.
2 Comments