(⌐■_■) Utkar5hM
2024/11/22

Performance Schema vs eBPF based Query Profiling

So, it all started at IRIS NITK, where I had the chance to work with some seriously cracked individuals, and we were testing out different distributed strategies for MariaDB. While I didn’t have a ton of time to dive super deep, I was picking up some gems here and there. The real kicker? A bunch of the guys were working on eBPF based tools for their own projects/academics, and that’s where I first got exposed to eBPF. Safe to say, the tech rizz was strong.

Once my summer internship wrapped up and I had some breathing room, I decided to channel that extra time into learning eBPF by building my own monitoring tool. I finished building it but didn’t get around to testing or using it for quite some time.

So, I finally decided revisiting it to see if it’s actually any good.

Project Github Link

Before starting the comparison, let’s first see what the program does and how it works. The MariaDB/MySQL eBPF exporter collects query latencies—just as the name suggests—by hooking into the initial query execution function named dispatch_command() (in both MySQL/MariaDB, though they behave a bit differently) and then normalizes the queries to help categorize similar ones together. It then exports this data as Prometheus metrics, which can be collected by a Prometheus server and visualized in Grafana.

I used libbpfgo for building the program, mainly for its CORE (Compile Once, Run Everywhere) capability, and also because I wanted to use Golang for the userspace part of the program. There was a challenge though: the compiled MariaDB/MySQL binaries have mangled function names, and libbpfgo can’t easily find and hook into those like bpftrace can (at least, I didn’t know a way). To handle this, I wrote a build script that uses the nm tool to extract the actual function name from the DB binary, stores it, and compiles the BPF object to hook into it. This object is then embedded into the Go program using Go’s embed package, ensuring it always runs with the DB version it was built for.

If you’ve ever worked with libbpf, you’d know how tricky it can be to set up everything properly. So, I also made sure the whole thing could be compiled and run inside Docker. I wrote scripts to build it in a container, assuming the required kernel packages are available in Ubuntu’s default repos. Once it can run in a container, we can also attach it to a database running on the host or in another container, by starting the container in the respective process namespaces.

Now that we know what the program can do, the next question is how it affects performance compared to running raw (with no monitoring) and existing monitoring methods like performance_schema and slow query logs. Since there’s already a solid comparison between the latter two (PERFORMANCE_SCHEMA vs Slow Query Log - Percona), I only compared my tool with raw and performance_schema.

Some points to consider before testing:

It also seemed interesting to test how the tool behaves when collecting only slower queries—say, queries longer than 200ms or 1000ms. This makes sense, but the Percona blog explains why it’s usually not the best idea and suggests a better approach worth trying out later.

Benchmarking Configuration

sysbench scripts

Initially, I wanted to emulate a scenario where scripts would keep CPU and memory usage around 50–60% under raw conditions. This would let me observe how performance drops with the monitoring tool running. But keeping this balance consistently turned out to be complicated.

Instead, I switched to running default sysbench scripts. While this maxes out CPU/memory, it introduces a useful metric: the number of transactions executed, which helps gauge throughput and performance impact.

sysbench oltp_common --db-driver=mysql --table-size=100000 --mysql-user=root --mysql-password=L0CK3D_1N --mysql-port=3306 --mysql-host=10.128.0.3 --mysql-db=sysbenchtest prepare
sysbench oltp_read_write --db-driver=mysql --table-size=100000 --mysql-user=root --mysql-password=L0CK3D_1N --mysql-port=3306 --mysql-host=10.128.0.3 --mysql-db=sysbenchtest --threads=12 --report-interval=1 run

I also turned off indexing so queries would be slower, making it easier to generate a nice Grafana graph:

grafana-dashboard-screenshot

Query for the above graph:

rate(ebpf_exporter_query_latencies_histogram_sum[1m])/rate(ebpf_exporter_query_latencies_histogram_count[1m])

Before each test, I ran a cleanup to clear caches:

sysbench oltp_common --db-driver=mysql --table-size=100000 --mysql-user=root --mysql-password=L0CK3D_1N --mysql-port=3306 --mysql-host=10.128.0.3 --mysql-db=sysbenchtest cleanup

performance schema

For configuring performance schema, I followed this guide: Query Profiling Using Performance Schema

Example of how query latency data looked:

performance-schema-query-latency-ss

Process Exporter

I had the Process Exporter logging CPU and memory usage, but unfortunately missed the Prometheus retention window. No screenshots or backups this time.

System Configuration

GCP instances: 2x e2-standard-2 (2 vCPU, 1 core, 8 GB memory)
OS: Debian

Benchmarking

I started by having the eBPF exporter capture all queries, even 0ms latency ones, and it did use quite a lot of CPU:

htop-ss

This changed significantly when filtering to only collect queries over 200ms—CPU and memory usage dropped, and the benchmarks reflected that.

Results

Each test was run three times and averaged. Full results can be found here.

test - averagesread_querieswrite_queriestotal_queriestransactionstotal_timetotal_eventsmin_latencyavg_latencymax_latency95th_latency
ebpf21085266024363012180150609300.03971506098.34333333347.81666667375.1389.27
ebpf>200ms21550416157263078630153931.5300.0355153931.57.42546.77378.9794.1
performance_schema1821535.333520438.66672602193.333130109.6667300.05895130109.66678.64833333356.50833333318.80586.785
raw2120234.667605781.33333028906.667151445.3333300.0370667151445.33338.20666666747.55166667365.941666790.595

Graphs from Google Sheets:

no-of-transactions

no-of-queries

average-latency

chill-guy