Skip to main content

PromQL - Query Language

https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

Additional reading in the PromQL book

Prometheus provides a functional query language called PromQL (Prometheus Query Language) that allows users to select and aggregate time series data in real time. The result of an expression can be shown as a graph, viewed as tabular data in the Prometheus expression browser, or consumed by external systems via the HTTP API.

Why is it necessary to know this?

Because you're going to use it in Grafana!

Queries​

Returns all time series for the metric.

node_memory_MemFree_bytes

Return all time series with the metric http_requests_total and the provided job and handler labels.

node_memory_MemFree_bytes{instance="localhost:9100", job="david-pc"}

Return a whole range of time (in this case 5 minutes up to the query time) for the same vector, making it a range vector.

node_memory_MemFree_bytes{instance="localhost:9100", job="david-pc"}[5m]

Time can be specified in:

  • ms milliseconds
  • s seconds
  • m minutes
  • h hours
  • d days - assuming a day always has 24h
  • w weeks - assuming a week always has 7d
  • y years - assuming a year always has 365d

Note that an expression resulting in a range vector cannot be graphed directly, but can be viewed in the tabular ("Console") view of the expression browser.

Using regular expressions, you can select time series only for jobs whose name matches a certain pattern, in this case, all jobs ending with server:

node_cpu_scaling_frequency_hertz{job=~".*pc"}
  • **=** select labels that are exactly equal to the provided string.
  • **!=** select labels that are not equal to the provided string.
  • **=~** select labels that match the regex of the provided string. You'll practically use this one.
  • **!~** select labels that do not match the regex of the provided string.

All regular expressions in Prometheus use RE2 syntax. To learn regex, I recommend the site https://regexlearn.com/

To select all HTTP status codes except 4xx, you can run:

prometheus_http_requests_total{code!~"4.."}

Operators​

Binary arithmetic operators​

  • + (addition)
  • - (subtraction)
  • * (multiplication)
  • / (division)
  • % (modulo)
  • ^ (power/exponentiation)

Binary arithmetic operators are defined between pairs of scalar/scalar, vector/scalar, and vector/vector values.

Between two scalars, the behavior is obvious: they evaluate to another scalar that is the result of the operator applied to the two scalar operands.

Between an instant vector and a scalar, the operator is applied to the value of each data sample in the vector. For example, if an instant vector of time series is multiplied by 2, the result is another vector in which each sample value from the original vector is multiplied by 2. The metric name is dropped.

Between two instant vectors, a binary arithmetic operator is applied to each entry in the left-hand vector and its corresponding element in the right-hand vector. The result is propagated into the result vector with the grouping labels becoming the output label set. The metric name is dropped. Entries for which no matching entry in the right-hand vector can be found are not part of the result.

Binary comparison operators

  • == equal
  • != not equal
  • > greater than
  • < less than
  • >= greater or equal
  • <= less or equal

Comparison operators are defined between pairs of scalar/scalar, vector/scalar, and vector/vector values. By default they filter, but their behavior can be modified by providing a bool (boolean) after the operator, which will return 0 or 1 for the value instead of filtering.

Between two scalars, the bool modifier must be provided and these operators result in another scalar that is 0 (false) or 1 (true), depending on the comparison result.

Between an instant vector and a scalar, these operators are applied to the value of each data sample in the vector, and vector elements for which the comparison result is false are dropped from the resulting vector. If the bool modifier is provided, vector elements that would be dropped will have the value 0 and vector elements that would be kept will have the value 1. The metric name will be dropped if the bool modifier is provided.

Between two instant vectors, these operators behave as a filter by default, applied to matching entries. Vector elements for which the expression is not true or which do not find a match on the other side of the expression are dropped from the result, while the others are propagated into a result vector with the grouping labels becoming the output label set. If the bool modifier is provided, vector elements that would be dropped will have the value 0 and vector elements that would be kept will have the value 1, with the grouping labels again becoming the output label set. The metric name will be dropped if the bool modifier is provided.

Logical/set binary operators​

These logical/set binary operators are defined only between instant vectors:

  • and intersection
  • or union
  • unless complement

vector1 and vector2 results in a vector consisting of the elements of vector1 for which there are elements in vector2 with exactly matching label sets. Other elements are dropped. The metric name and values are carried over from the left-hand side vector.

vector1 or vector2 results in a vector that contains all the original elements (label sets + values) of vector1 and additionally all elements of vector2 which do not have matching label sets in vector1.

vector1 unless vector2 results in a vector consisting of the elements of vector1 for which there are no elements in vector2 with exactly matching label sets. All matching elements in both vectors are dropped.

Binary operator precedence​

The following list shows the precedence of binary operators in Prometheus, from highest to lowest.

  1. ^
  2. *, /, %, atan2
  3. +, -
  4. ==, !=, <=, <, >=, >
  5. and, unless
  6. or

Operators on the same precedence level are left-associative. For example, 2 * 3 % 2 is equivalent to (2 * 3) % 2. However, ^ is right-associative, so 2 ^ 3 ^ 2 is equivalent to 2 ^ (3 ^ 2).

Vector matching​

Operations between vectors attempt to find a matching element in the right-hand side vector for each entry in the left-hand side. There are two basic types of matching behavior: one-to-one and many-to-one/one-to-many.

Vector matching keywords​

These vector matching keywords allow matching between series with different label sets by providing:

  • on
  • ignoring

The label lists provided to matching keywords will determine how vectors are combined. Examples can be found in one-to-one vector matches and in many-to-one and one-to-many vector matches.

Group modifiers​

These group modifiers allow many-to-one/one-to-many vector matching:

  • group_left
  • group_right

Label lists can be provided to the group modifier which contain labels from the "one" side to be included in the result metrics.

Many-to-one and one-to-many matching are advanced use cases that should be carefully considered. Often, proper use of ignoring(<labels>) provides the desired result.

Group modifiers can only be used for comparison and arithmetic. Operations like and, unless, and or match all possible entries in the right vector by default.

One-to-one vector matches One-to-one finds a unique pair of entries from each side of the operation. In the default case, that is an operation following the format vector1 <operator> vector2. Two entries match if they have exactly the same set of labels and corresponding values. The ignoring keyword allows ignoring certain labels when matching, while the on keyword allows reducing the set of considered labels to a provided list:

method_code:http_errors:rate5m{code="500"} / ignoring(code) method:http_requests:rate5m This returns a result vector containing the fraction of HTTP requests with status code 500 for each method, measured over the last 5 minutes. Without ignoring(code), there would be no match as the metrics do not share the same set of labels. Entries with methods put and del have no match and will not show in the result:

{method="get"} 0.04 // 24 / 600 {method="post"} 0.05 // 6 / 120

Many-to-one and one-to-many vector matches Many-to-one and one-to-many matching refer to the case where each element on the "one" side can match with multiple elements on the "many" side. This has to be explicitly requested using the group_left or group_right modifiers, where left/right determines which vector has the higher cardinality.

<vector expr> <bin-op> ignoring(<label list>) group_left(<label list>) <vector expr> <vector expr> <bin-op> ignoring(<label list>) group_right(<label list>) <vector expr> <vector expr> <bin-op> on(<label list>) group_left(<label list>) <vector expr> <vector expr> <bin-op> on(<label list>) group_right(<label list>) <vector expr>

The label list provided with the group modifier contains additional labels from the "one" side to be included in the result metrics. For on, a label can only appear in one of the lists. Every time series of the result vector must be uniquely identifiable.

Example query:

method_code:http_errors:rate5m / ignoring(code) group_left method:http_requests:rate5m In this case, the left vector contains more than one entry per method label value. Thus, we indicate this using group_left. The elements from the right side are now matched with multiple elements with the same method label on the left:

{method="get", code="500"} 0.04 // 24 / 600 {method="get", code="404"} 0.05 // 30 / 600 {method="post", code="500"} 0.05 // 6 / 120 {method="post", code="404"} 0.175 // 21 / 120

Aggregation operators

Prometheus supports the following built-in aggregation operators which can be used to aggregate the elements of a single instant vector, resulting in a new vector of fewer elements with aggregated values:

sum (calculate sum over dimensions) min (select minimum over dimensions) max (select maximum over dimensions) avg (calculate the average over dimensions) group (all values in the resulting vector are 1) stddev (calculate population standard deviation over dimensions) stdvar (calculate population standard variance over dimensions) count (count number of elements in the vector) count_values (count number of elements with the same value) bottomk (smallest k elements by sample value) topk (largest k elements by sample value) quantile (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions) These operators can be used to aggregate over all label dimensions or preserve distinct dimensions by including a without or by clause. These clauses may be used before or after the expression.

<aggr-op> [without|by (<label list>)] ([parameter,] <vector expression>) or

<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)] label list is a list of unquoted labels that may include a trailing comma, i.e., both (label1, label2) and (label1, label2,) are valid syntax.

without removes the listed labels from the result vector, while all other labels are preserved in the output. by does the opposite and drops labels that are not listed in the by clause, even if their label values are identical between all elements of the vector.

parameter is only required for count_values, quantile, topk, and bottomk.

count_values outputs one time series per unique sample value. Each series has an additional label. The name of that label is given by the aggregation parameter, and the label value is the unique sample value. The value of each time series is the number of times that sample value was present.

topk and bottomk are different from other aggregators in that a subset of the input samples, including the original labels, is returned in the result vector. by and without are only used to bucket the input vector.

quantile calculates the φ-quantile, the value ranked at number φ*N among the N metric values of the aggregated dimensions. φ is provided as the aggregation parameter. For example, quantile(0.5, ...) calculates the median, quantile(0.95, ...) the 95th percentile. For φ = NaN, NaN is returned. For φ < 0, -Inf is returned. For φ > 1, +Inf is returned.

Example:

If the metric http_requests_total had time series that fan out by application, instance, and group labels, we could calculate the total number of HTTP requests seen per application and group across all instances via:

sum without (instance) (http_requests_total) Which is equivalent to:

sum by (application, group) (http_requests_total) If we are only interested in the total of HTTP requests we have seen in all applications, we could simply write:

sum(http_requests_total) To count the number of binaries running each build version, we could write:

count_values("version", build_version) To get the 5 largest HTTP request counts across all instances, we could write:

topk(5, http_requests_total)

Operators for native histograms

Native histograms are an experimental feature. Ingestion of native histograms has to be enabled via a feature flag. Once native histograms have been ingested, they can be queried (even after the feature flag has been disabled again). However, operator support for native histograms is still very limited.

The logical/set binary operators work as expected even if histogram samples are involved. They only check for the existence of a vector element and do not change their behavior depending on the sample type of an element (float or histogram).

The binary operator + between two native histograms and the sum aggregation operator to aggregate native histograms are fully supported. Even if the histograms involved have different bucket layouts, the buckets are automatically converted appropriately so that the operation can be performed. (With the currently supported bucket schemas, this is always possible.) If either of the operands has to sum a mix of histogram samples and float samples, the corresponding vector element is removed from the output vector entirely.

All other operators do not behave in a meaningful way. They treat the histogram sample as if it were a float sample of value 0 or (in the case of arithmetic operations between a scalar and a vector) leave the histogram sample unchanged. This behavior will change to a meaningful one before native histograms are a stable feature.

This documentation is open source. Please help improve it by filing issues or pull requests.

Subquery​

Return the 5-minute rate of the http_requests_total metric for the past 30 minutes, with a resolution of 1 minute.

rate(http_requests_total[5m])[30m:1m]

This is an example of a nested subquery. The subquery for the deriv function uses the default resolution. Note that unnecessarily using subqueries is not sensible.

max_over_time(deriv(rate(distance_covered_total[5s])[30s:5s])[10m:])

Using functions, operators, etc. Return the per-second rate for all time series with the http_requests_total metric name, as measured over the last 5 minutes:

rate(http_requests_total[5m])

Assuming that the http_requests_total time series all have the labels job (fanout by job name) and instance (fanout by instance of the job), we might want to sum up the rate over all instances, so we get fewer output time series, but still preserve the job dimension:

sum by (job) (
rate(http_requests_total[5m])
)

If we have two different metrics with the same dimensional labels, we can apply binary operators to them and elements on both sides with the same label set will be matched and propagated to the output. For example, this expression returns the unused memory in MiB for every instance (on a fictional cluster scheduler exposing these metrics about the instances it runs):

(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024

The same expression, but summed up by application, could be written like this:

sum by (app, proc) (
instance_memory_limit_bytes - instance_memory_usage_bytes
) / 1024 / 1024

If the same fictional cluster scheduler exposed CPU usage metrics like the following for every instance:

instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="elephant", proc="worker", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="turtle", proc="api", rev="4d3a513", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="fox", proc="widget", rev="4d3a513", env="prod", job="cluster-manager"}
...

We could get the top 3 CPU users grouped by application (app) and process type (proc) like this:

topk(3, sum by (app, proc) (rate(instance_cpu_time_ns[5m])))

Assuming this metric contains one time series per running instance, you could count the number of running instances per application like this:

count by (app) (instance_cpu_time_ns)

Functions​

https://prometheus.io/docs/prometheus/latest/querying/functions/

There's no point covering all functions here, just look at the link above. But for knowledge purposes, I believe the main ones for the DevOps world can be covered below. There's no point memorizing them, but know some of them and consult the documentation as needed. We'll see more when we cover Grafana.

We use functions to generate a new group of samples. Below is the function and the input it needs function(params).

  • abs(v instant-vector): returns the input vector with all sample values converted to their absolute value.
  • absent(v instant-vector): returns an empty vector if the vector passed to it has any elements or the value 1 if the vector passed to it has no elements. This is useful for alerting on when no time series exist for a given metric name and label combination.
    • absent(nonexistent{job="myjob"})
  • absent_over_time(v range-vector): absent_over_time(v range-vector) returns an empty vector if the range vector passed to it has any elements or a 1-element vector with the value 1 if the range vector passed to it has no elements.
  • ceil(v instant-vector): rounds the sample values of all elements in v up to the nearest integer.
  • changes(v range-vector): For each input time series, changes(v range-vector) returns the number of times its value has changed within the provided time range as an instant vector.
  • clamp(v instant-vector, min scalar, max scalar): clamp(v instant-vector, min scalar, max scalar) clamps the sample values of all elements in v to have a lower limit of min and an upper limit of max.
  • clamp_max(v instant-vector, max scalar): clamps the sample values of all elements in v to have an upper limit of max.
  • clamp_min(v instant-vector, min scalar): clamps the sample values of all elements in v to have a lower limit of min.
  • day_of_month(v=vector(time()) instant-vector): day_of_month(v=vector(time()) instant-vector) returns the day of the month for each of the given times in UTC. Returned values are from 1 to 31.
  • day_of_week(v=vector(time()) instant-vector): day_of_week(v=vector(time()) instant-vector) returns the day of the week for each of the given times in UTC. Returned values are from 0 to 6, where 0 means Sunday, etc.
  • day_of_year(v=vector(time()) instant-vector): day_of_year(v=vector(time()) instant-vector) returns the day of the year for each of the given times in UTC. Returned values are from 1 to 365 for non-leap years, and 1 to 366 in leap years.
  • days_in_month(v=vector(time()) instant-vector): days_in_month(v=vector(time()) instant-vector) returns number of days in the month for each of the given times in UTC.
  • minute(v=vector(time()) instant-vector): returns the minute of the hour for each of the given times in UTC. Returned values are from 0 to 59.
  • hour(v=vector(time()) instant-vector): hour(v=vector(time()) instant-vector) returns the hour of the day for each of the given times in UTC. Returned values are from 0 to 23.
  • month(v=vector(time()) instant-vector): returns the month of the year for each of the given times in UTC. Returned values are from 1 to 12, where 1 means January, etc.
  • year(v=vector(time()) instant-vector): returns the year for each of the given times in UTC.
  • time(): returns the number of seconds since January 1, 1970 UTC. Note that this does not actually return the current time, but the time at which the expression is to be evaluated.
  • timestamp(v instant-vector): returns the timestamp of each of the samples of the given vector as the number of seconds since January 1, 1970 UTC.
  • delta(v range-vector): calculates the difference between the first and last value of each time series element in a range vector, returning an instant vector with the given deltas and equivalent labels. The following example expression returns the difference in CPU temperature between now and 2 hours ago:
    • delta(cpu_temp_celsius{host="zeus"}[2h])
    • should only be used with gauge metrics
  • idelta(v range-vector): calculates the difference between the last two samples in the range vector v, returning an instant vector with the given deltas and equivalent labels.
  • floor(v instant-vector): rounds the sample values of all elements in v down to the nearest integer.
  • holt_winters(v range-vector, sf scalar, tf scalar): holt_winters(v range-vector, sf scalar, tf scalar) produces a smoothed value for time series based on the range in v. The lower the smoothing factor sf, the more importance is given to old data. The higher the trend factor tf, the more trends in the data are considered. Both sf and tf must be between 0 and 1.
  • increase(v range-vector): calculates the increase in the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. The increase is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if a counter increases only by integer increments.
  • label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, ...) joins all the values of all the src_labels using separator and returns the time series with the label dst_label containing the joined value. There can be any number of src_labels in this function.
    • This example will return a vector with each time series having a foo label with the value a,b,c added to it: label_join(up{job="api-server",src1="a",src2="b",src3="c"}, "foo", ",", "src1", "src2", "src3")
  • label_replace(v instant-vector, dst_label string, replacement string, src_label string, regex string): matches the regular expression regex against the value of the label src_label. If it matches, the value of the label dst_label in the returned time series will be the expansion of replacement, together with the original labels in the input. Capturing groups in the regular expression can be referenced with $1, $2, etc. If the regular expression doesn't match, the time series is returned unchanged.
  • rate(v range-vector): calculates the per-second average rate of increase of the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. Also, the calculation extrapolates to the ends of the time range, allowing for missed scrapes or imperfect alignment of scrape cycles with the range's time period.
  • irate(v range-vector): calculates the per-second instant rate of increase of the time series in the range vector. This is based on the last two data points. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.
  • resets(v range-vector): returns the number of counter resets within the provided time range as an instant vector. Any decrease in value between two consecutive samples is interpreted as a counter reset.
    • resets should only be used with counters.
  • round(v instant-vector, to_nearest=1 scalar): rounds the sample values of all elements in v to the nearest integer. Ties are resolved by rounding up. The optional to_nearest argument allows specifying the nearest multiple to which the sample values should be rounded. This multiple may also be a fraction.
  • sgn(v instant-vector): returns a vector with all sample values converted to their sign, defined as: 1 if v is positive, -1 if v is negative and 0 if v is equal to zero.
  • sort(v instant-vector): returns vector elements sorted by their sample values, in ascending order.
  • sort_desc(v instant-vector): Same as sort, but sorts in descending order.
  • vector(s scalar): returns the scalar s as a vector with no labels.

The following functions allow aggregating each series of a given range vector over time and return an instant vector with per-series aggregation results:

  • avg_over_time(range-vector): the average value of all points in the specified interval.
  • min_over_time(range-vector): the minimum value of all points in the specified interval.
  • max_over_time(range-vector): the maximum value of all points in the specified interval.
  • sum_over_time(range-vector): the sum of all values in the specified interval.
  • count_over_time(range-vector): the count of all values in the specified interval.
  • quantile_over_time(scalar, range-vector): the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval.
  • stddev_over_time(range-vector): the population standard deviation of the values in the specified interval.
  • stdvar_over_time(range-vector): the population standard variance of the values in the specified interval.
  • last_over_time(range-vector): the most recent point value in the specified interval.
  • present_over_time(range-vector): the value 1 for any series in the specified interval. Note that all values in the specified interval have the same weight in the aggregation even if the values are not equally spaced throughout the interval.

There are also logarithmic, trigonometric, and other functions.