Skip to content

Commit c929a5a

Browse files
authoredMay 1, 2025··
Add topkif (#271)
1 parent cb23190 commit c929a5a

File tree

4 files changed

+181
-11
lines changed

4 files changed

+181
-11
lines changed
 

‎apl/aggregation-function/statistical-functions.mdx

+2-1
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ The table summarizes the aggregation functions available in APL. Use all these a
3636
| [stdevif](/apl/aggregation-function/stdevif) | Calculates the standard deviation of an expression in records for which the predicate evaluates to true. |
3737
| [sum](/apl/aggregation-function/sum) | Calculates the sum of an expression across the group. |
3838
| [sumif](/apl/aggregation-function/sumif) | Calculates the sum of an expression in records for which the predicate evaluates to true. |
39-
| [topk](/apl/aggregation-function/topk) | calculates the top values of an expression across the group in a dataset. |
39+
| [topk](/apl/aggregation-function/topk) | Calculates the top values of an expression across the group in a dataset. |
40+
| [topkif](/apl/aggregation-function/topkif) | Calculates the top values of an expression in records for which the predicate evaluates to true. |
4041
| [variance](/apl/aggregation-function/variance) | Calculates the variance of an expression across the group. |
4142
| [varianceif](/apl/aggregation-function/varianceif) | Calculates the variance of an expression in records for which the predicate evaluates to true. |

‎apl/aggregation-function/topk.mdx

+11-10
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: topk
33
description: 'This page explains how to use the topk aggregation function in APL.'
44
---
55

6-
The `topk` aggregation in Axiom Processing Language (APL) allows you to identify the top *k* results based on a specified field. This is especially useful when you want to quickly analyze large datasets and extract the most significant values, such as the top-performing queries, most frequent errors, or highest latency requests.
6+
The `topk` aggregation in Axiom Processing Language (APL) allows you to identify the top `k` results based on a specified field. This is especially useful when you want to quickly analyze large datasets and extract the most significant values, such as the top-performing queries, most frequent errors, or highest latency requests.
77

88
Use `topk` to find the most common or relevant entries in datasets, especially in log analysis, telemetry data, and monitoring systems. This aggregation helps you focus on the most important data points, filtering out the noise.
99

@@ -38,7 +38,7 @@ The main difference between `top` (supported by both SPL and APL) and `topk` (su
3838
</Accordion>
3939
<Accordion title="ANSI SQL users">
4040

41-
In ANSI SQL, identifying the top *k* rows often involves using the `ORDER BY` and `LIMIT` clauses. While the logic remains similar, APL’s `topk` simplifies this process by directly returning the top *k* values of a field in an aggregation.
41+
In ANSI SQL, identifying the top `k` rows often involves using the `ORDER BY` and `LIMIT` clauses. While the logic remains similar, APL’s `topk` simplifies this process by directly returning the top `k` values of a field in an aggregation.
4242

4343
The main difference between SQL’s solution and APL’s `topk` is that `topk` is estimated. This means that APL’s `topk` is faster, less resource intenstive, but less accurate than SQL’s combination of `ORDER BY` and `LIMIT` clauses.
4444

@@ -65,17 +65,17 @@ LIMIT 5;
6565
### Syntax
6666

6767
```kusto
68-
topk(field, k)
68+
topk(Field, k)
6969
```
7070

7171
### Parameters
7272

73-
- **`field`**: The field or expression to rank the results by.
74-
- **`k`**: The number of top results to return.
73+
- `Field`: The field or expression to rank the results by.
74+
- `k`: The number of top results to return.
7575

7676
### Returns
7777

78-
A subset of the original dataset with the top *k* values based on the specified field.
78+
A subset of the original dataset with the top `k` values based on the specified field.
7979

8080
## Use case examples
8181

@@ -162,7 +162,8 @@ This query returns the top 5 cities based on the number of HTTP requests.
162162

163163
## List of related aggregations
164164

165-
- [**top**](/apl/tabular-operators/top-operator): Returns the top values based on a field without requiring a specific number of results (`k`), making it useful when you're unsure how many top values to retrieve.
166-
- [**sort**](/apl/tabular-operators/sort-operator): Orders the dataset based on one or more fields, which is useful if you need a complete ordered list rather than the top *k* values.
167-
- [**extend**](/apl/tabular-operators/extend-operator): Adds calculated fields to your dataset, which can be useful in combination with `topk` to create custom rankings.
168-
- [**count**](/apl/aggregation-function/count): Aggregates the dataset by counting occurrences, often used in conjunction with `topk` to find the most common values.
165+
- [top](/apl/tabular-operators/top-operator): Returns the top values based on a field without requiring a specific number of results (`k`), making it useful when you're unsure how many top values to retrieve.
166+
- [topkif](/apl/aggregation-function/topkif): Returns the top `k` results without filtering. Use topk when you do not need to restrict your analysis to a subset.
167+
- [sort](/apl/tabular-operators/sort-operator): Orders the dataset based on one or more fields, which is useful if you need a complete ordered list rather than the top `k` values.
168+
- [extend](/apl/tabular-operators/extend-operator): Adds calculated fields to your dataset, which can be useful in combination with `topk` to create custom rankings.
169+
- [count](/apl/aggregation-function/count): Aggregates the dataset by counting occurrences, often used in conjunction with `topk` to find the most common values.

‎apl/aggregation-function/topkif.mdx

+167
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
---
2+
title: topkif
3+
description: 'This page explains how to use the topkif aggregation in APL.'
4+
---
5+
6+
The `topkif` aggregation in Axiom Processing Language (APL) allows you to identify the top `k` values based on a specified field, while also applying a filter on another field. Use `topkif` when you want to find the most significant entries that meet specific criteria, such as the top-performing queries from a particular service, the most frequent errors for a specific HTTP method, or the highest latency requests from a specific country.
7+
8+
Use `topkif` when you need to focus on the most important filtered subsets of data, especially in log analysis, telemetry data, and monitoring systems. This aggregation helps you quickly zoom in on significant values without scanning the entire dataset.
9+
10+
<Note>
11+
The `topkif` aggregation in APL is a statistical aggregation that returns estimated results. The estimation provides the benefit of speed at the expense of precision. This means that `topkif` is fast and light on resources even on large or high-cardinality datasets but does not provide completely accurate results.
12+
13+
For completely accurate results, use the [top operator](/apl/tabular-operators/top-operator) together with a filter.
14+
</Note>
15+
16+
## For users of other query languages
17+
18+
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
19+
20+
<AccordionGroup>
21+
<Accordion title="Splunk SPL users">
22+
23+
Splunk SPL does not have a direct equivalent to the `topkif` function. You can achieve similar results by using the top command combined with a where clause, which is closer to using APL’s top operator with a filter. However, APL’s `topkif` provides a more optimized, estimated solution when you want speed and efficiency.
24+
25+
<CodeGroup>
26+
```sql Splunk example
27+
| where method="GET" | top limit=5 status
28+
```
29+
30+
```kusto APL equivalent
31+
['sample-http-logs']
32+
| summarize topkif(status, 5, method == 'GET')
33+
```
34+
</CodeGroup>
35+
36+
</Accordion>
37+
<Accordion title="ANSI SQL users">
38+
39+
In ANSI SQL, identifying the top `k` rows filtered by a condition often involves a WHERE clause followed by ORDER BY and LIMIT. APL’s `topkif` simplifies this by combining the filtering and top-k selection in one function.
40+
41+
<CodeGroup>
42+
```sql SQL example
43+
SELECT status, COUNT(*)
44+
FROM sample_http_logs
45+
WHERE method = 'GET'
46+
GROUP BY status
47+
ORDER BY COUNT(*) DESC
48+
LIMIT 5;
49+
```
50+
51+
```kusto APL equivalent
52+
['sample-http-logs']
53+
| summarize topkif(status, 5, method == 'GET')
54+
```
55+
</CodeGroup>
56+
57+
</Accordion>
58+
</AccordionGroup>
59+
60+
# Usage
61+
62+
## Syntax
63+
64+
```kusto
65+
topkif(Field, k, Condition)
66+
```
67+
68+
## Parameters
69+
70+
- `Field`: The field or expression to rank the results by.
71+
- `k`: The number of top results to return.
72+
- `Condition`: A logical expression that specifies the filtering condition.
73+
74+
## Returns
75+
76+
A subset of the original dataset containing the top `k` values based on the specified field, after applying the filter condition.
77+
78+
# Use case examples
79+
80+
<Tabs>
81+
<Tab title="Log analysis">
82+
83+
Use `topkif` when analyzing HTTP logs to find the top 5 most frequent HTTP status codes for GET requests.
84+
85+
**Query**
86+
87+
```kusto
88+
['sample-http-logs']
89+
| summarize topkif(status, 5, method == 'GET')
90+
```
91+
92+
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20topkif(status%2C%205%2C%20method%20%3D%3D%20'GET')%22%7D)
93+
94+
**Output**
95+
96+
| status | count_ |
97+
|--------|--------|
98+
| 200 | 900 |
99+
| 404 | 250 |
100+
| 500 | 100 |
101+
| 301 | 90 |
102+
| 302 | 60 |
103+
104+
This query groups GET requests by HTTP status and returns the 5 most frequent statuses.
105+
106+
</Tab>
107+
<Tab title="OpenTelemetry traces">
108+
109+
Use `topkif` in OpenTelemetry traces to find the top five services for server.
110+
111+
**Query**
112+
113+
```kusto
114+
['otel-demo-traces']
115+
| summarize topkif(['service.name'], 5, kind == 'server')
116+
```
117+
118+
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B'otel-demo-traces'%5D%20%7C%20summarize%20topkif(%5B'service.name'%5D%2C%205%2C%20kind%20%3D%3D%20'server')%22%7D)
119+
120+
**Output**
121+
122+
| service.name | count_ |
123+
|-------------|------------|
124+
| frontend-proxy | 99,573 |
125+
| frontend | 91,800 |
126+
| product-catalog | 29,696 |
127+
| image-provider | 25,223 |
128+
| flagd | 10,336 |
129+
130+
This query shows the top five services filtered to server.
131+
132+
</Tab>
133+
<Tab title="Security logs">
134+
135+
Use `topkif` in security log analysis to find the top 5 cities generating GET HTTP requests.
136+
137+
**Query**
138+
139+
```kusto
140+
['sample-http-logs']
141+
| summarize topkif(['geo.city'], 5, method == 'GET')
142+
```
143+
144+
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B'sample-http-logs'%5D%20%7C%20summarize%20topkif(%5B'geo.city'%5D%2C%205%2C%20method%20%3D%3D%20'GET')%22%7D)
145+
146+
**Output**
147+
148+
| geo.city | count_ |
149+
|----------|--------|
150+
| New York | 300 |
151+
| London | 250 |
152+
| Paris | 200 |
153+
| Tokyo | 180 |
154+
| Berlin | 160 |
155+
156+
This query returns the top 5 cities generating the most GET HTTP requests.
157+
158+
</Tab>
159+
</Tabs>
160+
161+
# List of related aggregations
162+
163+
- [topk](/apl/aggregation-function/topk): Returns the top `k` results without filtering. Use topk when you do not need to restrict your analysis to a subset.
164+
- [top](/apl/tabular-operators/top-operator): Returns the top results based on a field with accurate results. Use top when precision is important.
165+
- [sort](/apl/tabular-operators/sort-operator): Sorts the dataset based on one or more fields. Use sort if you need full ordered results.
166+
- [extend](/apl/tabular-operators/extend-operator): Adds calculated fields to your dataset, useful before applying topkif to create new fields to rank.
167+
- [count](/apl/aggregation-function/count): Counts occurrences in the dataset. Use count when you only need counts without focusing on the top entries.`

‎docs.json

+1
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,7 @@
395395
"apl/aggregation-function/sum",
396396
"apl/aggregation-function/sumif",
397397
"apl/aggregation-function/topk",
398+
"apl/aggregation-function/topkif",
398399
"apl/aggregation-function/variance",
399400
"apl/aggregation-function/varianceif"
400401
]

0 commit comments

Comments
 (0)
Please sign in to comment.