Get count from split metadata on simple time range query #5758

tontinton · 2025-04-19T17:05:17Z

I think I found a nice optimization here while starting to research the search query code. @rdettai

Count queries can return much faster by not downloading splits, and I couldn't think of a good reason to always download split files on time range queries, if the split time range is fully contained in the search request time range.

Split the PR into 2: #5759.

rdettai

Nice optimization. I think it deserves a unit test. E.g the following test scenario:

The mock metastore returns 4 splits (1 outside the range, 1 overlapping the star, 1 overlapping the end, 1 overlapping the whole range and 1 within the range.
The mock search service expects only 1 call.

rdettai · 2025-04-22T08:02:58Z

quickwit/quickwit-search/src/root.rs

+        let start_time = split.time_range.as_ref().map(|x| x.start()).copied();
+        let end_time = split.time_range.as_ref().map(|x| x.end()).copied();
+        if is_metadata_count_request(search_request, start_time, end_time) {


Nit, to make search_partial_hits_phase a bit more readable, I would pass the time_range directly to is_metadata_count_request

rdettai · 2025-04-22T08:09:58Z

quickwit/quickwit-search/src/root.rs

+                jobs_to_leaf_request(search_request, indexes_metas_for_leaf_search, client_jobs)?;
+            leaf_request_tasks.push(cluster_client.leaf_search(leaf_request, client.clone()));
+        }
+        leaf_search_responses.extend(try_join_all(leaf_request_tasks).await?);


Nit, moving it to a separate line would help see a bit more clearly how errors are resolved.

Suggested change

leaf_search_responses.extend(try_join_all(leaf_request_tasks).await?);

let executed_leaf_search_responses = try_join_all(leaf_request_tasks).await?;

leaf_search_responses.extend(executed_leaf_search_responses);

tontinton · 2025-04-22T20:40:48Z

Nice optimization. I think it deserves a unit test. E.g the following test scenario:

* The mock metastore returns 4 splits (1 outside the range, 1 overlapping the star, 1 overlapping the end, 1 overlapping the whole range and 1 within the range.

* The mock search service expects only 1 call.

Will add tests when I have a bit more time.

rdettai

Thanks for adding a test. Unfortunately I think it is flawed (so was my example of test earlier, I should have written "the mock search service expects to be called for 3 splits")

rdettai · 2025-04-30T07:51:23Z

quickwit/quickwit-search/src/root.rs

+        mock_search.expect_leaf_search().times(1).returning(|_req| {
+            Ok(quickwit_proto::search::LeafSearchResponse {
+                num_hits: 1,
+                partial_hits: vec![mock_partial_hit("split_inside", 1, 1)],
+                failed_splits: Vec::new(),
+                num_attempted_splits: 1,
+                ..Default::default()
+            })
+        });


asserting that _req contains the right splits is the important part of this test

the mock response should be all splits except "split_inside" which is the one we resolve at the root level 🙃

rdettai · 2025-04-30T07:53:30Z

quickwit/quickwit-search/src/root.rs

+        assert_eq!(resp.num_hits, 1);
+        assert_eq!(resp.hits.len(), 1);


This is not the response I would have expected here. num_hits should be 10 (num docs in spit inside) + whatever you decide leaf_search returns for the overlapping splits.

rdettai · 2025-04-30T07:56:12Z

quickwit/quickwit-search/src/root.rs

+            end_timestamp: Some(129_000),
+            index_id_patterns: vec!["test-index".to_string()],
+            query_ast: qast_json_helper("test", &["body"]),
+            max_hits: 10,


your change is not expected to run if max_hits!=0

tontinton · 2025-04-30T19:22:19Z

Thanks for adding a test. Unfortunately I think it is flawed (so was my example of test earlier, I should have written "the mock search service expects to be called for 3 splits")

You're totally right, I was supposed to be making a count query, fixed now.

quickwit/quickwit-search/src/root.rs

rdettai · 2025-05-14T14:54:19Z

quickwit/quickwit-search/src/root.rs

+    if let Some(request_start_timestamp) = request.start_timestamp {
+        let Some(split_start_timestamp) = split_start_timestamp else {
+            return false;
+        };
+        if split_start_timestamp < request_start_timestamp {
+            return false;
+        }
+    }


nit, I wonder whether this wouldn't be easier to read:

Suggested change

if let Some(request_start_timestamp) = request.start_timestamp {

let Some(split_start_timestamp) = split_start_timestamp else {

return false;

};

if split_start_timestamp < request_start_timestamp {

return false;

}

}

match (request.start_timestamp, split_start_timestamp) {

(Some(request_start), Some(split_start)) if split_start >= request_start => {}

(Some(_), _) => return false,

(None, _) => {}

}

rdettai · 2025-05-14T15:15:04Z

quickwit/quickwit-search/src/root.rs

+                    num_hits: req.leaf_requests[0]
+                        .split_offsets
+                        .iter()
+                        .map(|s| s.num_docs)
+                        .sum(),


We could make this test more powerful by setting a smaller number here.

rdettai · 2025-05-14T15:24:13Z

quickwit/quickwit-search/src/root.rs

+        assert_eq!(resp.num_hits, 50);
+        assert_eq!(resp.hits.len(), 0);


it is missing from the other tests, but would be nice to also assert resp.num_successful_splits here

rdettai · 2025-05-14T15:42:46Z

last minute thought, you could also add some tests in https://github.com/tontinton/quickwit/blob/926a2f33d7b35a5cee064adc457c4503f96dc725/quickwit/rest-api-tests/scenarii/qw_search_api/0001_ts_range.yaml to double check the limit conditions, e.g:

endpoint: simple/search
params:
  query: "*"
  start_timestamp: 1684993000
  end_timestamp: 1684993004
expected:
  num_hits: 3

should confirm that the upper bound exclusion condition is correct and stays that way.

tontinton mentioned this pull request Apr 19, 2025

Optimize simple time ranged search queries #5759

Open

tontinton changed the title ~~Get count from split metadata on simple query with timerange~~ Get count from split metadata on simple query with time range Apr 19, 2025

tontinton changed the title ~~Get count from split metadata on simple query with time range~~ Get count from split metadata on simple time range query Apr 19, 2025

rdettai reviewed Apr 22, 2025

View reviewed changes

tontinton force-pushed the optimize-timestamp-range-count branch from de8dde6 to f3ab102 Compare April 22, 2025 20:40

tontinton requested a review from rdettai April 29, 2025 17:38

rdettai requested changes Apr 30, 2025

View reviewed changes

tontinton force-pushed the optimize-timestamp-range-count branch from c535af6 to 89f26f1 Compare April 30, 2025 19:21

tontinton requested a review from rdettai April 30, 2025 19:22

tontinton commented Apr 30, 2025

View reviewed changes

quickwit/quickwit-search/src/root.rs Show resolved Hide resolved

tontinton force-pushed the optimize-timestamp-range-count branch from 89f26f1 to 926a2f3 Compare April 30, 2025 20:48

rdettai approved these changes May 14, 2025

View reviewed changes

tontinton mentioned this pull request May 14, 2025

Remove timerange root search #5760

Open

tontinton force-pushed the optimize-timestamp-range-count branch from 926a2f3 to 2a6a590 Compare May 14, 2025 18:49

tontinton requested a review from rdettai May 14, 2025 18:49

Get count from split metadata on simple query with timerange

e42f8d4

tontinton force-pushed the optimize-timestamp-range-count branch 3 times, most recently from 545e9d8 to 4f1812e Compare May 15, 2025 12:07

Add tests to search splits contained in time range

8c4d360

tontinton force-pushed the optimize-timestamp-range-count branch from 4f1812e to 8c4d360 Compare May 15, 2025 12:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get count from split metadata on simple time range query #5758

Get count from split metadata on simple time range query #5758

tontinton commented Apr 19, 2025 •

edited

Loading

rdettai left a comment

rdettai Apr 22, 2025

rdettai Apr 22, 2025

tontinton commented Apr 22, 2025

rdettai left a comment •

edited

Loading

rdettai Apr 30, 2025

rdettai Apr 30, 2025

rdettai Apr 30, 2025

tontinton commented Apr 30, 2025 •

edited

Loading

rdettai May 14, 2025

rdettai May 14, 2025

rdettai May 14, 2025

rdettai commented May 14, 2025 •

edited

Loading

	leaf_search_responses.extend(try_join_all(leaf_request_tasks).await?);
	let executed_leaf_search_responses = try_join_all(leaf_request_tasks).await?;
	leaf_search_responses.extend(executed_leaf_search_responses);

		assert_eq!(resp.num_hits, 1);
		assert_eq!(resp.hits.len(), 1);

		assert_eq!(resp.num_hits, 50);
		assert_eq!(resp.hits.len(), 0);

Get count from split metadata on simple time range query #5758

Are you sure you want to change the base?

Get count from split metadata on simple time range query #5758

Conversation

tontinton commented Apr 19, 2025 • edited Loading

rdettai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tontinton commented Apr 22, 2025

rdettai left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tontinton commented Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdettai commented May 14, 2025 • edited Loading

tontinton commented Apr 19, 2025 •

edited

Loading

rdettai left a comment •

edited

Loading

tontinton commented Apr 30, 2025 •

edited

Loading

rdettai commented May 14, 2025 •

edited

Loading