-
Notifications
You must be signed in to change notification settings - Fork 454
Get count from split metadata on simple time range query #5758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Get count from split metadata on simple time range query #5758
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimization. I think it deserves a unit test. E.g the following test scenario:
- The mock metastore returns 4 splits (1 outside the range, 1 overlapping the star, 1 overlapping the end, 1 overlapping the whole range and 1 within the range.
- The mock search service expects only 1 call.
quickwit/quickwit-search/src/root.rs
Outdated
let start_time = split.time_range.as_ref().map(|x| x.start()).copied(); | ||
let end_time = split.time_range.as_ref().map(|x| x.end()).copied(); | ||
if is_metadata_count_request(search_request, start_time, end_time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, to make search_partial_hits_phase
a bit more readable, I would pass the time_range
directly to is_metadata_count_request
quickwit/quickwit-search/src/root.rs
Outdated
jobs_to_leaf_request(search_request, indexes_metas_for_leaf_search, client_jobs)?; | ||
leaf_request_tasks.push(cluster_client.leaf_search(leaf_request, client.clone())); | ||
} | ||
leaf_search_responses.extend(try_join_all(leaf_request_tasks).await?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, moving it to a separate line would help see a bit more clearly how errors are resolved.
leaf_search_responses.extend(try_join_all(leaf_request_tasks).await?); | |
let executed_leaf_search_responses = try_join_all(leaf_request_tasks).await?; | |
leaf_search_responses.extend(executed_leaf_search_responses); |
de8dde6
to
f3ab102
Compare
Will add tests when I have a bit more time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding a test. Unfortunately I think it is flawed (so was my example of test earlier, I should have written "the mock search service expects to be called for 3 splits")
quickwit/quickwit-search/src/root.rs
Outdated
mock_search.expect_leaf_search().times(1).returning(|_req| { | ||
Ok(quickwit_proto::search::LeafSearchResponse { | ||
num_hits: 1, | ||
partial_hits: vec![mock_partial_hit("split_inside", 1, 1)], | ||
failed_splits: Vec::new(), | ||
num_attempted_splits: 1, | ||
..Default::default() | ||
}) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- asserting that
_req
contains the right splits is the important part of this test - the mock response should be all splits except "split_inside" which is the one we resolve at the root level 🙃
quickwit/quickwit-search/src/root.rs
Outdated
assert_eq!(resp.num_hits, 1); | ||
assert_eq!(resp.hits.len(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the response I would have expected here. num_hits
should be 10 (num docs in spit inside) + whatever you decide leaf_search
returns for the overlapping splits.
quickwit/quickwit-search/src/root.rs
Outdated
end_timestamp: Some(129_000), | ||
index_id_patterns: vec!["test-index".to_string()], | ||
query_ast: qast_json_helper("test", &["body"]), | ||
max_hits: 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your change is not expected to run if max_hits!=0
c535af6
to
89f26f1
Compare
You're totally right, I was supposed to be making a count query, fixed now. |
89f26f1
to
926a2f3
Compare
quickwit/quickwit-search/src/root.rs
Outdated
if let Some(request_start_timestamp) = request.start_timestamp { | ||
let Some(split_start_timestamp) = split_start_timestamp else { | ||
return false; | ||
}; | ||
if split_start_timestamp < request_start_timestamp { | ||
return false; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, I wonder whether this wouldn't be easier to read:
if let Some(request_start_timestamp) = request.start_timestamp { | |
let Some(split_start_timestamp) = split_start_timestamp else { | |
return false; | |
}; | |
if split_start_timestamp < request_start_timestamp { | |
return false; | |
} | |
} | |
match (request.start_timestamp, split_start_timestamp) { | |
(Some(request_start), Some(split_start)) if split_start >= request_start => {} | |
(Some(_), _) => return false, | |
(None, _) => {} | |
} |
quickwit/quickwit-search/src/root.rs
Outdated
num_hits: req.leaf_requests[0] | ||
.split_offsets | ||
.iter() | ||
.map(|s| s.num_docs) | ||
.sum(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this test more powerful by setting a smaller number here.
quickwit/quickwit-search/src/root.rs
Outdated
assert_eq!(resp.num_hits, 50); | ||
assert_eq!(resp.hits.len(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is missing from the other tests, but would be nice to also assert resp.num_successful_splits
here
last minute thought, you could also add some tests in https://github.com/tontinton/quickwit/blob/926a2f33d7b35a5cee064adc457c4503f96dc725/quickwit/rest-api-tests/scenarii/qw_search_api/0001_ts_range.yaml to double check the limit conditions, e.g:
should confirm that the upper bound exclusion condition is correct and stays that way. |
926a2f3
to
2a6a590
Compare
545e9d8
to
4f1812e
Compare
4f1812e
to
8c4d360
Compare
I think I found a nice optimization here while starting to research the search query code. @rdettai
Count queries can return much faster by not downloading splits, and I couldn't think of a good reason to always download split files on time range queries, if the split time range is fully contained in the search request time range.
Split the PR into 2: #5759.