-
Notifications
You must be signed in to change notification settings - Fork 259
chore: Add assertion for empty data files for append action #1301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Xuanwo
merged 6 commits into
apache:main
from
dentiny:hjiang/add-assertion-for-empty-data-files
May 10, 2025
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
3c21958
Add assertion for empty data files for append action
dentiny 0e1b714
update error message
dentiny 52b8256
use failed precondition
dentiny 5b9aaf9
fix unit test
dentiny 8e1f3b5
Merge branch 'main' into hjiang/add-assertion-for-empty-data-files
dentiny 9a0b105
rename error kind to confirm naming convension
dentiny File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning a
DataInvalid
error is good enough here, I don't know if invalid argument is needed. cc @XuanwoThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error description reads to me as "data corruption".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes sense,
InvalidArgument
or maybeInvalidInput
is more intuitive.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems safe to just return
Ok(())
while input iterator is empty. Returning error here seems no contribute for users' experrence. Users can't take actions on this error.They have to add something like:
So how about we just check this and early return it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the given iterator is empty, we will end up with a manifest file with no manifest entries, with no warning / error happening (it's exactly the same behavior you described), this is something I would like to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, I think having an empty manifest file doesn't make too much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is valid and I do agree that we should not generate an empty manifest file.
However, the
add_data_files
function itself should be able to handle empty input safely, as users might use it in loops or pipelines with filters, where encountering an empty iterator is possible.Therefore, I prefer to perform this check when committing the manifest file, rather than within
add_data_files
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, I want to note that
add_data_files
can be called multiple times and is not directly mapped to a manifest file.