Skip to content

feat: support arrow dictionary in schema conversion #1293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jdockerty
Copy link
Contributor

@jdockerty jdockerty commented May 4, 2025

Which issue does this PR close?

What changes are included in this PR?

An expansion to the current schema conversion for arrow to an iceberg schema.

This uses the suggestion provided by @alamb (here) for the schema conversion work.

Are these changes tested?

The current test for schema conversion has been expanded.

I've also taken the liberty of altering the assert_eq! call to use the pretty_assertions version, as it makes viewing the failure much simpler with a large JSON structure - I can revert this though if it is problematic.

@jdockerty jdockerty marked this pull request as ready for review May 4, 2025 16:49
Copy link

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me -- thank you @jdockerty

(though note I am not an expert in this code nor a committer on the iceberg project)

@alamb
Copy link

alamb commented May 5, 2025

Looks like @phillipleblanc made a similar PR here: #1283

@alamb
Copy link

alamb commented May 5, 2025

@liurenjie1024 or @Xuanwo, do you have time to review this PR?

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jdockerty for this pr, LGTM!

@liurenjie1024 liurenjie1024 merged commit 513376c into apache:main May 6, 2025
17 checks passed
@jdockerty jdockerty deleted the feat/arrow-schema-conversion-dict branch May 6, 2025 09:25
Xuanwo pushed a commit that referenced this pull request May 10, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Helps #1277

## What changes are included in this PR?

<!--
Provide a summary of the modifications in this PR. List the main changes
such as new features, bug fixes, refactoring, or any other updates.
-->

This expands the `test_type_conversion` test with the specific
`DataType::Dictionary` type.

Props to @phillipleblanc for this, as I noticed the extra coverage on
his PR which I believe is very useful to capture[^1].

[^1]: I was looking to add similar to
#1293, but it was merged just
before I pushed the changes 😆
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support DataType::Dictionary for arrow schema conversion
4 participants