Skip to content

bug: .drop() causing mismatch in RecordBatchReader schema  #8393

Closed
@judahrand

Description

@judahrand

What happened?

Calling .drop() on a Table which contains a column name longer than 59 characters and then calling .to_pyarrow_batches() results in a broken RecordBatchReader where the contained RecordBatches do not match the schema of the RecordBatchReader due to column name truncations. This breaks RecordBatchReader.read_all().

What is surprising, however, is that if the .drop() is the first method called after conn.register() the truncation does not seem to occur.

import ibis
import pyarrow


table = pyarrow.Table.from_pydict(
    {
        "a_short_column_name": [1, 2, 3],
        "a_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_long_column_name": [4, 5, 6],
    },
)

ibis.set_backend("duckdb")
ibis_table = ibis.memtable(table)

result = ibis_table.select(
    *ibis_table.schema().keys(),
).drop(
    "a_short_column_name",
)

record_batch_reader = result.to_pyarrow_batches(chunk_size=1)
print(record_batch_reader.read_next_batch().schema)
assert record_batch_reader.schema == record_batch_reader.read_next_batch().schema
a_very_very_very_very_very_very_very_very_very_very_very__1: int64
Traceback (most recent call last):
  File "<REDACTED>/minimal_example.py", line 23, in <module>
    assert record_batch_reader.schema == record_batch_reader.read_next_batch().schema
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

What version of ibis are you using?

❯ pip list | grep -e ibis -e duckdb
duckdb                                   0.10.0
duckdb-engine                            0.9.2
ibis-framework                           8.0.0

What backend(s) are you using, if any?

DuckDB

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions