Closed
Description
What happened?
Calling .drop()
on a Table
which contains a column name longer than 59 characters and then calling .to_pyarrow_batches()
results in a broken RecordBatchReader
where the contained RecordBatch
es do not match the schema of the RecordBatchReader
due to column name truncations. This breaks RecordBatchReader.read_all()
.
What is surprising, however, is that if the .drop()
is the first method called after conn.register()
the truncation does not seem to occur.
import ibis
import pyarrow
table = pyarrow.Table.from_pydict(
{
"a_short_column_name": [1, 2, 3],
"a_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_very_long_column_name": [4, 5, 6],
},
)
ibis.set_backend("duckdb")
ibis_table = ibis.memtable(table)
result = ibis_table.select(
*ibis_table.schema().keys(),
).drop(
"a_short_column_name",
)
record_batch_reader = result.to_pyarrow_batches(chunk_size=1)
print(record_batch_reader.read_next_batch().schema)
assert record_batch_reader.schema == record_batch_reader.read_next_batch().schema
a_very_very_very_very_very_very_very_very_very_very_very__1: int64
Traceback (most recent call last):
File "<REDACTED>/minimal_example.py", line 23, in <module>
assert record_batch_reader.schema == record_batch_reader.read_next_batch().schema
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
What version of ibis are you using?
❯ pip list | grep -e ibis -e duckdb
duckdb 0.10.0
duckdb-engine 0.9.2
ibis-framework 8.0.0
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Type
Projects
Status
done