Skip to content

Access JSON internal (binary) representation in clients (psql). #6256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vbmithr opened this issue Jun 4, 2025 · 4 comments
Open

Access JSON internal (binary) representation in clients (psql). #6256

vbmithr opened this issue Jun 4, 2025 · 4 comments
Labels
C-enhancement Category Enhancements

Comments

@vbmithr
Copy link

vbmithr commented Jun 4, 2025

What type of enhancement is this?

Performance

What does the enhancement do?

I my use case I have lots of log lines in JSON in a table, using the JSON type. In DB, JSON is serialized in binary using Databend's jsonb representation.

I want to access this data from a program that uses the postgresql interface. However, this program understands jsonb and I want to obtain the jsonb binary data directly.

I suggest that you add a JSON function like json_to_binary or json_to_jsonb that would be take a JSON value as input and returns its jsonb encoding as binary data.

Implementation challenges

No response

@vbmithr vbmithr added the C-enhancement Category Enhancements label Jun 4, 2025
@vbmithr vbmithr mentioned this issue Jun 4, 2025
@vbmithr
Copy link
Author

vbmithr commented Jun 4, 2025

Just noticed something funny, I can leverage the datafusion effect in #6173 to actually get JSON in jsonb representation :D

Or actually just arrow_cast(json, 'Binary') will make psql output bytea data (hex in my case). So I can say it works, though a very weird thing happens. If I use a parametrized query, instead of a simple query, psql will return data formatted in JSON, even when casting. This is really puzzling.

But I think arrow_cast to Binary is the correct answer to this issue. @sunng87 I can provide wireshark traces if you're interested in digging into this.

@killme2008
Copy link
Contributor

Just noticed something funny, I can leverage the datafusion effect in #6173 to actually get JSON in jsonb representation :D

Or actually just arrow_cast(json, 'Binary') will make psql output bytea data (hex in my case). So I can say it works, though a very weird thing happens. If I use a parametrized query, instead of a simple query, psql will return data formatted in JSON, even when casting. This is really puzzling.

I suspect it's caused by the query plan optimizer. We can investigate this.

@WenyXu
Copy link
Member

WenyXu commented Jun 5, 2025

If I use a parametrized query, instead of a simple query, psql will return data formatted in JSON, even when casting.

Could you please provide the minimal steps to reproduce for this case?

@vbmithr
Copy link
Author

vbmithr commented Jun 5, 2025

public=> show create table json;
 Table |            Create Table
-------+-------------------------------------
 json  | CREATE TABLE IF NOT EXISTS "json" (+
       |   "ts" TIMESTAMP(3) NOT NULL,      +
       |   "x" JSON NULL,                   +
       |   TIME INDEX ("ts")                +
       | )                                  +
       |                                    +
       | ENGINE=mito                        +
       |
(1 ligne)

public=> select * from json;
             ts             |          x
----------------------------+---------------------
 2025-06-05 15:54:44.798000 | {"x":"y"}
 2025-06-05 15:56:14.460000 | {"x":"y","y":212.2}
(2 lignes)

public=> select arrow_cast(x, 'Binary') from json;
 arrow_cast(json.x,Utf8("Binary"))
-----------------------------------
 {"x":"y"}
 {"x":"y","y":212.2}
(2 lignes)

public=> select arrow_cast(x, 'Binary') from json limit 10;
                 arrow_cast(json.x,Utf8("Binary"))
--------------------------------------------------------------------
 \x4000000110000001100000017879
 \x400000021000000110000001100000012000000978797960406a866666666666
(2 lignes)

public=> select * from json limit 10;
             ts             |          x
----------------------------+---------------------
 2025-06-05 15:54:44.798000 | {"x":"y"}
 2025-06-05 15:56:14.460000 | {"x":"y","y":212.2}
(2 lignes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements
Projects
None yet
Development

No branches or pull requests

3 participants