Some postgres data fields stored as hex (bytea hex format)

blakey · May 17, 2018, 5:44pm

I was having trouble reading the backend postgres data using psql directly due to the data being encoded. Notably, the following fields appear to be stored in hex bytea format by default when switching to a postgres backend: dataset.meta and example.content. It looks like the print-dataset recipe handles the decoding, but I find querying the raw data still useful.

When querying directly, this query handles the decoding (CONVERT_FROM expects the first argumenst to be a string bytea, and you can specify the “to” type):

prodigy=#  
   SELECT CONVERT_FROM(content, 'utf8')
     FROM example;

You can find more information about the hex format used by postgres here: https://www.postgresql.org/docs/9.1/static/datatype-binary.html#AEN5296
And more information about type conversions here: https://www.postgresql.org/docs/9.1/static/functions-string.html

Describe `dataset`

prodigy=# \d+ dataset
 Column  |          Type          |                      Modifiers                       | Storage  | Stats target | Description
---------+------------------------+------------------------------------------------------+----------+--------------+------------
 id      | integer                | not null default nextval('dataset_id_seq'::regclass) | plain    |              | 
 name    | character varying(255) | not null                                             | extended |              | 
 created | integer                | not null                                             | plain    |              | 
 meta    | bytea                  | not null                                             | extended |              | 
 session | boolean                | not null                                             | plain    |              |

Describe `example`

prodigy=# \d+ example
   Column   |  Type   |                      Modifiers                       | Storage  | Stats target | Description 
------------+---------+------------------------------------------------------+----------+--------------+-------------
 id         | integer | not null default nextval('example_id_seq'::regclass) | plain    |              | 
 input_hash | bigint  | not null                                             | plain    |              | 
 task_hash  | bigint  | not null                                             | plain    |              | 
 content    | bytea   | not null                                             | extended |              |

honnibal · May 18, 2018, 6:43pm

Hmm! This is news to me actually. We use an ORM, peewee, to manage the database connectivity. Perhaps this is something it’s doing? Good to know, anyway!

sdarmofal · July 10, 2019, 9:21am

Any news in this topic? It’s really annoying when you work with postgresql because you can’t use JSON operators in queries. peewee supports postgresql but it needs additional import for postgresql features. You can read more about it in peewee documentation.

honnibal · July 11, 2019, 11:07am

@sdarmofal We’re open to suggestions, but it’s not immediately obvious to me what we would need to do. Prodigy does provide the source for the database driver – we made sure to keep this in pure Python, as we figured some people might want to edit it directly. You should be able to find it in prodigy/components/db.py. So, you can probably make the edits you require directly.

If there’s a change we can make that improves things without causing backwards incompatibilities, we’d consider that. But if we have to change a value type in the table, I think that would force users to migrate their data, right? We’d be reluctant to do that.

Topic		Replies	Views
Migrating from sqlite to postgresql database , solved	1	1198	April 4, 2018
Error in loading a new dataset in PostgreSQL done , database	4	1532	September 15, 2017
db-out equivalent on postgres	3	43	September 18, 2024
Output format of annotations in Mysql. usage , database	2	396	March 31, 2020
Slow database operations on PostgreSQL database	1	390	March 11, 2023

Some postgres data fields stored as hex (bytea hex format)

Describe dataset

Describe example

Related topics

Describe `dataset`

Describe `example`