I was having trouble reading the backend postgres data using psql
directly due to the data being encoded. Notably, the following fields appear to be stored in hex bytea
format by default when switching to a postgres backend: dataset.meta
and example.content
. It looks like the print-dataset
recipe handles the decoding, but I find querying the raw data still useful.
When querying directly, this query handles the decoding (CONVERT_FROM
expects the first argumenst to be a string bytea
, and you can specify the “to” type):
prodigy=#
SELECT CONVERT_FROM(content, 'utf8')
FROM example;
You can find more information about the hex format used by postgres here: https://www.postgresql.org/docs/9.1/static/datatype-binary.html#AEN5296
And more information about type conversions here: https://www.postgresql.org/docs/9.1/static/functions-string.html
Describe dataset
prodigy=# \d+ dataset
Column | Type | Modifiers | Storage | Stats target | Description
---------+------------------------+------------------------------------------------------+----------+--------------+------------
id | integer | not null default nextval('dataset_id_seq'::regclass) | plain | |
name | character varying(255) | not null | extended | |
created | integer | not null | plain | |
meta | bytea | not null | extended | |
session | boolean | not null | plain | |
Describe example
prodigy=# \d+ example
Column | Type | Modifiers | Storage | Stats target | Description
------------+---------+------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('example_id_seq'::regclass) | plain | |
input_hash | bigint | not null | plain | |
task_hash | bigint | not null | plain | |
content | bytea | not null | extended | |