Customize the JSON format when saving the annotations in database?

kavan · May 22, 2018, 6:44pm

Is there a way to customize the JSON format when saving the annotations in database?

{"text":"this is an example of some_location.","_input_hash":79275707,"_task_hash":565275264,"spans":[{"start":23,"end":35,"text":"some_location","rank":0,"label":"LOC","score":0.6479370919,"source":"en_core_web_lg","input_hash":79275707}],"meta":{"score":0.6479370919},"answer":"accept"}

I would like to customize the above format.

how to modify the default schema?

ines · May 22, 2018, 7:07pm

What exactly would you like to customise?

If you want to add properties, you can include them when you load in your data in JSONL format. As long as it’s valid JSON, your custom properties will be passed through and saved in the database with the annotations. For example:

{"text": "Some text", "custom_id": 123}

Internally, Prodigy uses the JSON format to communicate annotation tasks. Depending on the database you’re using, the data is then converted to the respective database fields and formats. If you want to export your data in a different format, you can always interact with the database directly, request the dataset and then export it:

from prodigy.components.db import connect

db = connect()  # uses the settings from your prodigy.json
examples = db.get_dataset('your_dataset_name')
# `examples` is a list of dictionaries in Prodigy's format – you can
# now convert it however you want, and save it out in any file format

You can find more details on the database methods in your PRODIGY_README.html.

When modifying the dataset, you usually want to keep a copy of the original data. That’s also the reason Prodigy prevents you from overwriting annotations directly before they’re saved. The original dataset should always reflect the exact data that came back from the annotator – otherwise, you can never be sure that the labels you’re training on are really what was annotated and it’s too easy to accidentally destroy data if there’s a bug in your code.

kavan · May 24, 2018, 4:50pm

Thanks for your prompt response.

While using Postgres, prodigy has a fixed schema having tables of dataset, example and link and their corresponding column structure. However, i would like to extract some parts from that data and save it under my custom database schema and format.

Is there a way to override prodigy's database behaviour for saving annotations and align it with custom schema and format.

If not, what would you suggest for the same? (basically i want to save the completed annotations but in a different format under custom database schema by extracting spans and text from the output prodigy is producing for saved annotations)

ines · May 24, 2018, 5:12pm

Ah okay, thanks for the clarification! There are two main options:

1. Create a custom Database class

The first one would be to plug in your own Database class. You can find the detailed API documentation of the structure Prodigy expects in your PRODIGY_README.html. Your custom class needs to implement the same methods as Prodigy’s built-in class. You can then plug it into Prodigy via the 'db' setting returned by the recipe. For example:

return {
    'dataset': dataset,
    'db': YourCustomDB(),
    # other stuff
}

(In the upcoming version of Prodigy, you’ll also be able to wrap your custom loader as a Python package and expose it via the entry points. You can then simply set "db": "your_custom_db" in your prodigy.json and won’t have to customise any recipe.)

2. Send the answers to your database via the update callback

Alternatively, you could also send a copy of every batch of annotation to your database and then store it however you like. Recipes can define an optional update method that is called with a list of annotations every time Prodigy receives a new batch.

# in your recipe
def update(answers):
    # do something here
    ADD_ANSWERS_TO_YOUR_DB(answers)
   
return {
    'dataset': dataset,
    'update': update,
   # etc.
}

In theory, you could also set 'db': False if you go for this approach. This will disable Prodigy’s built-in database. But it also means that you could lose data if something goes wrong – so you might still want to keep at least a local SQLite backup, just in case.

Topic		Replies	Views
Output format of annotations in Mysql. usage , database	2	395	March 31, 2020
Datasets and using pre-annotated data Getting Started usage , solved	23	5513	November 15, 2020
Save values edited by user in custom HTML / JS recipes custom , front-end	1	1401	July 24, 2018
Custom db-out output format database , image , custom , solved	3	587	April 27, 2020
Saving and retrieving annotations usage , database , custom , solved	7	5100	June 13, 2018

Customize the JSON format when saving the annotations in database?

1. Create a custom Database class

2. Send the answers to your database via the update callback

Related topics