Question: Automatically save

Hi Team,
Question 1: This is kind of question and not bug. I am trying to find out If there is a way to save progress automatically after accept annotation.

Question 2: Is there any way we can customize output file ?
example
This is what Prodigy output
{"text":"66.50 USD/BBL","_input_hash":-1240399113,"_task_hash":514467501,"_is_binary":false,"tokens":[{"text":"66.50","start":0,"end":5,"id":0,"ws":true},{"text":"USD","start":6,"end":9,"id":1,"ws":false},{"text":"/","start":9,"end":10,"id":2,"ws":false},{"text":"BBL","start":10,"end":13,"id":3,"ws":false}],"_view_id":"ner_manual","spans":[{"start":0,"end":5,"token_start":0,"token_end":0,"label":"priceIndex"},{"start":6,"end":9,"token_start":1,"token_end":1,"label":"priceCCY"},{"start":10,"end":13,"token_start":3,"token_end":3,"label":"priceUOM"}],"answer":"accept","_timestamp":1651000095}

Can I modify something like this ?
{"textSegmentAnnotations":"66.50 USD/BBL","_input_hash":-1240399113,"_task_hash":514467501,"_is_binary":false,"tokens":[{"textSegmentAnnotations":"66.50","start":0,"end":5,"id":0,"ws":true},{"textSegmentAnnotations":"USD","start":6,"end":9,"id":1,"ws":false},{"textSegmentAnnotations":"/","start":9,"end":10,"id":2,"ws":false},{"textSegmentAnnotations":"BBL","start":10,"end":13,"id":3,"ws":false}],"_view_id":"ner_manual","spans":[{"start":0,"end":5,"token_start":0,"token_end":0,"label":"priceIndex"},{"start":6,"end":9,"token_start":1,"token_end":1,"label":"priceCCY"},{"start":10,"end":13,"token_start":3,"token_end":3,"label":"priceUOM"}],"answer":"accept","_timestamp":1651000095}

Answer 1

If you want to store the annotation immediately after hitting "accept" you'll want to change this setting in the prodigy.json file:

"batch_size": 1

More information on this settings file can be found here. Note that you can always hit the "save" icon in the upper lefthand corner if you want to store everything you've labelled in the current batch sofar.

Answer 2:

You can always use the output from Prodigy in another script in Python. That's what I usually do.

from prodigy.components.db import connect

db = connect()                               # uses settings from prodigy.json
dataset = db.get_dataset("test_dataset")     # retrieve a dataset

# If we assume you have a `change_item` function that can turn
# the old json structure into a new one then you can do something like:
dataset = [change_item(item) for item in dataset]
    

More information on the code can be found on our database API documentation page.

1 Like