Track of new entities added

Zainpann · December 7, 2018, 10:20am

Hello,
Is there any way to keep track of the newly added entities after annotation in our dataset, and also any way to check what are those entities?

ines · December 8, 2018, 5:27am

I’m not 100% sure I understand your question correctly – do you want to get an overview of all entity annotations in the dataset?

In general, the db-out command lets you export a given dataset to JSONL (newline-delimited JSON). For example:

prodigy db-out your_dataset > some_file.jsonl

The data contains one record per example that was annotated. This way, you’ll always be able to reproduce what the annotator saw and how they answered (which is obviously super important). For example, if you’ve annotated using ner.teach, an record in the data could look like this:

{
    "text": "Apple updates its analytics service with new metrics",
    "spans": [
        {"start": 0, "end": 5, "label": "ORG"}
    ],
    "answer": "accept"
}

Or if you’ve ben annotating with a manual recipe like ner.manual, which allows you to select one or more entity spans manually, the result could look like this:

{
    "text": "Hello Apple",
    "tokens": [
        {"text": "Hello", "start": 0, "end": 5, "id": 0},
        {"text": "Apple", "start": 6, "end": 11, "id": 1}
    ],
    "spans": [
        {"start": 0, "end": 5, "label": "GREETING", "token_start": 0, "token_end": 0},
        {"start": 6, "end": 11, "label": "ORG", "token_start": 1, "token_end": 1}
    ],
    "answer": "accept"
}

So depending on what you need, you can read in this file and compute some statistics (number of entities, labels, frequencies and so on). On reason we chose JSON / JSONL as the standard format is that it’s easy to process and analyse in most programming languages and tools.

You can also connect to the Prodigy database in a Python script btw for direct access (see your PRODIGY_README.html for more details and the full API reference). Here’s an example – db.get_dataset returns a list of examples (dicts) in your dataset, in the same format as described above:

from prodigy.components.db import connect

db = connect()  # uses the settings in your prodigy.json
examples = db.get_dataset('your_dataset')
# do something with the example

Topic		Replies	Views
Edit Saved NER Manual Annotations usage , ner , database , solved	4	1388	September 13, 2018
changing annotations in DB via the interface usage , ner , front-end	2	1180	December 12, 2019
NER: Pass annotated data set to Prodigy for validating / small corrections usage , ner , review	1	508	February 20, 2020
Retrieving binary-annotated records from a mixed dataset ner , database , solved	2	480	January 3, 2020
Edit saved annotations ner , solved	4	1372	March 2, 2018

Track of new entities added

Related topics