Track of new entities added

Hello,
Is there any way to keep track of the newly added entities after annotation in our dataset, and also any way to check what are those entities?

I’m not 100% sure I understand your question correctly – do you want to get an overview of all entity annotations in the dataset?

In general, the db-out command lets you export a given dataset to JSONL (newline-delimited JSON). For example:

prodigy db-out your_dataset > some_file.jsonl

The data contains one record per example that was annotated. This way, you’ll always be able to reproduce what the annotator saw and how they answered (which is obviously super important). For example, if you’ve annotated using ner.teach, an record in the data could look like this:

{
    "text": "Apple updates its analytics service with new metrics",
    "spans": [
        {"start": 0, "end": 5, "label": "ORG"}
    ],
    "answer": "accept"
}

Or if you’ve ben annotating with a manual recipe like ner.manual, which allows you to select one or more entity spans manually, the result could look like this:

{
    "text": "Hello Apple",
    "tokens": [
        {"text": "Hello", "start": 0, "end": 5, "id": 0},
        {"text": "Apple", "start": 6, "end": 11, "id": 1}
    ],
    "spans": [
        {"start": 0, "end": 5, "label": "GREETING", "token_start": 0, "token_end": 0},
        {"start": 6, "end": 11, "label": "ORG", "token_start": 1, "token_end": 1}
    ],
    "answer": "accept"
}

So depending on what you need, you can read in this file and compute some statistics (number of entities, labels, frequencies and so on). On reason we chose JSON / JSONL as the standard format is that it’s easy to process and analyse in most programming languages and tools.

You can also connect to the Prodigy database in a Python script btw for direct access (see your PRODIGY_README.html for more details and the full API reference). Here’s an example – db.get_dataset returns a list of examples (dicts) in your dataset, in the same format as described above:

from prodigy.components.db import connect

db = connect()  # uses the settings in your prodigy.json
examples = db.get_dataset('your_dataset')
# do something with the example