Training on part of the custom annotations

ines · October 20, 2021, 8:08am

Hi! If you've annotated all labels in one dataset, one option would be to separate them into multiple sets (e.g. one per label, or one for the label you're most interested in) by connecting to the database in Python:

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("your_dataset_with_all_labels")
new_examples = []
for eg in examples:
    spans = [span for span in eg.get("spans", []) if span["label"] == "EDUCATION"]
    eg["spans"] = spans
    new_examples.append(eg)

db.add_dataset("your_dataset_education")
db.add_examples(new_examples, ["your_dataset_education"])

You now have one dataset with only the spans you labelled as EDUCATION and you'll be able to run experiments with it separately.

Topic		Replies	Views
Prodigy train on specific custom entities usage , ner , spacy , training	1	394	July 23, 2021
Merging annotations from different datasets usage , ner , database , solved	12	5885	May 28, 2019
Training Multiple entities at the Same time? ner , spacy , solved	11	3178	November 27, 2018
data-to-spacy losing annotations ner	11	469	January 7, 2024
CSV with NER classifications to dataset usage	1	1562	December 13, 2018

Training on part of the custom annotations

Related topics