Training on part of the custom annotations

mikkelyo · October 19, 2021, 8:09am

Is it possible to train on a part of the custom annotations with prodigy/spacy?

For example, if I have a dataset where I annotated: Name, education, phone number, plus some other things - is it then possible to make a separate model for finding each of the entities? For example a model to just find education, that is only trained on the annotated education.

ines · October 20, 2021, 8:08am

Hi! If you've annotated all labels in one dataset, one option would be to separate them into multiple sets (e.g. one per label, or one for the label you're most interested in) by connecting to the database in Python:

from prodigy.components.db import connect

db = connect()
examples = db.get_dataset("your_dataset_with_all_labels")
new_examples = []
for eg in examples:
    spans = [span for span in eg.get("spans", []) if span["label"] == "EDUCATION"]
    eg["spans"] = spans
    new_examples.append(eg)

db.add_dataset("your_dataset_education")
db.add_examples(new_examples, ["your_dataset_education"])

You now have one dataset with only the spans you labelled as EDUCATION and you'll be able to run experiments with it separately.

mikkelyo · October 21, 2021, 7:43am

Yeah that's what I ended up doing, was just wondering if there was an inbuilt function for it - seeing as there are so many other smart functions.

Thanks for replying

ines · October 22, 2021, 11:27am

Yeah, maybe we should expose something like it as a utility – I think the only tricky part is that there might be so many different combinations of things that a user could want. You might want to filter by span labels, or by selected options, or by top-level labels, or by relation labels, or by some combination.

Btw, if you know jq, there's probably a super smart and magical way to do all of this in a simple one-liner but... I don't know it well enough to give you the solution

mikkelyo · October 22, 2021, 12:25pm

Yeah not sure how you should design it. It's probably easier for people to just make a python script to fix it.

Topic		Replies	Views
Getting Started Questions usage , ner	1	631	November 6, 2018
Prodigy train on specific custom entities usage , ner , spacy , training	1	394	July 23, 2021
Improve trained models with annotations usage , ner , training	3	519	September 20, 2021
prelabel data using regex and how to use the active learning functionality and get the model usage , ner , spacy	3	545	October 14, 2021
Does Prodigy load pre-annotated data? usage , ner , solved	23	2637	October 25, 2018

Training on part of the custom annotations

Related topics