Training a model on both gold and binary data

einarbmag · May 10, 2019, 2:04pm

Thanks for your response @honnibal, that all makes sense. I can see how you’d want flexibility on the “completeness” definition. I really think it would be a huge improvement though to be able to pass two datasets to the training recipe, one of which would be treated as complete (the most strict definition) and one treated as binary. While we’re there, it would actually be nice to be able to pass multiple datasets instead of having to merge them.

But ok, you’re saying that currently if I want to do this, I add a no_missing field to the complete examples like this? :

{
    "text": "Apple updates its analytics service with new metrics",
    "label": "CORRECT",
    "spans": [{
        "start": 0,
        "end": 5,
        "label": "ORG"
    }],
    "no_missing": true
}

That’s a reasonable workflow for the complete data that I already have and have to load into the Prodigy database anyway (and so adding the field is a single line of code), would be good to have a nice solution for using both gold and binary data that I’ve annotated in Prodigy, without having to extract, add the field, and add it back to the database. But that’s not such a big deal, database manipulations aren’t that much work.

Next task for me is to see whether I can reach the same accuracy when training my NER model in Prodigy as I do with spaCy. I took my old, complete, “accept” data, generated an equal amount of “reject” data, and chucked it in ner.batch-train, accuracy only around 70% (I had F1 score ~95% in spaCy, on slightly homogeneous generated data). But that’s something I should probably read more about and play around with before asking for help.

Topic		Replies	Views
Difference in quality in make-gold vs trained model's annotations (and others) ner	1	598	August 10, 2018
Train NER model to improve existing entities spacy vs prodigy ner , spacy	1	951	December 9, 2019
Prodigy annotations to SpaCy train spacy	13	5613	January 31, 2018
Using binary accept/reject from NER teach in Spacy ner	1	1054	February 5, 2020
ner.teach - couple of questions ner , done , solved , nightly	9	2649	December 30, 2021

Training a model on both gold and binary data

Related topics