Training a model on both gold and binary data

usage
ner
(Einar Bui Magnusson) #1

Hello,
I’ve just acquired Prodigy with the plan to improve data annotation for an already existing chatbot NLU app (NER as well as intent / text classification). Meaning, I started out with procedurally created gold/complete training data (~30 intents, ~10 entity types, some overlap with the pre-trained spaCy model (ORG, PERSON, LOC …), ~10000 examples) and train/fine-tune a spaCy NER model. I have then been using a custom annotation loop where I just take the lowest confidence user logs and laboriously correct the full annotations in a home-made sub-par interface. I was hoping to use Prodigy and its accept/reject approach to speed this up dramatically.

The more I look into the forum and documentation, it seems that training a model with both gold and binary training data isn’t properly supported? Using ner.batch-train, it seems I have to specify that the data is either one or the other, using the --no-missing flag. And there doesn’t seem to be any easy way to use spaCy with binary training data.

Is there any way to use both types of data effectively, or is this anywhere on the Prodigy roadmap?

(Matthew Honnibal) #2

You’re right that this isn’t well exposed at the recipe level at the moment. However, you can attach a field no_missing to your examples that indicates whether they treat the annotations as gold-standard.

We’re not fully satisfied with the usability of the "no_missing" flags, but they were the best interim solution we could put in place. We hope this can be improved for Prodigy 2.0. I’ll give you a bit of backstory about how we’re thinking about this, and why the problem is a bit tricky.

We’ve opted to use a pretty minimal data model in Prodigy so far, to keep the task storage very flexible. The database views the annotation tasks as opaque blobs, which are associated to dataset IDs. Of course, there’s a trade-off here. The disadvantage of being able to store any type of data is you don’t get a type system. A key piece of information we’re missing is that we don’t really have a great way of denoting the provenance of the examples, and how complete their annotations are.

The truth about what’s complete and correct can be much more subtle than just a binary flag. You could use NER annotations that are “complete” with respect to entity types A, B and C, but not some new entity type D. The underlying training algorithm could support this level of detail, but the implementation gets more complicated, and it gets harder to communicate what’s going on to users. Already the mechanism by which the model learns from the incomplete information is fairly subtle, so we’re reluctant to make it even more complicated.

(Einar Bui Magnusson) #3

Thanks for your response @honnibal, that all makes sense. I can see how you’d want flexibility on the “completeness” definition. I really think it would be a huge improvement though to be able to pass two datasets to the training recipe, one of which would be treated as complete (the most strict definition) and one treated as binary. While we’re there, it would actually be nice to be able to pass multiple datasets instead of having to merge them.

But ok, you’re saying that currently if I want to do this, I add a no_missing field to the complete examples like this? :

{
    "text": "Apple updates its analytics service with new metrics",
    "label": "CORRECT",
    "spans": [{
        "start": 0,
        "end": 5,
        "label": "ORG"
    }],
    "no_missing": true
}

That’s a reasonable workflow for the complete data that I already have and have to load into the Prodigy database anyway (and so adding the field is a single line of code), would be good to have a nice solution for using both gold and binary data that I’ve annotated in Prodigy, without having to extract, add the field, and add it back to the database. But that’s not such a big deal, database manipulations aren’t that much work.

Next task for me is to see whether I can reach the same accuracy when training my NER model in Prodigy as I do with spaCy. I took my old, complete, “accept” data, generated an equal amount of “reject” data, and chucked it in ner.batch-train, accuracy only around 70% (I had F1 score ~95% in spaCy, on slightly homogeneous generated data). But that’s something I should probably read more about and play around with before asking for help.

"Gold Standard" dataset as evaluation for ner.batch-train with binary annotation?
(Einar Bui Magnusson) #4

@honnibal @ines, could you please confirm whether this is the right syntax to specify an example as no_missing ?:

{
    "text": "Apple updates its analytics service with new metrics",
    "label": "CORRECT",
    "spans": [{
        "start": 0,
        "end": 5,
        "label": "ORG"
    }],
    "no_missing": true
}

Thanks!

1 Like
(Matthew Honnibal) #5

Yes, that looks right to me. Does it seem to do the right thing when you try it?