Does ner.manual requires negative examples?

In addition to my main question, I’m hoping to understand how to build a good workflow for my problem.

I have a new tag I’d like to train (a particular set of names, company and persons only) in documents. I’ve split a few of those documents into sentences and I’m feeding them into ner.manual.

As I understood ner.teach, it’s good to have somewhat equal balance between positive and negative examples. How does this work for ner.manual?

My guess is that the workflow should be like so:

  • ner.manual
  • ner.teach based on the manual tags from before

Alternatively, I also tried this workflow:

  • terms.teach
  • ner.teach

But this one sort of focused a lot more on the person names instead of company names, I suppose given that the word vectors are better trained on the former. I can’t exactly be sure of this though.

When you’re doing the ner.manual annotation, all of the tokens you don’t tag as entities are effectively “negative examples”. If the model predicts an entity over those tokens during training, the annotations tell it that entity is wrong.

A problem would occur if you had binary annotations and they were all “accept”. In this case, the model knows the span being annotated is an entity, but it doesn’t know whether it’s the only entity in the sentence. So there would be no way for the model to learn from only these “accepts”, because it can’t see what isn’t an entity.

1 Like