✨ VIDEO: FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy

ines · February 6, 2019, 12:50pm

I recorded a new video about frequently asked Prodigy questions (inspired by the forum), and some more general advice about structuting annotation projects, designing label schemes and setting up NLP pipelines

nix411 · February 12, 2019, 8:43am

Very nice!

About the accept or reject. What if your model only detects one Named Entity in the feed but it should have found two? Is that a reject as well?

ines · February 12, 2019, 11:46am

If you're doing binary annotation (e.g. using ner.teach), you're only giving feedback on this particular entity span that's highlighted. It will only ever show you one at a time, so if that one entity is correct, then it should be an accept. The model is only updated with that information. (Btw, my slides here show an example of how the updating with incopmplete information works.)

nix411 · February 12, 2019, 1:28pm

Great. Btw. I wanted to create new training data by using Matcher first. So I produced a ~10k examples of multi word, e.g.

[{
  'text': 'This is an example of a Big Volcano', 
  'entities': [(23, 35, 'MY_LABEL'), ]
}, ...]

Then I loaded en_core_web_sm with ner disabled (I want a fresh NER model), added a new ner to the pipeline and started training. After training I thought I would go ahead and use ner.make-gold but results were disappointing.

My question is; is the approach wrong for training a fresh NER model?

honnibal · February 13, 2019, 11:10am

Your approach doesn’t sound bad, but I can imagine a number of details that could go wrong. When you say the results were disappointing, were you mostly finding it proposed too many entities, or too few? Did you have an evaluation so you could measure accuracy? Also, how did you train? It sounds like you wrote your own script, which is fine, but there could be a couple of small things that might have gone wrong, e.g. no minibatching, not shuffling between epochs, etc.

Topic		Replies	Views
Prodigy single span data incompatible with NER model which expects all data to be present? usage , ner , api	3	876	August 17, 2018
train ner dataset -> ValueError: too many values to unpack ner , done	6	2626	January 10, 2020
Mixing Positive and Negative examples in Training Set for NER Modeling usage , ner , spacy	1	613	October 1, 2020
Improve trained models with annotations usage , ner , training	3	521	September 20, 2021
Using binary accept/reject from NER teach in Spacy ner	1	1057	February 5, 2020

✨ VIDEO: FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy

Related topics