✨ VIDEO: FAQ #1: Tips & tricks for NLP, annotation & training with Prodigy and spaCy

project
news

(Ines Montani) #1

I recorded a new video about frequently asked Prodigy questions (inspired by the forum), and some more general advice about structuting annotation projects, designing label schemes and setting up NLP pipelines :smiley:


(Nicolai Bjerre Pedersen) #3

Very nice!

About the accept or reject. What if your model only detects one Named Entity in the feed but it should have found two? Is that a reject as well?


(Ines Montani) #4

If you’re doing binary annotation (e.g. using ner.teach), you’re only giving feedback on this particular entity span that’s highlighted. It will only ever show you one at a time, so if that one entity is correct, then it should be an accept. The model is only updated with that information. (Btw, my slides here show an example of how the updating with incopmplete information works.)


(Nicolai Bjerre Pedersen) #5

Great. Btw. I wanted to create new training data by using Matcher first. So I produced a ~10k examples of multi word, e.g.

[{
  'text': 'This is an example of a Big Volcano', 
  'entities': [(23, 35, 'MY_LABEL'), ]
}, ...]

Then I loaded en_core_web_sm with ner disabled (I want a fresh NER model), added a new ner to the pipeline and started training. After training I thought I would go ahead and use ner.make-gold but results were disappointing.

My question is; is the approach wrong for training a fresh NER model?


(Matthew Honnibal) #6

Your approach doesn’t sound bad, but I can imagine a number of details that could go wrong. When you say the results were disappointing, were you mostly finding it proposed too many entities, or too few? Did you have an evaluation so you could measure accuracy? Also, how did you train? It sounds like you wrote your own script, which is fine, but there could be a couple of small things that might have gone wrong, e.g. no minibatching, not shuffling between epochs, etc.