I recorded a new video about frequently asked Prodigy questions (inspired by the forum), and some more general advice about structuting annotation projects, designing label schemes and setting up NLP pipelines
Very nice!
About the accept or reject. What if your model only detects one Named Entity in the feed but it should have found two? Is that a reject as well?
If you're doing binary annotation (e.g. using ner.teach
), you're only giving feedback on this particular entity span that's highlighted. It will only ever show you one at a time, so if that one entity is correct, then it should be an accept. The model is only updated with that information. (Btw, my slides here show an example of how the updating with incopmplete information works.)
Great. Btw. I wanted to create new training data by using Matcher
first. So I produced a ~10k examples of multi word, e.g.
[{
'text': 'This is an example of a Big Volcano',
'entities': [(23, 35, 'MY_LABEL'), ]
}, ...]
Then I loaded en_core_web_sm
with ner
disabled (I want a fresh NER model), added a new ner
to the pipeline and started training. After training I thought I would go ahead and use ner.make-gold
but results were disappointing.
My question is; is the approach wrong for training a fresh NER model?
Your approach doesnβt sound bad, but I can imagine a number of details that could go wrong. When you say the results were disappointing, were you mostly finding it proposed too many entities, or too few? Did you have an evaluation so you could measure accuracy? Also, how did you train? It sounds like you wrote your own script, which is fine, but there could be a couple of small things that might have gone wrong, e.g. no minibatching, not shuffling between epochs, etc.