NER Annotation Highlight's nothing at beginning of sentence

Hi guys,

I am training my NER and I see, from time to time, a highlight at the beginning of the sentence, but it is not actually highlighting anything… for instance… I copy pasted from the text of the Web UI below for my CURRENT_EMPLOYER_ORG label I an labelling…

could it be that it is a ‘space’ (\s) that is trying to be considered as a match for my label? in this case, I should reject, right?

" CURRENT_EMPLOYER_ORGInterested in Nordstrom."

Second question:

same thing is happening with highlighting the period at the end of the sentence… I get like 10 in a row w/ the period highlighted (which I reject all)… then finally a few good examples (where most are reject as well… but at least the highlights are for words or phrases… )… then another 10 or so highlighting last character (usually periods… )… SCORES are like .40 for these as well??? Thoughts?

Last question… When prodigy is serving up tasks, does it ALWAYS make a predicted NER tag? what if there is really no match? or it is super low confidence (I assume that is a low SCORE on the UI? )… will it just pick the tagged tokens that has the greatest score? albeit, even if it is intolerably low and would be below any threshold we would consider a confident match/prediction?

Thanks for your help

I’ll answer the second question first, since I think it sheds light on the other issue:

Yes, even if the predictions are very low, the model will eventually ask something. The reason is that we can’t be sure the model is well calibrated: it might be predicting scores without much relationship to the truth. In this situation, we don’t want it to get “stuck”. We want to always have a way to get out of the problem and into a better situation.

That’s why the predictions are sometimes strange, including predictions of punctuation, spaces etc. The model doesn’t start off knowing that these things are never entities, so you’ll sometimes get asked about them.