Does spacy NER model use POS for modelling

Hi guys,

quick question… I am training a new entity (similiar to ORG. it is actually the “current employed org” for staffing recruiters typing in notes like “Carl is currently at BoA” or “John is working at Home Depot”… )

the training is going well… but the NER stumbles often with picking up the full “phrase” of the company name, due mostly to the recruiters lack of using capitalization, for instance “John is working at Home depot but really wants to move on”… NER does well with “Home Depot”… but not so much with “Home depot”

I would like NER to capture the full phrase “Home depot” (which of course I am labelling and correcting through ner.make-gold). but… If I were forced to build a rule-based system, I have found a distinct pattern (through repetition of labelling) that their is often a conjunction or other POS transition type word that usually ends the company phrase (like the “but” in my Hope depot example above)… some others : “then/during/…” and almost never appears as part of the company name phrase (how many companies do you know named “Home but” etc.)

In short… Just curious… does the spacy modelling algo (which algo is it BTW? ) take into account POS, e.g. like noun phrases, or these conjunctions in it’s figuring out the correct boundaries? Is there any way to influence this model with rules (other than upfront or after-the-fact post or pre processing of results?)

Will the spacy model pick up this word-transition-boundary correctly without me pushing it?



You can find out more about the NER model here:

The short answer is: no, the NER model doesn’t use POS tags as features. However, you could use POS tags in a patterns file to have the model suggest noun phrases to annotate for you. On the other hand, I’m not sure it will be faster than using the ner.make-gold workflow.

Would it make sense to add a multitask objective that inserts the POS as a feature?

1 Like

You can add a multi-task objective to train the model with POS information, yes. Effectively you’re making the CNN layers jointly predict the POS and the other objective, which is sort of equivalent to using POS features.

My experience with the multi-task objectives has mostly been negative, though. I usually get <0.5% improvement in accuracy, so I usually leave them off by default. It should be easy to try, though!