currently I use spacy only for lemmatization/parsing of colloquial language, without word vector capabilities. I have a problem with the POS-tagging. my code
tokens = [tok.lemma_.lower().strip() for tok in doc if tok.pos_ != ‘PRON’]
recognizes “my”, “your” etc as PRON, but not “mine”, “your’s” My current hack is to modify all occurrences of “mine” by “my” but it’s hardly elegant. (The word “mine” does not occur as “the explosive device” in my document.)
Once you’ve labelled some examples, you can run pos.batch-train and see if it improves. Ideally, you also want to evaluate it against a representative set that includes a good mix of pronouns.