colloquial pronouns not labeled as pron

aph61 · January 29, 2019, 9:35am

currently I use spacy only for lemmatization/parsing of colloquial language, without word vector capabilities. I have a problem with the POS-tagging. my code

tokens = [tok.lemma_.lower().strip() for tok in doc if tok.pos_ != ‘PRON’]

recognizes “my”, “your” etc as PRON, but not “mine”, “your’s” My current hack is to modify all occurrences of “mine” by “my” but it’s hardly elegant. (The word “mine” does not occur as “the explosive device” in my document.)

suggestions? or just keep hardcoding

thanks,

andreas

ines · January 29, 2019, 4:28pm

If your hard-coded solution works well, why not

But you could also try and improve the POS tagger, specifically the PRON label, on your data using the pos.teach recipe:

prodigy pos.teach your_dataset en_core_web_sm ./your_data.jsonl --label PRON

Once you’ve labelled some examples, you can run pos.batch-train and see if it improves. Ideally, you also want to evaluate it against a representative set that includes a good mix of pronouns.

Topic		Replies	Views
Linguistic features configured for a non-english model usage , spacy , solved	2	461	January 11, 2019
Pipeline for POS corrections and dep corrections usage , spacy , dep , pos	1	557	March 31, 2021
Training POS Tager for Indonesian Language usage , spacy , pos	5	1284	November 20, 2019
Custom POS tag model and errors spacy , custom , pos	3	2355	January 16, 2019
Where is tag-map gone in spaCy 3 usage , spacy , solved	4	1473	October 3, 2022

colloquial pronouns not labeled as pron

Related topics