How to do POS tagging with this tool?

westofpluto · December 13, 2019, 12:01am

I have downloaded the en_core_web_lg model and I have my own input JSONL text data in a new domain (medical/clinical). Now I want to use Prodigy to train a POS tagger that works well on my domain (which of course includes new words that were probably not in the original trained model). As near as I can tell, your documentation does not describe how to do this.

What is the command line command that I need to use to do this? There seems to be no pos.manual recipe. The closest I found is pos.teach but that doesn't seem to show what I want. I was expecting to see a set of possible tags in the interface so that I can select a word in my text and apply that tag to that word.

Please describe how to use this tool to do POS tagging for a new domain like this.

ines · December 13, 2019, 9:55am

Hi! There's no built-in pos.manual, because it's kinda rare that you want to do part-of-speech labelling entirely from scratch – and if you do, you can also use ner.manual, which pretty much does the same thing and lets you highlight tokens manually.

I'd recommend starting with pos.make-gold and your model, and focus on the most relevant POS tags. The model will pre-highlight the tags it predicts for the incoming text, and you can correct them or add annotations it's missing. Even if the model isn't doing great on your domain, there are still a lot of decisions it'll make that are correct, so you won't have to replicate them all by hand. Seeing the predictions visualized also gives you a better sense of what the model is struggling with.

You probably want to pick a few labels to start with that are both easy to annotate and have the biggest impact. Also, a little tip in case you haven't seen it: since you'll only be annotating single tokens, you can also double-click on the token to highlight it in the UI (instead of clicking and dragging).

Topic		Replies	Views
Train POS on new Language usage , pos	2	692	December 30, 2018
Disambiguate POS Tags usage , pos	1	708	September 21, 2018
prodigy train tagger not working pos , more-info-needed	3	566	November 10, 2020
Training POS Tager for Indonesian Language usage , spacy , pos	5	1302	November 20, 2019
Basic question about Prodigy annotations and model training. usage , ner	12	753	January 18, 2019

How to do POS tagging with this tool?

Related topics