I have downloaded the en_core_web_lg model and I have my own input JSONL text data in a new domain (medical/clinical). Now I want to use Prodigy to train a POS tagger that works well on my domain (which of course includes new words that were probably not in the original trained model). As near as I can tell, your documentation does not describe how to do this.
What is the command line command that I need to use to do this? There seems to be no pos.manual recipe. The closest I found is pos.teach but that doesn't seem to show what I want. I was expecting to see a set of possible tags in the interface so that I can select a word in my text and apply that tag to that word.
Please describe how to use this tool to do POS tagging for a new domain like this.
Hi! There's no built-in pos.manual, because it's kinda rare that you want to do part-of-speech labelling entirely from scratch – and if you do, you can also use ner.manual, which pretty much does the same thing and lets you highlight tokens manually.
I'd recommend starting with pos.make-gold and your model, and focus on the most relevant POS tags. The model will pre-highlight the tags it predicts for the incoming text, and you can correct them or add annotations it's missing. Even if the model isn't doing great on your domain, there are still a lot of decisions it'll make that are correct, so you won't have to replicate them all by hand. Seeing the predictions visualized also gives you a better sense of what the model is struggling with.
You probably want to pick a few labels to start with that are both easy to annotate and have the biggest impact. Also, a little tip in case you haven't seen it: since you'll only be annotating single tokens, you can also double-click on the token to highlight it in the UI (instead of clicking and dragging).