lemmas in the annotation workflow

Hi all,

Since spaCy has now a trainable lemmatizer I was wondering how lemmatization is treated now in Prodigy.

Lets say, I have data with POS, dependency and lemmata annotations that has been prepared with spaCy and I want to correct the POS using Prodigy. Will the lemmata be kept after exporting the corrected data to json?



Hi Jacobo.

you can re-use spaCy in our pos.correct recipe and this data can be used to train a new spaCy model. However, I'm not aware of a lemmatization interface. Did you create a custom one?

With regards to training models. It's good to be aware that the train recipe in Prodigy supports POS/dependency via --tagger and --parser but it currently does not support an interface with lemmatization.

Some Details on train

You can pass a base model to train via --base-model and this will give you the a pipeline as a starting point to train on. This includes having pre-trained weights, but your trained model will also contain all the pipeline components from the original model. That means that right now, if you were to finetune using --basemodel that you would also get a lemmatizer in the pipeline.

Use spaCy directly

In your case, it sounds like you want something more custom/specific than what Prodigy's train command offers and in this case I might recommend using spaCy directly. You can use the spacy-config recipe to generate a config file that can be used as a starting point and you might also find some inspiration by checking out the examples projects on this Github repo. I also found this example that demonstrates a trainable lemmatizer that might be of interest.

Does this help? Feel free to ask extra questions!


thanks for your answer. What I would like to do is to correct with Prodigy some conllu files that I have generated with my own spaCy models (greCy). To be more precise, I want to correct dependencies, pos, and tags, but I'm afraid that when importing the data into Prodigy the lemmatization will be lost. I will have to look into this in more detail and possibly develop my own recipe.



1 Like