lemmas in the annotation workflow

jcbmyrstn · April 5, 2023, 4:26am

Hi all,

Since spaCy has now a trainable lemmatizer I was wondering how lemmatization is treated now in Prodigy.

Lets say, I have data with POS, dependency and lemmata annotations that has been prepared with spaCy and I want to correct the POS using Prodigy. Will the lemmata be kept after exporting the corrected data to json?

Thanks,

Jacobo

koaning · April 6, 2023, 1:08pm

Hi Jacobo.

you can re-use spaCy in our pos.correct recipe and this data can be used to train a new spaCy model. However, I'm not aware of a lemmatization interface. Did you create a custom one?

With regards to training models. It's good to be aware that the train recipe in Prodigy supports POS/dependency via --tagger and --parser but it currently does not support an interface with lemmatization.

Some Details on `train`

You can pass a base model to train via --base-model and this will give you the a pipeline as a starting point to train on. This includes having pre-trained weights, but your trained model will also contain all the pipeline components from the original model. That means that right now, if you were to finetune using --basemodel that you would also get a lemmatizer in the pipeline.

Use spaCy directly

In your case, it sounds like you want something more custom/specific than what Prodigy's train command offers and in this case I might recommend using spaCy directly. You can use the spacy-config recipe to generate a config file that can be used as a starting point and you might also find some inspiration by checking out the examples projects on this Github repo. I also found this example that demonstrates a trainable lemmatizer that might be of interest.

Does this help? Feel free to ask extra questions!

jcbmyrstn · April 7, 2023, 11:05pm

Hi,

thanks for your answer. What I would like to do is to correct with Prodigy some conllu files that I have generated with my own spaCy models (greCy). To be more precise, I want to correct dependencies, pos, and tags, but I'm afraid that when importing the data into Prodigy the lemmatization will be lost. I will have to look into this in more detail and possibly develop my own recipe.

Thanks,

J.

Topic		Replies	Views
help - first process of annotation usage , ner , solved , pos	15	925	August 7, 2021
Basic question about Prodigy annotations and model training. usage , ner	12	753	January 18, 2019
Problem creating a new language to serve as a base model for further improvement in Prodigy spacy , pos	3	645	August 17, 2020
accuracy same for both prodigy train and spacy train usage , spacy , solved	4	786	January 19, 2020
Is there any way to train a model directly from SpaCy without using Prodigy after annotation? ner , spacy	1	120	May 20, 2024

lemmas in the annotation workflow

Some Details on train

Use spaCy directly

Related topics

Some Details on `train`