Export POS tags

yllwpr · May 28, 2022, 2:30pm

I have annotated a dataset with ner.manual, trained a model and annotated more data using ner.correct.

Is it now possible to also get the POS tags for the tokens using db-out?

koaning · May 30, 2022, 8:22am

Just to confirm, have you been using ner.manual and ner.correct to label part of speech tags that you'd now like to export? Or are you interested in exporting the labelled entities together with the POS information that spaCy might predict?

In the case of the former; db-out will be able to return all annotations.

In the case of the latter, you'll need to use a spaCy model to attach the POS information after exporting the data via db-out. Alternatively, you could also train a spaCy model based on en_core_web_md (assuming you use English) which can already detect POS tags. You can train a pipeline using it as a starting point to detect the entities that you've labelled and then you'll have a model that can predict both.

yllwpr · May 30, 2022, 9:48am

I want to export the labeled entities including the pos tags. If I use the en_core_web_md, then the default entities are also included there or? If I want to determine the Pos tags of the entities afterwards, I have to ensure an equal tokenizing. I have trained with blank:en and on it and use the default tokenizer.

koaning · May 30, 2022, 1:20pm

The English models should all be using the same tokenizer unless you've customised it.

Here's how I might go about it.

prodigy train --ner <datasetname> --base-model en_core_web_md --lang en <folder-out>

I'm using the train command here that uses the en_core_web_md pipeline as a starting point. Once the model is done training, I can load it.

import spacy

# Load the trained model
nlp = spacy.load("<folder-out>/model-best/")

# Run the model
doc = nlp("do you speak Python")

# Confirm the POS 
[t.pos_ for t in doc]
# ['AUX', 'PRON', 'VERB', 'PROPN']

# Confirm ents 
doc.ents
# (Python, )

This way, you can re-use the POS from a base model while I'm adding the NER from the Prodigy labels. Note that these POS estimators are statistical predictions though. They are predictions that will be wrong once in a while.

Topic		Replies	Views
Linguistic features configured for a non-english model usage , spacy , solved	2	466	January 11, 2019
help - first process of annotation usage , ner , solved , pos	15	925	August 7, 2021
Spacy features - NER manual ? ner , spacy , solved	5	560	January 31, 2021
Load pre-tagged entities ner.manual usage , ner , solved	8	1248	May 15, 2018
Does spacy NER model use POS for modelling enhancement , ner , spacy	3	1220	October 25, 2018

Export POS tags

Related topics