Initializing custom model for ner

beperron · January 24, 2021, 9:50pm

I am working on training a new entity type and have been following the demonstration videos. My current effort is to use word embeddings created using FastText in Gensim. I initialized my model as such:

!python3 -m spacy init-model en /tmp/vectors --vectors-loc dispo_vectors.txt

Then I used Prodigy to generate a terminology list, which was then converted to a set of patterns stored as a jsonl. My interest is to train the model on my text_summaries, which are saved in a text (txt) format. Here is the code that I am running:

!prodigy ner.teach opioids_ner /tmp/vectors text_summaries.txt --loader txt --label OPIOIDS --patterns opioid_patterns.jsonl

This produces the following error:

KeyError: "[E001] No component 'ner' found in pipeline. Available names: ['sentencizer']"

I understand that this is due to the pipeline not having the ner. I found that the fix is to add the ner, which I did using the following:

nlp = spacy.load('/tmp/vectors')
nlp.add_pipe(nlp.create_pipe('ner'))
nlp.to_disk('unigram-empty_ner')

This creates a new directory called unigram-empty_ner, with a meta.json file, and two subdirectories, ner and vocab. I assumed that I could now just load the model using something like:

vectors_ner_added = spacy.load('unigram-empty_ner')

And, then replace the original model (/tmp/vectors) with vectors_ner_added:

!prodigy ner.teach opioids_ner vectors_ner_added text_summaries.txt --loader txt --label OPIOIDS --patterns opioid_patterns.jsonl

But, obviously, that doesn't work because vectors_ner_add is a directory. Any guidance would be greatly appreciated.

Thanks

ines · January 25, 2021, 2:18am

Hi! In theory, you can just add the component exactly like you did – but you need to call nlp.initialize() to initialize the weights. Then you can save the output to a directory. Prodigy supports loading a model from a package name, or from a path – so you don't need to call spacy.load in Python, you can just pass in your directory path unigram-empty_ner as the spaCy model when you start Prodigy. (If you're in a notebook, it's maybe a bit less intuitive, but keep in mind that the prodigy commands are CLI commands, not code, so they're executed in a different context than the Python code.)

Btw, since the drug prediction video is a bit older and Prodigy has a bunch of other additional workflows now: instead of doing a "cold start" with ner.teach (using a model that doesn't know anything and trying to teach it enough with patterns so it can make suggestions), you could also try and collect a small dataset of semi-manual annotations using ner.manual + your patterns. This can give you a more reliable start, because you can make sure that the model sees enough examples of your entity type, and enough texts where it sees the correct answer for every single token.

Topic		Replies	Views
Help with training from scratch english NER model with pretrained Gensim vectors usage , ner , spacy	2	642	January 27, 2022
Loading fasttext vectors to spacy/prodigy ner , spacy , solved	9	1542	February 13, 2022
Add custom NER model from prodigy to spacy pipeline usage , ner , spacy , solved	3	2336	October 5, 2022
Train multiple NER from a blank FR model using fastext vectors usage , ner , spacy	12	855	March 24, 2020
Roadmap of having a unified model for tokenizing, NER and dependency parsing using Prodigy ner , spacy , custom , training	1	418	July 7, 2023

Initializing custom model for ner

Related topics