Blank spacy model without being trained

solved
usage
ner
spacy
(Anne) #1

Hi,

I would like to run ner.manual with a blank spacy model. And I was wondering how to make a blank spacy model (completely empty, without being trained). Or should the model be trained?

I have tried things like:

model_nlp = spacy.blank('en')
model_nlp.add_pipe(model_nlp.create_pipe('ner'))
model_nlp.begin_training()
model_nlp.to_disk('model-ner-en-blank')

But when I use this in the command I get this error:

FileNotFoundError: [Errno 2] No such file or directory: 'model-ner-en-blank/ner/tok2vec_model'

This is the command that I want to use:

prodigy ner.manual NerTagDB model-ner-en-blank AllSentences.jsonl --label "NSAID, COAG"

Thanks,
Anne

(Ines Montani) #2

Hi! If you only want to run ner.manual, you don’t even need the ner component in the pipeline. The model is only used for tokenization. So you can use a completely blank model, or even a pre-trained model, as long as it has the same language and tokenization rules.

Your code does seem okay, though, for creating a blank model with a blank entity recognizer. Did you double-check that the directory model-ner-en-blank you’re referending on the command line definitely exists? And if it does, can you check what’s in it?

(Anne) #3

Hi, thank you for your quick response.
Yes, the model-ner-en-blank exists. There are two documents in this folder: meta.json and tokenizer. There are also two folders: ner and vocab. The ner folder contains the following files: cfg, model and moves. The vocab folder contains: key2row, lexemes.bin, strings.json and vectors.

When I run this:

prodigy ner.manual NerTagDB AllSentences.jsonl --label "NSAID, COAG"

I get the following error:

OSError: [E053] Could not read meta.json from AllSentences.jsonl / meta.json

Or is this not what you mean by: “you don’t need the ner component”. Or do you mean that I can use, for example, en_core_web_sm.

When I run the following:

prodigy ner.manual NerTagDB model-ner-en-blank AllSentences.jsonl --label "NSAID, COAG"

I get the error again:

FileNotFoundError: [Errno 2] No such file or directory: 'model-ner-en-blank/ner/tok2vec_model'

What I want to do with ner.manual is to tag two entities (NSAID and COAG) manually. And then probably to use ner.teach and / or ner.batch-train, with the data that came out.

(Anne) #4

Hi,

We have succeeded in making a blank spacy model. The strange thing is that it works with an older version of spacy 2.0.18 and not with the version I had2.1.3.
With the older version, the files that are needed are created. And now it works with this command:

prodigy ner.manual NerTagDB model-ner-en-blank AllSentences.jsonl --label "NSAID, COAG"
1 Like
(Ines Montani) #5

Ahh, this explains a lot. Glad you got it working! And yes, Prodigy currently uses spaCy 2.0. Models between spaCy 2.0 and 2.1 aren’t compatible, which is likely why you were seeing this error.

It’s also the reason we’re still working on testing spaCy v2.1 with Prodigy before we release the new Prodigy version that depends on spaCy v2.1 (see this thread for details). Once that’s out, everyone will need to retrain their models, so we need to make sure everything works as expected.

Also, just to clarify:

Yes, I meant that you could also use the en_core_web_sm model. Its tokenization rules will be the same as the tokenization rules of the blank model, and the tokenizer is all ner.manual needs. (It pre-tokenizes the text to make it easier to highlight words because the selection can snap to the token boundaries. It also helps you spot tokenization issues and prevents you from blindly creating annotations that will never be “true” in real life because the tokenization doesn’t match the entities you highlight.)