de_core_news_sm label question

usage
(Uwe Steiner) #1

Hi
we want to use de_core_news_sm

the following command works
python -m prodigy ner.manual text_prodigy_KISME_09042019 de_core_news_sm data/text_prodigy_KISME_09042019.jsonl --exclude text_prodigy_KISME_09042019 --label ORG

the following command does not work
python -m prodigy ner.manual text_prodigy_KISME_09042019 de_core_news_sm data/text_prodigy_KISME_09042019.jsonl --exclude text_prodigy_KISME_09042019 --label ORG, COMPANY

but we need two labels for annotation - which labels are allowed?

prodigy version 1.6.1
spaCy version v2.0.16

thanks
Uwe

0 Likes

(Ines Montani) #2

I first thought you were asking about the different annotation schemes – but I think the solution might be much simpler. Try replacing this:

with this:

--label ORG,COMPANY

Spaces in command line commands usually separate arguments and values – if the values contain spaces, there’s no way it can know where the value of the label argument ends. You can also put the labels in quotation marks, like "ORG, COMPANY".

Btw, just for completeness: If you’re running the ner.manual recipe, you’ll be labelling by hand anyways and the model is only used for tokenization. So the labels that are already in the model won’t matter at this step. However, if your plan is to update an existing pre-trained model, you probably want to be using consistent labels. You can find more details on the label schemes used in spaCy’s pre-trained models here.

0 Likes

(Uwe Steiner) #3

thanks a lot

prodigy ner.manual text_prodigy_KISME_09042019 de_core_news_sm data/text_prodigy_KISME_09042019.jsonl --exclude text_prodigy_KISME_09042019 --label ORG,COMPANY

works fine

and yes ner.manual we use because we are at the start - as soon as we have enough data to train a modell we will shift to the other method

Kind regards
Uwe

0 Likes

(Uwe Steiner) #4

and by the way - you are very fast with your answer - perfect service - thanks again

0 Likes