combined labelling for NER and Classification purposes

pooja · October 9, 2019, 2:15pm

Hello,
I used ner-manual with 10 labels. Out of 10, couple are for classification and rest for NER. I have several questions after ner-maual.

first all is it okay to combine labelling for both purposes if so, below questions
Do i need to delete classificationlabel1, classificationlabel2 in "spans" section while doing ner.batch-train? or not to mention those 2 labels in the --label option of ner.batch-train?
is there any easy way to convert prodigy jsonl to IOB format to feed to non-spacy models?

ines · October 14, 2019, 9:44am

You can use the same source data, but it's not required. I'd also recommend using separate datasets for your NER and text classification annotations, to avoid conflicts and make it easier to run separate experiments. You'll still be training separate model components and you might want to run them differently. Or maybe it turns out that you need slightly different annotations for the components to achieve better results.

When you labelled the text categories, did you use spans to do that? If so, yes – the NER model will be trained using the "spans" and expects them to be named entities. If you want to train a text classifier, you typically want to have one text and a top-level "label" and not labelled spans in the text.

Prodigy's output gives you the original text and the character offsets into the text. This should let you write converters for any common format you need. You can also use spaCy's biluo_tags_from_offsets helper to convert character offsets to token-based BILUO tags.

pooja · October 17, 2019, 9:10am

[INES] thanks a lot for giving clarity on NER vs classification labeling.
i have another doubt. My input is a document with lot of paragraphs. Sometimes i missed labeling entities. Now i am editing dataset to cover missing data. Does the model affects if i accidentally miss tagging lot of entities in the document?.

ines · October 18, 2019, 9:55am

Yes, if your data is inconsistent, your model may produce significantly worse results. During training, you're essentially asking it to come up with a strategy that's consistent with the training data – and a strategy based on wrong or inconsistent annotations may not generalise that well.

Topic		Replies	Views
Getting Started Questions usage , ner	1	631	November 6, 2018
Combining NER and Classification usage , ner , textcat , solved	7	722	August 5, 2022
combining multiple models and exporting training data to spacy ner , spacy	3	2883	November 13, 2018
Data format for label correction task based on pre-labelled dataset Getting Started	5	351	June 24, 2022
Merging single label-based models into one multiple label-model usage , ner , solved	3	1078	June 10, 2020

combined labelling for NER and Classification purposes

Related topics