Evaluating multi-class classification

JanP · November 14, 2019, 12:17pm

Hi team!

I have managed to annotate enough data for each of my labels to achieve around 0.8 F-score out of batch-train
Next, I would like to merge all the label-specific datasets, train multi-class classifier (non-exclusive labels), and evaluate it against gold-standard eval. set.

1] According to both the docs and this post:

Trouble creating evaluation set with textcat.eval

I’d suggest starting with the mark recipe, which takes a stream, and optional label and the name of an annotation interface, and will show you whatever comes in in exactly that order. So you can do something like this, pass in your label wonen and annotate whether the label applies to the text or not:
prodigy mark your_eval_set your_data.jsonl --label wonen --view-id classification

the mark recipe is recommended for creating evaluation datasets.

I wonder why is the mark recipe prefered compared to the textcat.manual recipe? What is the actual difference between the two?

2] Is there any way I could use textcat.eval for multi-class classification on data annotated using choice view or does the textcat.eval only work for single-label evaluation? I only care if the correct label is among those assigned to a text...

Thank you!

Best regards,
Jan

honnibal · November 15, 2019, 1:47pm

Hi @JanP,

The textcat.manual is newer than that comment, which is why Ines didn't mention it there. I think you're probably right that textcat.manual is the way to go for you.

Regarding the textcat.eval, the recipe is very simple, so you should be able to adjust it for your needs quite easily. All it really does is run and print the evaluation at the end of the recipe, in the on_exit callback. There's a helper method prodigy.components.preprocess.convert_options_to_cats that converts from the options format used by the choice interface into the cats dictionary used by the text classification models. If you add this transformation in the on_exit callback, the recipe should work on the choice data.

JanP · November 17, 2019, 4:25pm

Thank you @honnibal !

How should the results of this version of textcat.eval be interpreted? If only one label out of two is predicted corretly, will this translate into Correct or Incorrect?

JanP · November 19, 2019, 1:39pm

@honnibal

It seems that I am confused about what textcat.eval does. Docs say Evaluating a trained text classification model, create an evaluation set.
I was hoping that after creating the evaluation set using textcat.manual, I could then use textcat.eval to see how my model trained using textcat.batch-train fares on unseen examples.

1] How can I use it just for the evaluation part?
Then, I suppose that the [source] argument is the labeled evaluation set, but why does it start a server for more annotation?

2] I have just tried using the prodigy.components.preprocess.convert_options_to_cats in the same way as it is used in textcat.batch-train - right after it connects to the db

def on_exit(ctrl):
        examples = ctrl.db.get_dataset(dataset)
        data = dict(model.evaluate(convert_options_to_cats(examples)))
        print(printers.tc_result(data))

However, this seems to result in an empty stream

Thank you in advance for any suggestions!

ines · November 19, 2019, 10:38pm

The idea of textcat.eval is to create an evaluation set and evaluate your model on it. If you already have an evaluation set created with textcat.manual, all you should have to do is load the data, load the model and then call model.evaluate and print the results. So you don't need the recipe to return the dictionary of components and start the annotation server with data. You just want to run a quick script that loads the model and evaluates it against your data.

So basically, something like this:

from prodigy.components.db import connect
from prodigy.models.textcat import TextClassifier
from prodigy.components.preprocess import convert_options_to_cats
from prodigy.components import printers
import spacy

db = connect()
label = "YOUR_LABEL"
nlp = spacy.load("/your_model", disable=["tagger", "parser", "ner"])
model = TextClassifier(nlp, label)
examples = db.get_dataset(dataset)
data = dict(model.evaluate(convert_options_to_cats(examples)))
print(printers.tc_result(data))

JanP · November 20, 2019, 4:28pm

Hi @ines!

I see! Thanks for this!

Since I want to evaluate how many of the texts in evaluation set got at least one correct label from my model (multi-class), should I simply substitute "YOUR_LABEL" for the list of all my labels or do I need to modify the code to make it work for multiple cats?

What exactly does the convert_options_to_cats do? If a text labeled in choice interface has several labels, is it just converting it into several examples for each attributed label?

Topic		Replies	Views
Trouble creating evaluation set with textcat.eval usage , textcat , solved	2	899	August 11, 2018
Multi label tagging usage , textcat	1	1180	September 10, 2018
Is textcat.teach (as out-of-the-box) appropriate with multilabel tasks? textcat , solved	4	338	June 28, 2022
Textcat correct recipe usage , textcat , solved	1	629	September 16, 2020
Merge annotations for multi label classification tasks (non mutually exclusive) usage , textcat	3	779	January 25, 2021

Evaluating multi-class classification

Related topics