Evaluating multi-class classification

Hi team!

I have managed to annotate enough data for each of my labels to achieve around 0.8 F-score out of batch-train
Next, I would like to merge all the label-specific datasets, train multi-class classifier (non-exclusive labels), and evaluate it against gold-standard eval. set.

1] According to both the docs and this post:

the mark recipe is recommended for creating evaluation datasets.

I wonder why is the mark recipe prefered compared to the textcat.manual recipe? What is the actual difference between the two?

2] Is there any way I could use textcat.eval for multi-class classification on data annotated using choice view or does the textcat.eval only work for single-label evaluation? I only care if the correct label is among those assigned to a text...

Thank you!

Best regards,
Jan

Hi @JanP,

The textcat.manual is newer than that comment, which is why Ines didn't mention it there. I think you're probably right that textcat.manual is the way to go for you.

Regarding the textcat.eval, the recipe is very simple, so you should be able to adjust it for your needs quite easily. All it really does is run and print the evaluation at the end of the recipe, in the on_exit callback. There's a helper method prodigy.components.preprocess.convert_options_to_cats that converts from the options format used by the choice interface into the cats dictionary used by the text classification models. If you add this transformation in the on_exit callback, the recipe should work on the choice data.

1 Like

Thank you @honnibal !

How should the results of this version of textcat.eval be interpreted? If only one label out of two is predicted corretly, will this translate into Correct or Incorrect?

@honnibal

It seems that I am confused about what textcat.eval does. Docs say Evaluating a trained text classification model, create an evaluation set.
I was hoping that after creating the evaluation set using textcat.manual, I could then use textcat.eval to see how my model trained using textcat.batch-train fares on unseen examples.

1] How can I use it just for the evaluation part?
Then, I suppose that the [source] argument is the labeled evaluation set, but why does it start a server for more annotation?

2] I have just tried using the prodigy.components.preprocess.convert_options_to_cats in the same way as it is used in textcat.batch-train - right after it connects to the db

def on_exit(ctrl):
        examples = ctrl.db.get_dataset(dataset)
        data = dict(model.evaluate(convert_options_to_cats(examples)))
        print(printers.tc_result(data))

However, this seems to result in an empty stream :thinking:

Thank you in advance for any suggestions!

The idea of textcat.eval is to create an evaluation set and evaluate your model on it. If you already have an evaluation set created with textcat.manual, all you should have to do is load the data, load the model and then call model.evaluate and print the results. So you don't need the recipe to return the dictionary of components and start the annotation server with data. You just want to run a quick script that loads the model and evaluates it against your data.

So basically, something like this:

from prodigy.components.db import connect
from prodigy.models.textcat import TextClassifier
from prodigy.components.preprocess import convert_options_to_cats
from prodigy.components import printers
import spacy

db = connect()
label = "YOUR_LABEL"
nlp = spacy.load("/your_model", disable=["tagger", "parser", "ner"])
model = TextClassifier(nlp, label)
examples = db.get_dataset(dataset)
data = dict(model.evaluate(convert_options_to_cats(examples)))
print(printers.tc_result(data))
1 Like

Hi @ines!

I see! Thanks for this!

Since I want to evaluate how many of the texts in evaluation set got at least one correct label from my model (multi-class), should I simply substitute "YOUR_LABEL" for the list of all my labels or do I need to modify the code to make it work for multiple cats?

What exactly does the convert_options_to_cats do? If a text labeled in choice interface has several labels, is it just converting it into several examples for each attributed label?