Trouble creating evaluation set with textcat.eval

ines · August 11, 2018, 12:48pm

Hi and sorry about the per-post link limit! (It's mostly for spam bot protection, and since we've already had issues with spam bots before so I'm scared to turn off this setting )

textcat.eval is mostly intended as a quick tool for performing live evaluations of existing models. It lets you answer questions like "How would my model perform on this new data?", without having to create a full evaluation set first and then running a separate evaluation. Instead, the recipe uses the model, runs it over the new text and asks you whether the predictions are correct. This way way, Prodigy can immediately show you the results when you exit the session. Here's the link to the thread you mentioned, which explains this in more detail:

That said, the recipe is mostly intended for evaluations you do during development. Once you're ready to perform a full, repeatable and gold-standard evaluation, you usually want to create a new set manually. The nice thing is that if you use Prodigy's textcat.batch-train recipe, you can evaluate from the same binary annotation style – so a set of examples with "accept" and "reject"
annotations.

I'd suggest starting with the mark recipe, which takes a stream, and optional label and the name of an annotation interface, and will show you whatever comes in in exactly that order. So you can do something like this, pass in your label wonen and annotate whether the label applies to the text or not:

prodigy mark your_eval_set your_data.jsonl --label wonen --view-id classification

For each label you want to annotate, you can then start a new session over the same data, and add it to your evaluation set. It might sound unintuitive at first, but we've found that it's often faster and more efficient to make several passes over the data and annotate it once for each label. Your brain gets to focus on one label and concept at a time, you won't have to click as much (because you're only saying yes or no), and you'll end up with one binary decision for each label on each text, which you can evaluate your model on later.

(If you do want to solve the gold-standard annotation differently and do it all in one – for example, if you have too many labels – you could also create a custom evaluation recipe using the choice interface. The selected labels will then be stored like "accept": ["wonen"], so if you want to run your evaluation within Prodigy, you'll have to convert the data to the "label": "wonen" format. You can find an example of the recipe and workflow in the "Quickstart" section at the bottom of this page.)

Quick note, also in case others come across this thread later: It's important to keep in mind that spaCy's text classifier assumes that the categories are not mutually exclusive, so this will also be the basis of the built-in evaluation when you pass an --eval-id dataset to textcat.batch-train.

Let me have a look at this! The label should specify the category you want to annotate, and only predictions for that category should be shown. Prodigy should add those to the data as it comes in, so you shouldn't have to add any labels to the data in advance.

Topic		Replies	Views
Evaluating multi-class classification usage , textcat	5	610	November 20, 2019
Stream textcat.eval from stdin? textcat , done	2	660	April 21, 2019
textcat.eval use and Prodi.gy's evaluation workflow? textcat , solved	3	808	May 10, 2018
Help with textcat workflow usage , textcat , solved	3	641	August 13, 2021
evaluate text classification model using spacy evaluate? textcat , spacy , solved	2	735	February 14, 2020

Trouble creating evaluation set with textcat.eval

Related topics