Textcat model with multiple classes

ines · October 31, 2019, 10:33am

Hi! Glad to hear things have been going well so far

Your current approach does indeed sound a bit complicated for what it is, and I'm sure there's an easier way to achieve the same result. Have you had a look at the textcat.manual recipe yet? It shows the text and options in a multiple-choice interface and the format you get out is directly compatible with textcat.teach.

A single annotation task in the choice format could look like this:

{
    "text": "Some text",
    "options": [
        {"id": "LABEL1", "text": "Label 1"},
        {"id": "LABEL2", "text": "Label 2"}
    ]
}

When you select an option, a key "accept" is added to the task and it holds a list of the selected IDs. For example: "accept": ["LABEL1", "LABEL2"]. You can also provide those when you load in the data to pre-select certain categories – e.g. based on your rules – and then correct them if needed.

For training, you might also consider training with spaCy directly – this gives you more flexibility and you get to tweak more settings, experiment with different architectures etc. See here for an example script.

Datasets in Prodigy hold the annotations you collect. There's typically no need to import raw data before you annotated – this can all be done on the command line when you start the recipe.

Datasets are append-only so you'll never lose any state or data. So if you want to manually edit examples in an existing dataset, you should export it, edit it and then import it to a new dataset. This creates more data overall – but it means you'll always be able to recover the previous dataset. We recommend creating a new dataset for every annotation experiment, annotation type etc. Merging datasets later is easy – there's a db-merge command and each example has hashes that let you find all annotations on the same input text. You can also think of a dataset as one unit of data you'd run a particular experiment with.

Topic		Replies	Views
How to do multiclass textcat? usage , textcat	8	4754	May 25, 2018
training data format for multiclass textcat Getting Started usage , textcat	7	1542	August 29, 2022
Best use of `textcat.teach` usage , textcat	2	1431	June 18, 2020
Help with textcat workflow usage , textcat , solved	3	641	August 13, 2021
Multiple, separate text classifications in a single model usage , textcat , solved	12	2884	September 28, 2021

Textcat model with multiple classes

Related topics