How to do multiclass textcat?

kwtrnka · September 15, 2017, 11:06pm

I think I’m not understanding something basic about the API. If I need to categorize text into 20 classes, do I need to make 20 different datasets? Or do I need to pretrain a spacy model to randomly output those classes first?

ines · September 16, 2017, 9:48am

Prodigy supports annotating multiple classes or labels at once, so you can do something like:

prodigy textcat.teach my_dataset en_core_web_sm my_source.jsonl --label POLITICS,ECONOMY

You can always keep adding more examples of different labels to the same dataset. When you use the textcat.batch-train command, Prodigy will read all available classes from the ones available in your dataset and train them.

When using Prodigy for text classification, there’s no explicit need for the spaCy model to know the classes beforehand. Depending on the data you’re working with and the classes you want to annotate, it might make sense to start off with a terminology list, which you can bootstrap using the terms.teach recipe. The list could either cover all classes, or you could create one for each class (depending on the data and how fine-grained the categories are). If you haven’t seen it yet, check out the end-to-end example of training an insults classifier with Prodigy. The example only covers two classes (“insult” and “not insult”), but the same approach should work for a multi-class task as well.

Ultimately, it all comes down to experimenting with what works best on your data – and Prodigy can hopefully help with that

Btw, a quick note on the annotation strategy: To make the most of the binary annotation UI, we generally recommend not annotating too many classes at once, especially if they’re very different content-wise. Moving through the examples quickly works best if you (or the annotator) can focus on one objective at a time and doesn’t have to spend much time reading and analysing the annotation task. For example, if you’re annotating whether a text is about food or about cars, switching between those objectives on each decision can make annotation less effective, so it might be better to annotate both classes separately. (This is mostly a UX psychology consideration, though.)

kwtrnka · September 18, 2017, 4:59pm

Oh ok. Say when I give 3 classes and reject the annotation, what does it do? Is the underlying model forced to be a one-vs-rest configuration so that it can use that as training data or is that data just ignored?

On the annotation strategy I get the idea, but are there any studies backing it up? Is it faster to do 1000 30-way annotations vs 30 * 1000 binary annotations? And how do you manage the active learning part if you decompose it as one-vs-rest? One class may need only 100 examples to plateau but another may need 5000.

Partly I ask because I tried doing one of my classes as binary but the true vs false case was very skewed and the classifier just always predicted no so it defeated the active learning.

honnibal · September 19, 2017, 10:58am

Hi Keith,

Thanks for the questions. In order:

1. How the multi-class classification works

The model supports potential “multi-tag” classification — so each class is a neuron in the final layer, with the output scores compressed using a logistic transform. You can see the network definition here:

github.com

explosion/spaCy/blob/3fa76c17d19b49162652976207e030e484888f02/spacy/_ml.py#L529


            if bp_y is not None and bp_y is not None:
                d_Xs.append(d_y, sgd=sgd)
            else:
                d_Xs.append(None)
        return d_Xs
    return ys, foreach_bwd
model = wrap(foreach_fwd, layer)
return model




def build_text_classifier(nr_class, width=64, **cfg):
nr_vector = cfg.get('nr_vector', 5000)
with Model.define_operators({'>>': chain, '+': add, '|': concatenate,
                             '**': clone}):
    if cfg.get('low_data'):
        model = (
            SpacyVectors
            >> flatten_add_lengths
            >> with_getitem(0,
                Affine(width, 300)
            )

Note that there are really two models defined here: a small model for learning quickly, and then a larger model for when you have more examples. In each model, the last weight layer is an Affine layer initialized to zero, with no dropout. The number of output neurons here matches the number of classes being predicted. Normally a softmax transform would apply across all of the classes, so that the scores sum to 1. We instead perform an elementwise logistic transform, and interpret each score >= t as a prediction of True. I suggest t=0.5 is usually sensible.

2. Experimental evaluations

We plan to organise some experiments once the system is more stable — we don’t want to run the evaluation now and then have it invalidated by the next round of changes.

I think it’s important to make the experiment very directly evaluate the system being discussed. I’m always frustrated when tools or products claim “scientific” support from studies that address very different experimental setups from the tool itself. If we’re discussing usability, I don’t expect to see many linear relationships, which really limits the generality of any finding.

3. Imbalanced classes

The active learning should work really well for imbalanced classes. However it’s important that it sees some positive examples at the start. If you have a look at Ines’s tutorial, you’ll see how to encourage that by first building a terminology list, and using that to help bootstrap the initial classifier.

kwtrnka · September 20, 2017, 12:49am

So there’s a single model with logistic output? When only one class is annotated, do you only backprop the error from that one unit?

So say you annotate car=no but truck=? and bike=? would the target be like
[0, NaN, NaN]

or
[0, 0.5, 0.5]

or
[0, 0, 0]

honnibal · September 20, 2017, 10:51am

Now that I’m looking at it, the model might benefit from using the range [-1., 1.] instead of [0., 1.]. Not sure whether I’ve tried that.

Let’s say there’s only one class, is_vehicle. If you have:

{'text': 'car', 'label': 'is_vehicle', 'answer': 'reject'},
{'text': 'truck', 'label': 'is_vehicle', 'answer': 'ignore'},
{'text': 'bike', 'label': 'is_vehicle', 'answer': 'ignore'}

The target will be [0.0], because the ignores are filtered out before the update is performed on the batch. If you have multiple classes:

{'text': 'car', 'label': 'is_vehicle', 'answer': 'reject'},
{'text': 'truck', 'label': 'is_vehicle', 'answer': 'ignore'},
{'text': 'bike', 'label': 'uses_road', 'answer': 'accept'}

The gradients will be zeroed for classes for which no feedback is provided. So you’ll get:

# Output scores:
'car': {'is_vehicle': 0.7, 'uses_road': 0.9},
'truck': {'is_vehicle': 0.6, 'uses_road': 0.87},
'bike': {'is_vehicle': 0.93, 'uses_road': 0.2}

# Gradient

# Output scores:
'car': {'is_vehicle': -0.3, 'uses_road': 0.0},
'bike': {'is_vehicle': 0.0, 'uses_road': -0.8}

…But when I went to the implementation to link it:

github.com

explosion/spaCy/blob/develop/spacy/pipeline.pyx#L647


    with p.open('rb') as file_:
        self.model.from_bytes(file_.read())


def load_tag_map(p):
    with p.open('rb') as file_:
        tag_map = msgpack.loads(file_.read(), encoding='utf8')
    self.vocab.morphology = Morphology(
        self.vocab.strings, tag_map=tag_map,
        lemmatizer=self.vocab.morphology.lemmatizer,
        exc=self.vocab.morphology.exc)


deserialize = OrderedDict((
    ('cfg', lambda p: self.cfg.update(_load_cfg(p))),
    ('vocab', lambda p: self.vocab.from_disk(p)),
    ('tag_map', load_tag_map),
    ('model', load_model),
))
util.from_disk(path, deserialize, exclude)
return self

This doesn’t look correct. The gradient looks wrong for the missing values.

plusepsilon · May 22, 2018, 5:45pm

Hi Matthew,

Are the gradient issues sorted out for the missing values? (ner, tagger, and textcat)

If I were to do this in PyTorch, the strategy would be to add a zero mask where the values are missing? (instead of futzing with zero-ing gradients)

honnibal · May 22, 2018, 7:20pm

I did fix that bug in the textcat, yes: https://github.com/explosion/spaCy/blob/develop/spacy/pipeline.pyx#L936 . It should be working in the tagger too.

I recently made improvements to the way missing values are handled in the parser and NER as well, but I’m not sure they’ll be relevant to Prodigy. They’re on the develop branch, and will be released into the forth-coming v2.1.0a0, which will be published on spacy-nightly.

plusepsilon · May 25, 2018, 4:23pm

With the spaCy update, would it possible to pass in REJECT samples in the nlp.update API for the NER model?

github.com

explosion/spacy/blob/master/examples/training/train_ner.py#L66


        ner.add_label(ent[2])


# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes):  # only train NER
    optimizer = nlp.begin_training()
    for itn in range(n_iter):
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text, annotations in TRAIN_DATA:
            nlp.update(
                [text],  # batch of texts
                [annotations],  # batch of annotations
                drop=0.5,  # dropout - make it harder to memorise data
                sgd=optimizer,  # callable to update weights
                losses=losses)
        print(losses)


# test the trained model
for text, _ in TRAIN_DATA:
    doc = nlp(text)

Topic		Replies	Views
textcat.teach for multi-class classification textcat	3	513	June 19, 2023
Textcat model with multiple classes usage , textcat	5	1534	November 1, 2019
Interface error with text cat.teach? usage , textcat	1	583	March 20, 2018
Custom multilabel categorization recipe textcat , spacy , front-end , solved	12	6274	August 3, 2020
Multi-class textcat usage , textcat	1	1021	March 27, 2018

How to do multiclass textcat?

1. How the multi-class classification works

2. Experimental evaluations

3. Imbalanced classes

Related topics