Change loss in Custom Model

Hi all,

When I train a custom Spacy model, where could I change the loss, which matters a lot for an algorithm. I noticed that in /spacy/pipeline.pyx, the loss is calculated like this:

def get_loss(self, docs, golds, scores):
    truths = numpy.zeros((len(golds), len(self.labels)), dtype='f')
    not_missing = numpy.ones((len(golds), len(self.labels)), dtype='f')
    for i, gold in enumerate(golds):
        for j, label in enumerate(self.labels):
            if label in gold.cats:
                truths[i, j] = gold.cats[label]
                not_missing[i, j] = 0.
    truths = self.model.ops.asarray(truths)
    not_missing = self.model.ops.asarray(not_missing)
    d_scores = (scores-truths) / scores.shape[0]
    d_scores *= not_missing
    mean_square_error = ((scores-truths)**2).sum(axis=1).mean()
    return mean_square_error, d_scores

This is just mean_square_error. And the loss for updating gradient is the d_scores, which is only the subtraction between scores and truths. Can I customize my own loss?

If you subclass the component, you should be able to override the get_loss method in your subclass, to implement your custom loss. You would then just need to make sure it’s your subclass that Prodigy is using for the text classification. The easiest thing to do would be to replace the Language.factories['textcat'] entry like this:

from spacy.language import Language
from spacy.pipeline import TextCategorizer

class CustomLossTextCategorizer(TextCategorizer):
    def get_loss(self, docs, golds, scores):
        # Implement your custom logic here.
        return loss, d_scores

Language.factories['textcat'] = lambda nlp, **kwargs: CustomLossTextClassifier(nlp.vocab, **kwargs)

After replacing the factories entry, when a call is made to nlp.create_pipe('textcat'), you should see an instance of your model created, instead of the default class. Check the entries of nlp.pipeline to make sure it’s correct. If necessary you can also modify the pipeline explicitly, after everything is loaded.

Hope that helps! If you have a lot of success with a custom loss, I’d definitely be interested to hear it. I can imagine that instance weighting might help, for instance.

Note that there are a few details of the optimiser that might make experimenting with the loss a bit confusing. One is that the gradients for each batch are re-scaled to have a norm of 1 before the updates are calculated. This might interfere with your experiments. For instance, if you set an enormous scale for some instances to test whether your thing is working, the rescaling might mean you don’t see much effect of that. You can turn this off by setting the max_grad_norm parameter in the optimizer.

I’m sorry you’ll have to dig through the source somewhat on these things — I hope we can have these more advanced internals of spaCy better documented in future.

1 Like