how to write a model.update() function

Really glad to help you do this. I do hope we’ll be able to add some Chinese models for spaCy soon too.

You should actually be able to use the built-in recipes for NER and textcat, even with Chinese. But to answer your question about the update() function: there’s some documentation in the PRODIGY_README.html file that you might want to look at. The signature of the function is very simple. Example:


examples = [{"text": "some text", "spans": [{"start":0, "end":4, "label": "DT"}], "answer": "accept"}]

def update(answered_examples):
    loss = 0.0
    return loss

update(examples)

The update() function must take a minibatch of dict objects, where each dict should have a key answer, with value one of "accept", "reject" or "ignore". For the NER recipe the example should have a key spans, which should be a list of dicts. Each span dict should have the keys "start", "end" and "label", where start and end are character offsets, and label is a string.

To make the update function work well, there are a few things to consider. First, in the NER update, you’re not going to have complete annotations for the inputs. You might only have one entity for the sentence. You also need a way to learn from "reject" examples. If the answer is "reject", it’s easy to calculate the gradient of the error for the class you got wrong, but for other classes you probably want to zero the gradient. I’m not sure what the neatest way to express this in Tensorflow or PyTorch would be. Personally I wouldn’t bother trying to express it as a loss — I would just calculate the gradient and pass that in.