active learning and update function

Hi Team Prodigy,

We had some questions about the correct way to customize Prodigy's active learning features for our use case.

In order to minimize annotation time, we tried to create an NER recipe that combines ner.teach and ner.correct. At a high-level, our goal is to present the annotator with abstracts that are tagged by the model, let the annotator update the annotations using the ner.correct/ner.manual interface, and then use active learning to update the model similar to how it is done in ner.teach. In other words, we want to use active learning, but instead of the binary annotation interface used in ner.teach, we want to use the annotation interface in ner.manual/ner.correct, because we believe it is possibly a more efficient use of the annotator's time.

The annotated examples are passed onto either the update function of the NER model that is being retrained or the update function returned by combine_models function that combines the NER model with the PatternMatcher.

Our questions are:

  • What is the format of input arguments that we should pass onto the update functions above in order to ensure that the models are updated in the best possible way? Could you provide us with the function signature as well as either the function code or a sufficient explanation of it?
  • For example, is it ok if we just pass to the update functions accepts but never rejects? Is it ok if we pass a list of text and associated metadata where each dictionary in the list itself includes multiple spans? Should we transform this input in any way in order to use the update function correctly? Maybe we should transform the inputs into a binary format of accept and reject and make sure each example has exactly one span? We have a more detailed example of what we pass as input below.
  • Is there a way to empirically check whether the model is being updated correctly? Two quantitative values we observe are the loss returned by the update function as well as a logging statement which says things like PROGRESS: Estimating progress of 0.1667. Currently, we do not understand how the loss and progress values are related to each other. Can you help us understand that, and more generally, help us find a quantitative way of verifying whether we are using active learning successfully?
  • Does the number of examples passed onto the update function matter? What are best practices in regards to that?

Currently, we pass to the update function a list, where each element of the list has the keys text, spans, and an answer. spans itself is a list where each element is a dictionary as follows:

{'start': 74,
'end': 83,
'token_start': 13,
'token_end': 13,
'label': 'PROTEIN'}

thank you for your help,
berk

Hi! The update callback is called by Prodigy whenever a batch of examples comes back from the web app. It receives a list of annotated examples in Prodigy's JSON format – so basically whatever was sent out via the stream, with the added annotations (e.g. manually added spans) and the "answer". See here for the API docs: Custom Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

Prodigy's binary annotation recipes use a more complex annotation model (e.g. the EntityRecognizer class implemented by Prodigy) to handle updating a spaCy model from binary yes/no annotations. See my slides here for details on why this is slightly more complex.

If you're annotating manually, that shouldn't be necessary – or at least, you should be able to assume that the examples you're getting back are complete and corrected annotations. So you'll be able to just call nlp.update on your spaCy model directly. Pretty much exactly like you would train the model from scratch: https://v2.spacy.io/usage/training#training-simple-style (just with nlp.resume_training instead of begin_training – otherwise you'd be resetting the weights).

The loss is definitely a good one to track, and it's also returned by nlp.update. If you want to test the whole end-to-end process, you could als simulate an annotation session: call nlp.update with batches of examples, and then keep evaluating the predictions after multiple updates.

For the active learning recipes, Prodigy uses the loss to estimate the progress, to basically give you a rough idea of when to stop annotating (when the loss might hit zero and there's nothing left to learn). You could also implement your own progress callback that receives whatever is returned from your update callback (e.g. the loss) and returns a progress value. This could be based on how many examples are left, or a combination of that and the loss over time.

It kinda depends – this is the batch size used to update the model, so you typically want to find a good trade-off between large enough to be effective and small enough to be efficient. We've specifically optimised spaCy to be updatable with small batches to allow workflows like this, so the default batch size of 10 should work okay – but if you're working with other implementations or newer transformer-based pipelines, you might want to experiment with using larger batch sizes.