How are answers from active learning used in training

I have what I think is a basic question (if this has been addressed on the forum pls point, poking around I couldn't find an answer).

The annotations from active learning have a label and an answer. What role does the answer play in the supervision of the learner that prodigy invokes when it trains? I had assumed that all rejections are thrown away for training, but not sure given the accept/reject matrix that prodigy displays after training.

Here is how I invoke the teach and train phases:

prodigy textcat.teach direction en_vectors_web_lg data/filtered.jsonl -t2v vectors/lmao_vectors.bin -pt patterns/direction_patterns.jsonl -l BUY,SELL

prodigy textcat.batch-train direction en_vectors_web_lg -o models/direction_cat_model -n 10 -t2v vectors/lmao_vectors.bin -l SELL,BUY

(FYI, the lmao_vectors.bin file is the result of training the corpus using the en_vectors_web_lg)

and here are the results:

Loaded model en_vectors_web_lg
Using 20% of examples (471) for evaluation
Using 100% of remaining examples (1886) for training
Dropout: 0.2  Batch size: 10  Iterations: 10  

#            LOSS         F-SCORE      ACCURACY  
01           0.243        0.967        0.955                                    
02           0.064        0.972        0.962                                    
03           0.038        0.966        0.953                                    
04           0.031        0.971        0.960                                    
05           0.022        0.971        0.960                                    
06           0.031        0.966        0.953                                    
07           0.026        0.967        0.955                                    
08           0.048        0.972        0.962                                    
09           0.042        0.969        0.958                                    
10           0.038        0.967        0.955                                    

accept   accept   295
accept   reject   1  
reject   reject   136
reject   accept   16 

Correct     431
Incorrect   17

Baseline    0.35              
Precision   1.00              
Recall      0.95              
F-score     0.97              
Accuracy    0.96

In that context, how is the reported baseline calculated?

Much thanks

When you collect binary annotations, you'll only have incomplete information about the text – but Prodigy can still use that information to update the model accordingly. spaCy's models were specifically designed to be updated with sparse annotations (which is not always the case for NLP model implementations). The data you train on doesn't have to be the complete gold-standard, and we can update a model and move it in the right direction, even if all we know is "these tokens are not an org, but could be anything else" or "this text is not about buying, but could be about any of the other labels".

I'm showing some examples in my slides here: https://speakerdeck.com/inesmontani/belgium-nlp-meetup-rapid-nlp-annotation-through-binary-decisions-pattern-bootstrapping-and-active-learning?slide=12

It basically works like this: To update the model, we need the gradient of the loss function, which is calculated from the predicted distribution and the target distribution. If we don't know the full target distribution, and only that some labels are wrong, we can assign those a probability of 0, and then split the rest proportionally. So if we know that label A is wrong, and nothing about labels B and C, but the model predicted a much higher probability for B, we can reflect this in the update we make. That's essentially how Prodigy updates the model with binary annotations.

Here's an NER example that shows the calculation:

Hi Ines, thanks for your prompt reply (as always!) and the presentation...very clear.