In a typical training scenario, you're updating a model with examples and the correct answer – e.g. with a text and the entities in it. In some cases you may also have partial annotations: you know some entities but not all.
Prodigy's active learning recipes like ner.teach
also let you collect binary yes/no decisions. The data you create here is different again: for some spans, you know that they are entities, because you accepted them. For the ones you rejected, you know that they're not of type X – but they could potentially be something else. This requires a different way of updating the model: you want to update with the positive examples where you know the answer, and proportionally with the "negative" example where you only know that a certain label doesn't apply. That's the type of training the --binary
flag enables.
Fine-grained per-label accuracy is a very new feature in spaCy, so we only just added that to the regular training recipe in Prodigy in v1.9. The binary training requires very different evaluation (for the reasons explained above), so if we wanted more fine-grainde accuracy, we'd have to come up with our own implementation and logic for it. It's also not clear if it translates well and makes it easier to reason about the results.