Preannotation with

I want to use my own low precision annotators that are similar in principle to Matchers but slightly more involved than just string matching. My plan is to use these to generate tasks that have {“text”: xxxx, “spans”: …} with spans filled by the pre-annotation. I have a few questions about this:

1)Multiple labels per text are possible, but the documentation seems to prefer one label/entity per task so that the binary tech UI can be used. Would it be possible to use the manual UI but with model in the loop? The interest in the manual UI would be to correct pre-annotated labels (remove and add)

  1. It’s not clear at what point the model in the loop starts making suggestions. Does it just start ignoring the pre-filled spans and inject it’s own?

This is kinda tricky and I'm not 100% sure it'd make much sense. The binary workflow with a model in the loop is really designed to give feedback on one particular prediction and update the model accordingly. To create better annotations, Prodigy uses beam search to get all possible analyses of the given input text and focus on the ones that give you the best possible gradient for updating.

If the texts already have pre-annotated spans, where do the model's predictions come in? And if you're always correcting the examples manually, the model only ever gets to see perfect, gold-standard annotations. Using a model in the loop that you're updating only really makes sense if you want to give feedback on the model's predictions. If you just want to stream in your pre-annotated examples and correct them, it seems like it'd be better to use ner.manual, label everything and then train a model afterwards. You can still use that pre-trained model with ner.teach afterwards to fine-tune its predictions with binary feedback.

Active learning-powered recipes like ner.teach can't respect pre-defined spans, for the reasons mentioned above. If you're training a new category and you're bootstrapping with patterns, the model will start predicting something pretty much immediately, but whether you get to see the predictions depends on whether there are pattern matches available, how the predictions are scored etc.