Using ner.manual on HTML Input

You can always write a custom recipe, but the easiest way would be to run the built-in mark recipe, which will simply stream in whatever comes in and render it with a given interface. For example:

prodigy mark your_dataset your_data.jsonl --view-id html

You could also experiment with different ways of breaking down the annotation into smaller binary decisions. For example, maybe you're able to extract candidates for the highlighted spans programmatically, e.g. via matcher rules or regular expressions. Even if there are many false positives, you'll be able to click through them very quickly and you'd probably still be faster than if you selected them manually. It'll also give you more consistent annotations, and you'll be able to spot potential problems or difficulties in the data that might also be tricky for a statistical model to learn later on.

Btw, if you want to try out more dynamic interfaces, there's also experimental support for custom JavaScript – see this thread for discussion and examples. I'd still recommend to use it sparingly, though – it's very tempting to overcomplicate the task, but we've found that you collect much better data if you're able to break the task down to a series of simple decisions, rather than fewer complex ones.

Your example should work out-of-the-box! Prodigy will preserve any additional properties in the data and simply pass them through. You can add your own custom property on the root object, or even on the individual tokens etc. The data can be anything, as long as it's JSON-serializable :slightly_smiling_face: