Using ner.manual on HTML Input

ines · October 12, 2018, 2:49pm

You can always write a custom recipe, but the easiest way would be to run the built-in mark recipe, which will simply stream in whatever comes in and render it with a given interface. For example:

prodigy mark your_dataset your_data.jsonl --view-id html

You could also experiment with different ways of breaking down the annotation into smaller binary decisions. For example, maybe you're able to extract candidates for the highlighted spans programmatically, e.g. via matcher rules or regular expressions. Even if there are many false positives, you'll be able to click through them very quickly and you'd probably still be faster than if you selected them manually. It'll also give you more consistent annotations, and you'll be able to spot potential problems or difficulties in the data that might also be tricky for a statistical model to learn later on.

Btw, if you want to try out more dynamic interfaces, there's also experimental support for custom JavaScript – see this thread for discussion and examples. I'd still recommend to use it sparingly, though – it's very tempting to overcomplicate the task, but we've found that you collect much better data if you're able to break the task down to a series of simple decisions, rather than fewer complex ones.

Your example should work out-of-the-box! Prodigy will preserve any additional properties in the data and simply pass them through. You can add your own custom property on the root object, or even on the individual tokens etc. The data can be anything, as long as it's JSON-serializable

Topic		Replies	Views
NER manual on view id HTML usage , ner , custom	1	872	May 16, 2019
Does Prodigy support HTML annotation for NER usage , ner	3	1213	December 1, 2022
Is there any way to annotate text with HTML tags in it ? ner , spacy	1	31	February 25, 2025
Annotate Raw HTML usage , front-end , solved	2	1060	January 23, 2020
HTML to jsonl and NER task workflow usage , ner , solved	6	851	July 19, 2019

Using ner.manual on HTML Input

Related topics