Masking Form Prompts?

Hi All,
We are a health care agency and we have about 20 different key forms represented in our medical records. I have them all accessible as text in a SQL database and also as indexed text supporting faceted search (using Azure Search).

We've been exploring the use of custom NER labeling as a way to provide structure in our unstructured text. We have a variety of labeling schemas that we are interested in but are starting with suicide risk as the highest valued set of labels. We've focused on two of our 20 note types which are mostly free form text written by clinical staff. The project is showing signs of success (thank you spaCy and Prodigy!) and staff are asking if we can start labeling some of our assessment note types which are quite structured with small areas of free text entry - the opposite of the note types we started with.

The question I'm looking for help answering is how best to 'mask' form prompts so our NER custom labels don't label the form prompts but instead stay focused on the interspersed free clinical text.

I've searched the forum for clues and while I'm sure this has been asked before, I wasn't successful at finding the discussion. Any thoughts or links to prior threads would be welcomed.


So if I understand this correctly, you want to display additional text (in this case, the form prompts) with the text to be annotated? But you don't want that text to be part of the "annotatable" content, right?

There are several ways you could solve this. One would be to put together a custom interface using blocks. The first block could be a simple html block that shows the prompt, maybe with some formatting to make it stand out. The second block could then be the ner_manual block.

blocks = [
    {"view_id": "html", "html_template": "<strong>{{prompt}}</strong>"}
    {"view_id": "ner_manual"},

You could then feed in data in the following format:

{"prompt": "This is the prompt:", "text": "This is the rest"}

Another option would be to set "disabled": true on the tokens of the prompts in the "tokens" property of the ner_manual data. This will show them slightly greyed out and will make them unselectable (and any spans containing those tokens will be invalid). So the text will be there, but it can't be selected and will look different. This feature is also useful if you have tokens somewhere within the text that you know should never be part of an entity – either because it's additional meta, or because you just know that it's never an entity (like newlines, which Prodigy disables by default).