Customizing rel.manual interface

Hi,

My team will be annotating paragraphs for relation extraction of inter- and cross-sentence relationships. It seems that the approach we will take is to have text annotated for entities first, and then use rel.manual to annotate relationships between such entities.

The interface of rel.manual is presenting some difficulties for this task though. Since tokens appear to have more whitespace separating them and all tokens have bounding boxes, the text becomes hard to read for our annotators. I can see how the default interface is useful in other cases, like single sentences. However, is there a way to customize how tokens are displayed in rel.manual?

Specifically, I would like to customize the color and presence of bounding boxes for tokens (keep them for entities but remove them for other tokens) and decrease the whitespace between tokens in a sentence so that the text displays similar to a normal paragraph (I see whitespace between lines can be changed with relationHeightWrap).

Thanks

Hi @ale,

Re disabling tokens in REL UI
It is possible to disable certain tokens from selection via --disable-patterns argument to rel.manual. This argument takes a jsonl file with Matcher patterns of tokens to be disabled. You can see an example of how it is used here: Dependencies and Relations · Prodigy · An annotation tool for AI, Machine Learning & NLP
In your case you would want to enable only the tokens that are also entities so the pattern would look something like:
{"pattern": [{"_": {"label": {"NOT_IN": ["ORG"]}}}]}
This pattern would enable only the tokens with ORG label. However, as I was testing out this rule, I realized there's a small bug in our relations.manual recipe in that the default label for each token is None while it should a string for the Matcher to work.
While we are preparing the fix, you can also work around it by making a small modification to your local copy of the recipe, which should be located at {path_to_your_prodigy_instalation}/prodigy/recipes/rel.py (to find the path to your prodigy installation run prodigy stats)
In this file in line 151 change None to "None" so:
Token.set_extension("label", default="None", force=True)
With this change the pattern above should work as expected.

Re customizing the color
You can customize the color of the relation labels using custom_theme config setting. In the case of rel.manual it will make sense for the relations labels as the NER labels i.e. the bounding boxes are by design in gray to give prominence to the relation arrows in the UI so we are currently not exposing the setting to modify the color of the entity bounding box.

Re decreasing the whitespace between the tokens
This operation is best done as part of preprocessing i.e. before annotating named entities and relations. This is because all annotations (NER and REL) are bound to the token offsets including white space tokens. It's tricky to annotate with one type of tokenization (e.g. NER) and then modify the tokenization (by removing whitespaces) after. Alternatively, you could submit a custom tokenizer as spaCy pipeline if you want to modify the default tokenization from within a recipe.

Thanks for your answer @magdaaniol.

Re decreasing the whitespace between the tokens: I understand that some of the whitespace and tokenization play a role. However, I was wondering if the front end rendering could be customized to decrease the visual space between tokens (even if there is no setting for this, perhaps using Javascript). The reason is that it is hard to read the text with all the bounding boxes and space between tokens. Since our annotators will be annotating both NER and RE at the same time, and the texts are dense in content, we would prefer to customize the interface to make it more similar to what is displayed in ner.manual, that is, display it as normal text.

  • More specifically, is there a way to remove bounding boxes on non-entity tokens?
  • Is it possible to decrease the padding of the bounding boxes and decrease the visual space between tokens (just the visual space, not the actual whitespace in the original text)?

If the options above are not possible, is there a way to modify the interface to display the normal text above and the interactive interface of rel.manual? This would help annotators read the normal text and then annotate it in the interface below it. Hopefully this is not confusing but I can try to explain it better.

Thanks!

Hi @ale,

Before moving on to details on fallback solutions, I just wanted to check - have you not managed to successfully disable tokens using the information I provided in the previous post? I believe that would make a lot of difference and I sense that you agree it would be the ideal solution.
I also understand now, it's not a problem of having spurious spaces in the source text.
If possible perhaps you could share a screenshot of how it currently renders for you? Ideally with "show_whitespace": true added to your .prodigy.json so that we can see how many whitespace tokens there actually are.