Hi @ale,
Re disabling tokens in REL UI
It is possible to disable certain tokens from selection via --disable-patterns
argument to rel.manual
. This argument takes a jsonl file with Matcher patterns of tokens to be disabled. You can see an example of how it is used here: Dependencies and Relations · Prodigy · An annotation tool for AI, Machine Learning & NLP
In your case you would want to enable only the tokens that are also entities so the pattern would look something like:
{"pattern": [{"_": {"label": {"NOT_IN": ["ORG"]}}}]}
This pattern would enable only the tokens with ORG label. However, as I was testing out this rule, I realized there's a small bug in our relations.manual
recipe in that the default label for each token is None
while it should a string for the Matcher to work.
While we are preparing the fix, you can also work around it by making a small modification to your local copy of the recipe, which should be located at {path_to_your_prodigy_instalation}/prodigy/recipes/rel.py
(to find the path to your prodigy installation run prodigy stats
)
In this file in line 151 change None
to "None" so:
Token.set_extension("label", default="None", force=True)
With this change the pattern above should work as expected.
Re customizing the color
You can customize the color of the relation labels using custom_theme
config setting. In the case of rel.manual
it will make sense for the relations labels as the NER labels i.e. the bounding boxes are by design in gray to give prominence to the relation arrows in the UI so we are currently not exposing the setting to modify the color of the entity bounding box.
Re decreasing the whitespace between the tokens
This operation is best done as part of preprocessing i.e. before annotating named entities and relations. This is because all annotations (NER and REL) are bound to the token offsets including white space tokens. It's tricky to annotate with one type of tokenization (e.g. NER) and then modify the tokenization (by removing whitespaces) after. Alternatively, you could submit a custom tokenizer as spaCy pipeline if you want to modify the default tokenization from within a recipe.