I currently have a text classification recipe that uses the Spacy RulesMatcher to identify and highlight spans of text. The point of the highlighting is to flag keywords to help annotators assign labels to sentences. An example of the UI is here:
After exporting the data with
db-out, I see that the
accept field in the JSONL seems to contain both the matcher name (eg. maturity_date) as well as the annotator's labels (eg. Maturity Date). This means that the
accept field will often have duplicates such as ('Maturity Date' and 'maturity_date').
Is this expected behaviour? And if so, is there a recommended way to isolate only the annotator's label without the matcher spans?
Thanks a lot