Splitting labels in NER

In the process of reviewing the labels of a significant number of documents, that I actually started by using PATTERNS, I realized that I should split one of the labels in two.

The process that I am considering is related to this one Renaming labels in NER

prodigy->substitute the label manually -> prodigy

The label to be changed follows a very simple structure and can be easily done by hand.

As I am reviewing the downloaded json file
prodigy db-out MAStipif2 >./prodigy.jsonl
my only concern is that it has a code for the pattern.

I am guessing that I will get the assignment to the original pattern, and this is for presentation purpose in the Prodigy GUI, but will it have impact down the line if I change the label and do not change the pattern code in the database export file?
Thanks

Hi! By "code for the pattern", do you mean the pattern IDs included in the "meta"? If so, this doesn't really matter – the information is just there for reference to display in the UI, and so you know which pattern match was used to produce the suggested annotations. It's not used for anything. If you think it might be confusing later on, you could also just remove it from the data (should be very easy to do programmatically).

1 Like