Is it possible to merge 2 label into 1? & How to add a corpus into specific label

akbarnotoponco · November 10, 2019, 2:41pm

Hi, i accidentally create two label with same type of value when continuing my previous process. Let's say it is "SERVICE" and "SERVICES". Is it possible to merge its label's data into 1 label ? Or i just need some script to replace all "services" word to service after exporting ? but i think it'll affect the data as well, i'm not sure though

and let's say i also has a "LOCATION" label, and i want to add some location corpus into it that i got from the client, how to accomplish this?

Thank you! sorry for my bad English.

ines · November 11, 2019, 12:44pm

Yes, I would recommend to export the data with db-out, open it in your editor and replace "SERVICE" with "SERVICES". It's just the label name, so it shouldn't cause any problems with the data. When you're done, import the edited data to a new dataset.

(Prodigy doesn't allow just changing an existing set, because your data would then be out-of-sync with what the annotator saw. It would also make it easier to accidentally lose data, which is bad. So if you want to edit data manually, you'll need to export and import to a new set.)

Is the location corpus from your client already annotated? If so, you can convert it to Prodigy's format and then import it to the dataset using the db-in command. You can find more details on the JSON format in the "Annotation task formats" section of your PRODIGY_README.html.

It should hopefully be easy to write a small script that converts your data. For NER, you need the original text, and the start/end character offsets and label for each entity. For example:

{
    "text": "Apple updates its analytics service with new metrics",
    "spans": [{ "start": 0, "end": 5, "label": "ORG" }]
}

Maybe try it with a new dataset and a small sample first, to make sure it all works correctly

akbarnotoponco · November 12, 2019, 1:14am

Unfortunately, it is not, it's raw. List names of the location of my country... so i guess the solution is just to create a script to my location list into prodigy json format right?

ines · November 12, 2019, 11:21am

Ah okay! So you have raw text plus a location list? Then you could have a small script that loads the raw data and adds the spans by matching the locations (using regular expressions or something like spaCy's rule-based matcher).

You can then either import it to the dataset directly, or load the data with ner.manual so you can correct the spans first. (This depends on the quality of the location list and the data.)

akbarnotoponco · November 13, 2019, 6:15pm

Ho can i insert it into the dataset directly because it's so many , and it just contain the location per line in a file

Topic		Replies	Views
Combining two separate datasets into a single trained model ner	2	183	December 6, 2023
Merging annotations from different datasets usage , ner , database , solved	12	5534	May 28, 2019
Multiple models or one single model? usage , ner	2	417	February 22, 2021
Adding new label usage , ner	5	1136	November 8, 2021
"evolving" an annotation dataset by adding labels? solved	2	157	October 30, 2023

Is it possible to merge 2 label into 1? & How to add a corpus into specific label

Related Topics