Dear Ines,
Moin! as mentioned I have made a refined named entity recognition. I made it entity by entity then I merged all together. before merging the format of my data is
{"text":"Therefore, as 100,000 is to 70,711, so is 7560 to 5346, the sine of the arc of 3\u00b0 4' 52\", which is CEB.","spans":[{"start":79,"end":88,"label":"LONG"}]}
{"text":"Subtracting this from 45\u00b0 leaves CBE, 41\u00b0 55' 8\", whose half is 20\u00b0 57' 34\", the tangent to which arc is 38,304.","spans":[{"start":38,"end":48,"label":"LONG"},{"start":64,"end":75,"label":"LONG"}]}
Means:
{"text":"text","spans":[{"start":79,"end":88,"label":"LABEL"}]}
but after merging I can not have the same fomat
i have something like this:
{"text":"Therefore, as 100,000 is to 70,711, so is 7560 to 5346, the sine of the arc of 3\u00b0 4' 52\", which is CEB.","spans":[{"start":14,"end":21,"label":"PARA","answer":"accept"},{"start":79,"end":88,"label":"LONG","answer":"accept"}],"_input_hash":16621573,"_task_hash":-706401107,"answer":"accept"}
which means
I feel that I should somehow do
db-in
maybe before the modification of annotation ...
I have merged my data in this way
from prodigy.components.db import connect
from prodigy.models.ner import merge_spans
db = connect() # connect to the DB using the prodigy.json settings
datasets = ['ner_date_v02','ner_time_v02','ner_para_v05','ner_astr_v03','ner_long_v10','ner_star_v02','ner_plan_v02','ner_name_v02','ner_geom_v01']
examples = []
for dataset in datasets:
examples += db.get_dataset(dataset) # get examples from the database
merged_examples = merge_spans(examples)
from prodigy import set_hashes
merged_examples = [set_hashes(eg, overwrite=True) for eg in merged_examples]
db.add_dataset('data_merged_v12')
db.add_examples(merged_examples, datasets=['data_merged_v12'])
the reason of this setup is that I want to calculate the metric of NER for each entity based on this