Many thanks for your responses, for test,
I read my raw data by prodigy and also add arbitrary labels ( I defined by --label in ner.manual) to that. It shows I am able to do annotation on raw data for an arbitrary set of labels. here I used a json file containing the sentence of my data and then converted to jsonl and every thing was ok.
Now again to my question, since I did the special tokenization on my data by regex and also annotated the data by regex,
I have the annotated data in this format in python:
[[(‘Therefore’, ‘None’),
(‘CD’, ‘GEOM’),
(‘being’, ‘None’),
(‘dropped’, ‘None’),
(‘perpendicular’, ‘None’),
(‘to’, ‘None’),
(‘AB’, ‘GEOM’),
(‘where’, ‘None’),
(‘AD’, ‘GEOM’),
(‘which’, ‘None’),
(‘is’, ‘None’),
(‘half’, ‘None’),
(‘AB’, ‘GEOM’),
(‘is’, ‘None’),
(‘1000’, ‘NUM’),
(‘AC’, ‘GEOM’),
(‘will’, ‘None’),
(‘be’, ‘None’),
(‘3333⅓’, ‘NUM’)],
[(‘Looking’, ‘None’),
(‘this’, ‘None’),
(‘up’, ‘None’),
(‘in’, ‘None’),
(‘a’, ‘None’),
(‘table’, ‘None’),
(‘of’, ‘None’),
(‘secants’, ‘None’),
(‘we’, ‘None’),
(‘find’, ‘None’),
(‘the’, ‘None’),
(‘angles’, ‘None’),
(‘CAD’, ‘GEOM’),
(‘and’, ‘None’),
(‘CBD’, ‘GEOM’),
(‘to’, ‘None’),
(‘be’, ‘None’),
(“72° 33’”, ‘COORD’)],
[(‘So’, ‘None’),
(‘also’, ‘None’),
(‘at’, ‘None’),
(‘16°’, ‘ANG’),
(‘or’, ‘None’),
(‘17°’, ‘ANG’),
(‘Aquarius’, ‘None’),
(‘with’, ‘None’),
(‘AB’, ‘GEOM’),
(‘1000’, ‘NUM’),
(‘AC’, ‘GEOM’),
(‘is’, ‘None’),
(‘1375’, ‘NUM’),
(‘so’, ‘None’),
(‘if’, ‘None’),
(‘AD’, ‘GEOM’),
(‘1000’, ‘NUM’),
(‘AC’, ‘GEOM’),
(‘is’, ‘None’),
(‘2750’, ‘NUM’),
(‘showing’, ‘None’),
(“68° 40’”, ‘COORD’),
(‘in’, ‘None’),
(‘the’, ‘None’),
(‘table’, ‘None’),
(‘of’, ‘None’),
(‘secants’, ‘None’)]]
do you have any suggestion that how can I proceed form here? probably I should make a same format that you mentioned , is there any option that I can use prodigy in any thought
many thanks