displaying annotated dataset

usage
ner
database
solved

(Zain Muhammad) #1

image


Q1)
Here I am getting all the annotations of a dataset from a database, Is there any way I can only get annotations of a particular entity labelled by the person. for example I want to display just all the ‘ORG’ in a dataset ?
Q2)
And one thing more which I think may not be possible but can I get entity and the word which is annotated together? for example if youtube is annotated as ‘ORG’, so I can get them both together rather than the start token and end token.


(Ines Montani) #2

Sure – you have the raw data right there, and all the information you’re looking for is available in that data. So you can extract that however you want.

For example:

only_orgs = []
for eg in examples:
    org_spans = [span for span in eg["spans"] if span["label"] == "ORG"]
    if org_spans:  # only include example if ORG spans are available
        only_orgs.append(eg) 

Your annotations also include the character offsets into the text as the start and end of each span. For example, the first span in your example defines "start": 0, "end": 12. So the characters in the text this refers to is text[0:12].

for eg in examples:
    text = eg["text"]
    for span in eg["spans"]:
        start = span["start"]
        end = span["end"]
        print(text[start:end], span["label"])

(Zain Muhammad) #3

Thanks Alot!