(Zain Muhammad) #1


Here I am getting all the annotations of a dataset from a database, Is there any way I can only get annotations of a particular entity labelled by the person. for example I want to display just all the ‘ORG’ in a dataset ?
And one thing more which I think may not be possible but can I get entity and the word which is annotated together? for example if youtube is annotated as ‘ORG’, so I can get them both together rather than the start token and end token.

(Ines Montani) #2

Sure – you have the raw data right there, and all the information you’re looking for is available in that data. So you can extract that however you want.

For example:

only_orgs = []
for eg in examples:
    org_spans = [span for span in eg["spans"] if span["label"] == "ORG"]
    if org_spans:  # only include example if ORG spans are available

Your annotations also include the character offsets into the text as the start and end of each span. For example, the first span in your example defines "start": 0, "end": 12. So the characters in the text this refers to is text[0:12].

for eg in examples:
    text = eg["text"]
    for span in eg["spans"]:
        start = span["start"]
        end = span["end"]
        print(text[start:end], span["label"])

(Zain Muhammad) #3

Thanks Alot!