displaying annotated dataset

solved
usage
ner
database
(Zain Muhammad) #1

image


Q1)
Here I am getting all the annotations of a dataset from a database, Is there any way I can only get annotations of a particular entity labelled by the person. for example I want to display just all the ‘ORG’ in a dataset ?
Q2)
And one thing more which I think may not be possible but can I get entity and the word which is annotated together? for example if youtube is annotated as ‘ORG’, so I can get them both together rather than the start token and end token.

(Ines Montani) #2

Sure – you have the raw data right there, and all the information you’re looking for is available in that data. So you can extract that however you want.

For example:

only_orgs = []
for eg in examples:
    org_spans = [span for span in eg["spans"] if span["label"] == "ORG"]
    if org_spans:  # only include example if ORG spans are available
        only_orgs.append(eg) 

Your annotations also include the character offsets into the text as the start and end of each span. For example, the first span in your example defines "start": 0, "end": 12. So the characters in the text this refers to is text[0:12].

for eg in examples:
    text = eg["text"]
    for span in eg["spans"]:
        start = span["start"]
        end = span["end"]
        print(text[start:end], span["label"])
(Zain Muhammad) #3

Thanks Alot!

(Zain Muhammad) #4

for eg in ex:
text = eg[“text”]
for span in eg[“spans”]:
start = span[“start”]
end = span[“end”]
print(text[start:end]+"\t",span[“label”])

this is giving me just 1042 annoation although after addition of new entities

it is giving me below error after dsplaying 1042 annotations

KeyError Traceback (most recent call last)
in
1 for eg in ex:
2 text = eg[“text”]
----> 3 for span in eg[“spans”]:
4 start = span[“start”]
5 end = span[“end”]

KeyError: ‘spans’

(Ines Montani) #5

What are the new annotations you added to the set and how did you create them? It looks like the 1042 annotations you have in your set are regular named entity annotations with a "spans" property, but the other ones are not.

You can run the db-out command to export the dataset and inspect the annotations. Maybe you accidentally added text classification annotations or something else to the same set?

(Zain Muhammad) #6

No i was following the same procedure while adding annotation, and I have tried this with another dataset too, and it is also displaying only 1042 annotation, is there any limitations that no more than 1042 annotation will be displayed?

(Ines Montani) #7

No, that shouldn’t be a problem – datasets often have thousands or tens of thousands of examples. So unless you’re running out of memory or something (which is unlikely), there’s no reason why it would randomly stop at 1042 examples.

But are you always seeing the KeyError? If so, there’s definitely a problem with at least one of the examples, and at least one of the examples in your datasets doesn’t have a "spans" property (which shouldn’t be the case if they’re all named entity annotations – even examples with no entities should have "spans": [] set).

If you just want to work around this for now, you could do for span in eg.get("spans", []) in your code, to make the spans default to an empty list and essentially skip examples that do not have it. You might still want to investigate the examples it fails on, though, and check how they ended up in the dataset.

(Zain Muhammad) #8

Thanks a ton! now I am getting all the entities!

1 Like