displaying annotated dataset

Zainpann · March 12, 2019, 10:12am

Q1)
Here I am getting all the annotations of a dataset from a database, Is there any way I can only get annotations of a particular entity labelled by the person. for example I want to display just all the ‘ORG’ in a dataset ?
Q2)
And one thing more which I think may not be possible but can I get entity and the word which is annotated together? for example if youtube is annotated as ‘ORG’, so I can get them both together rather than the start token and end token.

ines · March 12, 2019, 10:45am

Sure – you have the raw data right there, and all the information you're looking for is available in that data. So you can extract that however you want.

For example:

only_orgs = []
for eg in examples:
    org_spans = [span for span in eg["spans"] if span["label"] == "ORG"]
    if org_spans:  # only include example if ORG spans are available
        only_orgs.append(eg)

Your annotations also include the character offsets into the text as the start and end of each span. For example, the first span in your example defines "start": 0, "end": 12. So the characters in the text this refers to is text[0:12].

for eg in examples:
    text = eg["text"]
    for span in eg["spans"]:
        start = span["start"]
        end = span["end"]
        print(text[start:end], span["label"])

Zainpann · March 12, 2019, 11:06am

Thanks Alot!

Zainpann · May 3, 2019, 11:08am

for eg in ex:
text = eg[“text”]
for span in eg[“spans”]:
start = span[“start”]
end = span[“end”]
print(text[start:end]+"\t",span[“label”])

this is giving me just 1042 annoation although after addition of new entities

it is giving me below error after dsplaying 1042 annotations

KeyError Traceback (most recent call last)
in
1 for eg in ex:
2 text = eg[“text”]
----> 3 for span in eg[“spans”]:
4 start = span[“start”]
5 end = span[“end”]

KeyError: ‘spans’

ines · May 3, 2019, 11:17am

What are the new annotations you added to the set and how did you create them? It looks like the 1042 annotations you have in your set are regular named entity annotations with a "spans" property, but the other ones are not.

You can run the db-out command to export the dataset and inspect the annotations. Maybe you accidentally added text classification annotations or something else to the same set?

Zainpann · May 3, 2019, 11:49am

No i was following the same procedure while adding annotation, and I have tried this with another dataset too, and it is also displaying only 1042 annotation, is there any limitations that no more than 1042 annotation will be displayed?

ines · May 3, 2019, 2:50pm

No, that shouldn’t be a problem – datasets often have thousands or tens of thousands of examples. So unless you’re running out of memory or something (which is unlikely), there’s no reason why it would randomly stop at 1042 examples.

But are you always seeing the KeyError? If so, there’s definitely a problem with at least one of the examples, and at least one of the examples in your datasets doesn’t have a "spans" property (which shouldn’t be the case if they’re all named entity annotations – even examples with no entities should have "spans": [] set).

If you just want to work around this for now, you could do for span in eg.get("spans", []) in your code, to make the spans default to an empty list and essentially skip examples that do not have it. You might still want to investigate the examples it fails on, though, and check how they ended up in the dataset.

Zainpann · May 6, 2019, 5:54am

Thanks a ton! now I am getting all the entities!

Topic		Replies	Views
Track of new entities added usage , ner	1	407	December 8, 2018
Training on part of the custom annotations usage , ner , database	4	677	October 22, 2021
Export Annotated Data from ner.manual to get list of words per label usage , ner , database	12	958	September 7, 2022
Convert annotated NER data to entity "offset format" ner , spacy , solved	2	886	August 25, 2020
recipe just to view annotations in a db usage , database , solved	2	503	May 13, 2021

displaying annotated dataset

Related topics