I annotated sentences using ner.manual for marking sentiment spans. These spans are not the traditional "NER" stuff. However, I would like to classify them into classes such as positive, neutral, negative, etc.
So far, a sentence might look like this: Mr. Müller was very friendly and also, he successfully did the surgery.
Here, very friendly and successfully did the surgery are the marked spans of two different out of many possible classes.
As I have sentences with several annotated spans (or no spans), I would like to classify every span individually as positive, negative, etc. How can I do that? I tried to multiply those sentences from the db-out jsonl and by letting appear every span just once, but the duplication protection mechanism is hindereing me (though this is generally a very useful feature). Do you have any ideas?
What is more: Will the ner.print function be as pretty as it was before again? (That is, colourful.)
Hi, I hope I understand your use case correctly – but you should be able to implement a custom stream that takes your already annotated examples, creates a new example for each span and then sends it out with options, so you can annotate each span in the choice interface and assign it an additional category.
from prodigy.components.db import connect
from prodigy import set_hashes
import copy
options = [{"id": "POSITIVE", "text": "positive"}] # etc.
def get_stream():
db = connect()
examples = db.get_dataset("name_of_your_ner_dataset")
for eg in examples:
for span in eg.get("spans", []):
# Create a new example for each span
new_eg = copy.deepcopy(eg)
new_eg["spans"] = [span]
new_eg["options"] = options
# Rehash to prevent duplicate hashes
new_eg = set_hashes(new_eg, overwrite=True)
yield new_eg
Calling set_hashes with overwrite=True reassigns the input hashes and task hashes to make sure you don't end up with duplicates when you create the new examples.
My idea is to split the example sentence into two to-be-annotated-sentences:
example: Mr. Müller was very friendly and also, he successfully did the surgery.
which will be annotated as: Mr. Müller was very friendly and also, he successfully did the surgery. Mr. Müller was very friendly and also, he successfully did the surgery.
I do not really want to annotate a sentence two times, but I do not see another way for marking each span as either positive or negative. Do you have one?
This is the main goal: Annotate a bunch of already annotated sentences again, but this time I want to classify pre-annotated spans as positive, negative or neutral. I need to see the full sentence either way, because without the context, the annotations ofentimes do not make sense. I hope it is understandable now. And many thanks for your reply!
I think annotating each sentence with focus on one entity at a time makes the most sense here. Yeah, you'll be seeing the same text multiple times, but you're always focusing in a different span within the sentence, with potentially very different sentiments.
An alternative solution could be to add "options", make them multiple choice and create one per span + sentiment combination. So you'd have "very friendly": positive, "very friendly": neutral, ..., "successfully did the surgery": positive, and so on. And you could maybe use the "style" property on the choice options to assign different colors. But this still seems messier and means there's a lot more going on per task (and more potential for making mistakes).