Annotate text with multiple entities using ner_manual

I just got a personal license and am trying to annotate text data for NER training with multiple entities (custom entities), for now I just want data annotation, and care less about model training.

I followed the tutorial and did the following
(1) created dataset:
prodigy dataset my_data “My data for Annotation”
(2) I have jsonl file with few records in json format – “tmp.jsonl”
(3) I tried ner.teach but I am not able to provide custom label, also ner.teach opens up web interface for one entity
prodigy ner.teach my_data en_core_web_sm tmp.jsonl
(4) I tried mark, but this doesn’t load any data
prodigy mark my_data tmp.jsonl --view-id ner_manual

All I wanted is to just annotate text on a set of custom entities.

Any help is much appreciated.

Thanks,

You’re probably looking for the ner.manual recipe, which streams in your text, tokenizes it and asks you to label one or more spans of text. The --label argument lets you pass in one or more labels to annotate. For example:

prodigy ner.manual my_data en_core_web_sm my_data.tmp.jsonl --label LABEL_ONE,LABEL_TWO

Some background on the other recipes: Because ner.teach only asks you for binary feedback, it also only asks you for one entity at a time. You can still annotate multiple labels and multiple entities in the same text – just in separate steps. The mark recipe on the other hand streams in whatever you give it, without transforming your data. So if you set --view-id ner_manual, it also expects the data to be tokenized and in the correct format for the manual interface (see your PRODIGY_README.html for details if you’re interested).

Thanks for the quick reply.

I ran prodigy ner.manual my_data en_core_web_sm tmp.jsonl --label ONE,TWO but the web UI shows “NO TASK AVAILABLE”.

I have few json text in tmp.jsonl but my-data dataset should be empty because I just created it.

Do I have to load my-data dataset first?

Below is the screenshot.

25%20PM

No, what you’re doing is correct. The my_data dataset is where the answers will be saved when you submit them in the app.

In the “Progress” in the sidebar it says that the total number of annotations in your dataset is 5 – did you already annotate something with that dataset previously? And what happens if you create a completely new dataset and try again?

If that doesn’t work, what’s in your temp.jsonl? Can you show an example? And can you run the command with PRODIGY_LOGGING=basic and see if there’s anything suspicious in the logs? For example:

PRODIGY_LOGGING=basic prodigy ner.manual ...

Hi,

Yes I previously annotated my_data. But then, I dropped the dataset and re-created it with the same name, seems like dropping dataset doesn’t delete the table in SQLite.
When I created a dataset with a different name, it worked.

Thanks for your help.