ner.manual - simple usage

Cristiano74 · October 5, 2018, 3:18pm

Dear Support,
I have a newbie questions about the usage of ner.manual.

Q1: In the ner.manual I have to annotate a term (ex. "gigigi") each time I see it on different tasks? Maybe this term is replicated many times in the document. It is useful for disambiguation?

Q2: In the ner.manual I have around 600 phrases to annotate, but the annotator received just 120 phrases. I think there is not any active learning on ner.manual, or I'm wrong?

Thanks in advance for your support.

All the best

C.

ines · October 5, 2018, 4:30pm

Yes, if you're labelling manually, you usually want to label every instance of the term "gigigi" every single time. Named entity recognition is context-dependent, so you want your data to include the entities in a variety of contexts. For ambiguous entities, this is especially important.

Because labelling everything manually can be kinda annoying and tedious, Proidgy tries to make this easier with the semi-automated recipes like ner.teach (uses active learning) or ner.match (without active learning) that will suggest candidates and let you say yes or no.

No, the default ner.manual recipe should stream in all examples as they come in and not skip any. By "received", do you mean that they annotated everything, but you only have 120 tasks in the dataset? Some possible explanations could be:

Does you data contain any duplicate sentences? If so, Prodigy will filter those out.
If the annotator refreshes the browser, the Prodigy app will request the next batch of tasks – and until you've received all answers and the session is over, Prodigy can't know whether a task needs to be sent out again. This thread has more details on this and suggestions for a solution.
Always make sure to save your progress in the web app after you're done annotating. Otherwise, you might lose the last batch of annotations when you close the browser.

Cristiano74 · October 8, 2018, 3:25pm

Thanks @ines for the answers!

I have another couple of questions:

Q3: I’ve checked also on your demo server on https://prodi.gy/demo?view_id=ner_manual:

On your demo the annotator has not any “No tasks available” message till the 100% progress, and that is OK.
On our server the progress bar has infinite symbol (and I would like to have the same progress bar of the demo), how I can setup it?

The source file is a TXT file, for example:

prodigy ner.manual export3 it_core_news_sm export1.txt --label "PERSON, ORG, LOC" &

Q4: How to add the license on the product? Just download the file you provided via email and use it?

Thanks again for your support, and any suggestions would be really appreciated.

All the best.

C.

ines · October 8, 2018, 3:33pm

Ah, I think I know what's going on here: By default, Prodigy streams are generators and if a file can be read in line-by-line, Prodigy will do so and start yielding out tasks immediately. Generators have no length, because they don't know how many items there are in total. And if Prodigy doesn't know how many items there are in total, it can't display the progress.

A simple thing you could do is edit the recipe in recipes/ner.py and find the following line:

stream = get_stream(source, api=api, loader=loader, rehash=True,
                        dedup=True, input_key='text')

... and replace it with this:

stream = get_stream(source, api=api, loader=loader, rehash=True,
                        dedup=True, input_key='text')
stream = list(stream)

(To find the source of your Prodigy installation, you can run the following:

python -c "import prodigy; print(prodigy.__file__)"

Do you mean the software license? The Prodigy library doesn't connect to the internet or otherwise "phones home", so you don't need to enter the license key when you use the software. However, you should keep it safe for future reference.

Cristiano74 · October 9, 2018, 9:57am

Many thanks @ines for your reply, and I've modified the ner.py as your instructions:

ines:

A simple thing you could do is edit the recipe in recipes/ner.py and find the following line:
stream = get_stream(source, api=api, loader=loader, rehash=True,
                        dedup=True, input_key='text')
… and replace it with this:
stream = get_stream(source, api=api, loader=loader, rehash=True,
                        dedup=True, input_key='text')
stream = list(stream)

But without success, for example on ner.manual receipe I add the stream = list(stream) as follow:

@recipe('ner.manual',
        dataset=recipe_args['dataset'],
        spacy_model=recipe_args['spacy_model'],
        source=recipe_args['source'],
        api=recipe_args['api'],
        loader=recipe_args['loader'],
        label=recipe_args['label_set'],
        exclude=recipe_args['exclude'])
    def manual(dataset, spacy_model, source=None, api=None, loader=None,
           label=None, exclude=None):
    """
    Mark spans by token. Requires only a tokenizer and no entity recognizer,
    and doesn't do any active learning.
    """
    log("RECIPE: Starting recipe ner.manual", locals())
    nlp = spacy.load(spacy_model)
    log("RECIPE: Loaded model {}".format(spacy_model))
    # Get the label set from the `label` argument, which is either a
    # comma-separated list or a path to a text file. If labels is None, check
    # if labels are present in the model.
    labels = label
    if not labels:
        labels = get_labels_from_ner(nlp)
        print("Using {} labels from model: {}"
              .format(len(labels), ', '.join(labels)))
    log("RECIPE: Annotating with {} labels".format(len(labels)), labels)
    stream = get_stream(source, api=api, loader=loader, rehash=True,
                        dedup=True, input_key='text')

    stream = list(stream)

Happy to know if there is something wrong in this edited file ner.py, and if I need to modify the interface as well to make it working?

Another useful thing IMHO to add on the ner.manual interface is the total number of tasks in progress till the completion, like "3% of 150".

Again, thanks for your support.

All the best.

C.

Cristiano74 · October 11, 2018, 9:58am

Hi @ines if you have any ideas or suggestions, or just a pointer to the documentation about this topic, please let me know.

All the best

C.

ines · October 11, 2018, 11:07am

Sorry, I think I told you to add this at the wrong position: the idea is that the recipe needs to return a stream that exposes a __len__ (i.e. a length, which is the case for regular lists, but not generators). So try converting the stream as late as possible in the recipe:

stream = add_tokens(nlp, stream)
stream = list(stream)

Cristiano74 · October 11, 2018, 4:03pm

Thanks a lot @ines ! Works like a charm https://imgur.com/HBL73Tu

I really appreciate your support.

All the best

C.

Topic		Replies	Views
Name Entity Recognition Workflow usage , ner	2	464	February 19, 2020
Recipe choice for NER Annotated Dataset Creation usage , ner , solved	2	458	April 20, 2020
Named Entities(manual) usage , ner , solved	4	803	May 11, 2018
Two Questions on Teach recipes usage , ner , textcat , solved	5	742	January 27, 2020
Help with messy data usage , ner	8	666	January 20, 2019

ner.manual - simple usage

Related topics