Multi-labels not working

rose · August 23, 2019, 2:39pm

I am new to using Prodigy and am creating an annotating with three labels for the annotator to select from using the code below to create the dataset and annotation. Note I specifically want to use mark for the tool:

prodigy mark dataset_name /path/to/data.json --view-id classification --label "AGE,DATE,TYPE" --memorize

Unfortunately when trying to annotate the labels are not shown as three individual labels but as one long label as shown below:

I continued to try different combinations of the label to get it to work as shown below but none of the below worked:

--label "AGE, DATE ,TYPE"
--label AGE,DATE,TYPE
--label AGE, DATE, TYPE
-l "AGE, DATE ,TYPE"
-l "AGE,DATE,TYPE"
-l AGE,DATE,TYPE
-l AGE, DATE, TYPE

Does anyone know the possible cause of the issue?

ines · August 23, 2019, 3:12pm

Hi! The problem here is that the mark recipe will stream in exactly what you load in and render it in a given interface. The classification interface shows one label at the top and the content – so in this case, it'll use the value you passed in for "label", which is the string "AGE,DATE,TYPE".

I'm not sure why you specifically want to use the mark recipe – what you describe sounds more like a classic task for textcat.manual (assuming you want to assign top-level categories to a text) or ner.manual (assuming you want to highlight spans in a text)?

You can see how ner.manual is implemented here:

github.com

explosion/prodigy-recipes/blob/master/ner/ner_manual.py

# coding: utf8
from __future__ import unicode_literals

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
from prodigy.util import split_string
import spacy


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe('ner.manual',
    dataset=("The dataset to use", "positional", None, str),
    spacy_model=("The base model", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string),
    exclude=("Names of datasets to exclude", "option", "e", split_string)
)

This file has been truncated. show original

The main differences are that: the incoming data needs to be tokenized and have a "tokens" property, the interface to use should be ner_manual (multiple selectable labels at the top, highlightable spans in the text) and the "config" of the recipe should return the full label set.

rose · August 23, 2019, 3:19pm

Hello,

Thank you for your quick response the reason for the use of mark is that we were previously using ner.teach however ner.teach only shows you the most relevant tasks so out of the 100 examples, the “most relevant” selection seems to be only about 10-20%. I want to perform the annotation on all the documents so following another post I decided to use mark. Would ner.manual allow us to annotate on all the documents uploaded?

ines · August 23, 2019, 3:23pm

Yes, ner.manual will show you every example as it comes in and let you annotate manually. It doesn't do any active learning of example selection – it's fully manual. The only thing it uses the spaCy model for is tokenization, so you can highlight faster.

(Btw, if you check out the recipe docs in your PRODIGY_README.html, it should also give you more details on the built-in recipes, what they need, what they do under the hood and whether they use active learning or not.)

rose · August 23, 2019, 3:29pm

Thank you this is great info

I tried to create a dataset ner.manual but its failing. The file I am uploading is called data.json but it says it is failing as it cannot read meta.json. I am very new to this tool and I am not sure what it is referring to when its looking for a meta.json file?

ines · August 23, 2019, 3:33pm

You likely forgot to pass in an argument, or set the arguments in the wrong order on the command line. So Prodigy thinks that your data file is the spaCy model and spaCy complains that it can't load your model, because it's not a model. (If there's an error, you can usually scroll up and see what caused it – whether it's Prodigy, spaCy or something else.)

Your command should look something like this:

prodigy ner.manual your_dataset en_core_web_sm data.jsonl --label AGE,DATE,TYPE

You can also find some more examples in the docs

rose · August 23, 2019, 3:33pm

Brilliant thank you

Topic		Replies	Views
Labels in mark, and multiuser access to prodigy usage , solved	7	2774	June 28, 2018
Label not assigned when using mark recipe done	2	513	October 10, 2017
Labels not being served, usage , custom	1	370	February 21, 2020
NER - Add labels on the fly usage , ner	1	462	May 8, 2021
'Cannot find label in model' when trying to train from pre-annotated data usage , ner , solved	11	946	March 14, 2019

Multi-labels not working

Related topics