I am new to using Prodigy and am creating an annotating with three labels for the annotator to select from using the code below to create the dataset and annotation. Note I specifically want to use mark for the tool:
prodigy mark dataset_name /path/to/data.json --view-id classification --label "AGE,DATE,TYPE" --memorize
Unfortunately when trying to annotate the labels are not shown as three individual labels but as one long label as shown below:
Hi! The problem here is that the mark recipe will stream in exactly what you load in and render it in a given interface. The classification interface shows one label at the top and the content – so in this case, it'll use the value you passed in for "label", which is the string "AGE,DATE,TYPE".
I'm not sure why you specifically want to use the mark recipe – what you describe sounds more like a classic task for textcat.manual (assuming you want to assign top-level categories to a text) or ner.manual (assuming you want to highlight spans in a text)?
You can see how ner.manual is implemented here:
The main differences are that: the incoming data needs to be tokenized and have a "tokens" property, the interface to use should be ner_manual (multiple selectable labels at the top, highlightable spans in the text) and the "config" of the recipe should return the full label set.
Thank you for your quick response the reason for the use of mark is that we were previously using ner.teach however ner.teach only shows you the most relevant tasks so out of the 100 examples, the “most relevant” selection seems to be only about 10-20%. I want to perform the annotation on all the documents so following another post I decided to use mark. Would ner.manual allow us to annotate on all the documents uploaded?
Yes, ner.manual will show you every example as it comes in and let you annotate manually. It doesn't do any active learning of example selection – it's fully manual. The only thing it uses the spaCy model for is tokenization, so you can highlight faster.
(Btw, if you check out the recipe docs in your PRODIGY_README.html, it should also give you more details on the built-in recipes, what they need, what they do under the hood and whether they use active learning or not.)
I tried to create a dataset ner.manual but its failing. The file I am uploading is called data.json but it says it is failing as it cannot read meta.json. I am very new to this tool and I am not sure what it is referring to when its looking for a meta.json file?
You likely forgot to pass in an argument, or set the arguments in the wrong order on the command line. So Prodigy thinks that your data file is the spaCy model and spaCy complains that it can't load your model, because it's not a model. (If there's an error, you can usually scroll up and see what caused it – whether it's Prodigy, spaCy or something else.)