Is it possible to do both NER and text classification annotation? Or I should annotated it twice, first using ner.manual and then using textcar.manual? I have text dataset that will be used for both of NER and Text classification model development.
To make the annotation task as efficient as possible we recommend doing your annotation in two passes: NER and textcat separately.
Prodigy train and data-to-spacy commands will take care of merging the annotations for the purpose of training a spaCy pipeline with these two components.
Annotation is easier and less error prone if annotators can focus on task at a time.
That said it is definitely possible to setup Prodigy to collect NER and textcat annotations at the same time via custom recipe. You can combine different UIs using blocks- as long as your input file contains all the necessary info to fill in ner_manual and classification view_ids, Prodigy will be able to render such block.
See here for more info and examples.
import prodigy
from prodigy.components.preprocess import add_tokens
from prodigy.components.stream import get_stream
import spacy
@prodigy.recipe("peer_review")
def peer_review_ner_cat(dataset, lang, file_path):
# We can use the blocks to override certain config and content, and set
# "text": None for the choice interface so it doesn't also render the text
blocks = [
{"view_id": "ner_manual"},
{"view_id": "choice", "text": None}
]
options = [
{"id": 3, "text": "General Comments"},
{"id": 2, "text": "Detect Localization"},
{"id": 1, "text": "Detect Suggestion"},
{"id": 0, "text": "Detect Problem"}
]
nlp = spacy.blank(lang) # blank spaCy pipeline for tokenization
stream = get_stream(file_path, loader="jsonl") # set up the stream
stream.apply(add_tokens, nlp=nlp, stream=stream) # tokenize the stream for ner_manual
return {
"dataset": dataset, # the dataset to save annotations to
"view_id": "blocks", # set the view_id to "blocks"
"stream": stream, # the stream of incoming examples
"config": {
"choice_style": "multiple",
"labels": ["Problem", "Suggestion", "Localization","General"], # the labels for the manual NER interface
"blocks": blocks # add the blocks to the config
}
}
But when I run the prodigy with this recipe using this command: prodigy peer_review ner_cat_peer_review id dataset/peer-review-masdig.jsonl -F prodigy/recipe.py
there is an error in the web interface like shown in this image: