Hi,
We are using prodigy mostly as an annotation and vetting tool. I have a custom recipe that implements our bespoke model into the loop. However, sometimes prodigy brings up "No tasks available" when a custom recipe is deployed. I think this could be because of the active-learning in the ner_manual interface which i am using in the custom recipe. Is there anyway to disable the active-learning, or is there an interface which will allow the user to vet and annotate the text without that feature.
I am using ner_manual as it allows me to add labels from the bespoke model on to the text, as well as allowing the user to annotate terms onto the same text. I tried to set "view_id" = 'mark', but that interface does not exist.
My recipe:
import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.loaders import JSON
import spacy
from prodigy.components.preprocess import add_tokens
@prodigy.recipe('custom.ner.match',
dataset=("The dataset to use", "positional", None, str),
source=("The source data as a JSON file", "positional", None, str),
)
def custom_ner_match(dataset, source):
stream = JSON(source)
stream = get_stream(stream)
nlp = spacy.load('en_core_web_sm')
stream = add_tokens(nlp, stream)
return {
'view_id': 'ner_manual', # Annotation interface to use
'dataset': dataset, # Name of dataset to save annotations
'stream': stream, # Incoming stream of examples
"config": {
"labels": [label1, label2, label3]
}
}
def get_stream(stream):
model = lcs.LungScreen()
for eg in stream:
spans_from_model = get_spans_from_your_model(eg["text"], model)
spans = []
for start_char, end_char, label in spans_from_model:
# Let's assume your function returns a tuple of the start and end
# offset and the label. For each span, we now create a new task
# and send it out
# when a 0,0 span is returned
# specify start and end token to avoid mismatched token error
# NOTE: remember to ignore tokens for this span when ingesting database file after annotations are done
if (start_char== 0 and end_char==0):
spans .append({"start": start_char, "end": end_char, "label": label, "token_start": 0, "token_end": 0 })
else:
spans.append({"start": start_char, "end": end_char, "label": label})
yield {"text": eg["text"], "spans": spans}
def get_spans_from_your_model(text, model):
result = model.run(text)
labels = []
for k,v in result.items():
if(v!=None):
labels.append(str(k)+':'+str(v))
if len(labels)<1:
return([[0,0, "NO ENTITY RETURNED"]])
output = [[0,0, i] for i in labels]
return(output)
Hoping to hear from you soon!
Thanks!