Disable active-learning component ner_manual

daniyalSelani · November 22, 2019, 12:43pm

Hi,
We are using prodigy mostly as an annotation and vetting tool. I have a custom recipe that implements our bespoke model into the loop. However, sometimes prodigy brings up "No tasks available" when a custom recipe is deployed. I think this could be because of the active-learning in the ner_manual interface which i am using in the custom recipe. Is there anyway to disable the active-learning, or is there an interface which will allow the user to vet and annotate the text without that feature.
I am using ner_manual as it allows me to add labels from the bespoke model on to the text, as well as allowing the user to annotate terms onto the same text. I tried to set "view_id" = 'mark', but that interface does not exist.

My recipe:

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.loaders import JSON

import spacy
from prodigy.components.preprocess import add_tokens

@prodigy.recipe('custom.ner.match',
	dataset=("The dataset to use", "positional", None, str),
	source=("The source data as a JSON file", "positional", None, str),
)
def custom_ner_match(dataset, source):
	stream = JSON(source)
	stream = get_stream(stream)
	nlp = spacy.load('en_core_web_sm')
	stream = add_tokens(nlp, stream)

	return {
		'view_id': 'ner_manual',       # Annotation interface to use
		'dataset': dataset,     # Name of dataset to save annotations
		'stream': stream,       # Incoming stream of examples
		"config": {
		"labels": [label1, label2, label3]
		}
	}


def get_stream(stream):
	model = lcs.LungScreen()
	for eg in stream:
		spans_from_model = get_spans_from_your_model(eg["text"], model)
		spans = []
		for start_char, end_char, label in spans_from_model:
			# Let's assume your function returns a tuple of the start and end
			# offset and the label. For each span, we now create a new task
			# and send it out

			# when a 0,0 span is returned 
			# specify start and end token to avoid mismatched token error
			# NOTE: remember to ignore tokens for this span when ingesting database file after annotations are done
			if (start_char== 0 and end_char==0):
				spans .append({"start": start_char, "end": end_char, "label": label, "token_start": 0, "token_end": 0 })
				
			else:
				spans.append({"start": start_char, "end": end_char, "label": label})
		yield {"text": eg["text"], "spans": spans}  

def get_spans_from_your_model(text, model):
	result = model.run(text)
	labels = []
	for k,v in result.items():
		
		if(v!=None):
			labels.append(str(k)+':'+str(v))
	
	if len(labels)<1:
		return([[0,0, "NO ENTITY RETURNED"]])
	output = [[0,0, i] for i in labels]
	
	
	return(output)

Hoping to hear from you soon!
Thanks!

ines · November 24, 2019, 2:41pm

Hi! I think there might be some confusion here around the active learning features: The manual recipes do not use any active learning – and if you implement features like sorting or pre-selection via a model, all of this happens in your recipe code. In your recipe, you're only using the model to pre-highlight spans and are not actually updating the model in the loop. And you're also not filtering the results. So there's no active learning happening here.

Prodigy shows "No tasks available." if your stream doesn't yield any examples. This can happen for different reasons, and I don't know what your data looks like. For example, one explanation could be that all examples are already annotated in your dataset. Or, if you refresh a bunch of times and request new batches from the stream and don't answer them, they'll only be re-queued if you restart the server, or if you make your stream "infinite" (see here for an example) and re-queue unanswered questions until they're in the database.

Sometimes it can help to add some print statements to your code to log what's going on.

The view_id you return by your recipe is the UI to use to render the content – you can find an overview of the available options in the Readme.

daniyalSelani · November 26, 2019, 1:03pm

Thanks for the detailed and helpful response as always. The problem got solved.

Topic		Replies	Views
"No tasks available" for ner_manual but not ner ner , solved	6	901	April 10, 2018
ner.manual - simple usage Getting Started usage , ner , solved	7	2466	October 11, 2018
No task available with custom recipe for text classification textcat , custom , solved	7	511	October 6, 2021
showing no task available even data not yet completely annotated usage	10	1479	October 20, 2021
Customizing prodigy for NER and relationship extraction usage , ner , custom	4	4208	December 20, 2017

Disable active-learning component ner_manual

Related topics