showing no task available even data not yet completely annotated


I am not getting the all sentences whatever i hostwd in the prodigy sever its frequently showing no task available even the data not yet finished

for example : if i host 800 sentences dataset
after some time i mean around 100 or 150 its showing no task available on prodiigy ui

can u give me answers why this was happening ?

Hi! Which recipe are you running? If you're running an active learning-powered recipe like ner.teach or textcat.teach, this is expected – remember that you're annotating the most relevant examples here, so Prodigy will score them and skip some examples in favour of others. In ner.teach, you'll also be seeing lots of different analyses of individual examples.

If your goal is to just annotate every single example in your stream, you probably want to be using a manual recipe. You can find more details in this thread:

i running below command to start the process
python3.5 -m prodigy ner.make-silver testingof22kto27kdatasm en_core_web_md testingof22kto27kdatasm.jsonl --label Techskill,Softskill,Duration,EducationUniversity,EducationDegree,EducationSubject,Location,Title,ExperienceTitle,DescriptiveTechskills,Descriptivesoftskills,Degreeprovider --patterns skill_patt.jsonl -F

we are using ner.make-silver can you give slolution for this to fix the issue

How does your ner.make-silver recipe in your look and what does it do?

Also, what’s in your testingof22kto27kdatasm dataset already? By default, Prodigy won’t show you examples that have already been annotated in the current dataset.

testingof22kto27kdatasm is the jsonf file which is having sentences for ner annotation

Ah, I meant the first argument here, which I assume is the name of the dataset the annotations are saved to?

Also, since it's a custom recipe, it'd be great if you could share the recipe script or at least more details on what it does and/or what it's based on.

yes testingof22kto27kdatasm is the name of the database to store our annotations and my recipe is

import prodigy
from prodigy import recipe_args
import spacy
from prodigy.util import read_jsonl
from spacy.matcher import Matcher
import prodigy
import spacy
from prodigy.util import log
import spacy.vocab
import spacy.tokens
import copy
from spacy.tokens import Span
from prodigy.components.preprocess import split_sentences, add_tokens
from prodigy.components.loaders import get_stream
from prodigy.core import recipe_args
from prodigy.util import split_evals, get_labels_from_ner, get_print, combine_moo
from prodigy.util import read_jsonl,write_jsonl, set_hashes, log, prints
from prodigy.util import INPUT_HASH_ATTR
def make_gold(dataset, spacy_model, source=None, api=None, loader=None,
              patterns=None, labels=None, exclude=None, unsegmented=False):
    Create gold data for NER by correcting a model's suggestions.
    #log("RECIPE: Starting recipe ner.make-gold", locals())
    nlp = spacy.load(spacy_model)
    #log("RECIPE: Loaded model {}".format(spacy_model))

    patterns_by_label = {}
    for entry in read_jsonl(patterns):
        patterns_by_label.setdefault(entry['label'], []).append(entry['pattern']]
    matcher = Matcher(nlp.vocab)
  for pattern_label, patterns in patterns_by_label.items():
        matcher.add(pattern_label, None, *patterns)

    # Get the label set from the `label` argument, which is either a
    # comma-separated list or a path to a text file. If labels is None, check
    # if labels are present in the model.
    if labels is None:
        labels = set(get_labels_from_ner(nlp) + list(patterns_by_label.keys()))
        print("Using {} labels from model: {}"
              .format(len(labels), ', '.join(labels)))
    log("RECIPE: Annotating with {} labels".format(len(labels)), labels)
    stream = get_stream(source, api=api, loader=loader, rehash=True,
                        dedup=True, input_key='text')
    # Split the stream into sentences
    if not unsegmented:
        stream = split_sentences(nlp, stream)
    # Tokenize the stream
    stream = add_tokens(nlp, stream)
  def make_tasks():
        """Add a 'spans' key to each example, with predicted entities."""
        texts = ((eg['text'], eg) for eg in stream)
        for doc, eg in nlp.pipe(texts, as_tuples=True):
            task = copy.deepcopy(eg)
            spans = []
            matches = matcher(doc)
            pattern_matches = tuple(Span(doc, start, end, label) for label, starr
t, end in matches)
            for ent in doc.ents + pattern_matches:
                if labels and ent.label_ not in labels:
                    'token_start': ent.start,
                    'token_end': ent.end - 1,
                    'start': ent.start_char,
                    'end': ent.end_char,
                    'text': ent.text,
                    'label': ent.label_,
                    'source': spacy_model,
                    'input_hash': eg[INPUT_HASH_ATTR]
            task['spans'] = spans
            task = set_hashes(task)
            yield task

    return {
        'view_id': 'ner_manual',
        'dataset': dataset,
        'stream': make_tasks(),
        'exclude': exclude,
        'update': None,
        'config': {'lang': nlp.lang, 'labels': labels}

Okay, but what's saved in that dataset already? (You can check this by using the db-out command). If you already have annotations in there, Prodigy will skip incoming examples if they're the same tasks.

every time we are creating new database for every dataset I’m sure when we are hosting dataset the database is empty.

This didnt work for CLI. I am still getting the same error.

i use textcat.manual

I have a lot of datasets but i still get "NO TASK AVAILABLE"

How are you running the command and what version of Prodigy are you using? And did you double-check that the examples you're loading in aren't yet annotated in the dataset? If you already have annotations for them, Prodigy will skip them, so you'll only see what's not yet in the dataset in the database.