showing no task available even data not yet completely annotated

usage
(pradeep) #1

Hi,team

I am not getting the all sentences whatever i hostwd in the prodigy sever its frequently showing no task available even the data not yet finished

for example : if i host 800 sentences dataset
after some time i mean around 100 or 150 its showing no task available on prodiigy ui

can u give me answers why this was happening ?

0 Likes

(Ines Montani) #2

Hi! Which recipe are you running? If you’re running an active learning-powered recipe like ner.teach or textcat.teach, this is expected – remember that you’re annotating the most relevant examples here, so Prodigy will score them and skip some examples in favour of others. In ner.teach, you’ll also be seeing lots of different analyses of individual examples.

If your goal is to just annotate every single example in your stream, you probably want to be using a manual recipe. You can find more details in this thread:

0 Likes

Struggling to create a multiple choice image classification
(pradeep) #3

i running below command to start the process
python3.5 -m prodigy ner.make-silver testingof22kto27kdatasm en_core_web_md testingof22kto27kdatasm.jsonl --label Techskill,Softskill,Duration,EducationUniversity,EducationDegree,EducationSubject,Location,Title,ExperienceTitle,DescriptiveTechskills,Descriptivesoftskills,Degreeprovider --patterns skill_patt.jsonl -F recipe.py

we are using ner.make-silver can you give slolution for this to fix the issue

0 Likes

(Ines Montani) #4

How does your ner.make-silver recipe in your recipe.py look and what does it do?

Also, what’s in your testingof22kto27kdatasm dataset already? By default, Prodigy won’t show you examples that have already been annotated in the current dataset.

0 Likes

(pradeep) #5

testingof22kto27kdatasm is the jsonf file which is having sentences for ner annotation

0 Likes

(Ines Montani) #6

Ah, I meant the first argument here, which I assume is the name of the dataset the annotations are saved to?

Also, since it’s a custom recipe, it’d be great if you could share the recipe script or at least more details on what it does and/or what it’s based on.

0 Likes

(pradeep) #7

yes testingof22kto27kdatasm is the name of the database to store our annotations and my recipe is

recipe.py

import prodigy
from prodigy import recipe_args
import spacy
from prodigy.util import read_jsonl
from spacy.matcher import Matcher
import prodigy
import spacy
from prodigy.util import log
import spacy.gold
import spacy.vocab
import spacy.tokens
import copy
from spacy.tokens import Span
from prodigy.components.preprocess import split_sentences, add_tokens
from prodigy.components.loaders import get_stream
from prodigy.core import recipe_args
from prodigy.util import split_evals, get_labels_from_ner, get_print, combine_moo
dels
from prodigy.util import read_jsonl,write_jsonl, set_hashes, log, prints
from prodigy.util import INPUT_HASH_ATTR
@prodigy.recipe('ner.make-silver',
        dataset=recipe_args['dataset'],
        spacy_model=recipe_args['spacy_model'],
        source=recipe_args['source'],
        api=recipe_args['api'],
        loader=recipe_args['loader'],
        patterns=recipe_args['patterns'],
      labels=recipe_args['label_set'],
        exclude=recipe_args['exclude'],
        unsegmented=recipe_args['unsegmented'])
def make_gold(dataset, spacy_model, source=None, api=None, loader=None,
              patterns=None, labels=None, exclude=None, unsegmented=False):
    """
    Create gold data for NER by correcting a model's suggestions.
    """
    #log("RECIPE: Starting recipe ner.make-gold", locals())
    nlp = spacy.load(spacy_model)
    #log("RECIPE: Loaded model {}".format(spacy_model))

    patterns_by_label = {}
    for entry in read_jsonl(patterns):
        patterns_by_label.setdefault(entry['label'], []).append(entry['pattern']]
)
    matcher = Matcher(nlp.vocab)
  for pattern_label, patterns in patterns_by_label.items():
        matcher.add(pattern_label, None, *patterns)

    # Get the label set from the `label` argument, which is either a
    # comma-separated list or a path to a text file. If labels is None, check
    # if labels are present in the model.
    if labels is None:
        labels = set(get_labels_from_ner(nlp) + list(patterns_by_label.keys()))
        print("Using {} labels from model: {}"
              .format(len(labels), ', '.join(labels)))
    log("RECIPE: Annotating with {} labels".format(len(labels)), labels)
    stream = get_stream(source, api=api, loader=loader, rehash=True,
                        dedup=True, input_key='text')
    # Split the stream into sentences
    if not unsegmented:
        stream = split_sentences(nlp, stream)
    # Tokenize the stream
    stream = add_tokens(nlp, stream)
  def make_tasks():
        """Add a 'spans' key to each example, with predicted entities."""
        texts = ((eg['text'], eg) for eg in stream)
        for doc, eg in nlp.pipe(texts, as_tuples=True):
            task = copy.deepcopy(eg)
            spans = []
            matches = matcher(doc)
            pattern_matches = tuple(Span(doc, start, end, label) for label, starr
t, end in matches)
            for ent in doc.ents + pattern_matches:
                if labels and ent.label_ not in labels:
                    continue
                spans.append({
                    'token_start': ent.start,
                    'token_end': ent.end - 1,
                    'start': ent.start_char,
                    'end': ent.end_char,
                    'text': ent.text,
                    'label': ent.label_,
                    'source': spacy_model,
                    'input_hash': eg[INPUT_HASH_ATTR]
                })
            task['spans'] = spans
            task = set_hashes(task)
            yield task

    return {
        'view_id': 'ner_manual',
        'dataset': dataset,
        'stream': make_tasks(),
        'exclude': exclude,
        'update': None,
        'config': {'lang': nlp.lang, 'labels': labels}
    }
0 Likes

(Ines Montani) #8

Okay, but what’s saved in that dataset already? (You can check this by using the db-out command). If you already have annotations in there, Prodigy will skip incoming examples if they’re the same tasks.

0 Likes

(pradeep) #9

every time we are creating new database for every dataset I’m sure when we are hosting dataset the database is empty.

0 Likes