Annotations not saved to database

sofiejb · November 17, 2022, 2:40pm

Hi, I am using a custom ner.manual recipie (nothing fancy) to get annotated a small dataset (just four examples for the the sake of trying out the recipie). All seems to run fine and I click save and close the webpage when completing the annotation. After a few minutes I type ctr+c in my terminal a asyncio error: asyncio.exceptions.CancelledError
Nothing is saved to my sqlite dataset.

Is there something in my recipie that makes the syncing with the database very slow? Am I misunderstanding something really basic when it comes to how you complete a session?

This is my recipie:

def get_stream(nlp, source,patterns,  highlight_chars):
    stream = JSONL(source)


    if patterns is not None:
        pattern_matcher = PatternMatcher(nlp, combine_matches=True, all_examples=True)
        pattern_matcher = pattern_matcher.from_disk(patterns)
        stream = (eg for _,eg in pattern_matcher(stream))


    stream = add_tokens(nlp, stream, use_chars=highlight_chars)
    for eg in stream:
        if "spans" not in eg:
            eg["spans"] = []

        eg = prodigy.set_hashes(eg)

        yield eg

def get_stream_loop(nlp, source, dataset, patterns, highlight_chars):
    db = connect()
    while True:
        stream = get_stream(nlp, source, patterns, highlight_chars)

        hashes_in_dataset = db.get_task_hashes(dataset)
        yielded = False

        for eg in stream:
            # Only send out task if its hash isn't in the dataset yet
            if eg["_task_hash"] not in hashes_in_dataset: 
                yield eg
                yielded = True
        if not yielded:
            break


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
    "custom_ner_manual",
    dataset=("The dataset to use", "positional", None, str),
    spacy_model=("The base model", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string),
    patterns=("The match patterns file","option","p",str),
    exclude=("Names of datasets to exclude", "option", "e", split_string),
    highlight_chars=("Allow for highlighting individual characters instead of tokens", "flag", "C", bool),
)
def custom_ner_manual(
    dataset: str,
    spacy_model: str,
    source: str,
    label: Optional[List[str]] = None,
    patterns: Optional[str] = None,
    exclude: Optional[List[str]] = None,
    highlight_chars: bool = False,
):

    stream = get_stream_loop(nlp, source, dataset, patterns, highlight_chars) # Incoming stream of examples # should we place loop here? 

    return {
        "view_id": "ner_manual",  # Annotation interface to use
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": stream,
        "exclude": exclude,  # List of dataset names to exclude
        "before_db": remove_tokens if highlight_chars else None,
        # Remove token information to permit highlighting individual characters
        "config": {  # Additional config settings, mostly for app UI
            "lang": nlp.lang,
            "labels": label,  # Selectable label options
        },

In advance, thanks for all suggested solutions!

sofiejb · November 17, 2022, 3:01pm

Just a note in case it is useful:
I see the following when I complete the annotations:

ryanwesslen · November 17, 2022, 6:38pm

hi @sofiejb!

I found the recipe you gave was missing a lot imports and a few other parts (e.g., nlp = spacy.load(spacy_model) and was missing the remove_tokens function). I didn't have a lot of time to look through every aspect but I at least got it running and save with no problems:

from prodigy.components.loaders import JSONL
from prodigy.models.matcher import PatternMatcher
from prodigy.components.preprocess import add_tokens
from prodigy.components.db import connect
from prodigy.util import split_string
import prodigy
from typing import Iterable, List, Optional, Union

from prodigy.types import RecipeSettingsType, StreamType, TaskType

import spacy

def remove_tokens(answers: List[TaskType]) -> List[TaskType]:
    """Remove token information from example before they're placed in the
    database. Used if character highlighting is enabled."""
    for eg in answers:
        del eg["tokens"]
        if "spans" in eg:
            for span in eg["spans"]:
                del span["token_start"]
                del span["token_end"]
    return answers

def get_stream(nlp, source,patterns,  highlight_chars):
    stream = JSONL(source)


    if patterns is not None:
        pattern_matcher = PatternMatcher(nlp, combine_matches=True, all_examples=True)
        pattern_matcher = pattern_matcher.from_disk(patterns)
        stream = (eg for _,eg in pattern_matcher(stream))


    stream = add_tokens(nlp, stream, use_chars=highlight_chars)
    for eg in stream:
        if "spans" not in eg:
            eg["spans"] = []

        eg = prodigy.set_hashes(eg)

        yield eg

def get_stream_loop(nlp, source, dataset, patterns, highlight_chars):
    db = connect()
    while True:
        stream = get_stream(nlp, source, patterns, highlight_chars)

        hashes_in_dataset = db.get_task_hashes(dataset)
        yielded = False

        for eg in stream:
            # Only send out task if its hash isn't in the dataset yet
            if eg["_task_hash"] not in hashes_in_dataset: 
                yield eg
                yielded = True
        if not yielded:
            break


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
    "custom_ner_manual",
    dataset=("The dataset to use", "positional", None, str),
    spacy_model=("The base model", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string),
    patterns=("The match patterns file","option","p",str),
    exclude=("Names of datasets to exclude", "option", "e", split_string),
    highlight_chars=("Allow for highlighting individual characters instead of tokens", "flag", "C", bool),
)
def custom_ner_manual(
    dataset: str,
    spacy_model: str,
    source: str,
    label: Optional[List[str]] = None,
    patterns: Optional[str] = None,
    exclude: Optional[List[str]] = None,
    highlight_chars: bool = False,
):
    nlp = spacy.load(spacy_model) # this was missing
    stream = get_stream_loop(nlp, source, dataset, patterns, highlight_chars) # Incoming stream of examples # should we place loop here? 

    return {
        "view_id": "ner_manual",  # Annotation interface to use
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": stream,
        "exclude": exclude,  # List of dataset names to exclude
        "before_db": remove_tokens if highlight_chars else None,
        # Remove token information to permit highlighting individual characters
        "config": {  # Additional config settings, mostly for app UI
            "lang": nlp.lang,
            "labels": label,  # Selectable label options
        },
    }

Can you check if this is what you wanted?

The weird thing is that if you didn't have nlp = spacy.load(spacy_model) then you would receive this error:

stream = get_stream_loop(nlp, source, dataset, patterns, highlight_chars) # Incoming stream of examples # should we place loop here? 
NameError: name 'nlp' is not defined

And remove_tokens function is only really needed if you add -C so I'm still not confident that fix will make a difference if you didn't add the -C option.

So I think this may not solve your problem. However, at least gets us to a recipe that works that we can replicate on both ends.

If you're still having problems, let us know what version of Prodigy and anything other unique details you may be doing (e.g., what what your full prodigy command you run like python -m prodigy custom_ner_manual ... and ideally 2-3 test examples).

Topic		Replies	Views
Edit Saved NER Manual Annotations usage , ner , database , solved	4	1388	September 13, 2018
Saving and retrieving annotations usage , database , custom , solved	7	5097	June 13, 2018
Edit saved annotations ner , solved	4	1372	March 2, 2018
Error saving annotations third-party , server	7	752	January 4, 2022
Custom Recipe with MongoDB usage , ner	3	433	April 28, 2023

Annotations not saved to database

Related topics