Adding a text box to a recipe

klopez · February 3, 2022, 10:11pm

I am trying to add a simple text box to a recipe specifically the textcat.teach recipe as I want to annotate/classify some text samples and to keep the model in the loop.

I copied the recipe from here and added a blocks variable to the config and also added the pipeline argument to the textclassifier model (as shown here):

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.models.textcat import TextClassifier
from prodigy.models.matcher import PatternMatcher
from prodigy.components.sorters import prefer_uncertain
from prodigy.util import combine_models, split_string
import spacy
from typing import List, Optional


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
    "textcat.teach.BOX",
    dataset=("The dataset to use", "positional", None, str),
    spacy_model=("The base model", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string),
    patterns=("Optional match patterns", "option", "p", str),
    exclude=("Names of datasets to exclude", "option", "e", split_string),
)
def textcat_teach(
    dataset: str,
    spacy_model: str,
    source: str,
    label: Optional[List[str]] = None,
    patterns: Optional[str] = None,
    exclude: Optional[List[str]] = None,
):
    """
    Collect the best possible training data for a text classification model
    with the model in the loop. Based on your annotations, Prodigy will decide
    which questions to ask next.
    """
    blocks = [
        {"view_id": "text_input", "field_rows": 3, "field_label": "Explain your decision"}
    ]
    # Load the stream from a JSONL file and return a generator that yields a
    # dictionary for each example in the data.
    stream = JSONL(source)

    # Load the spaCy model
    nlp = spacy.load(spacy_model)

    # Initialize Prodigy's text classifier model, which outputs
    # (score, example) tuples
    model = TextClassifier(nlp, label, pipe_name="textcat")

    if patterns is None:
        # No patterns are used, so just use the model to suggest examples
        # and only use the model's update method as the update callback
        predict = model
        update = model.update
    else:
        # Initialize the pattern matcher and load in the JSONL patterns.
        # Set the matcher to not label the highlighted spans, only the text.
        matcher = PatternMatcher(
            nlp,
            prior_correct=5.0,
            prior_incorrect=5.0,
            label_span=False,
            label_task=True,
        )
        matcher = matcher.from_disk(patterns)
        # Combine the NER model and the matcher and interleave their
        # suggestions and update both at the same time
        predict, update = combine_models(model, matcher)

    # Use the prefer_uncertain sorter to focus on suggestions that the model
    # is most uncertain about (i.e. with a score closest to 0.5). The model
    # yields (score, example) tuples and the sorter yields just the example
    stream = prefer_uncertain(predict(stream))
    
    return {
        "view_id": "classification",  # Annotation interface to use
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": stream,  # Incoming stream of examples
        "update": update,  # Update callback, called with batch of answers
        "exclude": exclude,  # List of dataset names to exclude
        "config": {"lang": nlp.lang, "blocks": blocks},  # Additional config settings, mostly for app UI
    }

but when I try to run:

python -m prodigy textcat.teach.BOX news_groups blank:en newsgroups_space.txt --label NODULE --patterns nodule_patterns.jsonl -F text_cat_with_box.py

I get:

  File "text_cat_with_box.py", line 48, in textcat_teach
    model = TextClassifier(nlp, label, pipe_name="textcat")
  File "cython_src\prodigy\models\textcat.pyx", line 90, in prodigy.models.textcat.TextClassifier.__init__
  File "cython_src\prodigy\models\textcat.pyx", line 23, in prodigy.models.textcat.infer_exclusive
ValueError: Can't infer exclusive vs. non-exclusive categories from 'textcat': not in the pipeline. Available: []

How would I add a simple text box where the annotator can give a reason to their choice for this recipe? I also tried just pasting the code directly and running it and it gives the same error. Any ideas what could be happening here?

ines · February 5, 2022, 1:21pm

Hi! It looks like the problem here is that you're using a blank:en pipeline with no text classifier, so there's nothing that the recipe can use to predict the initial categories. One thing you can do in your recipe is to make sure a text classifier is added and has the correct labels:

from prodigy.models.textcat import add_text_classifier

# in your recipe
add_text_classifier(nlp, label)

That said, this will start you off with a blank text classifier, the model will know essentially nothing and it might take you a lot longer to get to a state where it can make useful suggestions. So if possible, you ideally want to start off with a text classifier that was trained on at least a small sample of manually annotated data. For example, you could run textcat.manual, collect a few representative examples, train your model with prodigy train and then use that in your custom textcat.teach workflow to improve it further.

klopez · February 10, 2022, 7:09pm

Hi Ines,

Thank you for the insight. I think there are discrepancies between the internal recipes and recopies that exist on explosions github? We can see that this textcat manual is different, it doesn't have the --loader parameter. I really just want to slightly modify the built-in recipe such that I can add 1 extra block (a text box) but I cant seem to find the built-in recipes?

Thank you

klopez · February 10, 2022, 8:46pm

OKAY, I figured out how to accomplish this!
First I had to check out the builtin recipie which can be found by doing this: python -c "import prodigy;print(prodigy.__file__)" and can be found here

I literally took the textcat.manual part of the textcat.py file and modified it to

add 2 components to the block (the actual text to annotate and the input box)
include the blocks
add the block as the view id

Here is what my code looks like:

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.models.textcat import TextClassifier
from prodigy.models.matcher import PatternMatcher
from prodigy.components.sorters import prefer_uncertain
from prodigy.util import combine_models, split_string, get_labels, log
from prodigy.components.loaders import get_stream
from prodigy.components.preprocess import add_label_options, add_labels_to_stream
from prodigy.types import TaskType, StreamType, RecipeSettingsType
from typing import List, Optional, Union, Iterable
import spacy
from typing import List, Optional
@prodigy.recipe(
    "textcat.manual.BOX",
    # fmt: off
    dataset=("Dataset to save annotations to", "positional", None, str),
    source=("Data to annotate (file path or '-' to read from standard input)", "positional", None, str),
    loader=("Loader (guessed from file extension if not set)", "option", "lo", str),
    label=("Comma-separated label(s) to annotate or text file with one label per line", "option", "l", get_labels),
    exclusive=("Treat classes as mutually exclusive (if not set, an example can have multiple correct classes)", "flag", "E", bool),
    exclude=("Comma-separated list of dataset IDs whose annotations to exclude", "option", "e", split_string),
    # fmt: on
)
def manual(
    dataset: str,
    source: Union[str, Iterable[dict]],
    loader: Optional[str] = None,
    label: Optional[List[str]] = None,
    exclusive: bool = False,
    exclude: Optional[List[str]] = None,
) -> RecipeSettingsType:
    """
    Manually annotate categories that apply to a text. If more than one label
    is specified, categories are added as multiple choice options. If the
    --exclusive flag is set, categories become mutually exclusive, meaning that
    only one can be selected during annotation.
    """
    
    log("RECIPE: Starting recipe textcat.manual", locals())
    labels = label
    if not labels:
        msg.fail("textcat.manual requires at least one --label", exits=1)
    has_options = len(labels) > 1
    log(f"RECIPE: Annotating with {len(labels)} labels", labels)
    stream = get_stream(
        source, loader=loader, rehash=True, dedup=True, input_key="text"
    )
    blocks = [
        {"view_id": "choice" if has_options else "classification"},
        {"view_id": "text_input", "field_rows": 3, "field_label": "Explain your decision"}
    ]
    if has_options:
        stream = add_label_options(stream, label)
    else:
        stream = add_labels_to_stream(stream, label)
        if exclusive:
            # Use the dataset to decide what's left to annotate
            db = connect()
            if dataset in db:
                stream = filter_accepted_inputs(db.get_dataset(dataset), stream)

    return {
        #"view_id": "choice" if has_options else "classification",
        "view_id": "blocks", 
        "dataset": dataset,
        "stream": stream,
        "exclude": exclude,
        "config": {
            "labels": labels,
            "choice_style": "single" if exclusive else "multiple",
            "choice_auto_accept": exclusive,
            "exclude_by": "input" if has_options else "task",
            "auto_count_stream": True,
            "blocks": blocks,
        },
    }

ines · February 13, 2022, 9:48am

Glad you got it working!

Yes, the versions of the recipes in the prodigy_recipes repo are slightly modified and simplified so they work better as templates to start from and modify, and contain less "magic" than the built-in recipes, which need to deal with all kinds of input etc.

klopez · February 15, 2022, 3:33pm

Thank again!

Topic		Replies	Views
Can't get labels to be shown. docs , usage , textcat , done , solved	6	1361	May 28, 2020
Textcat correct recipe usage , textcat , solved	1	630	September 16, 2020
custom recipe not working in 1.9.8 usage , solved	2	478	March 18, 2020
Custom spacy pipe for Prodigy view textcat , spacy	2	670	November 21, 2019
SpanCat and TextCat textcat , custom , spancat	1	28	September 17, 2024

Adding a text box to a recipe

Related topics