Is it possible to have a recipe receive multiple streams?

SanVijey · April 7, 2025, 1:49pm

We are currently creating a relations recipe, which consists of two blocks. We want the relations block to only consist of the numerical tokens/spans of the text (as those tokens are what we are creating relationships between) and we also want to have a ner block, consisting of the full, labelled text of the current example, so we have the full context of the current example and can create better relationships. The way we tried to achieve this was to have a full stream (consisting of the full task essentially) and a filtered stream (consisting of the numerical tokens), and we are trying to pass the full stream to the ner block, and the filtered stream to the relations block. We have been trying to do this, but have failed multiple times. Anyone know how we can achieve this? We are open to both a solution to having multiple streams, or if anyone has any better ideas as to how to have these two blocks containing different stuff.

magdaaniol · April 8, 2025, 4:53pm

Welcome to the forum @SanVijey!

The front-end is expecting a single stream of tasks to render, so for most UIs, including blocks, it's not possible to have it receive multiple streams (as for the most cases it shouldn't be necessary). The exception is pages, where you could define each component UI completely independently. I provide a pages based solution below, but first I'd like to point out some simpler options.

I understand that you're trying to limit which tokens can be selected for relation annotations while preserving the entire sentence for context. Not sure if you've seen it, but therelations.manual recipe lets you define the "disable" patterns for tokens. This lets you define patterns for tokens that should be unselectable in the UI while still remaining visible.

For example, with a pattern that disables anything that is not a number, you'd get a UI like this:

As you can see only the numbers are selectable, while the rest of the tokens is present but grayed out.

The pattern used in this example is:

{"label": "noNum","pattern": [{"LIKE_NUM": false}]}

You can also use entity labels as well as other spaCy token properties. See here for more details on pattern options.

One problem with this approach in your case could be that you also want to preserve the NER labels of the disabled tokens if any.

You could combine ner with relations in blocks, but since both UIs share the underlying token representations, the disable patterns would apply in both UIs.

If you want to implement a UI that combines ner UI with relations UI and keep them independent, you'd need to use pages rather than blocks.

In a way, pages could support multiple streams in that you'd feed different data to your pages creating function.

Here's an example of how that could look like.
In the recipe below I programmatically create a task for ner and a task for relations while keeping them completely independent:

import copy
from pathlib import Path
from typing import Any, Dict, List

import prodigy
import spacy
from prodigy.components.preprocess import add_tokens
from prodigy.components.stream import get_stream
from prodigy.core import Arg
from prodigy.recipes.rel import preprocess_stream, setup_matchers
from prodigy.types import StreamType
from prodigy.util import set_hashes

REL_LABELS = ["REL_LABEL"]
NER_LABELS = ["PERSON", "ORG"]


def create_ner_page(
    text: str, tokens: List[str], spans: List[str], labels: List[str]
) -> Dict:
    """Create a ner page configuration."""
    # make sure all tokens are visible
    visible_tokens = []
    for token in tokens:
        token_copy = copy.deepcopy(token)
        if token_copy.get("disabled"):
            del token_copy["disabled"]
        visible_tokens.append(token_copy)
    return set_hashes(
        {
            "text": text,
            "view_id": "ner",
            "tokens": visible_tokens,
            "spans": spans,
            "config": {"labels": labels},
        }
    )


def create_relations_page(text: str, tokens: List[Dict], labels: List[str]) -> Dict:
    """Create a relations page configuration."""
    return set_hashes(
        {
            "text": text,
            "view_id": "relations",
            "tokens": tokens,
            "config": {"labels": labels, "wrap_relations": True},
        }
    )


def create_pages(example: Dict[str, Any]) -> Dict[str, Any]:
    """Create all pages for a given example."""
    pages = [
        create_ner_page(
            text=example["text"],
            tokens=example.get("tokens", []),
            spans=example.get("spans", []),
            labels=["PERSON", "ORG"],
        ),
        create_relations_page(
            text=example["text"], tokens=example.get("tokens", []), labels=["REL_LABEL"]
        ),
    ]
    return set_hashes({"pages": pages})


def add_pages(stream: StreamType) -> StreamType:
    """Process the input stream and generate pages."""
    for example in stream:
        paginated_example = create_pages(example)
        yield set_hashes(paginated_example)


@prodigy.recipe(
    "test-recipe",
    dataset=Arg(help="Dataset to save answers to."),
    source=Arg(help="Input source"),
    disable_patterns_path=Arg(help="Disable patterns path"),
)
def test_recipe(
    dataset: str, source: str, disable_patterns_path: Path
) -> Dict[str, Any]:
    """
    Process text files and create a multi-page annotation interface.

    Args:
        dataset: Name of the dataset to save annotations
        source: Input source
        disable_patterns_path: Path to the file containing disable patterns

    Returns:
        Dictionary containing recipe configuration
    """
    stream = get_stream(source)
    nlp = spacy.blank("en")
    disable_matcher, disable_patterns = setup_matchers(nlp, disable_patterns_path)

    # Process stream
    stream.apply(add_tokens, stream=stream, nlp=nlp)
    # Apply matcher rules to the stream
    stream.apply(
        preprocess_stream,
        stream=stream,
        nlp=nlp,
        matcher=None,
        disable_matcher=disable_matcher,
        span_label=["PERSON", "ORG"],
        add_nps=False,
        add_ents=False,
    )
    stream = add_pages(stream=stream)

    return {
        "dataset": dataset,
        "view_id": "pages",
        "stream": stream,
        "config": {
            "custom_theme": {"cardMaxWidth": "90%"},
        },
    }

I reuse the disabling function from the relations recipe that applies spaCy matcher rules to set the disabled attribute on the tokens. This is undone by the function that created the ner page to make sure all tokens are visible in the ner UI.

The resulting UI looks like this:
pages_rel

Let me know if you need any clarification or have questions about implementing either approach!

Topic		Replies	Views
Custom Recipe Using Different Tokens for Each NER Block ner , custom , front-end	2	387	July 21, 2021
Rendering text in rels.manual as text usage , ner , front-end , relations	5	686	May 5, 2021
Using relations interface for large texts usage , ner , legal , relations	4	1094	October 5, 2020
Enabling both "Assign relations" and "Select spans" in custom relations recipe usage , front-end , solved , relations	4	556	November 26, 2020
Combine NER and Relations annotation tasks with custom recipe	2	260	October 31, 2022

Is it possible to have a recipe receive multiple streams?

Related topics