Callback and Update

image
How to obtain this in prodigy ?
When i use this command
python3 -m prodigy progress demo_resume
It shows
image
How to obtained as in the first sample as mentioned in this video: Using the Update callback - Prodigy Shorts - YouTube

hi @kushalrsharma!

Please watch the video where he describes how to use that code.

@koaning does an excellent job that for almost all of his videos, he posts accompanying code in the YouTube description.

Also be sure to check out the prodigy-recipes repo, especially the tutorials folder. That's where several of his other video's code is available.

i use the same recipe for my ner by changing the view_id but i got the above later
mentioned images

You ran the built-in Prodigy progress recipe, not Vincent's custom recipe.

Since Vincent's recipe is a custom recipe, you have to point the -F option to the Python file containing your recipe. The confusion happened simply because both Vincent's custom recipe and this build in recipe have the same name progress.

A few suggestions as you're continuing to post on this forum.

First, I would highly recommend spending time reading through our Prodigy documentation. In this case, please be sure to read through our custom recipes. After pointing to the custom script, I suspect you may get tripped up on other aspects of this recipe (e.g., you'll need to change the stream from Vincent's recipe to point to the data you want to annotate). Check out our docs on File Loaders to find our different functions for creating streams. You may also find the prodigy-recipes repo to be helpful (e.g., combine the ner_manual script with Vincent's for a ner-update combined recipe).

Second, please provide reproducible examples when posting. Typically, this would include a dummy data set, a minimal code snipped (e.g., using Markdown), and any accompanying details (e.g., snippets of prodigy.json if modified). By doing so, we'll be able to help you get faster responses as we can quickly scan and identify the problem.

I understand Prodigy can take time to learn (we have a lot of functionality!); however, as you're likely already seeing, there's lots of resources in our documentation, support, videos, and repos (e.g., prodigy-recipes) we've provided to help you progress quickly up the learning curve.

1 Like

Thank you so much @ryanwesslen . Loved to be in this community. Prodigy has changed my way of diving into the NLP problems. And the constant support and guidance you guyz provided are the awesome. Thank You So much.

Error:

Open the app in your browser and start annotating!

✘ Invalid task format for view ID 'ner'

spans   field required

{'text': 'SUMMARY', '_input_hash': -661986175, '_task_hash': -1889599281}
ERROR:    Traceback (most recent call last):
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/components/validate.py", line 80, in validate
    Schema(**obj)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/pydantic/main.py", line 406, in __init__
    raise validation_error
pydantic.error_wrappers.ValidationError: 1 validation error for SpansTask
spans
  field required (type=value_error.missing)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 633, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 600, in run_forever
    self._run_once()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1896, in _run_once
    handle._run()
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 34, in coro
    await self.app(scope, request.receive, send_stream.send)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/routing.py", line 656, in __call__
    await route.handle(scope, receive, send)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/routing.py", line 259, in handle
    await self.app(scope, receive, send)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/routing.py", line 61, in app
    response = await func(request)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 227, in app
    raw_response = await run_endpoint_function(
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 162, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/app.py", line 440, in get_session_questions
    return _shared_get_questions(req.session_id, excludes=req.excludes)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/app.py", line 405, in _shared_get_questions
    tasks = controller.get_questions(session_id=session_id, excludes=excludes)
  File "cython_src/prodigy/core.pyx", line 256, in prodigy.core.Controller.get_questions
  File "cython_src/prodigy/core.pyx", line 257, in prodigy.core.Controller.get_questions
  File "cython_src/prodigy/components/feeds.pyx", line 379, in prodigy.components.feeds.Feed.get_batch
  File "cython_src/prodigy/components/feeds.pyx", line 330, in prodigy.components.feeds.Feed._enqueue_tasks
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/components/validate.py", line 135, in check
    validate(self.Schema, obj, error_msg=self.error_msg)
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/components/validate.py", line 92, in validate
    sys.exit(1)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/routing.py", line 624, in lifespan
    await receive()
  File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/uvicorn/lifespan/on.py", line 135, in receive
    return await self.receive_queue.get()
  File "/usr/lib/python3.10/asyncio/queues.py", line 159, in get
    await getter
asyncio.exceptions.CancelledError

While using

prodigy progress demo_final_data -F recipe.py 

There is no change in prodigy.json file.
This is the recipe.py

import time
from typing import List

from rich import box
from rich.table import Table
from rich.console import Console

import prodigy
from prodigy.components.loaders import JSONL


class ProgressTable:
    def __init__(self):
        self.start_time = time.time()
        self.n_examples = {
            "n_accept": 0,
            "n_reject": 0,
            "n_skip": 0,
        }
        self.console = Console()

    def make_table(self):
        """Generates a pretty Rich table from the results."""
        seconds_sofar = time.time() - self.start_time
        minutes = seconds_sofar / 60
        total_counts = sum(self.n_examples.values())
        time_mark = f"{int(seconds_sofar // 60)}m{int(seconds_sofar % 60)}s"
        table = Table(title=f"Summary at {time_mark}", box=box.SIMPLE)

        table.add_column("Answer", style="magenta", footer="Total")
        table.add_column(
            "Count", justify="right", style="cyan", footer=str(total_counts)
        )
        table.add_column(
            "Annot per Hour",
            justify="right",
            style="green",
            footer=str(int(total_counts / minutes) * 60),
        )

        for key, value in self.n_examples.items():
            table.add_row(key, str(value), str(int(value / seconds_sofar * 60 * 60)))
        table.show_footer = True
        return table

    def update(self, examples: List[dict]):
        self.n_examples["n_accept"] += len([e for e in examples if e["answer"] == "accept"])
        self.n_examples["n_reject"] += len([e for e in examples if e["answer"] == "reject"])
        self.n_examples["n_skip"] += len([e for e in examples if e["answer"] == "ignore"])
        table = self.make_table()
        self.console.print(table)


@prodigy.recipe(
    "progress",
    dataset=("Dataset to save answers to", "positional", None, str),
)
def progress(dataset: str):
    # Load your own streams from anywhere you want
    stream = JSONL("/home/kushal/Documents/spacyprodigy/outputfiles/resume925.jsonl")
    ptable = ProgressTable()

    return {
        "dataset": dataset,
        "view_id": "ner",
        "stream": stream,
        "update": ptable.update,
    }

where
stream = JSONL("/home/kushal/Documents/spacyprodigy/outputfiles/resume925.jsonl")is the location of my jsonl files that i want to annotate and keep the track of.

Be sure to read the error messages.

This says that you're using the ner interface where it requires your data to include spans field like:

{
  "text": "Apple updates its analytics service with new metrics",
  "spans": [{ "start": 0, "end": 5, "label": "ORG" }]
}

I think the problem is you're using JSONL which doesn't add in the tokens. Try to mimic the normal ner.manual recipe.

Therefore, instead of doing this:

stream = JSONL(...)

Do this:

# add to top
import spacy
from prodigy.components.preprocess import add_tokens

...
# be sure to add in spacy_model and source
def progress(dataset: str, spacy_model: str, source: str):
  stream = JSONL(source)
  nlp = spacy.load(spacy_model)
  stream = add_tokens(nlp, stream) # this adds in tokens, which are needed for `ner`

You'll need to modify your arguments by adding your file_path as an input called source and a spacy_model for tokenization like this.

I think you should think carefully about what you're trying to accomplish. For example, the code currently counts annotations by how the textcat is set, i.e., looking at the answer key:

        self.n_examples["n_accept"] += len([e for e in examples if e["answer"] == "accept"])
        self.n_examples["n_reject"] += len([e for e in examples if e["answer"] == "reject"])
        self.n_examples["n_skip"] += len([e for e in examples if e["answer"] == "ignore"])

However, as I mentioned above, annotated spans (like entities) is saved in the spans key:

{
  "text": "Apple updates its analytics service with new metrics",
  "spans": [{ "start": 0, "end": 5, "label": "ORG" }]
}

Instead it would look more like this:

        self.n_examples["ORG"] += len([e for e in examples if e["spans"]["label"] == "ORG"])