How to obtain this in prodigy ?
When i use this command
python3 -m prodigy progress demo_resume
It shows
How to obtained as in the first sample as mentioned in this video: Using the Update callback - Prodigy Shorts - YouTube
hi @kushalrsharma!
Please watch the video where he describes how to use that code.
@koaning does an excellent job that for almost all of his videos, he posts accompanying code in the YouTube description.
Also be sure to check out the prodigy-recipes
repo, especially the tutorials
folder. That's where several of his other video's code is available.
i use the same recipe for my ner by changing the view_id but i got the above later
mentioned images
You ran the built-in Prodigy progress
recipe, not Vincent's custom recipe.
Since Vincent's recipe is a custom recipe, you have to point the -F
option to the Python file containing your recipe. The confusion happened simply because both Vincent's custom recipe and this build in recipe have the same name progress
.
A few suggestions as you're continuing to post on this forum.
First, I would highly recommend spending time reading through our Prodigy documentation. In this case, please be sure to read through our custom recipes. After pointing to the custom script, I suspect you may get tripped up on other aspects of this recipe (e.g., you'll need to change the stream from Vincent's recipe to point to the data you want to annotate). Check out our docs on File Loaders to find our different functions for creating streams. You may also find the prodigy-recipes repo to be helpful (e.g., combine the ner_manual
script with Vincent's for a ner-update
combined recipe).
Second, please provide reproducible examples when posting. Typically, this would include a dummy data set, a minimal code snipped (e.g., using Markdown), and any accompanying details (e.g., snippets of prodigy.json
if modified). By doing so, we'll be able to help you get faster responses as we can quickly scan and identify the problem.
I understand Prodigy can take time to learn (we have a lot of functionality!); however, as you're likely already seeing, there's lots of resources in our documentation, support, videos, and repos (e.g., prodigy-recipes
) we've provided to help you progress quickly up the learning curve.
Thank you so much @ryanwesslen . Loved to be in this community. Prodigy has changed my way of diving into the NLP problems. And the constant support and guidance you guyz provided are the awesome. Thank You So much.
Error:
Open the app in your browser and start annotating!
✘ Invalid task format for view ID 'ner'
spans field required
{'text': 'SUMMARY', '_input_hash': -661986175, '_task_hash': -1889599281}
ERROR: Traceback (most recent call last):
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/components/validate.py", line 80, in validate
Schema(**obj)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/pydantic/main.py", line 406, in __init__
raise validation_error
pydantic.error_wrappers.ValidationError: 1 validation error for SpansTask
spans
field required (type=value_error.missing)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 633, in run_until_complete
self.run_forever()
File "/usr/lib/python3.10/asyncio/base_events.py", line 600, in run_forever
self._run_once()
File "/usr/lib/python3.10/asyncio/base_events.py", line 1896, in _run_once
handle._run()
File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/middleware/base.py", line 34, in coro
await self.app(scope, request.receive, send_stream.send)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/exceptions.py", line 71, in __call__
await self.app(scope, receive, sender)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/routing.py", line 656, in __call__
await route.handle(scope, receive, send)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/routing.py", line 259, in handle
await self.app(scope, receive, send)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/routing.py", line 61, in app
response = await func(request)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 227, in app
raw_response = await run_endpoint_function(
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 162, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/app.py", line 440, in get_session_questions
return _shared_get_questions(req.session_id, excludes=req.excludes)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/app.py", line 405, in _shared_get_questions
tasks = controller.get_questions(session_id=session_id, excludes=excludes)
File "cython_src/prodigy/core.pyx", line 256, in prodigy.core.Controller.get_questions
File "cython_src/prodigy/core.pyx", line 257, in prodigy.core.Controller.get_questions
File "cython_src/prodigy/components/feeds.pyx", line 379, in prodigy.components.feeds.Feed.get_batch
File "cython_src/prodigy/components/feeds.pyx", line 330, in prodigy.components.feeds.Feed._enqueue_tasks
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/components/validate.py", line 135, in check
validate(self.Schema, obj, error_msg=self.error_msg)
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/prodigy/components/validate.py", line 92, in validate
sys.exit(1)
SystemExit: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/starlette/routing.py", line 624, in lifespan
await receive()
File "/home/kushal/Documents/spacyprodigy/.venv/lib/python3.10/site-packages/uvicorn/lifespan/on.py", line 135, in receive
return await self.receive_queue.get()
File "/usr/lib/python3.10/asyncio/queues.py", line 159, in get
await getter
asyncio.exceptions.CancelledError
While using
prodigy progress demo_final_data -F recipe.py
There is no change in prodigy.json file.
This is the recipe.py
import time
from typing import List
from rich import box
from rich.table import Table
from rich.console import Console
import prodigy
from prodigy.components.loaders import JSONL
class ProgressTable:
def __init__(self):
self.start_time = time.time()
self.n_examples = {
"n_accept": 0,
"n_reject": 0,
"n_skip": 0,
}
self.console = Console()
def make_table(self):
"""Generates a pretty Rich table from the results."""
seconds_sofar = time.time() - self.start_time
minutes = seconds_sofar / 60
total_counts = sum(self.n_examples.values())
time_mark = f"{int(seconds_sofar // 60)}m{int(seconds_sofar % 60)}s"
table = Table(title=f"Summary at {time_mark}", box=box.SIMPLE)
table.add_column("Answer", style="magenta", footer="Total")
table.add_column(
"Count", justify="right", style="cyan", footer=str(total_counts)
)
table.add_column(
"Annot per Hour",
justify="right",
style="green",
footer=str(int(total_counts / minutes) * 60),
)
for key, value in self.n_examples.items():
table.add_row(key, str(value), str(int(value / seconds_sofar * 60 * 60)))
table.show_footer = True
return table
def update(self, examples: List[dict]):
self.n_examples["n_accept"] += len([e for e in examples if e["answer"] == "accept"])
self.n_examples["n_reject"] += len([e for e in examples if e["answer"] == "reject"])
self.n_examples["n_skip"] += len([e for e in examples if e["answer"] == "ignore"])
table = self.make_table()
self.console.print(table)
@prodigy.recipe(
"progress",
dataset=("Dataset to save answers to", "positional", None, str),
)
def progress(dataset: str):
# Load your own streams from anywhere you want
stream = JSONL("/home/kushal/Documents/spacyprodigy/outputfiles/resume925.jsonl")
ptable = ProgressTable()
return {
"dataset": dataset,
"view_id": "ner",
"stream": stream,
"update": ptable.update,
}
where
stream = JSONL("/home/kushal/Documents/spacyprodigy/outputfiles/resume925.jsonl")
is the location of my jsonl files that i want to annotate and keep the track of.
Be sure to read the error messages.
This says that you're using the ner
interface where it requires your data to include spans
field like:
{
"text": "Apple updates its analytics service with new metrics",
"spans": [{ "start": 0, "end": 5, "label": "ORG" }]
}
I think the problem is you're using JSONL
which doesn't add in the tokens. Try to mimic the normal ner.manual
recipe.
Therefore, instead of doing this:
stream = JSONL(...)
Do this:
# add to top
import spacy
from prodigy.components.preprocess import add_tokens
...
# be sure to add in spacy_model and source
def progress(dataset: str, spacy_model: str, source: str):
stream = JSONL(source)
nlp = spacy.load(spacy_model)
stream = add_tokens(nlp, stream) # this adds in tokens, which are needed for `ner`
You'll need to modify your arguments by adding your file_path
as an input called source
and a spacy_model
for tokenization like this.
I think you should think carefully about what you're trying to accomplish. For example, the code currently counts annotations by how the textcat
is set, i.e., looking at the answer
key:
self.n_examples["n_accept"] += len([e for e in examples if e["answer"] == "accept"])
self.n_examples["n_reject"] += len([e for e in examples if e["answer"] == "reject"])
self.n_examples["n_skip"] += len([e for e in examples if e["answer"] == "ignore"])
However, as I mentioned above, annotated spans (like entities) is saved in the spans
key:
{
"text": "Apple updates its analytics service with new metrics",
"spans": [{ "start": 0, "end": 5, "label": "ORG" }]
}
Instead it would look more like this:
self.n_examples["ORG"] += len([e for e in examples if e["spans"]["label"] == "ORG"])