Happy to hear it!
The errors in this thread were solved, but I have a new one that is preventing us from upgrading to v1.14.6
.
We are getting an error msg that states it cannot find any of our custom recipes:
✘ Can't find recipe or command 'ner_task_v3_b_validation'.
Run prodigy --help for more details. If you're using a custom recipe, provide
the path to the Python file using the -F argument.
Available recipes: ab.llm.tournament, ab.openai.prompts, ab.openai.tournament,
abandonment, account-matching, agent-speedbump, audio.manual, audio.transcribe,
compare, coref.manual, custom-intent-calibration, custom-labels,
custom-labels-prompt-multi-intent, data-to-spacy, db-in, db-merge, db-out,
dep.correct, dep.teach, drop, eup-corpus-validation,
eup-corpus-validation-with-options, filter-by-patterns, helpful-banking-moments,
helpful-banking-moments-repeat-intents, image.manual, login-failure, mark,
match, metric.iaa.binary, metric.iaa.doc, metric.iaa.span, ner-spans,
ner.correct, ner.eval-ab, ner.llm.correct, ner.llm.fetch, ner.manual,
ner.model-annotate, ner.openai.correct, ner.openai.fetch, ner.silver-to-gold,
ner.teach, pos.correct, pos.teach, print-dataset, print-stream, progress,
rel.manual, review, sent.correct, sent.teach, spacy-config, spans.correct,
spans.llm.correct, spans.llm.fetch, spans.manual, spans.model-annotate, stats,
stt-error-validation, stt-spans, terms.llm.fetch, terms.openai.fetch,
terms.teach, terms.to-patterns, textcat.correct, textcat.llm.correct,
textcat.llm.fetch, textcat.manual, textcat.model-annotate,
The error only appears when we run the prodigy server programmatically, i.e., using the prodigy.serve()
method as explained here (https://prodi.gy/docs/api-components#serve), and that it worked just fine until we upgraded to v1.14.6
.
And this error appears for all our custom recipes, not just the one in the error message above.
We tried adding the recipe path to the serve()
command (with -F
, as I do when I invoke prodigy directly) but that did not solve the error.
Were there any changes in this update that would have caused this?
The only changes in v1.14.6 should be the Pydantic/spaCy version bumps. But I can't imagine that would cause this issue.
Looking at your output though ... I can't help but notice stt-spans
and stt-error-validation
in the list of known recipes. So it is able to detect some of your custom recipes it seems.
I also just downloaded Prodigy v1.14.6 to try and reproduce your error. I have this custom recipe:
import prodigy
@prodigy.recipe(
"my-custom-recipe",
dataset=("Dataset to save answers to", "positional", None, str),
view_id=("Annotation interface", "option", "v", str)
)
def my_custom_recipe(dataset, view_id="text"):
# Load your own streams from anywhere you want
stream = [{"text": f"omg {i}"} for i in range(1000)]
def update(examples):
# This function is triggered when Prodigy receives annotations
print(f"Received {len(examples)} annotations!")
return {
"dataset": dataset,
"view_id": view_id,
"stream": stream,
"update": update
}
I'm able to confirm that this runs fine:
python -m prodigy my-custom-recipe xxx --view-id text -F recipe.py
When I move it into a Python script, like so:
import prodigy
prodigy.serve("prodigy my-custom-recipe xxx --view-id text -F recipe.py", port=9000)
Then I seem to hit the same issue.
✘ Can't find recipe or command 'my-custom-recipe'.
Run prodigy --help for more details. If you're using a custom recipe, provide
the path to the Python file using the -F argument.
Available recipes: ab.llm.tournament, ab.openai.prompts, ab.openai.tournament,
audio.manual, audio.transcribe, compare, coref.manual, data-to-spacy, db-in,
db-merge, db-out, dep.correct, dep.teach, drop, filter-by-patterns,
image.manual, mark, match, metric.iaa.binary, metric.iaa.doc, metric.iaa.span,
ner.correct, ner.eval-ab, ner.llm.correct, ner.llm.fetch, ner.manual,
ner.model-annotate, ner.openai.correct, ner.openai.fetch, ner.silver-to-gold,
ner.teach, pos.correct, pos.teach, print-dataset, print-stream, progress,
rel.manual, review, sent.correct, sent.teach, spacy-config, spans.correct,
spans.llm.correct, spans.llm.fetch, spans.manual, spans.model-annotate, stats,
terms.llm.fetch, terms.openai.fetch, terms.teach, terms.to-patterns,
textcat.correct, textcat.llm.correct, textcat.llm.fetch, textcat.manual,
textcat.model-annotate, textcat.openai.correct, textcat.openai.fetch,
textcat.teach, train, train-curve
When I revert to version v1.14.1 however, I seem to get the same error.
> python -m pip install prodigy==1.14.1 -f https://<lincense-key>@download.prodi.gy
> python serve.py
Just to check, what version of Prodigy does work for you here? I'm definately eager to dive into this, but it would help to know when this feature might've broken.
Ah! I think I've spotted the issue. Not 100% sure, but this might be it.
This was my serve.py
file originally.
import prodigy
prodigy.serve("prodigy my-custom-recipe xxx --view-id text -F recipe.py", port=9000)
The reason why you pass -F recipe.py
locally is because that file needs to run in order for the prodigy.recipe
decorator to register the recipe. But we can also achieve that by just importing within the Python script.
import prodigy
import recipe
prodigy.serve("prodigy my-custom-recipe xxx --view-id text", port=9000)
When I run import recipe
, the entire script runs as a side-effect, which also registers the recipe. From there prodigy.serve
is able to run it.
Might this explain what is happening on your end?
Here is our server.py
, which seems similar to your serve.py
. I tried adding from prodigy import recipes
, and from prodigy.core import recipe
, but this didn't resolve the error. I'm not sure why two recipes are "working" with the latest update, and the rest are not -- but here's our code in case this helps troubleshoot.
import logging
import multiprocessing as mp
import os
import time
from datetime import datetime
from threading import Lock
from typing import Dict, List
import prodigy
from google.cloud import storage
# The import is used implicitly by the "run_server" method below
from annotator import GCS_FILES_FOLDER, GCS_ROOT, recipes # noqa
from annotator.utils import timestamp
logger = logging.getLogger("hypercorn.access")
gcs_client = storage.Client(project="xxxx")
class TaskDefinition(dict):
"""
An annotation task consists of a named recipe and a dataset
"""
_required_keys = ("recipe", "dataset")
_optional_keys = (
"filepath", "input_sets", "spacy_model", "labels", "label_field",
"choice_field", "view_id"
)
def __init__(self, **kwargs):
"""
Make sure that all the required key/values are present and that all the
keys are known (either required or optional)
"""
for key in self._required_keys:
assert key in kwargs
for key in list(kwargs):
assert key in self._required_keys + self._optional_keys, \
f"{key} is not a recognized attribute"
self.update({k: v for k, v in kwargs.items()})
def convert_to_args(self) -> str:
"""
Convert the task's attributes to a string of arguments that can be
passed to the prodigy serve command
:return: a string with the command-line arguments
"""
# The filepath attribute is the GCS location, which consists of two
# parts: the first two characters identify the subfolder, the remaining
# 30 are the filename. We discard the subfolder and prepend "data" to
# the remainder to obtain the target path (i.e., where the GCS download
# will store the file), then use it (rather than the original filepath)
# as the source specification (third argument) for the prodigy command
command_args = [self["recipe"], self["dataset"]]
if "filepath" in self:
bucket = gcs_client.bucket(GCS_ROOT)
source_path = os.path.join(
GCS_FILES_FOLDER, self["filepath"][:2], self['filepath'][2:]
)
target_path = f"data/{self['filepath'][2:]}.jsonl"
blob = bucket.blob(source_path)
blob.download_to_filename(target_path)
logger.info(f"DOWNLOADED [GCS] '{source_path}' to '{target_path}'")
command_args.append(target_path)
if "spacy_model" in self:
command_args.insert(2, self["spacy_model"])
if "input_sets" in self:
command_args.append(f"{','.join(self['input_sets'])}")
if "labels" in self:
command_args.append(f"--label {','.join(self['labels'])}")
if "label_field" in self:
command_args.append(f"-l {self['label_field']}")
if "choice_field" in self:
command_args.append(f"-c {self['choice_field']}")
if "view_id" in self:
command_args.append(f"--view-id {self['view_id']}")
logger.info(f"CREATING task with '{command_args}")
return " ".join(command_args)
class ProdigyServer:
"""
Stores information about the Prodigy web server (such as the port number)
and manages the start and termination of the actual subprocess.
"""
@classmethod
def set_url_prefix(cls, prefix: str) -> None:
"""
Set the URL prefix to be used by all servers
:param prefix: a string, obtained from the config file, and dependent
on the environment where the app is running
"""
cls.prefix = prefix
def __init__(self, port_num: int):
"""
Create a new server with the given port number
"""
self.port_num = port_num
self._proc = None
self.start_time = None
@property
def url(self):
return self.prefix + str(self.port_num)
def is_available(self):
return self._proc is None or not self._proc.is_alive()
def is_running(self):
return self._proc.is_alive()
def start(
self, taskdef: TaskDefinition, start_time: datetime = None,
wait_time: int = 10
):
"""
Start the server with the attributes from the task definition and give
it a bit of time to settle down
:param taskdef: the TaskDefinition (with the recipe name, filepath, and
other attributes required for the server command)
:param start_time: date and time the task first started; will not be
None if the task was recreated from the active_tasks table at startup
:param wait_time: the number of seconds to wait to give the server a
chance to initialize properly (default: 10)
"""
# Start the Prodigy webserver and give it 10 seconds before returning
# (the .is_alive() method returns True immediately, so we cannot wait
# for that)
command_args = taskdef.convert_to_args()
self._proc = mp.Process(
target=run_server, args=(command_args, self.port_num,)
)
self._proc.daemon = False
self._proc.start()
time.sleep(wait_time)
self.start_time = start_time or timestamp()
def terminate(self):
"""
If the server process is currently running, terminate it
"""
if self._proc.is_alive():
self._proc.terminate()
while self._proc.is_alive():
time.sleep(.1)
self._proc = None
self.start_time = None
class AnnotationTask:
"""
Stores the task definition and manages the multiprocessing.Process for an
annotation task. Also keeps a list of annotators working on the task.
"""
# A finite list of available port numbers; the Helm chart assign explicit
# addresses to each of them, so they cannot be random
_reserved_ports: List[int] = [port_num for port_num in range(9091, 9101)]
# For use as a context manager so only one thread can obtain/return a port
# number at any one time
_lock = Lock()
@classmethod
def _claim_port(cls, port_num: int = None) -> int:
"""
Claim a port by number, or get the next available one; raises an error
if we've run out of port numbers
:param port_num: the port number to assign to the server; if not given,
select the next available one from the list of reserved port numbers
:return: an integer value between 9091 and 9100 (inclusive)
:raises: ValueError if no port numbers are available
"""
with cls._lock:
if cls._reserved_ports:
if port_num is not None:
cls._reserved_ports.remove(port_num)
else:
port_num = cls._reserved_ports.pop(0)
return port_num
raise ValueError("All reserved port numbers are taken")
@classmethod
def available_ports(cls) -> List[int]:
return cls._reserved_ports
def __init__(self, taskdef: TaskDefinition, port_num: int = None):
"""
Start a new Prodigy server with the task described in the definition
:param taskdef: an object with all the attributes needed to start the
process
:param port_num: the port number for the task; if not specified, select
the next available one
"""
port_num = AnnotationTask._claim_port(port_num=port_num)
self._task_def = taskdef
self._annotators = set()
self._server = ProdigyServer(port_num)
def start(self, start_time: datetime = None, wait_time: int = 10):
"""
:param start_time: date and time the task first started; will not be
None if the task was recreated from the active_tasks table at startup
:param wait_time: the number of seconds to wait to give the server a
chance to initialize properly (default: 10)
"""
self._server.start(
self._task_def, start_time=start_time, wait_time=wait_time
)
@property
def url(self) -> str:
"""
Return the externally accessible address for the server handling the
current task
:return: a URL (string)
"""
return self._server.url
def is_running(self):
return self._server.is_running()
def add_annotator(self, annotator_name: str) -> None:
self._annotators.add(annotator_name)
def terminate(self):
"""
Terminate the Prodigy server process; as a side effect, the port number
assigned to the server is now available for another task
"""
with self._lock:
self._reserved_ports.append(self._server.port_num)
self._server.terminate()
def summary(self) -> Dict:
"""
Return useful information about the task
"""
return {
"task": self._task_def,
"url": self._server.url,
"annotators": list(self._annotators),
"started_at": self._server.start_time
}
def run_server(command_args: str, port: int):
prodigy.serve(command_args, port=port, host="0.0.0.0")
Ah! The reason that I import recipe
is because this file is called recipe.py
.
import prodigy
@prodigy.recipe(
"my-custom-recipe",
dataset=("Dataset to save answers to", "positional", None, str),
view_id=("Annotation interface", "option", "v", str)
)
def my_custom_recipe(dataset, view_id="text"):
# Load your own streams from anywhere you want
stream = [{"text": f"omg {i}"} for i in range(1000)]
def update(examples):
# This function is triggered when Prodigy receives annotations
print(f"Received {len(examples)} annotations!")
return {
"dataset": dataset,
"view_id": view_id,
"stream": stream,
"update": update
}
If your file with your custom recipes is called dinosaurhead.py
then you should be able to have this important statement in your server.py
file.
import dinosaurhead
If you import using the file name, does that help?
We tried this latest approach, and it didn't work for us. So, we're still on v1.13.3
because it is stable for us, and quite far behind the current version.
We re-installed v1.13.3
(which is what's currently working on staging) and commented out the , recipes
part from this line in server.py
:
from annotator import GCS_FILES_FOLDER, GCS_ROOT, recipes
to
from annotator import GCS_FILES_FOLDER, GCS_ROOT
That produces the error ✘ Can't find recipe 'account-matching'
and no list of available recipes. This seems to confirm that it is the global import from recipes
that loads the available recipes. And in the error message that initially prompted our investigation, all our custom recipes are listed as Available recipes:
. So the question remains: why did this approach work prior to version v.1.14.*
? To which we now add the following one: why can v1.14.6
not load a recipe that is listed as available?