Love this reply. Thank you for the detail. Was enough for me to get what I need working.
I thought writing up a custom recipe was going to be super fiddly / hard, but with the example overlap it wasn't actually too bad
I'll probably write up the things I learnt as a blog post, if only for futureme to remember what I did.
The only thing I haven't been able to get working is the exclude_by
parameter in the config
. I set it to input
instead of task
. You can see in the following code that I've temporarily disabled the get_random_stream
functionality in order to debug. I continue to get the same images that I've already annotated. And this despite (when I export the annotations to check) the hashes being identical for both input and task.
Any idea what I'm doing wrong there?
import prodigy
import random
from prodigy.components.loaders import Images
LABEL = "some_label"
def before_db(examples):
for eg in examples:
if eg["image"].startswith("data:") and "path" in eg:
eg["image"] = eg["path"]
return examples
@prodigy.recipe("classify-images")
def classify_images(dataset, source):
# def get_random_stream():
# stream = Images(source)
# for eg in stream:
# if random.random() < 0.05: # or whatever
# yield eg
def get_stream():
# stream = get_random_stream()
stream = Images(source)
for eg in stream:
eg["label"] = LABEL
yield eg
return {
"dataset": dataset,
"stream": get_stream(),
"view_id": "classification",
"before_db": before_db,
"config": {
"choice_style": "single",
"exclude_by": "input"
}
}