How can I override Controller.get_questions

thibault · October 2, 2021, 8:28am

Hi,
In the doc it mentions that we can override the Controller.

I found in the code ( the __main__.py) :

recipe = get_recipe(command)
controller = recipe(*args, use_plac=True)

I couldn't find the implementation of the @recipe decorator.

My question is: What is the best way to override the get_questions method?

Or if it's easier: How can I create a Controller that would behave similar to the one created via the @recipe decorator?

I'd like to leverage the syntax and doc from @recipe and only override what I need (In that instance the get_questions method)

And the follow up question: Once I have a Controller created, how can I link it easily with the prodigy bin? Is calling the set_controller from app.py the correct way? Or is there a cleaner way?

Thank you for your help
Thibault

ines · October 4, 2021, 9:05am

Hi! Could you provide some more details on what exactly you want to achieve by overriding the get_questions method? I wouldn't necessarily recommend this for most use cases because it's fairly complex and you'd have to make sure you handle all possible scenarios correctly and manage the already annotated examples that should be excluded etc.

That said, a recipe can also return an instance of the Controller instead of a dictionary of components. So this would be the correct and most elegant way to do it: construct the controller in your recipe and return it.

thibault · October 10, 2021, 3:04am

Hi,
For my usecase, I'd like to implement my own logic on whether a user should get more questions or not (Basically I'd like to implement each user get exactly N questions)

So in my usecase I could easily just re-use any existing controller and using some custom logic I could decide whether I return prodigy get_questions or nothing if I know the user (using session_id) already processed N questions.

At the moment I'm thinking in overriding the __main__.py to use the controller and then monkey patch the get_questions method to add my own logic but that does sound super ugly....

Could you show me an example of recipe which create its own controller? Ideally re-using the logic behind @recipe? As I said...I'm happy with all the heavy lifting handled by prodigy. I just want to implement my own "job dispatching" (get_questions) logic

Hope that clarifies!

ines · October 11, 2021, 10:51am

Ah, so if you already have a monkey-patched solution that works, then this should be pretty straightforward to move to the custom recipe. Basically, all you have to do is construct the Controller in the recipe with the given arguments, instead of returning just a dictionary. For example:

@recipe("foo")
def some_function(dataset):
    ...
    return {"dataset": dataset, "stream": stream}  # etc.

@recipe("foo")
def some_function(dataset):
    ...
    # define whatever you need and set the rest to None
    ctrl = Controller(
        dataset, view_id, stream, update, store, progress, on_load, 
        on_exit, before_db, get_session_id, exclude, config, None)
    # monkey-patch your controller here
    return ctrl

thibault · October 12, 2021, 10:39am

Thanks for the reply.

I'm trying to test it with the ner.manual recipe for now without any changes (except I want to return the Controller instead of a dict)

This is the relevant part:

@recipe("custom_recipe",
....)
def manual(...):
    ...
    view_id = "ner_manual"
    update = None
    db = None
    progress = None
    on_load = None
    on_exit = None
    before_db = None
    get_session_id = None
    config = {
        "lang": nlp.lang,
        "labels": labels,
        "exclude_by": "input",
        "ner_manual_highlight_chars": highlight_chars,
        "auto_count_stream": True,
    }
    ctrl = Controller(
        dataset, view_id, stream, update, db, progress, on_load,
        on_exit, before_db, get_session_id, exclude, config, None)

    return ctrl

When I try to run it, I get the following error:

 prodigy custom_recipe test1 blank:en ./news_headlines_short.jsonl --label PERSON,ORG,PRODUCT,LOCATION -F ./recipe.py 
Using 4 label(s): PERSON, ORG, PRODUCT, LOCATION
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/thibanir/.virtualenvs/prodigy/lib/python3.9/site-packages/prodigy/__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 331, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/thibanir/.virtualenvs/prodigy/lib/python3.9/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/home/thibanir/.virtualenvs/prodigy/lib/python3.9/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/data/Project/hexagone/prodigy/./recipe.py", line 97, in manual
    ctrl = Controller(
  File "cython_src/prodigy/core.pyx", line 45, in prodigy.core.Controller.__init__
TypeError: __init__() takes exactly 15 positional arguments (14 given)

Could you tell me, what is the missing argument? I've checked the documentation and it only shows 13...
This is the version of prodigy I'm using: prodigy-1.11.4-cp39-cp39-linux_x86_64.whl

And as another question: For all the argument I'm passing as None, Is it using the default implementation or disabling the method? For example if get_session_id is None, does it mean I won't be able to use session ID?

Thanks
Thibault

ines · October 14, 2021, 11:06am

Ah, sorry, looks like we forgot to update that. Just pushed an update to the site – the expected signature is this:

controller = Controller(dataset, view_id, stream, update, db,
                        progress, on_load, on_exit, before_db,
                        validate_answer, get_session_id, exclude,
                        config, None)

No, the arguments here corresponds to what you would (or wouldn't) return by your recipe as a dictionary. So passing in None for update, before_db or get_session_id is the equivalent of not returning this config setting by your recipe. Prodigy will then fall back to the default.

thibault · October 15, 2021, 11:31am

Thank you for the updates! It does work like I was expecting!

Just to understand the get_questions: It's mostly calling the stream iterator batch_size time (in the case of feed_overlap=True ) and skip all the annotations already done using the exclude parameter.

It raises 2 questions:

Are the hashes for a given question the same for all sessions? (The answer seems to be yes but i wanted to confirm)
If I override the get_questions method, do I get any benefit of defining a stream + extra logic in get_questions? Or could I just override the get_questions method to retrieve my data and forget about the stream? I wonder if this streamis used in other places...As it seems it's just an iterator that can be calling a random API, I'd be tempted to say that get_questions is the only place calling it..., is that a correct assumption?

Thanks for the quick replies
Thibault

ines · October 18, 2021, 9:26am

By default, yes.

In theory, this is really the only relevant method that's called from the API, so if you override that, it should work. The only thing that might be off is the progress, so you probably want to provide a custom progress function as well.

thibault · October 24, 2021, 2:01am

Thank you for the answer.
I've been trying to remove the stream and here is what I've found so far:

If I specify stream=None, I get:

Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/thibanir/.virtualenvs/prodigy/lib/python3.9/site-packages/prodigy/__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 331, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "/home/thibanir/.virtualenvs/prodigy/lib/python3.9/site-packages/plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "/home/thibanir/.virtualenvs/prodigy/lib/python3.9/site-packages/plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "/data/Project/hexagone/prodigy/./recipe.py", line 150, in manual
    ctrl = CustomController(
  File "cython_src/prodigy/core.pyx", line 126, in prodigy.core.Controller.__init__
  File "cython_src/prodigy/components/feeds.pyx", line 168, in prodigy.components.feeds.Feed.__init__
  File "cython_src/prodigy/components/stream.pyx", line 107, in prodigy.components.stream.Stream.__init__
  File "cython_src/prodigy/components/stream.pyx", line 57, in prodigy.components.stream.validate_stream
TypeError: 'NoneType' object is not iterable

If I try with stream=[], I get:

✘ Error while validating stream: no first example
This likely means that your stream is empty.This can also mean all the examples
in your stream have been annotated in datasets included in your --exclude recipe
parameter.

If I pass stream=[1], it works

However, I was wondering how can I implement the custom get_progress you mentionned?
I tried to figure out the parameter for the progress callable and it seems to be progress(self, update_return_value=None)

I was wondering how could I get the session_id injected here? When I use http://localhost:8080/?session=tutu,

If I pass this method for progress:

def on_progress(*args, **kwargs):
    print(args[0].session_id)
    print(args[0].all_session_ids)
    return None

It will print:

2021-10-24_12-53-10
()

While I would have hoped to see tutu...

Thanks for your help
Thibault

ines · October 28, 2021, 8:30am

Ah, so you'd definitely need to provide the stream of examples on intitialization – but you can use your custom get_questions method to implement your own logic for how to fetch examples from the stream. I guess if you're doing everything fully custom in get_questions, you could also pass in a dummy stream with one dummy example to bypass the validation.

The get_questions method has the following signature (also see the API docs here):

def get_questions(
    self, session_id: Optional[str] = None, excludes: Optional[Iterable[int]] = None
) -> StreamType:

The session_id here will be the session making the requests, e.g. for ?session=tutu, the method will receive "tutu".

Yes, you can also see an example here: Custom Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

When you tested your progress function, did you already make a request and annotate examples using the tutu session? One thing to keep in mind is that the controller will only know about the tutu session once it has made a request, so when the progress first runs on startup, it won't yet know about tutu.

thibault · October 31, 2021, 1:11am

So, I've been able to make quite some progress thanks to your explanations...Thanks a lot for that. I'm now facing a strange behavior which I'm not sure how to solve and I wonder if it's a bug related to hijacking the stream to my own needs.

TLDR:

How can I make sure that /give_answers is called after each submission and not every 3 times?
Why is /give_answers sending only one answer when 3 have been done on the ui side?
Am i supposed to override other methods? And if so, which ones?

Here is my simple recipe:

from prodigy.core import recipe, Controller
from prodigy.types import RecipeSettingsType
from prodigy import set_hashes

CACHE = [
    {"audio": "http://localhost:8000/1.mp3", "pk": 1, "_view_id": "audio_manual"},
    {"audio": "http://localhost:8000/2.mp3", "pk": 2, "_view_id": "audio_manual"},
    {"audio": "http://localhost:8000/3.mp3", "pk": 3, "_view_id": "audio_manual"},
    {"audio": "http://localhost:8000/4.mp3", "pk": 4, "_view_id": "audio_manual"},
]

counter = 0


def update(examples):
    print("Update: %s" % examples)
    return examples

def before_db(examples):
    print("before_db: %s" % examples)
    return examples

def validate_answer(eg):
    print("Validating: %s" % eg)
    return True

class CustomController(Controller):
    def __init__(self):

        dataset = "1"
        view_id = "audio_manual"
        stream = [None]
        db = None
        progress = None
        on_load = None
        on_exit = None
        get_session_id = None
        exclude = None
        label = ["LBL 1", "LBL 2"]
        config = {
            "labels": label,
            "audio_autoplay": False,
            "auto_count_stream": True,
            "batch_size": 1,
            "feed_overlap": True
        }


        super().__init__(dataset, view_id, stream, update, db, progress, on_load,
        on_exit, before_db, validate_answer, get_session_id, exclude, config, None)

    def receive_answers(self, tasks, **kwargs):
        print("Tasks: %s" % tasks)
        print("kwargs: %s" % kwargs)
        return super().receive_answers(tasks, **kwargs)

    def get_questions(self, session_id=None, excludes=None):
        global CACHE
        global counter
        if counter >= len(CACHE):
            return []
        question = set_hashes(CACHE[counter])
        questions = [
            question
        ]
        counter += 1
        return questions

@recipe(
    "annotate",
)
def annotate() -> RecipeSettingsType:
    ctrl = CustomController()

    return ctrl

Highlight of this code: I will return 4 audio files in get_questions one by one (batch_size=1). stream return only one dummy value.

When starting the server with prodigy annotate -F ./recipe2.py, the server starts without problem and show me the 4 audio when I annotate them.

My issue is that:
/give_answers is called only after 3 samples have been annotated and not after each one like I would expect with a batch_size = 1

And when give_answers is called, it's only providing one answer.

This is the payload sent the first time /give_answers is called (pk=1):

{"answers":[{"audio":"http://localhost:8000/1.mp3","pk":1,"_view_id":"audio_manual","_input_hash":900824920,"_task_hash":-615701225,"audio_spans":[{"start":5.324999664616006,"end":11.874999252077949,"label":"LBL 1","id":"a691f77f-b5e1-44e3-827d-4e2163e7a75a","color":"rgba(255,215,0,0.2)"},{"start":18.724998820645016,"end":26.97499830103601,"label":"LBL 1","id":"dcf7db7f-8237-423a-b899-4bf93079bb28","color":"rgba(255,215,0,0.2)"}],"answer":"accept","_timestamp":1635640274}],"session_id":"no_dataset-e8f4329571d74d5bb52de68b953ee3a5","annotator_id":"no_dataset-e8f4329571d74d5bb52de68b953ee3a5"}

This is the payload sent the second time (when there are no samples left) and we can see it's pk=2 :

{"answers":[{"audio":"http://localhost:8000/2.mp3","pk":2,"_view_id":"audio_manual","_input_hash":-356468264,"_task_hash":-1895313176,"audio_spans":[{"start":4.574999711853188,"end":11.574999270972821,"label":"LBL 1","id":"67787522-c126-4190-bdc2-a222d2bcd908","color":"rgba(255,215,0,0.2)"},{"start":21.374998653740306,"end":27.62499826009712,"label":"LBL 1","id":"a9161c51-1d97-429b-bc8e-5cab02e30dd3","color":"rgba(255,215,0,0.2)"}],"answer":"accept","_timestamp":1635640284}],"session_id":"no_dataset-e8f4329571d74d5bb52de68b953ee3a5","annotator_id":"no_dataset-e8f4329571d74d5bb52de68b953ee3a5"}

Here are the logs as well where we can see that validate is called each time but before_db and update are only called after the 3 time:

prodigy annotate -F ./recipe2.py 

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

Validating: {'audio': 'http://localhost:8000/1.mp3', 'pk': 1, '_view_id': 'audio_manual', '_input_hash': 900824920, '_task_hash': -615701225, 'audio_spans': [{'start': 5.324999664616006, 'end': 11.874999252077949, 'label': 'LBL 1', 'id': 'a691f77f-b5e1-44e3-827d-4e2163e7a75a', 'color': 'rgba(255,215,0,0.2)'}, {'start': 18.724998820645016, 'end': 26.97499830103601, 'label': 'LBL 1', 'id': 'dcf7db7f-8237-423a-b899-4bf93079bb28', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept'}
Validating: {'audio': 'http://localhost:8000/2.mp3', 'pk': 2, '_view_id': 'audio_manual', '_input_hash': -356468264, '_task_hash': -1895313176, 'audio_spans': [{'start': 4.574999711853188, 'end': 11.574999270972821, 'label': 'LBL 1', 'id': '67787522-c126-4190-bdc2-a222d2bcd908', 'color': 'rgba(255,215,0,0.2)'}, {'start': 21.374998653740306, 'end': 27.62499826009712, 'label': 'LBL 1', 'id': 'a9161c51-1d97-429b-bc8e-5cab02e30dd3', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept'}
Validating: {'audio': 'http://localhost:8000/3.mp3', 'pk': 3, '_view_id': 'audio_manual', '_input_hash': 1255985390, '_task_hash': 1655745300, 'audio_spans': [{'start': 4.774999699256606, 'end': 9.974999371745476, 'label': 'LBL 1', 'id': '515e34cd-020b-42ec-b5ad-95b41e0797f3', 'color': 'rgba(255,215,0,0.2)'}, {'start': 20.724998694679197, 'end': 24.62499844904585, 'label': 'LBL 1', 'id': '3c09e38c-370b-492d-bf13-8463684da25a', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept'}
Tasks: [{'audio': 'http://localhost:8000/1.mp3', 'pk': 1, '_view_id': 'audio_manual', '_input_hash': 900824920, '_task_hash': -615701225, 'audio_spans': [{'start': 5.324999664616006, 'end': 11.874999252077949, 'label': 'LBL 1', 'id': 'a691f77f-b5e1-44e3-827d-4e2163e7a75a', 'color': 'rgba(255,215,0,0.2)'}, {'start': 18.724998820645016, 'end': 26.97499830103601, 'label': 'LBL 1', 'id': 'dcf7db7f-8237-423a-b899-4bf93079bb28', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept', '_timestamp': 1635640274}]
kwargs: {'session_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5', 'annotator_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5'}
Update: [{'audio': 'http://localhost:8000/1.mp3', 'pk': 1, '_view_id': 'audio_manual', '_input_hash': 900824920, '_task_hash': -615701225, 'audio_spans': [{'start': 5.324999664616006, 'end': 11.874999252077949, 'label': 'LBL 1', 'id': 'a691f77f-b5e1-44e3-827d-4e2163e7a75a', 'color': 'rgba(255,215,0,0.2)'}, {'start': 18.724998820645016, 'end': 26.97499830103601, 'label': 'LBL 1', 'id': 'dcf7db7f-8237-423a-b899-4bf93079bb28', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept', '_timestamp': 1635640274, '_annotator_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5', '_session_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5'}]
before_db: [{'audio': 'http://localhost:8000/1.mp3', 'pk': 1, '_view_id': 'audio_manual', '_input_hash': 900824920, '_task_hash': -615701225, 'audio_spans': [{'start': 5.324999664616006, 'end': 11.874999252077949, 'label': 'LBL 1', 'id': 'a691f77f-b5e1-44e3-827d-4e2163e7a75a', 'color': 'rgba(255,215,0,0.2)'}, {'start': 18.724998820645016, 'end': 26.97499830103601, 'label': 'LBL 1', 'id': 'dcf7db7f-8237-423a-b899-4bf93079bb28', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept', '_timestamp': 1635640274, '_annotator_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5', '_session_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5'}]
Validating: {'audio': 'http://localhost:8000/4.mp3', 'pk': 4, '_view_id': 'audio_manual', '_input_hash': -887226071, '_task_hash': -695329649, 'audio_spans': [{'start': 8.224999481965568, 'end': 16.57499895605827, 'label': 'LBL 1', 'id': '4f7d6ab1-eb26-402a-b1b3-d026adcc8437', 'color': 'rgba(255,215,0,0.2)'}, {'start': 26.074998357720627, 'end': 30.924998052253514, 'label': 'LBL 1', 'id': '1bc3ae77-c721-4a7e-aecb-8ea612b31561', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept'}
Tasks: [{'audio': 'http://localhost:8000/2.mp3', 'pk': 2, '_view_id': 'audio_manual', '_input_hash': -356468264, '_task_hash': -1895313176, 'audio_spans': [{'start': 4.574999711853188, 'end': 11.574999270972821, 'label': 'LBL 1', 'id': '67787522-c126-4190-bdc2-a222d2bcd908', 'color': 'rgba(255,215,0,0.2)'}, {'start': 21.374998653740306, 'end': 27.62499826009712, 'label': 'LBL 1', 'id': 'a9161c51-1d97-429b-bc8e-5cab02e30dd3', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept', '_timestamp': 1635640284}]
kwargs: {'session_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5', 'annotator_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5'}
Update: [{'audio': 'http://localhost:8000/2.mp3', 'pk': 2, '_view_id': 'audio_manual', '_input_hash': -356468264, '_task_hash': -1895313176, 'audio_spans': [{'start': 4.574999711853188, 'end': 11.574999270972821, 'label': 'LBL 1', 'id': '67787522-c126-4190-bdc2-a222d2bcd908', 'color': 'rgba(255,215,0,0.2)'}, {'start': 21.374998653740306, 'end': 27.62499826009712, 'label': 'LBL 1', 'id': 'a9161c51-1d97-429b-bc8e-5cab02e30dd3', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept', '_timestamp': 1635640284, '_annotator_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5', '_session_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5'}]
before_db: [{'audio': 'http://localhost:8000/2.mp3', 'pk': 2, '_view_id': 'audio_manual', '_input_hash': -356468264, '_task_hash': -1895313176, 'audio_spans': [{'start': 4.574999711853188, 'end': 11.574999270972821, 'label': 'LBL 1', 'id': '67787522-c126-4190-bdc2-a222d2bcd908', 'color': 'rgba(255,215,0,0.2)'}, {'start': 21.374998653740306, 'end': 27.62499826009712, 'label': 'LBL 1', 'id': 'a9161c51-1d97-429b-bc8e-5cab02e30dd3', 'color': 'rgba(255,215,0,0.2)'}], 'answer': 'accept', '_timestamp': 1635640284, '_annotator_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5', '_session_id': 'no_dataset-e8f4329571d74d5bb52de68b953ee3a5'}]

I guess my question are:

How can I make sure that /give_answers is called after each submission and not every 3 times?
Why is /give_answers sending only one answer when 3 have been done on the ui side?
Am i supposed to override other methods? And if so, which ones?

ines · October 31, 2021, 12:09pm

With the default configuration, it's expected that there's a delay in the first example being sent back to the server, because you have to factor in the example(s) that's still in the history in the app and the example that's currently being annotated, as well as the fact that Prodigy will always request the next batch in the background so you never run out of questions. The answers sent back to the server correspond to the batch size, which in your case is 1.

So with a batch_size of 1, you start annotating and you'll see example 1 and Prodigy will ask for the next batch (example 2). You annotate example 1 and it stays in the history so you can undo if you need to, while you see example 2 for annotation and Prodigy queues up example 3 in the background. You annotate example 2, it goes into the history, example 1 is "outboxed" to be sent back with the next request to /give_answers, example 3 is shown for annotation, and so on.

If you set "insant_submit": true in your config, the history part will be skipped and each example you annotate will be sent back instantly. However, you still want to prevent running out of questions to annotate, so it makes sense to design your stream to expect that there's always at least one example "in transit" in the background and being queued up while you still annotate.

thibault · November 1, 2021, 12:01pm

Awesome, the instant_submit: true does the job!

Are you sure we need to have one sample in transit? It seems that if I send questions one by one with instant_submit: true, I get get_questions called, I submit an answer, it calls /give_answers and after that it calls get_question...Or could it be that because I'm testing in localhost there is some network "magic" that works out well for me?

Not fully related to my previous questions:

Is there a way to get all the parameters that config accept? ìnstant_submitfor example only appears in the changelog...
In the documentation, receive_answers is supposed to only have answers and session_id as a parameter but I also see a annotator_id...which seems to be the same as session_id...Which one should I use to identify the annotator?
I tried to use the prodigyend listener to redirect to another page when the annotation is done (using window.href and I get a popup saying This page is asking you to confirm that you want to leave — information you’ve entered may not be saved....Adding a setTimeout(() => { window.href = "" }, 3000); solves the issue..but i wonder if there is a cleaner way to wait for the page to be fully done?

Thanks again for the quick answers (I think those might be the last questions )

ines · November 4, 2021, 10:35am

If it works like this, that's definitely good More generally, I think it can always be possible that there's an example in transit, and it'll depend on the data and how long it takes to queue up the next example from the stream. There may be scenarios where you may end up with the next example being queued up before the previous one is received or vice versa – depending on how fast you annotate, how large the data is and how long your stream takes (if you do any slow processing within the generator).

You can find an overview of all general-purpose config parameters here: Installation & Setup · Prodigy · An annotation tool for AI, Machine Learning & NLP

Individual interfaces may also take specific settings to customise them, and those are documented with the interfaces. For example, here are config settings for ner_manual: Annotation interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP

This typically happens when there's still unsaved state in the app. Since you're using instant_submit, you won't end up with actual unsaved state (because every annotation is saved immediately), so I'm guessing that you're simply hitting a race condition where prodigyend fires while the last example is still being submitted. So your workaround seems like an okay solution for this

(Also, this reminds me, we should provide a window.prodigy.save function for use cases without instant_submit, so you can ensure that unsaved state is submitted before redirecting etc. This won't make a difference for your case, but it'll be nice to have for others.)

thibault · November 9, 2021, 12:59pm

Thanks for the clarifcation

It seems that prodigyend is fired as soon as any annotation has been done...Could you tell me how you determine that prodigyend needs to fire? Is it relying on the progress or something else?

If I inject the following js:

document.addEventListener('prodigyend', event => {
    console.log("Prodigyend triggered");
})
document.addEventListener('prodigyanswer', event => {
    console.log("Prodigyanswer triggered");
})

We get the following log everytime I submit an answer:

I would expect the prodigyend to fire only when get_session_questions return [] ... Or am I missing something? (In this example I had 3 differents sample and it was firing every time)

And not exactly related but is it possible to have a loading bar or equivalent when waiting for the audio to load? I"m working with 30min long audio and it takes a while to load giving a feeling that the UI is stuck (and this happen even by default when I don't try to play with the way questions are delivered)

Thanks for the help

ines · November 10, 2021, 9:37am

Ah, I wonder if you're hitting an interesting edge case here because you're using instant_submit and are only ever dealing with one annotation at a time. Normally, prodigyend will fire when there are no more examples left on the queue, no current task to display and when Prodigy isn't loading and hasn't errored either. This is typically the same state in which you'd see "No tasks available."

However, now that I think about it, this condition may also be true temporarily in your scenario right after you submit an answer This is tricky, I need to think about a workaround for cases like this. In the meantime, if you know how many questions you're expecting, you could just keep a counter in JavaScript and only redirect once you hit prodigyend or prodigyanswer and X unique hashes have been answered.

I'll look into whether we can expose this somehow!

The audio player is implemented using wavesurfer and the Wavesurfer instance is exposed as window.wavesurfer. So you can interact with it from JavaScript or within the JS console. So you could listen to the loading event or similar and at least log to the console every N seconds while it's still loading.

thibault · November 10, 2021, 9:53am

Thanks for the answer. I've indeed started to monitor the prodigyanswer event and check if there are any questions left to do the redirect!

Regarding prodigyend wouldn´t it make more sense to use all the conditions above + get_session_question returns no results?

provRaminHamediZavie · October 26, 2022, 12:44am

Would be really useful to have this feature, I asked a question in this regard here:

Topic		Replies	Views
ordered tasks on "mark" receipe usage , custom	11	2134	May 5, 2020
Issue in multi-session mode: duplicated annotation tasks and different order? enhancement , done , streams	19	2779	May 28, 2020
Custom templates with custom DB and exclude logic usage , custom , solved	20	3057	January 29, 2018
Few records in in the db for the same example usage	26	630	June 13, 2023
Creating a custom review recipe for image annotation	12	95	April 15, 2025

How can I override Controller.get_questions

Related topics