feed_overlap bug?

Hi there,

I'm wondering if the feed_overlap functionality was broken in a recent update or if I'm using it incorrectly. It seems like even with feed_overlap set to true, labeling tasks are only going to one session. I was seeing this with an html recipe, but I made this POC recipe using the classification view and it also seems broken:

import prodigy

@prodigy.recipe('Basic Classification')
def basic_classification():
    stream = [
        {
            'text': 'text1',
            'label': 'label1'
        },
        {
            'text': 'text2',
            'label': 'label2'
        }, {
            'text': 'text3',
            'label': 'label3'
        }
    ]

    return {
        "view_id": "classification",
        "dataset": 'basic_classification_dataset2',
        "stream": stream,
        "config": {
            'feed_overlap': True
        }
    }

(I also have "feed_overlap": true in my ~/.prodigy/prodigy.json just in case.) If I classify an annotation with session=austin, that question isn't showing up for session=bob. By extension, once I've answered all three annotations as austin, loading the page as bob gives me a "No tasks available" message.

Let me know if I'm doing something incorrectly!
Austin

Some more stats output, and verbose logging from Prodigy.

  ✨  Prodigy stats

Version          1.8.3
Location         /usr/local/miniconda3/envs/prodigy/lib/python3.7/site-packages/prodigy
Prodigy Home     /Users/austin/.prodigy
Platform         Darwin-17.7.0-x86_64-i386-64bit
Python Version   3.7.3
Database Name    PostgreSQL
Database Id      postgresql
Total Datasets   57
Total Sessions   240

16:35:12 - DB: Loading dataset 'basic_classification_dataset2' (3 examples)

  ✨  Dataset 'basic_classification_dataset2'

Dataset       basic_classification_dataset2
Created       2019-06-28 16:25:54
Description   None
Author        None
Annotations   3
Accept        2
Reject        0
Ignore        1

Logging; this is after I’d already answered 3 questions as austin and had started prodigy to answer them as bob. It seems like it’s filtering out the answered questions before the UI is even hit?

16:33:34 - APP: Using Hug endpoints (deprecated)
16:33:34 - RECIPE: Loading recipe from file recipes/basic_testing_recipes.py
16:33:34 - RECIPE: Calling recipe 'Basic Classification'
16:33:34 - CONTROLLER: Initialising from recipe
{'config': {'feed_overlap': True, 'dataset': 'basic_classification_dataset2', 'recipe_name': 'Basic Classification', 'db': 'postgresql', 'db_settings': {'postgresql': {'dbname': 'prodigy_dev', 'host': 'workflow-dev.private-dev', 'user': 'prodigy_dev', 'password': 'prodigy_dev'}}, 'custom_theme': {'bgButton': '#f3f3f3', 'ignore': '#F7DA4A'}}, 'dataset': 'basic_classification_dataset2', 'db': True, 'exclude': None, 'get_session_id': None, 'on_exit': None, 'on_load': None, 'progress': <prodigy.components.progress.ProgressEstimator object at 0x115210080>, 'self': <prodigy.core.Controller object at 0x1152100b8>, 'stream': [{'text': 'text1', 'label': 'label1'}, {'text': 'text2', 'label': 'label2'}, {'text': 'text3', 'label': 'label3'}], 'update': None, 'view_id': 'classification'}

16:33:34 - VALIDATE: Creating validator for view ID 'classification'
16:33:34 - DB: Initialising database PostgreSQL
16:33:35 - DB: Connecting to database PostgreSQL
16:33:35 - DB: Loading dataset 'basic_classification_dataset2' (3 examples)
16:33:35 - DB: Creating dataset '2019-06-28_16-33-34'
{'created': datetime.datetime(2019, 6, 28, 16, 25, 54)}

16:33:35 - DatasetFilter: Getting hashes for excluded examples
16:33:35 - DatasetFilter: Excluding 3 tasks from datasets: basic_classification_dataset2
16:33:35 - CONTROLLER: Initialising from recipe
{'batch_size': 10, 'db': None, 'filters': [<prodigy.components.feeds.DatasetFilter object at 0x1151f7e80>, <prodigy.components.feeds.RelatedSessionsFilter object at 0x1136dd898>], 'max_sessions': 10, 'self': <prodigy.components.feeds.SessionFeed object at 0x1136dd4e0>, 'stream': [{'text': 'text1', 'label': 'label1'}, {'text': 'text2', 'label': 'label2'}, {'text': 'text3', 'label': 'label3'}], 'validator': <prodigy.components.validate.Validator object at 0x115210278>, 'view_id': 'classification'}

16:33:35 - CORS: initialize wildcard "*" CORS origins

  ✨  Starting the web server at http://localhost:8080 ...
  Open the app in your browser and start annotating!

Task queue depth is 1
Task queue depth is 2
Task queue depth is 3
Task queue depth is 4
16:33:41 - GET: /project
{'feed_overlap': True, 'dataset': 'basic_classification_dataset2', 'recipe_name': 'Basic Classification', 'db': 'postgresql', 'custom_theme': {'bgButton': '#f3f3f3', 'ignore': '#F7DA4A'}, 'view_id': 'classification', 'batch_size': 10, 'version': '1.8.3'}

16:33:41 - GET: /project
{'feed_overlap': True, 'dataset': 'basic_classification_dataset2', 'recipe_name': 'Basic Classification', 'db': 'postgresql', 'custom_theme': {'bgButton': '#f3f3f3', 'ignore': '#F7DA4A'}, 'view_id': 'classification', 'batch_size': 10, 'version': '1.8.3'}

Task queue depth is 1
Task queue depth is 1
16:33:41 - POST: /get_session_questions
Task queue depth is 1
Task queue depth is 2
16:33:42 - FEED: Finding next batch of questions in stream
16:33:42 - CONTROLLER: Validating the first batch for session: basic_classification_dataset2-bob
16:33:42 - RESPONSE: /get_session_questions (0 examples)
{'tasks': [], 'total': 3, 'progress': 0.0, 'session_id': 'basic_classification_dataset2-bob'}

Thanks for the report – I’ll take a look! Btw, what happens when you set "feed_overlap": False?

Hi Austin,

Thanks for the reproduction case! I was able to find your issue and come up with a test/fix. Can you verify it by trying this updated recipe and confirming if it works for you?

recipe.py

import prodigy


@prodigy.recipe("basic")
def basic_classification():
    DB = prodigy.components.db.connect()

    def remove_builtin_filter(controller: prodigy.core.Controller):
        # HACK: remove the builtin RelatedSessionsFilter because it's not serializable
        controller.config["feed_filters"] = controller.config["feed_filters"][:-1]

    class WorkaroundFilter(dict):
        def __init__(self):
            self.session_hashes = {}

        def prepare_batch_filter(self, session_id: str):
            if session_id is not None and session_id not in self.session_hashes:
                self.session_hashes[session_id] = DB.get_task_hashes([session_id])

        def filter_task(self, task, session_id: str):
            hashes = self.session_hashes.get(session_id, set())
            return task[prodigy.util.TASK_HASH_ATTR] in hashes

    stream = [
        {"text": "text1", "label": "label1"},
        {"text": "text2", "label": "label2"},
        {"text": "text3", "label": "label3"},
    ]
    return {
        "on_load": remove_builtin_filter,
        "view_id": "classification",
        "dataset": "basic_classification_dataset7",
        "stream": stream,
        "config": {"feed_filters": [WorkaroundFilter()]},
    }

You should be fine to use the WorkaroundFilter until the next release.

1 Like

Hi,
I am having the same issue. I tried the code above and ran the basic task with:
prodigy basic -F recipe.py

However, when I run
prodigy basic -F recipe.py

and goto localhost:8080, the UI displays “Oops, something went wrong :(”

and the logs show:
(base) ronalds-mbp:public-version-test u0115286$ prodigy basic -F recipe.py
Added dataset basic_classification_dataset7 to database SQLite.

:sparkles: Starting the web server at http://localhost:8080
Open the app in your browser and start annotating!

Exception when serving /project/default
Traceback (most recent call last):
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/waitress/channel.py”, line 336, in service
task.service()
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/waitress/task.py”, line 175, in service
self.execute()
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/waitress/task.py”, line 452, in execute
app_iter = self.channel.server.application(env, start_response)
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/hug/api.py”, line 451, in api_auto_instantiate
return module.hug_wsgi(*args, **kwargs)
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/falcon/api.py”, line 244, in call
responder(req, resp, **params)
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/hug/interface.py”, line 789, in call
raise exception
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/hug/interface.py”, line 762, in call
self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/hug/interface.py”, line 709, in render_content
content = self.outputs(content, **self._arguments(self._params_for_outputs, request, response))
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/hug/output_format.py”, line 129, in json
return json_converter.dumps(content, default=_json_converter, ensure_ascii=ensure_ascii, **kwargs).encode(‘utf8’)
File “/Users/u0115286/anaconda3/lib/python3.6/json/init.py”, line 238, in dumps
**kw).encode(obj)
File “/Users/u0115286/anaconda3/lib/python3.6/json/encoder.py”, line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File “/Users/u0115286/anaconda3/lib/python3.6/json/encoder.py”, line 257, in iterencode
return _iterencode(o, 0)
File “/Users/u0115286/anaconda3/lib/python3.6/site-packages/hug/output_format.py”, line 80, in _json_converter
raise TypeError(“Type not serializable”)
TypeError: Type not serializable

ronalds-mbp:public-version-test u0115286$ prodigy stats

:sparkles: Prodigy stats

Version 1.8.3
Location /Users/u0115286/anaconda3/lib/python3.6/site-packages/prodigy
Prodigy Home /Users/u0115286/.prodigy
Platform Darwin-18.6.0-x86_64-i386-64bit
Python Version 3.6.0
Database Name SQLite
Database Id sqlite
Total Datasets 2
Total Sessions 35

1 Like

Thanks for trying it out, I edited the example to resolve the issue you reported. Sadly it’s a bit tricky to disable the builtin feed filters without a new version of the library, so I got it wrong. Oops!

Try out the updated snippet and let me know if it still doesn’t work for you

1 Like

@ines I did try that…I was hoping it was just a boolean bug of some sort, but no such luck :frowning:

@justindujardin thank you for that! that worked perfectly and I was able to incorporate the same things into my html recipe successfully.

1 Like

@justindujardin The edit you made to the original fixed the issue. Thank you so much!

2 Likes