Multi-user sessions using a custom transformer-based span-cat pipeline

My team is trying to use the span.correct recipe from a transformer-based pipeline using multi-user sessions with feed_overlap set to true. However, after starting the server using the following recipe:

prodigy spans.correct cer_2.0_sent_spans model-best data/cer_s
entences_round_2.jsonl --update --label BODYPART,PSYCHOLOGICALCONDITION,DATETIME,VITAL_MEASUREMENT,VITAL_SIGN,RELATIONSHIPSTATUS,DOSAGE_STRENGTH,AD
MISSIONDISCHARGE,DISEASE,AGE,DRUG,TEST,LATERALITY,PROCEDURE,CLINICALDEPT,SYMPTOM,VACCINE,ROUTE,FREQUENCY,DURATION,GENDER,FORM,EMOTION

One annotator was able to label 154 examples, but the other only received 32, with "No Tasks Available" afterward. The annotator with 154 examples still has more examples loading in their session.

After running with verbose logging we get the following logs after trying to access the user-session with only 32 examples:

INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://192.168.2.224:8080 (Press CTRL+C to quit)
INFO:     192.168.103.187:60200 - "GET /?session=zach HTTP/1.1" 200 OK
INFO:     192.168.103.187:60200 - "GET /bundle.js HTTP/1.1" 200 OK
INFO:     192.168.103.187:60200 - "GET /project/zach HTTP/1.1" 200 OK
19:51:29: POST: /get_session_questions
19:51:29: CONTROLLER: Getting batch of questions for session: cer_2.0_enming_sent_spans-zach
19:51:29: FEED: Finding next batch of questions in stream
Exception in thread Thread-2 (_count):
INFO:     192.168.103.187:60201 - "GET /favicon.ico HTTP/1.1" 200 OK
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "cython_src/prodigy/components/stream.pyx", line 122, in prodigy.components.stream.Stream._count
  File "cython_src/prodigy/components/stream.pyx", line 26, in prodigy.components.stream.count_iterator
RuntimeError: cannot re-enter the tee iterator
19:51:29: FEED: re-adding open tasks to stream
19:51:29: FEED: Stream is empty
19:51:29: FEED: batch of questions requested for session cer_2.0_enming_sent_spans-zach: 0
19:51:29: RESPONSE: /get_session_questions (0 examples)
{'tasks': [], 'total': 32, 'progress': None, 'session_id': 'cer_2.0_enming_sent_spans-zach'}

INFO:     192.168.103.187:60200 - "POST /get_session_questions HTTP/1.1" 200 OK
19:51:33: STREAM: Counting iterator exceeded timeout of 10 seconds after 159 tasks

I am using prodigy 1.11.14

hi @coltonflowers1!

Thanks for your question and welcome to the Prodigy community :wave:

Thanks for the logs -- that's very helpful.

Like this post, I suspect there's an issue with multi-processing.

We may need to take a deeper dive into this.

Were annotators 1 and 2 annotating at the same time? Given you were using --update, I suspect that you could have an issue if both annotators are trying to update the model at the same time. I would suggest in the interim removing --update and see if you find the same error. In the meantime, I'll let you know if I find anything else on our end.

Also - one small recommendation -- given you have a lot of labels, you can list them to a labels.txt where each label is a new line and then pass labels.txt to --labels in your prodigy command to make your command a bit easier (and less prone to forgetting one of your labels).