ner.teach transformer model server error

Hello,

I am trying to refine a transformer based ner model using ner.teach on the 1.11(0a8) release. Per this thread, I will check out the regular release for ner.teach.

However, I also wanted to report that I am regularly experiencing server failure after about 10-20 tasks on ner.teach for a single entity on a 5-entity transformer NER model. I have tried dropping the -U as I saw that may overload text parsing times and I have also tried swapping the raw .txt for an explicitly formatted .jsonl of {"text":"..."} format, with the same behavior. I was not having issues when I was using this feature on a non-transformers model.

I set logging to verbose in order to obtain some logs in case that is of any use.

(spacy_prodigy_nightly_latest) $ prodigy ner.teach -l LABEL1 ner_binary_label1_set1 path_to_transformer_ner_model path_to_input_text.txt
10:55:47: INIT: Setting all logging levels to 10
10:55:48: RECIPE: Calling recipe 'ner.teach'
Using 1 label(s): LABEL1
10:55:48: RECIPE: Starting recipe ner.teach
{'dataset': 'ner_binary_label1_set1', 'spacy_model': 'path_to_transformer_ner_model', 'source': 'path_to_input_text.txt', 'loader': None, 'label': ['LABEL1'], 'patterns': None, 'exclude': None, 'unsegmented': False}

10:55:48: LOADER: Using file extension 'txt' to find loader
path_to_input_text.txt

10:55:48: LOADER: Loading stream from txt
10:55:48: LOADER: Rehashing stream
10:55:51: MODEL: Added sentence boundary detector to model pipeline
['sentencizer', 'transformer', 'ner']

10:55:51: RECIPE: Creating EntityRecognizer using model path_to_transformer_ner_model
10:55:52: RECIPE: Making sure all labels are in the model
['LABEL1']

10:55:52: SORTER: Resort stream to prefer uncertain scores (bias 0.0)
10:55:52: VALIDATE: Validating components returned by recipe
10:55:52: CONTROLLER: Initialising from recipe
{'before_db': None, 'config': {'lang': 'en', 'label': 'LABEL1', 'dataset': 'ner_binary_label1_set1', 'recipe_name': 'ner.teach', 'batch_size': 25}, 'dataset': 'ner_binary_label1_set1', 'db': True, 'exclude': None, 'get_session_id': None, 'metrics': None, 'on_exit': None, 'on_load': None, 'progress': <prodigy.components.progress.ProgressEstimator object at 0x10c837850>, 'self': <prodigy.core.Controller object at 0x123bbd7c0>, 'stream': <prodigy.components.sorters.ExpMovingAverage object at 0x127281a90>, 'update': <bound method EntityRecognizer.update of <prodigy.models.ner.EntityRecognizer object at 0x126fcbeb0>>, 'validate_answer': None, 'view_id': 'ner'}

10:55:52: VALIDATE: Creating validator for view ID 'ner'
10:55:52: VALIDATE: Validating Prodigy and recipe config
10:55:52: DB: Initializing database SQLite
10:55:52: DB: Connecting to database SQLite
10:55:52: DB: Creating dataset '2021-06-08_10-55-52'
{'created': datetime.datetime(2021, 6, 7, 15, 16, 56)}

10:55:52: DatasetFilter: Excluding examples based on task hashes
10:55:52: DatasetFilter: Getting hashes for excluded examples
10:55:52: DatasetFilter: Excluding 198 tasks from datasets: ner_binary_label1_set1
10:55:52: CONTROLLER: Initialising from recipe
{'batch_size': 25, 'dataset': 'ner_binary_label1_set1', 'db': <prodigy.components.db.Database object at 0x1277b9a30>, 'exclude': 'task', 'filters': [{'name': 'DatasetFilter', 'datasets': ['ner_binary_label1_set1']}], 'max_sessions': 10, 'overlap': False, 'self': <prodigy.components.feeds.SharedFeed object at 0x1272be0d0>, 'stream': <prodigy.components.sorters.ExpMovingAverage object at 0x127281a90>, 'validator': <prodigy.components.validate.Validator object at 0x1276aefd0>, 'view_id': 'ner'}

10:55:52: CONTROLLER: Validating the first batch for session: None
10:55:52: PREPROCESS: Splitting sentences
{'batch_size': 32, 'min_length': None, 'nlp': <spacy.lang.en.English object at 0x1272816d0>, 'no_sents_warned': False, 'stream': <generator object at 0x123bb3ee0>, 'text_key': 'text'}

10:55:52: FILTER: Filtering duplicates from stream
{'by_input': True, 'by_task': True, 'stream': <generator object at 0x123bb3d30>, 'warn_fn': <bound method Printer.warn of <wasabi.printer.Printer object at 0x10c8371c0>>, 'warn_threshold': 0.4}

10:55:52: FILTER: Filtering out empty examples for key 'text'
10:56:08: MODEL: Predicting spans for batch (batch size 64)
10:56:19: MODEL: Sorting batch by entity type (batch size 32)
10:56:19: FILTER: Skipping invalid 'text' task
{'text': '', '_input_hash': 616230852, '_task_hash': 1542943224}

10:56:19: FILTER: Skipping invalid 'text' task
{'text': '', '_input_hash': 616230852, '_task_hash': 1542943224}

10:56:27: MODEL: Predicting spans for batch (batch size 64)
10:56:36: MODEL: Sorting batch by entity type (batch size 32)
10:56:36: MODEL: Sorting batch by entity type (batch size 32)
10:56:36: MODEL: Predicting spans for batch (batch size 64)
10:56:49: MODEL: Sorting batch by entity type (batch size 32)
10:56:49: MODEL: Sorting batch by entity type (batch size 32)
10:56:56: MODEL: Predicting spans for batch (batch size 64)
10:57:06: MODEL: Sorting batch by entity type (batch size 32)
10:57:06: CORS: initialized with wildcard "*" CORS origins

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

INFO:     Started server process [2596]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8080 (Press CTRL+C to quit)
INFO:     ::1:50604 - "GET / HTTP/1.1" 200 OK
INFO:     ::1:50604 - "GET /bundle.js HTTP/1.1" 200 OK
10:57:17: GET: /project
{'lang': 'en', 'label': 'LABEL1', 'dataset': 'ner_binary_label1_set1', 'recipe_name': 'ner.teach', 'batch_size': 25, 'view_id': 'ner', 'version': '1.11.0a8'}

INFO:     ::1:50604 - "GET /project HTTP/1.1" 200 OK
10:57:17: POST: /get_session_questions
10:57:17: FEED: Finding next batch of questions in stream
10:57:17: MODEL: Sorting batch by entity type (batch size 32)
10:57:23: MODEL: Predicting spans for batch (batch size 64)
10:57:33: MODEL: Sorting batch by entity type (batch size 32)
10:57:33: MODEL: Sorting batch by entity type (batch size 32)
10:57:33: MODEL: Predicting spans for batch (batch size 64)
10:57:45: MODEL: Sorting batch by entity type (batch size 32)
10:57:45: MODEL: Sorting batch by entity type (batch size 32)
10:57:51: MODEL: Predicting spans for batch (batch size 64)
10:58:03: MODEL: Sorting batch by entity type (batch size 32)
10:58:03: MODEL: Sorting batch by entity type (batch size 32)
10:58:10: MODEL: Predicting spans for batch (batch size 64)
10:58:20: MODEL: Sorting batch by entity type (batch size 32)
10:58:20: MODEL: Sorting batch by entity type (batch size 32)
10:58:20: MODEL: Predicting spans for batch (batch size 64)
10:58:29: MODEL: Sorting batch by entity type (batch size 32)
10:58:29: MODEL: Sorting batch by entity type (batch size 32)
10:58:36: MODEL: Predicting spans for batch (batch size 64)
10:58:46: MODEL: Sorting batch by entity type (batch size 32)
10:58:46: MODEL: Sorting batch by entity type (batch size 32)
10:58:52: MODEL: Predicting spans for batch (batch size 64)
10:59:03: MODEL: Sorting batch by entity type (batch size 32)
10:59:03: MODEL: Sorting batch by entity type (batch size 32)
10:59:03: MODEL: Predicting spans for batch (batch size 64)
10:59:13: MODEL: Sorting batch by entity type (batch size 32)
10:59:13: MODEL: Sorting batch by entity type (batch size 32)
10:59:18: MODEL: Predicting spans for batch (batch size 64)
10:59:28: MODEL: Sorting batch by entity type (batch size 32)
10:59:28: MODEL: Sorting batch by entity type (batch size 32)
10:59:34: MODEL: Predicting spans for batch (batch size 64)
10:59:43: MODEL: Sorting batch by entity type (batch size 32)
10:59:43: MODEL: Sorting batch by entity type (batch size 32)
10:59:48: MODEL: Predicting spans for batch (batch size 64)
10:59:57: MODEL: Sorting batch by entity type (batch size 32)
10:59:57: MODEL: Sorting batch by entity type (batch size 32)
10:59:57: MODEL: Predicting spans for batch (batch size 64)
11:00:06: MODEL: Sorting batch by entity type (batch size 32)
11:00:06: MODEL: Sorting batch by entity type (batch size 32)
11:00:11: MODEL: Predicting spans for batch (batch size 64)
11:00:20: MODEL: Sorting batch by entity type (batch size 32)
11:00:20: MODEL: Sorting batch by entity type (batch size 32)
11:00:25: MODEL: Predicting spans for batch (batch size 64)
11:00:33: MODEL: Sorting batch by entity type (batch size 32)
11:00:34: MODEL: Sorting batch by entity type (batch size 32)
11:00:38: MODEL: Predicting spans for batch (batch size 64)
11:00:46: MODEL: Sorting batch by entity type (batch size 32)
11:00:46: MODEL: Sorting batch by entity type (batch size 32)
11:00:52: MODEL: Predicting spans for batch (batch size 64)
11:01:01: MODEL: Sorting batch by entity type (batch size 32)
11:01:01: MODEL: Sorting batch by entity type (batch size 32)
11:01:01: MODEL: Predicting spans for batch (batch size 64)
11:01:10: MODEL: Sorting batch by entity type (batch size 32)
11:01:10: MODEL: Sorting batch by entity type (batch size 32)
11:01:14: MODEL: Predicting spans for batch (batch size 64)
11:01:23: MODEL: Sorting batch by entity type (batch size 32)
11:01:23: RESPONSE: /get_session_questions (25 examples)

right after that last line it printed the JSON of 'tasks' to stdout and sent me the "Oops, something went wrong :("

Also, in the above I used a custom batch size of 25 in case I could just let it load longer, but in that example I couldn't even get it to load the first task. I switched it back to the default of 10 and got the original issue of it failing after several tasks, and the end of that logging looked like this:

MODEL: Sorting batch by entity type (batch size 32)
11:25:59: MODEL: Sorting batch by entity type (batch size 32)
11:25:59: RESPONSE: /get_session_questions (10 examples)
{'tasks': ...}

INFO:     ::1:50691 - "POST /get_session_questions HTTP/1.1" 200 OK

11:26:47: POST: /give_answers (received 10)
[{'text': ...},...]

11:26:47: CONTROLLER: Receiving 10 answers
11:26:47: MODEL: Merging entity spans of 10 examples
11:26:47: MODEL: Using 10 examples (without 'ignore')
11:26:51: PROGRESS: Estimating progress of 0.2308
11:26:51: DB: Getting dataset '2021-06-08_11-14-45'
11:26:51: DB: Getting dataset 'ner_binary_label1_set1'
11:26:51: DB: Getting dataset '2021-06-08_11-14-45'
11:26:51: DB: Added 10 examples to 2 datasets
11:26:51: CONTROLLER: Added 10 answers to dataset 'ner_binary_label1_set1' in database SQLite
11:26:51: RESPONSE: /give_answers
{'progress': 0.23076923076923078}

INFO:     ::1:50699 - "POST /give_answers HTTP/1.1" 200 OK
11:27:05: POST: /get_session_questions
11:27:05: FEED: Finding next batch of questions in stream
11:27:05: RESPONSE: /get_session_questions (10 examples)
{'tasks': ...}

INFO:     ::1:50700 - "POST /get_session_questions HTTP/1.1" 200 OK
11:27:17: POST: /get_session_questions
11:27:17: FEED: Finding next batch of questions in stream
11:27:23: MODEL: Predicting spans for batch (batch size 64)
11:27:32: MODEL: Sorting batch by entity type (batch size 32)
11:27:32: RESPONSE: /get_session_questions (10 examples)
{'tasks': ...}

INFO:     ::1:50701 - "POST /get_session_questions HTTP/1.1" 200 OK
11:27:44: POST: /give_answers (received 20)
[{'text': ...},...]

11:27:44: CONTROLLER: Receiving 20 answers
11:27:44: MODEL: Merging entity spans of 20 examples
11:27:44: MODEL: Using 20 examples (without 'ignore')
11:27:49: PROGRESS: Estimating progress of 0.2857
11:27:49: DB: Getting dataset '2021-06-08_11-14-45'
11:27:49: DB: Getting dataset 'ner_binary_label1_set1'
11:27:49: DB: Getting dataset '2021-06-08_11-14-45'
11:27:49: DB: Added 20 examples to 2 datasets
11:27:49: CONTROLLER: Added 20 answers to dataset 'ner_binary_label1_set1' in database SQLite
11:27:49: RESPONSE: /give_answers
{'progress': 0.2857142857142857}

INFO:     ::1:50703 - "POST /give_answers HTTP/1.1" 200 OK
11:28:47: POST: /get_session_questions
11:28:47: FEED: Finding next batch of questions in stream
11:28:47: MODEL: Sorting batch by entity type (batch size 32)
11:28:52: MODEL: Predicting spans for batch (batch size 64)
11:29:03: MODEL: Sorting batch by entity type (batch size 32)
11:29:03: MODEL: Sorting batch by entity type (batch size 32)
11:29:03: MODEL: Predicting spans for batch (batch size 64)
11:29:10: MODEL: Sorting batch by entity type (batch size 32)
11:29:10: MODEL: Sorting batch by entity type (batch size 32)
11:29:15: MODEL: Predicting spans for batch (batch size 64)
11:29:24: MODEL: Sorting batch by entity type (batch size 32)
11:29:24: MODEL: Sorting batch by entity type (batch size 32)
11:29:29: MODEL: Predicting spans for batch (batch size 64)
11:29:38: MODEL: Sorting batch by entity type (batch size 32)
11:29:38: MODEL: Sorting batch by entity type (batch size 32)
11:29:43: MODEL: Predicting spans for batch (batch size 64)
11:29:51: MODEL: Sorting batch by entity type (batch size 32)
11:29:51: MODEL: Sorting batch by entity type (batch size 32)
11:29:56: MODEL: Predicting spans for batch (batch size 64)
11:30:05: MODEL: Sorting batch by entity type (batch size 32)
11:30:05: MODEL: Sorting batch by entity type (batch size 32)

The above continued for another 10 minutes before I finally terminated it. It had started to show obviously incorrect tokens towards the last few examples, as in the other thread I linked at the top, so I suspect that this caught-in-a-loop behavior may have to do with the active learning piece getting stuck somehow? Just thought I'd leave this here in case it is useful. Loving Prodigy and everything else from Explosion!

Hi Matt,

Apologies for the late follow-up. I had this thread on my TODO list for some time because I wanted to spend some proper time investigating this. We've also been releasing updates to the ner.teach functionality that I was hoping that would fix the issues you've been seeing.

I've been trying to replicate your issues with the latest releases of Prodigy (1.11.2), spaCy (3.1.2) and spacy-transformers (1.0.5), and I haven't really been able to.

What I have seen, is that sometimes en_core_web_lg and en_core_web_trf may predict very different entities - especially if the input text for ner.teach is quite different than your original training data. This is because their generalization capability (to other domains) may be quite different. That also means that the confidences in predictions can be quite different.

There's a few things I want to check with you. First, if you upgrade all packages, are you still running into issues? If you still see the error "Oops, something went wrong :(" something is definitely wrong and we need to find a way for us to be able to replicate it so we can fix it.

If instead you're not running into any "real" crashes but you're still getting a limited set of examples and running out of tasks quickly, I have a few additional questions:

  • How many documents are you feeding as input into the ner.teach call?
  • Is the domain of these texts significantly different than the texts you trained the model on orginally?
  • If you would just load your trained model and run it over all the texts (in code with spaCy, not with Prodigy), how many entities in total would the model predict? (without any active learning) Do these entities look correct or wrong to you, mostly?
  • When you run ner.teach, is it still the case that after some time, the model seems to be predicting worse entities instead of better ones?