NER not containing <word_list>

Hi Matt,

first of all, thank you very much for your very detailed answer to my numerous questions!

It was a random split, but I'm currently generating a gold/eval set for better comparability. I got somewhat confused by the difference some recipes would make in this particular use case, so I opened a separate topic (Gold/Silver Dataset Confusion).

I think I was unclear in my explanation: 1-15 Token spans refered to the length of the input sentences, the entities are mostly 1-4. I wanted to avoid the word "sentence" because these lines are not necessary sentences in a grammatical way. I generate the data by simply taking after the first newline character and surpressing further segmentation during teaching with the -U flag.

The 2000 were just for a first glimpse, I'm confident that something can be achieved here, because the entities are often "framed" by a set of typical stopwords like "-" or "in".

Then, no :sweat_smile:. Now, yes! I explain my workflow in the above mentioned Gold/Silver Dataset Confusion, but in short: It greatly improved the predictions, I have only a few false positives now and I am trying to improve the accuracy with more examples (and a dedicated eval set :wink: )

How many would "a lot of" be in this context? Do you have some kind of documentation on how to experiment with the hyperparameters of the batch-train? It feels like you need much experience in the field to get some level of intuition when things like the beam width, the number of iterations or the batch size has to be changed. For quick results I am happy with the wise defaults of the prodigy recipes, but of course I want to get a deeper understanding for the final improvements.

I think this solved itself due to the misunderstanding above. I am already using text classification for other tasks, mainly the verification i.e. plausibility of rule based entities.

Remaining Question
Regarding my update with the recipe wrapper... is it possible to write a wrapper that can auto accept/reject examples and still keeping them in the normal controller workflow? I mean, I can add explicit commands to add them in the database and incorporate them in the model update, but I will always filter them from the example stream in order to avoid them being displayed.
But inside the wrapper I have no access to the controller (except on_exit), so I can't update things like the total processed examples in the current session.