✘ Error while validating stream: no first example
This likely means that your stream is empty.This can also mean all the examples
in your stream have been annotated in datasets included in your --exclude recipe
parameter.
Typically, this means there's nothing to load from the file. Can you provide an example of your data? You can remove any sensitive data. It's common like the answers below to have a small mistake in your input data.
Unfortunately there are 44 issues that mention a similar error so there could a variety of issues.
Let me know if you can provide an example and then we can go from there. Also, let us know if you can do logging and provide back any details.
and I ran this command (since I didn't have your model used in the recipe, I used en_core_web_sm as an alternative):
python -m prodigy ner.teach sample en_core_web_sm data/sample.jsonl --label PERSON
I got this error:
✘ Error while validating stream: no first example
This likely means that your stream is empty.This can also mean all the examples
in your stream have been annotated in datasets included in your --exclude recipe
parameter.
However, like the error says, I think the problem is that you're not getting any predicted entities or pattern matches, hence the stream is empty.
For example, if I modify the data to:
{"text": "blabla \"A\" FAULT", "meta": {"id": "id1"}}
{"text": "Joe Biden is president of the United States.", "meta": {"id": "id2"}}
{"text": "blablabla", "meta": {"id": "id6"}}
I suspect your problem is not enough of the predictions on your data is hitting the active learning criteria. ner.teach is really just running this (if we ignore patterns):
from prodigy.components.sorters import prefer_uncertain
def score_stream(stream):
for example in stream:
score = model.predict(example["text"])
yield (score, example)
stream = prefer_uncertain(score_stream(stream))
When you run this, prefer_uncertain isn't returning any of your predictions from your stream as the default algorithm ema. It tracks the exponential moving average of the uncertainties, and also tracks a moving variance. It then asks questions which are one standard deviation or more above the current average uncertainty.
What you may want to do is modify the algorithm (or bias sorter) in the function prefer_uncertain.
Here's more of a background:
FYI to find your local recipe, run python -m prodigy stats, find your Location: path, then find the file recipes/ner.py where you'll find where ner.teach is defined. You can then either modify it directly or copy it to create your modified ner.teach recipe where you can try out a modified version.
So I can't guarantee you won't -- but as I mentioned, the current default algorithm is ema:
prefer_uncertain(stream, algorithm='ema'): This is the default sorter. It tracks the exponential moving average of the uncertainties, and also tracks a moving variance. It then asks questions which are one standard deviation or more above the current average uncertainty.
Therefore, it will only "ask questions" for spans that are one or more standard deviation above the current average uncertainty. I think you're not seeing any examples b/c none of your spans meet this criteria. Hence, why I recommended modifying.
Yes, you could but I would just use the ner.correct which will allow you to correct good and bad examples. Remember, the ner.teach is for modifying the order of what examples you see (active learning). If you don't want to use it, that's completely fine. You're welcome to filter out entities, but remember you do need negative examples (e.g., examples without any entities). So if you label only those with predicted entities, you may not build as robust of a model.