I'm trying to repeat your steps. I started by creating a folder called issue-6037 and moving your files in there with the names news_headlines.jsonl and news_headlines_small.jsonl. From there I started annotating via this recipe:
python -m prodigy ner.manual ner_news_headlines blank:en news_headlines.jsonl --label PERSON,ORG,PRODUCT,LOCATION
This is what that interface looks like:
I annotated six examples and I hit the save button. Next, I ran your terms recipe.
python -m prodigy terms.to-patterns ner_news_headlines --label PERSON,ORG,PRODUCT,LOCATION --spacy-model blank:en > news_pattern.jsonl
This is what my news_pattern.sjonl file looks like:
{"label":"PERSON,ORG,PRODUCT,LOCATION","pattern":[{"lower":"uber"},{"lower":"\u2019s"},{"lower":"lesson"},{"lower":":"},{"lower":"silicon"},{"lower":"valley"},{"lower":"\u2019s"},{"lower":"start"},{"lower":"-"},{"lower":"up"},{"lower":"machine"},{"lower":"needs"},{"lower":"fixing"}]}
{"label":"PERSON,ORG,PRODUCT,LOCATION","pattern":[{"lower":"pearl"},{"lower":"automation"},{"lower":","},{"lower":"founded"},{"lower":"by"},{"lower":"apple"},{"lower":"veterans"},{"lower":","},{"lower":"shuts"},{"lower":"down"}]}
{"label":"PERSON,ORG,PRODUCT,LOCATION","pattern":[{"lower":"how"},{"lower":"silicon"},{"lower":"valley"},{"lower":"pushed"},{"lower":"coding"},{"lower":"into"},{"lower":"american"},{"lower":"classrooms"}]}
{"label":"PERSON,ORG,PRODUCT,LOCATION","pattern":[{"lower":"women"},{"lower":"in"},{"lower":"tech"},{"lower":"speak"},{"lower":"frankly"},{"lower":"on"},{"lower":"culture"},{"lower":"of"},{"lower":"harassment"}]}
{"label":"PERSON,ORG,PRODUCT,LOCATION","pattern":[{"lower":"silicon"},{"lower":"valley"},{"lower":"investors"},{"lower":"flexed"},{"lower":"their"},{"lower":"muscles"},{"lower":"in"},{"lower":"uber"},{"lower":"fight"}]}
{"label":"PERSON,ORG,PRODUCT,LOCATION","pattern":[{"lower":"uber"},{"lower":"is"},{"lower":"a"},{"lower":"creature"},{"lower":"of"},{"lower":"an"},{"lower":"industry"},{"lower":"struggling"},{"lower":"to"},{"lower":"grow"},{"lower":"up"}]}
And I think, looking at this file, that the recipe isn't doing what you had hoped it did. Notice how each row has "PERSON,ORG,PRODUCT,LOCATION" as a label? While this isn't the error message that you're experiencing, I'm assuming that it's related. The terms.to-patterns recipe is designed to be used together with the terms.teach recipe, not the ner.manual one.
This Youtube video helps explain how it's meant to be used.
Custom Recipe
That said, nothing is stopping you from writing a custom script that can turn your previous annotations as terms. Here's a small script that does that.
import srsly
import prodigy
from prodigy.components.db import connect
@prodigy.recipe(
"terms.from-ner",
ner_dataset=("Dataset loader NER annotations from", "positional", None, str),
file_out=("File to write patterns into", "positional", None, str)
)
def custom_recipe(ner_dataset: str, file_out: str):
# Connect to Prodigy database
db = connect()
# Load in annotated examples
annotated = db.get_dataset(ner_dataset)
# Loop over examples
pattern_set = set()
for example in annotated:
for span in example.get("spans", []):
pattern_str = example['text'][span['start']: span['end']]
# Store into tuple, because sets like that
tup = (pattern_str, span['label'])
pattern_set.add(tup)
patterns = [{"pattern": p, "label": l} for p, l in pattern_set]
srsly.write_jsonl(file_out, patterns)
If you're curious how to work with patterns and custom code, you may appreciate the guide in the docs here. When I run this locally via:
python -m prodigy terms.from-ner ner_news_headlines patterns.jsonl -F recipe.py
Then the file patterns.jsonl contains this:
{"pattern":"Apple","label":"ORG"}
{"pattern":"Silicon Valley","label":"LOCATION"}
{"pattern":"Uber","label":"ORG"}
{"pattern":"Pearl Automation","label":"ORG"}
I can now use these patterns to do ner.manual.
python -m prodigy ner.manual news_data blank:en news_headlines.jsonl --label PERSON,ORG,PRODUCT,LOCATION --patterns patterns.jsonl
Here's what it looks like:
Note how some entities are pre-labelled but also note that there's now PATTERN metadata in there. This tells you which patterns got activated. I hope this helps!