Im trying to create a new entity for broadband related products and started out with creating a pattern file.
The problem occurs at the end when trying to run the new model on new or existing phrases. It identifies all words in the sentence as label product. I start with a complete new empty model.
Step 1. Create the pattern file and save it to service_pattern.jsonl
Step 2 :I import a file with suitable phrases and start the annotation tool. In this case I use a file with 50 sentences but I have been using a file with thousands of rows with the same result.
Step 4: The training performs well with accuracy of 1.0
Accuracy 1.000
Problem: The problem comes when I try to use the new model as input when i ner.teach on new phrases.
Prodigy identifies all words inside the phrase as products.
Even when Im using the service-phrases.txt above it missmatch.
Yes, an accuracy of 1.0 is always suspicious! Your workflow sounds alright – could you post the full results after training that includes all statistics?
I cant reproduce the scenario with the same 1.0 accuracy but I started over with a new dataset and fresh model. The result is the same but with lower accuracy. I also added a new batch of training after the first round but it makes no difference.
Created a new dataset with name ner_product and start annotating with the same patterns file.
Do you have any full results and/or examples you can share from when you used larger dataset with thousands of examples? The problem here is that it’s super difficult to draw any conclusions from the results, because you’re only training from 24 examples. Even if the result is similar to the result you see if you train with thousands of examples, it might be for completely different reasons.