I’m trying to create a custom NER with swedish street addresses and labeled it STREET.
I’ve added a patterns file to cover street name followed by a number
{"label": "STREET", "pattern": [{"is_alpha": true},{"is_digit": true}]}
I have also generated a txt file that contains 60k phrases that includes the pattern together with some phrases soI receive some context.
Sample phrases translated to english
I want to book a cab to Streetname 1
Can you send me a car to Streetname 2
I run the following command and annotate around 2k
prodigy ner.teach street swedish-model streetphrases.txt --label STREET --patterns patterns.jsonl
After that I tried with ner.batch-train with the following command and result
prodigy ner.batch-train street new-swedish-model --label STREET --output new_model
Using 1 labels: STREET
Loaded model new-swedish-model
Using 20% of accept/reject examples (282) for evaluation
Using 100% of remaining examples (1132) for training
Dropout: 0.2 Batch size: 16 Iterations: 10
BEFORE 0.000
Correct 0
Incorrect 2
Entities 0
Unknown 0
# LOSS RIGHT WRONG ENTS SKIP ACCURACY
01 2480.805 0 2 1180 0 0.000
02 1165.106 0 2 1175 0 0.000
03 989.513 0 2 1173 0 0.000
04 900.055 1 1 1171 0 0.500
05 846.116 1 1 1170 0 0.500
06 821.424 1 1 1173 0 0.500
07 884.305 1 1 1173 0 0.500
08 848.783 2 0 1172 0 1.000
09 825.181 1 1 1171 0 0.500
10 787.083 2 0 1170 0 1.000
Correct 2
Incorrect 0
Baseline 0.000
Accuracy 1.000
What am I doing wrong here?
I thought there was an issue in the dataset so i printed it just to see.
0.97 A car to Central Street 51 STREET