I have trained a Spacy NER model with following training data
TRAIN_DATA = [
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
("LIBOR Interest rates", {"entities": [(0, 5, "LIBOR_WRD")]}),
]
When I? tested with this sentence "CPIMG is not a technology" , I get "CPIMG" getting detected as LIBOR_WRD . What has made space to detect this as LIBOR_WRD . I do not see the context or neighbour words are same as training data . The only this common is all "CAPITALS" how can i avoide this problem ?