UserWarning: [W030] Some entities could not be aligned in the text

Hello! We started from ner.manual annotating 10 custom entities and faced with the following issue trying to train a NER model:

  UserWarning: [W030] Some entities could not be aligned in the text "The apartments
     are nice, quiet and peaceful. Every..." with entities "[]". Use `spacy.gold.bil
    uo_tags_from_offsets(nlp.make_doc(text), entities)` to check the alignment. Misa
    ligned entities (with BILUO tag '-') will be ignored during training.

prodigy train ner reviews_20210420_annotated_sample blank:en --ner-missing

Could you please point to the guid how to annotate data so entities will be aligned with tokens?

Running the same command w/o --ner-missing disables the warning

Hi!

In essence this is just a warning and you can probably continue training. In the background, the misaligned entities will be ignored. This is an option if you have just a few misaligned cases, but obviously it's always good to try and understand what went wrong with the original data annotation & alignment.

I'm surprised that you ended up with misaligned entities when you created the dataset with ner.manual. Moreover, to add to my confusion, your cited error message reads:

with entities ""

This is weird, because typically you'd get a list of entity offsets there, but now you're just getting an empty list.

Some questions to try and get to the bottom of this:

  • Are you positive your dataset contains entity annotations?
  • Could you share the exact commands you ran to annotate your data and then to train it?
  • Before you started annotation with ner.manual, was the database empty?
  • Is there any chance you can share some example instances from your annotated dataset that would produce this error when running prodigy train?

If we can reproduce the problem, it'll be much easier for us to help you debug the problem & resolve it :wink: