Spacy NER Training, How to proceed name placeholders in a text

Hi
I want to train the spacy algorithm with prodigy. But now I have a logical question. In the end, I am only interested in names and organization names in a text, which I want to point out with spacy. If I train spacy, do I have to label placeholders for names too or only the names in the text?

For example:
1a. Mark Zuckerberg decided to invest in AI.
1b. The Facebook-CEO decided to invest in AI.

In 1a, I would label Mark Zuckerberg as a person but do I also label Facebook-CEO as a person? Generally, it is a person too, but I am just interested in the names in the end. Will spacy be confused if I label only the names and not the placeholders like "CEO"? Or should I label all the words in the context of the sentence a person and ignore the none names when I proceed with the results?

Same problem with the organizations. Example:
2a. Facebook invests in promising startups.
2b. The company invests in promising startups.

Same here. In the end, I need Facebook for further proceedings and not 'the company.' But will it affect the spacy result when I label the names when I train the algorithm?

Thanks for your help.

Hi! I'd say the question here is less about what's "right" or "wrong" from spaCy's perspective and more about how to best design the label scheme to make sure that the model learns effectively, and the annotations are consistent :slightly_smiling_face:

When person names are annotated for named entity recognition, you'd typically only annotate the actual name, i.e. the proper noun, not references. So "Mark Zuckerberg" would be labelled as PERSON, and "Facebook CEO" wouldn't be labelled. This is also the scheme used in other corpora, and what the model is optimised for, and it's also much easier to enforce consistently.

Resolving references is usually a separate task and something you'd probably want to approach differently – for instance, by predicting whether a noun or pronoun refers to a previously mentioned known entity, or by using rules to extract a previously mentioned organization if you come across "the company" as the subject of a sentence.