Add on new name entity incrementally...

Hi! In case you haven't seen it, you might find the NER flowchart useful that describes different scenarios and some general purpose tips and tricks:

Thats something that you have to include in your training data if this is how you want the model to behave. The ner.make-gold workflow could help you with that: it'll highlight all entities that the model already predicts and lets you correct the predictions and add more labels if needed.

If your new labels potentially overlap with existing types, you probably want to start training from scratch. Otherwise, you're constantly "fighting" the existing predictions and you'll need much more data to teach the model to suddenly predict a label very differently. You can still use a workflow like ner.make-gold to create the data semi-automatically, though.

While you can use the previously trained model in the loop, it's not really the best stategy if you want to train your final model. It's usually better and cleaner to use the same base model and train it on all annotations.

We don't distribute pretrained spaCy NER models for Chinese, so unless you already have your own pretrained model, you probably need to start from scratch anyways. The model components are completely independent – so in order to train an entity recognizer, you won't need a tagger or parser.

However, having word vectors can improve the training accuracy, because if word vectors are available, those will be used as features in the model and let you provide more information. How well this works in Chinese depends on the tokenization and you typically want to make sure that the vectors you use include embeddings for the same words that the tokenizer produces (and not just invidividual characters).

It's not that different? Prodigy just uses named keys for the start, end and label – so instead of (start, end, label), an entity is described as {"start": start, "end": end, "label": label}.

Sure, you can always export the data and use it in spaCy. The values are the same – and if you ever need BILUO tags etc., you can use spaCy's conversion utilities.

1 Like