Annotating for few labels+new label but training on all labels

Hi,

I have annotated good number of examples on labels like PERSON, ORG, PRODUCT, CARDINAL, DATE and TECH(new label which I created using terms.teach). For the model batch training I am giving all labels as below.
–label [PERSON,DATE,ORG,GPE,LOC,TECH,CARDINAL,ORDINAL,LAW,WORK_OF_ART,EVENT,PRODUCT,FACILITY,NORP
,LANGUAGE,TIME,PERCENT,MONEY,QUANTITY]

Is there any side effect on the accuracy of the other labels by not having annotations (even rejects) for some of these labels?

I’ve tried to design the training algorithm to minimise this problem, so hopefully no there won’t be much side-effect on the accuracy of the other labels. However, the answer is ultimately empirical — you’ll need to run a test and see what’s happening on your data.

If you’re seeing that the accuracy does decline on these unlabelled entity types, you might try including some text for which you have no annotations. This should stabilise the training somewhat, by encouraging the model to stick to its original behaviour. If that still doesn’t work, you could try adding the initial annotations to those unlabelled sentences as gold annotations.

1 Like