Ner.teach annotations that improved model produced from ner.batch-train

Quanti · August 5, 2019, 6:50pm

Hi,

We would like to warp up around 800 previously annotated (by labelers) data into a model and hope to improve it by manually looking into each label.

The current path we are taking right now:

db-in the annotations --> ner.batch-train to build a model (adjust the combination to achieve the highest result) --> ner-teach to manually Accept or Deny each instance

How do we export the manually adjusted annotations to the previously trained model? Or is it possible to export these annotations to train a new model?

ines · August 6, 2019, 8:23am

Hi! I don't think I understand the question, sorry! Could you give an example?

Quanti · August 6, 2019, 5:51pm

Hi Ines,

Here is what I have done for my project (to generate a model that auto-label data based on 800 pre-annotated data):

I created a database that contains the 800 pre-annotated data
I run ner.batch-train to generate model A which will tell me the accuracy rate of this mode
In order to improve the model, I used ner-teach to manually go through each of the pre-annotated data and is able to fix a lot of them by clicking yes or no

My question: how to build a model B that includes the manually annotated data from step 3? Or does all ner-teach data automatically updated model A, when I saved the and closed ner-teach ?

ines · August 7, 2019, 8:39am

Thanks for the clarification!

ner.teach doesn't save out the updated model, no. You typically always want to batch train the model "properly" afterwards to get even better results.

When you annotate with ner.teach, make sure you save those annotations to a separate dataset. When you're done annotating, you can then take model A, run ner.batch-train with the accept/reject annotations and output model B.

Quanti · August 7, 2019, 2:21pm

Hi Ines,

Thanks for your response. What is the command for saving ner.teach results to a seperate dataset? I initiated ner.teach as following:

dataset DatasetA "DatasetA"
db-in DatasetA pre_annotated_data_1.json
ner.batch-train DatasetA en_core_web_sm --output modelA
ner.teach DatasetA modelA pre_annotated_data_1.json

Are you suggesting that I should db-out the saved annotation from step 4 to a seperated dataset (DatasetB) and ner.batch-train from DatasetB?

ines · August 7, 2019, 4:23pm

Sorry if this was unclear! I meant that when you run ner.teach, the first argument (in your example, "DatasetA"), should be a different name. For example, "DatasetB". This will save the annotations you create to the othee dataset.

When you then batch train again in step 5, you can update the output model of step 4 with the annotations from DatasetB.

Quanti · August 7, 2019, 4:54pm

I think I understand now,

I should be doing this: ner.teach DatasetB modelA pre_annotated_data_1.json in order to save the newly annotated data in to a seperate dataset (DatasetB)

Then I am able to do ner.batch-train such as:
ner.batch-train DatasetB en_core_web_sm --output modelB

Let me know if I am understanding right

Quanti · August 7, 2019, 7:42pm

I have encountered another issue regarding the above:

When I followed what you mentioned:
1)ner.teach result save to a separate dataset (DatasetB), around 1600 annotations
2)ner.batch-train DatasetB en_core_web_sm --output modelB

I only see 300 examples loaded in ner.batch-train which command I used wrong?

ines · August 8, 2019, 8:57am

Are these the actual examples that are loaded, or the examples used for training? It's possible that you end up with fewer unique training examples, because Prodigy will merge all annotations on the same text into one example. The recipe will also hold back examples for evalution if you do not provide an evaluation set, so it can output accuracy results.

Quanti · August 8, 2019, 1:48pm

well, the total examples is 400, i used a 20/80 split. Even though, the total annotations I made was 1000+.

When I db-in the original examples (800 pre-annotated data), and ner.batch-train at the first place, it used 700ish examples for training, which means it loaded all examples for training the model?

Topic		Replies	Views
NER workflow / database questions usage , ner	4	758	July 19, 2020
Getting Started Questions usage , ner	1	630	November 6, 2018
Adding labels in ner.batch-train enhancement , usage , ner , done	3	986	February 20, 2018
NER training usage , ner	1	532	January 11, 2020
overwriting annotations ner	2	1243	May 28, 2018

Ner.teach annotations that improved model produced from ner.batch-train

Related topics