Does Prodigy allow multiple recipies to run concurrently ?

usage
ner
solved

(Bhanu Sharma) #1

So is there a way i can run ner.manual to manually annotate data and also use ner.teach where possible concurrently ?


(Ines Montani) #2

Hi! I’m not 100% sure what you mean by “concurrently”, so here are some possible use cases and solutions:

1. Running two recipes at the same time, e.g. for different annotators.

If you want to run both recipes at the same time and have different annotators work on them, that’s easy. You can just start two separate processes. As of v1.4.0, the PRODIGY_HOST and PRODIGY_PORT environment variables let you overwrite the host and port to serve the web app and REST API, so you can easily set different ports on the command line:

PRODIGY_PORT=1234 prodigy ner.manual your_dataset en_core_web_sm your_data.jsonl --label PERSON
PRODIGY_PORT=5678 prodigy ner.teach other_dataset en_core_web_sm your_data.jsonl

In theory, you can also add the annotations to the same dataset. However, I wouldn’t recommend that, because both recipes produce different training data and ideally, you’d want to run your experiments separately. The annotations produced by ner.manual are great for evaluation data or to correct very specific edge cases, whereas the data your collect with ner.teach is the best selection of examples to improve an existing model.

2. Using a hybrid of ner.manual and ner.teach.

If you’re looking for a recipe that uses an existing model’s predictions and lets you correct them manually, you might want to try the ner.make-gold recipe instead. It will stream in the texts and highlight the model’s predictions for a given label, which you can then accept or reject, or correct accordingly:

prodigy ner.manual your_dataset en_core_web_sm your_data.jsonl --label PERSON

Note that unlike ner.teach, ner.make-gold won’t update a model in the loop – the reason is that with manual annotation only, you’ll need a lot of examples to get meaningful results, especially if you’re training a new category. So the model in the loop wouldn’t be able to learn quickly enough for the active learning to really make a difference. Which also leads to the next point…

3. Collecting “new” annotations with a model in the loop.

Fully manual annotation is very tedious and we actually think that it’s something you should only have to resort to for very difficult edge cases and for gold-standard evaluation data. If you’re looking to bootstrap a new entity type from scratch, Prodigy offers other, more efficient ways that help you do that – for example, supplying a list of --patterns with explicit or abstract examples of the entities you’re looking for.

prodigy ner.teach your_dataset en_core_web_sm your_data.jsonl --patterns patterns.jsonl --label DOG

Each pattern consists of a list of token descriptions, similar to the patterns used by spaCy’s Matcher. The patterns will be used to find initial examples in your data, until the model has seen enough to start making its own suggestions in the loop.

{"label": "DOG", "pattern": [{"lower": "poodle"}]}
{"label": "DOG", "pattern": [{"lower": "golden"}, {"lower": "retriever"}]}

Here’s a video tutorial we’ve recorded that shows the whole end-to-end workflow (see here for a TL;DR summary). We start with collecting examples of the new entity type DRUG using word vectors, convert those examples to match patterns and then use those patterns to suggest more examples in the data during ner.teach, until the model has learned enough so it can start making suggestions.


(Bhanu Sharma) #3

So my use case is similar to what you describe in method 2 but unfortunately it doesn’t use Active learning. Is it possible if i create annotations from ner.manual and ner.teach for seperate kind of entities, for example using ner.manual for toy names and ner.teach for extracting commencement dates, and then combine those annotations to train my model on ?


(Ines Montani) #4

Sure, that’s no problem! Prodigy datasets can be exported as simple JSONL files, so you can merge them into a single dataset later on, run separate experiments training on the individual datasets or use the pre-trained model as the input model for the next training session:

prodigy ner.batch-train toys_dataset en_core_web_sm --output /output-model
prodigy ner.batch-train dates_dataset /output-model --output /new-output-model

Once you’re getting more “serious” about training, you might also want to look into strategies for preventing the “catastrophic forgetting problem”. You don’t want your model to overfit on the new data and “forget” what it had previously learned. So one solution could be to always make sure to include examples of what the model previously “got right”. Prodigy should hopefully make this a lot easier, because you can put together different datasets and run quick experiments to find out what works best on your data. (It’s always difficult to give definitive advice here, because it always depends on your very specific use case and the data you’re working with.)


(Bhanu Sharma) #5

Thanks, also, does ner.make-gold allow using patterns too for detecting entities, for eg. instead of standard spacy entities, as, PERSON, ORG, DATE i would want PARTY, COMMENCEMENT_DATE etc, just like ner.teach allow patterns ?


(Ines Montani) #6

No, ner.make-gold uses the model’s predictions to suggest the entities – so its main purpose is bootstrapping gold-standard data by correcting the already existing entities, not adding new types. To train your new entity types, you might want to chain different recipes together and try a workflow like this:

  1. Create match patterns for your new types like PARTY, COMMENCEMENT_DATE etc.
  2. Use ner.teach with a model in the loop and patterns to bootstrap training examples. The active learning plus patterns can be really helpful here, because it lets you collect a larger training set more quickly.
  3. Pre-train your model from the annotations.
  4. Run ner.make-gold with your model, see how it performs and correct its predictions.
  5. Train again and evaluate the results (also try the ner.train-curve to see how your model is improving).
  6. Identify the areas that are still problematic, and collect more specific examples.

One recipe I forgot to mention in my comment is ner.match, which uses a patterns file to suggest entities, which you can then accept or reject. The source of the recipes is also included with Prodigy, so once you’re familiar with the built-in recipes, you could also experiment with building your own and taking inspiration from the existing recipes. For example, you could try a recipe using the ner_manual interface that’s populated with patterns. See this page for more details on custom recipes.

Btw, based on your examples, I assume you’re working with legal texts? If so, I’d also recommend checking out @wpm’s posts on this forum. He’s actually built some pretty sophisticated training pipelines with Prodigy for legal NER and has shared a lot of his findings and work in progress.


(Bhanu Sharma) #7

Yeah true, I am currently working with legal texts. Thanks for the advice, i’ll check @wpm’s posts.