For the example " Example: Custom interfaces with choice, manual NER, free-form input and custom API loader", Could you please let me know how to create "cat_facts_data"?
cat_facts_data in the example command is the first argument of the recipe, so the name of the dataset that the annotations will be saved to. This is is a name you can choose, and when you annotate, Prodigy will create this dataset in your database and your annotations will be saved to it. You can then export your annotations using the db-out command with the name of your dataset, i.e. cat_facts_data.
The example recipes in the repo are slightly simplified and set up to work as standalone functions – we definitely want to give them another update, though, for the next release to reflect some of the newer features we added.
If you want to see the exact code Prodigy runs in the built-in recipes, you can always run prodigy stats to find the location of your local Prodigy installation, and then check out the Python files in prodigy/recipes.
Now I have raw data and patterns, and uses Spacy to generate "doc".
You mentioned " you can create data in Prodigy's format pretty easily using the processed doc. ... You can then add it to a dataset using Prodigy's database API: https://prodi.gy/docs/api-database#database"
Could you please show me some sample codes to generate prodigy dataset from Spacy "doc"?
In general, you can always write your own filter function in your custom recipe – streams are regular Python generators, so you can do something like this and apply any filtering you need:
def filter_stream(stream):
for eg in stream:
# filter based on some properties in the example here
yield eg
The update callback also gives you access to the batches of annotated examples that are submitted in the UI. You could then store any information about those already annotated examples in a variable in your recipe function that the filter_stream generator also has access to. This way, it can respond to collected annotations (which is also how annotating with a model in the loop works under the hood).
def update(answers):
# do something with the answers here
...
One thing to keep in mind is that the stream and answers are sent in batches. So any update you make based on collected annotations will only affect the next batch that's being created afterwards (not the examples that are already queued up for annotation in the app).