Use Prodigy purely as an annotating tool?

wsdee1 · November 19, 2018, 4:18pm

Hi there,

I am new to both prodigy and spacy, and I just wanted to use prodigy as an annotation tool. Say for example: I have a set of data which contains 10 entries, where 2 of them are false, and I need to filter out these 2 false entries. Finally, I wish to export the filtered result in whatever formats for other usage.

All the best.

ines · November 19, 2018, 5:35pm

Hi! And yes, absolutely! Prodigy mostly orchestrates the flow of incoming data → annotation UI → callbacks → collected annotations. Each of these workflows is expressed via a “recipe”, a Python script that returns a dictionary of components. While the tool ships with various advanced recipes that update a model in the loop etc., you can also just stream in pretty much any data, visualize it, label it and get the result back.

For example, let’s say your input is a .jsonl file that looks like this:

{"text": "Hello world"}
{"text": "Another text"}
{"text": "And another one"}

You could then use the mark recipe (details), which takes whatever data comes in, presents it for annotation and saves the results in the dataset. For example:

prodigy mark your_dataset_name /path/to/data.jsonl --view-id text

The view-id is the name of the annotation interface to use. Depending on the interface, you can add more properties to your data – e.g. the named entity spans, a top-level label, an image, HTML etc. You can find more details and examples of the formats in your PRODIGY_README.html.

Executing the above command will start Prodigy and serve up the annotation app. You can then open it in your browser and start labelling. Your answers will be sent back to the server and saved in the dataset. After annotation, you can export the collected data from the dataset:

prodigy db-out your_dataset_name > some_file.jsonl

The result could look like this:

{"text": "Hello world", "answer": "accept"}
{"text": "Another text", "answer": "reject"}
{"text": "And another one", "answer": "ignore"}

Prodigy uses newline-delimited JSON as its standard output format, since it’s easy to work with and easy to read in and manipulate in any language or library. So if you need a different format, it shouldn’t be difficult to convert the data.

Btw, if you’re interested in writing your own recipe scripts that do more custom stuff (load in data from a different format, perform certain actions when you receive annotations, render something custom), a good place to start is the prodigy-recipes repo: https://github.com/explosion/prodigy-recipes For example, here’s the code for the mark recipe I mentioned above:

github.com

explosion/prodigy-recipes/blob/master/other/mark.py

# coding: utf8
from __future__ import unicode_literals

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.util import split_string
from collections import Counter


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe('mark',
    dataset=("The dataset to use", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    view_id=("ID of annotation interface", "option", "o", str),
    exclude=("Names of datasets to exclude", "option", "e", split_string)
)
def mark(dataset, source, view_id, exclude=None):
    """

This file has been truncated. show original

wsdee1 · November 22, 2018, 12:51pm

Hi Ines,

So many thanks for your assistance, which is indeed, really helpful and quick!

All the best

alonisser · December 4, 2018, 10:39pm

We’ve followed this thread and run (where news_headline_mark is the dataset name we would like and news_headlines_options.jsonl an example of the news_headline jsonl but with an options array)
prodigy mark news_headline_mark news_headlines_options.jsonl --view-id choice
we’ve been able to annotate with the UI, But we found out that results are saved into db only when we press the “save” button, This does not seems to be the behavior while using ner.teach instead of mark. How can we get “auto save” when the annotator is confirming a choice?

Thanks!

ines · December 5, 2018, 1:24am

Annotations should be saved the same way across all recipes and interfaces. Prodigy sends collected annotations back to the server in batches, so as soon as a batch is full, the answers will be saved and sent back to the server automatically. The most recent answers are kept on the client to allow hitting "undo".

If you want the answers to be sent back sooner, you can change the "batch_size" setting in your prodigy.json or recipe config. The default batch size is 10.

alonisser · December 5, 2018, 7:23am

Thanks. I’ll try that

alonisser · December 9, 2018, 11:47am

Thanks @ines I’ve tried reducing batch_size to 5 in prodigy.json
I fail to make it autosave. while using mark, if I don’t explicitly push “save” button , no matter if I annotated more then 5 items, it does not save to db
Any solution to that?

ines · December 9, 2018, 12:09pm

Hmm, that's strange! Initially, you do have to annotate two batches for the first one to get autosaved. The most recent annotations will always stay in the app so you can undo them easily – so before you close the tab, you'll always have to save manually to submit everything that's left.

One thing you can do to check for intermediate autosaving is open the developer tools and look at the console or network tab for requests made by the app. After the initial answers, the app should make a POST request to /give_answers (sending back one batch of answers). It will also periodically make requests to /get_questions to request new tasks from the server.

alonisser · December 9, 2018, 12:42pm

Oh, sorry, it’s was the initial “two batches” thing that got me. After the second batch, the first one was saved

Enix26 · December 12, 2018, 1:45pm

Hello Ines, Is it possible to customize the UI? Over what language is built?

ines · December 12, 2018, 2:04pm

The Prodigy library that configures the annotation workflows is written in Python, the app is build in JavaScript (React), shipped with the core library as a compiled bundle.

See this page for theming options and details on custom HTML annotation views:

We're also currently testing supports for custom scripts – currently available for testing in the "html" interface. This lets you define your own actions and interfaces via custom recipes. See here for details and examples:

Topic		Replies	Views
prodigy use case for annotation having pre-annotated text usage , solved	8	1264	March 11, 2019
Labels in mark, and multiuser access to prodigy usage , solved	7	2774	June 28, 2018
Question: Automatically save	1	397	April 28, 2022
How do I use prodigy as a purely annotation tool with no underlying SpaCy model? usage	1	1591	April 27, 2018
No Save button on the UI	14	225	November 1, 2023

Use Prodigy purely as an annotating tool?

Related topics