Use Prodigy purely as an annotating tool?

usage
spacy
solved

#1

Hi there,

I am new to both prodigy and spacy, and I just wanted to use prodigy as an annotation tool. Say for example: I have a set of data which contains 10 entries, where 2 of them are false, and I need to filter out these 2 false entries. Finally, I wish to export the filtered result in whatever formats for other usage.

All the best.


(Ines Montani) #2

Hi! And yes, absolutely! Prodigy mostly orchestrates the flow of incoming data → annotation UI → callbacks → collected annotations. Each of these workflows is expressed via a “recipe”, a Python script that returns a dictionary of components. While the tool ships with various advanced recipes that update a model in the loop etc., you can also just stream in pretty much any data, visualize it, label it and get the result back.

For example, let’s say your input is a .jsonl file that looks like this:

{"text": "Hello world"}
{"text": "Another text"}
{"text": "And another one"}

You could then use the mark recipe (details), which takes whatever data comes in, presents it for annotation and saves the results in the dataset. For example:

prodigy mark your_dataset_name /path/to/data.jsonl --view-id text

The view-id is the name of the annotation interface to use. Depending on the interface, you can add more properties to your data – e.g. the named entity spans, a top-level label, an image, HTML etc. You can find more details and examples of the formats in your PRODIGY_README.html.

Executing the above command will start Prodigy and serve up the annotation app. You can then open it in your browser and start labelling. Your answers will be sent back to the server and saved in the dataset. After annotation, you can export the collected data from the dataset:

prodigy db-out your_dataset_name > some_file.jsonl

The result could look like this:

{"text": "Hello world", "answer": "accept"}
{"text": "Another text", "answer": "reject"}
{"text": "And another one", "answer": "ignore"}

Prodigy uses newline-delimited JSON as its standard output format, since it’s easy to work with and easy to read in and manipulate in any language or library. So if you need a different format, it shouldn’t be difficult to convert the data.

Btw, if you’re interested in writing your own recipe scripts that do more custom stuff (load in data from a different format, perform certain actions when you receive annotations, render something custom), a good place to start is the prodigy-recipes repo: https://github.com/explosion/prodigy-recipes For example, here’s the code for the mark recipe I mentioned above:


#3

Hi Ines,

So many thanks for your assistance, which is indeed, really helpful and quick!

All the best


(Alonisser) #4

We’ve followed this thread and run (where news_headline_mark is the dataset name we would like and news_headlines_options.jsonl an example of the news_headline jsonl but with an options array)
prodigy mark news_headline_mark news_headlines_options.jsonl --view-id choice
we’ve been able to annotate with the UI, But we found out that results are saved into db only when we press the “save” button, This does not seems to be the behavior while using ner.teach instead of mark. How can we get “auto save” when the annotator is confirming a choice?

Thanks!


(Ines Montani) #5

Annotations should be saved the same way across all recipes and interfaces. Prodigy sends collected annotations back to the server in batches, so as soon as a batch is full, the answers will be saved and sent back to the server automatically. The most recent answers are kept on the client to allow hitting “undo”.

If you want the answers to be sent back sooner, you can change the "batch_size" setting in your prodigy.json or recipe config. The default batch size is 10.


(Alonisser) #6

Thanks. I’ll try that


(Alonisser) #7

Thanks @ines I’ve tried reducing batch_size to 5 in prodigy.json
I fail to make it autosave. while using mark, if I don’t explicitly push “save” button , no matter if I annotated more then 5 items, it does not save to db
Any solution to that?


(Ines Montani) #8

Hmm, that’s strange! Initially, you do have to annotate two batches for the first one to get autosaved. The most recent annotations will always stay in the app so you can undo them easily – so before you close the tab, you’ll always have to save manually to submit everything that’s left.

One thing you can do to check for intermediate autosaving is open the developer tools and look at the console or network tab for requests made by the app. After the initial answers, the app should make a POST request to /give_answers (sending back one batch of answers). It will also periodically make requests to /get_questions to request new tasks from the server.


(Alonisser) #9

Oh, sorry, it’s was the initial “two batches” thing that got me. After the second batch, the first one was saved


(Heilein Izaguirre) #10

Hello Ines, Is it possible to customize the UI? Over what language is built?


(Ines Montani) #11

The Prodigy library that configures the annotation workflows is written in Python, the app is build in JavaScript (React), shipped with the core library as a compiled bundle.

See this page for theming options and details on custom HTML annotation views:

We’re also currently testing supports for custom scripts – currently available for testing in the "html" interface. This lets you define your own actions and interfaces via custom recipes. See here for details and examples: