A good place to start is the recipes overview on the website, which includes all available recipes and commands with visual examples. The first steps guide also shows simple usage of working with the database and datasets. You might also want to check out our video tutorials – even though they're showing different usage examples, they might be helpful to get a better feeling for an end-to-end workflow using Prodigy.
Prodigy uses a simple JSON/dictionary format for the annotation tasks. This keeps things simple, and makes sure you can easily reuse the data in other processes. For example, if your input looks like this:
{"text": "Hello world"}
The annotated task will look like this:
{"text": "Hello world", "answer": "accept"}
You can find more details and examples of this in the "Annotation task format" section of your PRODIGY_README.html
.
This indicates that there's likely something going wrong with the database connection – i.e. the connection doesn't work, the tables are not created correctly or the data is not saved. (I'm surprised it didn't throw an error, though!)
Could you try running the same commands with the default SQLite database and see if this works as expected? And do you see any suspicious output on the command line or in the log?
Yes, the database issue above likely also explains why the exclude
option is not working – the dataset is empty, so there are no examples to exclude. Under the hood, Prodigy assigns hashes to the incoming task – one for the input data (e.g. "text"
, "html"
or the JSON-dumped task). It also assigns a hash for the input data plus features you're annotating (e.g. labels or spans – this is less relevant in your case, though, since you're only annotating the incoming HTML data.
Yes, the easiest way to do this would be to create a custom HTML template. For example, lets' say your data looks like this:
{"customer": {"first_name": "John", "last_name": "Doe"}, "master": {"first_name": "John", "last_name": "Doe"}}
You can then access the data as Mustache variables in your HTML template, including nested values. For example, something like this:
<table>
<tr>
<td>{{customer.first_name}}</td>
<td>{{customer.last_name}}</td>
</tr>
<tr>
<td>{{master.first_name}}</td>
<td>{{master.last_name}}</td>
</tr>
</table>
You can add the HTML template to the 'config'
returned by your recipe, e.g.:
return {
'config': {'html_template': HTML_TEMPLATE},
# etc.
}
Edit:
Yes – the same functionality should also be taken care of by the exclude
logic. TASK_HASH_ATTR
is simply a constant for "_task_hash_attr"
, which you might have seen in your annotated examples. The main reason we're using the variable here is that it's a little cleaner than hard-coding the string. But as I said, you shouldn't have to worry about this – instead, you can use one of the stream filter functions or the set_hashes()
helper if you need to do hashing and filtering yourself (see the docs for the detailed API).