Custom view templates with scripts

front-end
enhancement
done

(Justin Du Jardin) #1

I’m having a difficult time getting prodigy to allow custom scripts in my recipe specific templates, e.g.

<div class="button">Custom Stuff</div>
<script>
  alert("Yay!");
</script>

There’s a react github discussion about it, but the TLDR is it looks like ele.innerHTML = '<script>...</script>'; doesn’t work.

I saw a suggestion that maybe static/index.html could be modified to include a script, but I don’t think that’s a very workable solution (unless I misunderstand) because you’d have to copy the files into or out of site-packages.

Perhaps a new recipe option to specify a single script file that will be injected into the base page, when combined with an html_template, e.g.

with open('recipes/textcat_eval.html') as txt:
    template_text = txt.read()
with open('recipes/textcat_eval.js') as txt:
    script_text = txt.read()

@recipe('textcat.eval',...)
def evaluate(...):
    ...
    return {
        'view_id': 'html',
        'config': {
            'html_template': template_text,
            'html_script': script_text,
        }
    }

I’m interested in generating more than one example from a single example being reviewed, so I would like to add new interactions that let the user give slightly more feedback before a yes/no. This should allow me to generate multiple examples for the DB when they do finally accept/reject the example.

I can’t really approach this kind of problem without being able to include scripts. I’m not blocked on this now, but I want to bring it up for a discussion because I think allowing scripting will lead to much more interesting custom templates.


(Ines Montani) #2

Thanks – this is actually something I’ve been thinking about a lot, and I still don’t have a perfect solution. The main problem is that in order to interact with the current annotation task that’s displayed, user scripts would also need to be able to call the pre-compiled app’s internals (more specifically, dispatch an action). This is pretty difficult.

Because Prodigy is a developer tool and only intended to be hosted internally, preventing cross-site scripting isn’t such a high priority here. So executing random user code wouldn’t be a problem either – but it also wouldn’t be very useful.

I did experiment with a few possible solutions, including some crazy stuff like evaling raw JSX with Babel. To my surprise, it worked – but it still feels very wrong.

In theory, we could also allow users to add custom hooks to their markup to tell Prodigy how to pass values forward (like <input data-prodigy-field="user_text">). But markup added via dangerouslySetInnerHTML is essentially a black box, so it would always require the front-end to actually parse the HTML and convert it to React elements. This seems doable – I’m just not sure if it’s a reasonable approach. However, It still feels more natural to me than reinventing the wheel and coming up with our own templating API – like, an arbitrary JSON format or something like that. After all, this API already exists: it’s HTML and JavaScript.

Another question is how opinionated we want the Prodigy app to be, and to what extent the front-end should be allowed to modify the incoming data. Allowing it to freely edit annotation tasks is potentially problematic – a small bug or inconsistency can easily ruin an entire dataset without the user even noticing. So initially, the front-end’s editing privileges were limited to the "accept" property only. (With the choice and manual NER interface, we ended up adding "accept" and "spans" as well.)

Anyway, I’m curious to hear your thoughts on this! (Here’s a related discussion on user input fields in Prodigy btw.)


(Justin Du Jardin) #3

One trick I’ve used in the past is to place functions on the root object from some appropriate place in my app, e.g. a singleton or app component init:

window.prodigy = {
  addExample: (...) => {
    this.store.dispatch(...);
  }
}

By using global functions, you can hide your app/framework internals from the consumer of the API. The implementation is also relatively straightforward, so it could be an easy way to dip your toes into exposing an API without building up too much technical debt.

Nice! Sometimes hacks are needed for building compelling tools. :sweat_smile: There was also a post on that react thread that suggested a less hacky approach using createContextualFragment

I would be very opinionated about the data that comes into your system and leave the frontend as open for hacking as you can tolerate. This way you can make guarantees about not letting a user screw up the dataset by doing something stupid while also letting them execute basic project-specific scripts/hacks/workarounds/prototypes inside the prodigy frame.

Perhaps something like JSON Schema could help? You define schema/s for the types of JSON data your system uses, and you can use them to validate objects before they touch your data layer. There are implementations in most major languages, and modern IDEs can usually pick up on the schemas for inline validation.

Here’s what a text annotation schema could look like:

{
  "title": "TextAnnotation",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "text": { "type": "string" },
    "answer": {
      "type": "string",
      "enum": ["accept", "reject", "ignore"]
    },
    "meta": { "type": "object" },
    "_input_hash": { "type": "number" },
    "_task_hash": { "type": "number" },
  },
  "required": ["text", "answer", "_input_hash", "_task_hash"]
}

You could allow for user-specific data by adding something like a “user” field to your schema and lock down the rest of the object. When an object fails to validate, the specific properties that failed are available for generating helpful error messages. If it proved useful you could even apply it to validation during other prodigy things like db-in or prodigy.json.


Recipe ner.eval-ab never returns any tasks
(Ines Montani) #4

Thanks – this is a really nice solution! (No idea why I hadn’t thought of this earlier.) I just tested it and it actually works pretty well: here’s a quick proof of concept :tada:

I’ve also been refactoring the web app to use a simpler scheme of props passed down to the annotation interface components. There will now only be four, all of which could be exposed via window.prodigy:

Prop Type Description
config object The Prodigy configuration – includes everything defined in the prodigy.json files and recipe config (minus sensitive data), plus dataset meta.
content object The annotation card’s content, i.e. the current annotation task.
update function Function to update the current annotation task. Takes an object of the top-level properties to overwrite (currently no deep merging).
answer function Function to answer an annotation task and trigger all respective actions. Takes a string of the answer, e.g. 'accept'. Should be used cautiously and only in special cases like the "choice_auto_accept" – if possible, answering should only happen via the action buttons, not the annotation card interface.

So assuming Prodigy allows you to add a custom script (which will be fairly trivial to implement), this would let you build pretty complex, custom interfaces and even powerful manual annotation modes. You could even introduce your own config parameters. It still wouldn’t fully solve your custom use case of creating multiple examples from one task, but we could probably add another function for this.

Now, if we also rerender the Mustache template when the content is updated, you’ll end up with a pretty React-like experience in your custom components. For example:

{{#spans}}<span onClick="removeSpan('{{id}}')">{{text}}</span>{{/spans}}
function removeSpan(id) {
     // here we assume that each span has a unique ID assigned
    const newSpans = window.prodigy.content.spans.filter(span => span.id != id)
    window.prodigy.update({ spans: newSpans })
}

Prodigy could even dispatch custom events that your interface could listen to – for example, to trigger additional actions when a question is answered:

document.addEventListener('prodigyanswer', ev => {
    console.log('The answer was: ', ev.answer)
})

This is a good idea! I’ve always been a little unhappy about the hacky and rudimentary validation on the front-end, so using a proper validation scheme might actually make things a lot easier. I’ll look into this!

Anyway, thanks for your great input – I feel like this discussion was super productive, and I can’t wait to start playing around with the custom script API. (If you’re interested in beta testing it, I could get in touch once I have a working version?)


(Matthew Honnibal) #5

Woo! This is super cool — really breaks down one of the current limitations. Also, definitely +1 on the schemas…The lack of structure in the data is a hassle in the backend as well, so having some validation would help.


(Ines Montani) #6

Update: All works pretty smoothly already! ::tada:

custom_js_poc

The above example only needed the following html_template and javascript config:

<button class="custom-button" onClick="updateText()">
    👇 Text to uppercase
</button>
<br />
<strong>{{text}}</strong>
let upper = false;

function updateText() {
    const text = window.prodigy.content.text;
    const newText = !upper ? text.toUpperCase() : text.toLowerCase();
    window.prodigy.update({ text: newText });
    upper = !upper;
    document.querySelector('.custom-button').textContent = '👇 Text to ' + (upper ? 'lowercase' : 'uppercase')
}

Things I tested in the above example that already work:

  • Dynamic update of the annotation task’s content in the HTML template.
  • Keeping arbitrary state within the custom JavaScript – pretty rudimentary here, but could easily be expanded. (Still need to test ways to keep cross-task state.)
  • Accessing elements in custom HTML components via the DOM – alternatively, I think you could also pass this to updateText(), but working with query selectors is often cleaner. (Nice plus: class names are unlikely to conflict with built-ins, because Styled Components mangles them when the Prodigy app is compiled.)

I’ve also added a user_config option to the config parameters passed in via the recipe and prodigy.json. This allows custom interfaces to refer to arbitrary config via window.prodigy.config.userConfig.

Edit: Dispatching custom events works as well. So you’ll also be able to do this:

document.addEventListener('prodigyanswer', ev => {
    console.log('The answer was: ', ev.detail)
});

I didn’t realise you couldn’t write to the top-level event and had to use event.detail. So we might want to change it slightly to always expose an object here.


Switching between annotation interfaces
Is it possible to customize annotation UI?
Add a comment box with options.
(Justin Du Jardin) #7

Wow, that’s some otherworldly turnaround time. Awesome! :clap:

Yeah, I’m definitely interested in testing anything you come up with!

How are you loading the user script? Rendered dynamically mixed in with the annotation template, or in the head of the document once when prodigy starts up? I suppose either way you could store cross-task state on the window object if no one is looking. :sweat_smile:

:clap:

If you’re looking at JSON Schema, I always find the Typescript tsconfig.json schema to be a good reference so I don’t have to go spelunking into the spec itself.


(Ines Montani) #8

Cool, thanks! We might even ship this as an undocumented, experimental feature with the next release – it’s is currently limited to the scope of the html interface anyway, so I don’t see much harm in including it.

Appending a <script> tag to the body on mount – so cross-task state shouldn’t actually be a problem. I just tested it by adding a simple counter to the script and it all seems to work fine. The window trick is a good idea, too – you could even go one step further and use the localStorage if you need more persistent state.

Thanks! This actually looks like something I’ll really enjoy writing, haha :sweat_smile:


(Ines Montani) #9

Another use case I just tested which will now be possible: user inputs! :+1: (However, I’d only recommend using free-form text inputs if it’s absolutely necessary for the task – it’s easy to overcomplicate the UI and start re-building surveys, which will make the whole annotation flow a lot less efficient.)

<input type="text" class="input" placeholder="User text here..." />
<button onClick="updateFromInput()">Update</button>
<br />
{{user_text}}
function updateFromInput() {
    const text = document.querySelector('.input').value;
    window.prodigy.update({ user_text: text });
}

(Small detail: I was surprised to find out that Prodigy’s keyboard shortcuts don’t actually propagate in the inputs added here – maybe it’s because they’re not part of the virtual DOM? I might need to investigate this further, just to make sure there are no weird edge cases. Not being able to type a or space would definitely be a big problem.)

Another thing that might be interesting to explore is range inputs with pre-defined labels, steps and values.


(Ines Montani) #10

Another small update: The theme settings are now also exposed as window.prodigy.theme and the {{theme}} variable within the HTML template. This includes the default colours and settings, and/or any modifications made by the user via the customTheme config.

I thought this could be pretty nice, because it’ll let you style custom interfaces more consistently, and keep all theming variables in one place. You can also add your own properties to the custom theme, e.g. "customTheme": {"customColor": "blue"}.

Example use case 1: in HTML

<mark style="background: {{theme.bgHighlight}}">{{label}}</mark>

Example use case 2: in CSS in <style> tag

<input required type="text" class="input" />

<style>
.input:invalid {
    border-color: {{theme.reject}};
}
</style>

Example use case 3: in JavaScript

function updateText() {
    const input = document.querySelector('.input');
    input.style.borderColor = window.prodigy.theme.accept;
    window.prodigy.update({ text: input.value });
}

(Had Seddiqi) #11

I want to implement something to help me be quicker for text classification. Simply, I just want to highlight certain keywords so I can visually jump to them for the classification of a long sentence or multiple sentences.

I tried generating the HTML in my data file, but the data is shown as-is (no rendering). I then added a column for the word to highlight, so I could modify it after the page loaded using a little Javascript, but turns out templates can’t be used inside the javascript config.

Any suggestions?

(I realize however that maybe I can chop my data records up a little more, that might be the way I’ll go, but curious about what else could be done here without changing the data processing part.)


(Ines Montani) #12

@hadsed If you just want to highlight certain words, you might not even need a custom template. You could simply add a "spans" property to the annotation tasks (just like when you’re annotating named entities). For example:

{
    "text": "Super long text",
    "label": "SOME_LABEL",
    "spans": [
        {"start": 20, "end": 30},
        {"start": 100, "end": 110}
    ]
}

The only thing that’s important here is that you need to know the character offsets of your keywords – but those should be pretty easy to generate. If a text classification task has spans assigned, Prodigy will highlight those inline. You could even add a "label" to the span.

That said, if you do want to use a custom template for more flexibility, make sure to set 'view_id': 'html' in your (custom) recipe, and add the HTML as the task’s "html" key. You can still add a "text" key that contains the plain text – this makes sense if you’re annotating with a model in the loop, or later want to use the annotations to train a model. The text classifier for example will then be trained on the "text" (not the HTML containing your custom markup).

{
    "text": "some text",
    "html": "<strong>some text</strong>"
}

Btw, the custom JavaScript examples I outlined in this thread already work! They’re just not documented yet, because they’re very experimental. So if you have a look at my examples above, you can see how to add custom JavaScript (via the recipe’s "javascript" config setting) and how to access the current task data and UI actions from within your script.


(Ines Montani) #13

A post was merged into an existing topic: Custom HTML