Customizing ner.teach recipe + annotation UI

My goal is to create a simple add-on for ner.teach that shows the top 5 Google results for the predicted entity span underneath the text. A similar idea was mentioned here: Is it possible to customize annotation UI?, and your advice was that doing something like this is generally not a good idea. However for our given scenario, where one of the goals is to discover new entities, the unstructured text alone is short and does not always give enough context to determine 100% if the entity prediction by our NER model is correct, we know we will be re-annotating often regardless, and a quick glance at Google search results is often enough to confirm -- it seems like a reasonable solution.

From reading Custom view templates with scripts and PRODIGY_README.html#custom-interfaces and a few other Support tickets, it appears the only way to do this is to create a custom recipe that alters ner.teach to use "view-id":"html" instead of "view-id":"ner" (this loses the span highlighting, which I haven't figured out how to work around), then a custom HTML template and then some custom JavaScript that grabs the HTML <mark> upon the event prodigyupdate, and then embeds an iframe with the Google Search query including the marked span text. An additional feature would be the option to have the search results pop up on a button click (somewhat like @ines example in the Custom view templates with scripts) -- is this the correct approach to take? Am I overcomplicating this? Is it possible to retrieve the predicted entity span via Mustache like you can retrieve the annotation task with {{text}} ?

EDIT:
I found a pretty ugly, but working JavaScript-only solution.

In my custom recipe, before return components, I have components['config']['javascript'] = script_text. At the top of the custom recipe file, I have:

with open('insert_google_search_results.js') as txt:
    script_text = txt.read()

And insert_google_search_results.js looks like this :grimacing: :

document.addEventListener('prodigyupdate', event => {
    try {
        var old_iframe = document.querySelector('#ifrm');
        old_iframe.parentNode.removeChild(old_iframe);
    } catch(err) {
        // No iframes to remove
    }

    var span = document.querySelector('mark').cloneNode(true);
    span.removeChild(span.querySelector('span'));

    var prediction = span.textContent;
    console.log('The prediction is: ', span)
    var replaced = prediction.split(' ').join('+');
    
    var ifrm = document.createElement('iframe'); // create the iframe
    ifrm.setAttribute('id', 'ifrm'); // assign an id

    var prodigyContainer = document.querySelector('.prodigy-container');

    ifrm.setAttribute('src', `https://www.google.com/search?igu=1&ei=&q=${replaced}`);
    ifrm.setAttribute('height', '100%');
    ifrm.setAttribute('width', '100%');

    prodigyContainer.insertAdjacentElement('beforeend', ifrm);
})

:hear_no_evil: Anyway, it works, although this totally speeds down the annotation flow, and Google asks you to do a CAPTCHA every now and then because they assume you're a bot :man_facepalming:

I think the way you solved this is fine :slightly_smiling_face: If you're writing a fully custom HTML interface, you also have to do the span highlighting yourself – the task exposes the "spans" with their start and end index, and you can use that in your logic to generate the HTML. If you just insert the text, all you get is the raw text. But redoing all of this seems inefficient, because ultimately, all you want to do is add something to an existing interface.

What you're trying to do here is get the text of the highlighted span, right? If so, your approach is kinda overcomplicating things, because you're reading it off the DOM, cloning elements etc. Prodigy exposes the current task data as window.prodigy.content, so you could also do something like this:

var task = window.prodigy.content
var span = task.spans[0]
var spanText = task.text.slice(span.start, span.end)

Also, instead of removing and re-adding the iframe, you should be able to just write to its .src, which should speed things up a little as well. I don't think you'll be able to work around the Google captcha, though, because in some way, you kind of are a bot :wink:

1 Like

Truly speaking my language here :slight_smile:

So, I wrote a much cleaner looking implementation of the Google search idea, one that doesn't hit Google's servers too often and doesn't bog down the annotator - by adding a nice button to the annotation task!

The custom recipe:

import prodigy
from prodigy.recipes.ner import teach

with open('google_search.js') as txt:
    script_text = txt.read()

@prodigy.recipe('ner_teach_google',
    dataset=prodigy.recipe_args["dataset"],
    spacy_model=prodigy.recipe_args["spacy_model"],
    label=prodigy.recipe_args["label_set"],
    patterns=prodigy.recipe_args["patterns"],
)
def ner_teach_google(dataset, spacy_model, label, patterns):
    components = teach(dataset=dataset, spacy_model=spacy_model, patterns=patterns, label=label)
    components['config']['javascript'] = script_text
    return components

(I wish I didn't have to create a custom recipe for it, the recipe isn't doing anything besides reading the JS file as a string and adding it to the components, but it's the cleanest way I've yet discovered)

And the JS, google_search.js (:warning: I am not a JS dev and i'm sure this could be better!):

document.addEventListener('prodigyupdate', event => {
  
  if (document.getElementById("search-iframe")) {
    resetIframe();
  }

  var styleIframeVisible = 'width: 675px; display: block; border: 1px solid #ddd; margin: 40px auto 0 auto; min-height: 300px;';
  var styleIframeHidden = 'width: 675px; display: none';
  var container = document.querySelector('.prodigy-container');
  var buttonsContainer = document.querySelector('.prodigy-buttons');

  function onSearchClick(event) {
    var mark = document.querySelector('mark').childNodes[0];
    var prediction = mark.textContent;
    var replaced = prediction.split(' ').join('+');

    if (!document.getElementById("search-iframe")) {
      addIframe();
    }
    updateIframeSrc(replaced);
  }

  function addSearchButton() {
    var button = document.createElement('button');
    button.innerHTML = '<svg width=40 height=40 y="0px" x="0px" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0 0 25 25" id="svg_1"><circle id="svg_3" r="6.532" cy="10.887" cx="10.887" stroke-width="2.3" stroke="white" fill-opacity="0" fill="#FFFFFF"/>  <line fill="none" stroke="white" stroke-width="2.3" x1="15.72656" y1="15.92109" x2="22.82031" y2="22.99021" id="svg_4" stroke-linejoin="undefined" stroke-linecap="undefined"/></svg>';
    button.setAttribute('id', 'search-button'); // assign an id
    button.setAttribute('style', 'width: 100px; height: 100px; border: 5px solid white; border-width: 4px 4px 0 2px !important; background-color: lightskyblue; text-align: center;');
    buttonsContainer.insertAdjacentElement('beforeend', button);
    button.addEventListener('click', onSearchClick);
  }

  function addIframe() {
    var ifrm = document.createElement('iframe');
    ifrm.setAttribute('style', styleIframeHidden);
    ifrm.setAttribute('id', 'search-iframe');
    container.insertAdjacentElement('afterend', ifrm);
  }

  function updateIframeSrc(text) {
    var iframe = document.querySelector('#search-iframe');
    iframe.setAttribute('style', styleIframeVisible);
    iframe.setAttribute('src', `https://www.google.com/search?igu=1&ei=&q=${text}`);
  }

  function resetIframe() {
    var iframe = document.querySelector('#search-iframe');
    iframe.setAttribute('style', styleIframeHidden);
    iframe.setAttribute('src', '');
  }

  if (!document.getElementById("search-button")) {
    addSearchButton();
  }

  if (!document.getElementById("search-iframe")) {
    addIframe();
  }

});

When you click on the little blue search box, an iframe pops up with the search results from the highlighted span, and when you accept/reject/skip the annotation, the iframe hides itself!

So far it's proving to be a neat addition to my annotation UI that helps add a little additional context for annotators who are not domain experts :confetti_ball:

Thanks again for all your work on Prodigy and your quick replies here in the forums @ines!

1 Like

Wow, that looks really elegant :ok_hand: Glad to hear you got it to work nicely.

In theory, you could add it to your prodigy.json as well – but that's even less convenient. I definitely see your point about the recipe wrapper, but I can't think of a better way at the moment... at least with a recipe script, you can make it specific to a given recipe, add custom Python if needed, easily put it in version control and send it to others so they can try it out.

1 Like

at least with a recipe script, you can make it specific to a given recipe, add custom Python if needed, easily put it in version control and send it to others so they can try it out.

Very good point, it's likely the recipe will continue to evolve, and thus having it as a custom recipe will be better in the long run.

Thanks again :+1:

1 Like