My goal is to create a simple add-on for ner.teach
that shows the top 5 Google results for the predicted entity span underneath the text. A similar idea was mentioned here: Is it possible to customize annotation UI?, and your advice was that doing something like this is generally not a good idea. However for our given scenario, where one of the goals is to discover new entities, the unstructured text alone is short and does not always give enough context to determine 100% if the entity prediction by our NER model is correct, we know we will be re-annotating often regardless, and a quick glance at Google search results is often enough to confirm -- it seems like a reasonable solution.
From reading Custom view templates with scripts and PRODIGY_README.html#custom-interfaces
and a few other Support tickets, it appears the only way to do this is to create a custom recipe that alters ner.teach
to use "view-id":"html"
instead of "view-id":"ner"
(this loses the span highlighting, which I haven't figured out how to work around), then a custom HTML template and then some custom JavaScript that grabs the HTML <mark>
upon the event prodigyupdate
, and then embeds an iframe with the Google Search query including the marked span text. An additional feature would be the option to have the search results pop up on a button click (somewhat like @ines example in the Custom view templates with scripts) -- is this the correct approach to take? Am I overcomplicating this? Is it possible to retrieve the predicted entity span via Mustache like you can retrieve the annotation task with {{text}}
?
EDIT:
I found a pretty ugly, but working JavaScript-only solution.
In my custom recipe, before return components
, I have components['config']['javascript'] = script_text
. At the top of the custom recipe file, I have:
with open('insert_google_search_results.js') as txt:
script_text = txt.read()
And insert_google_search_results.js
looks like this :
document.addEventListener('prodigyupdate', event => {
try {
var old_iframe = document.querySelector('#ifrm');
old_iframe.parentNode.removeChild(old_iframe);
} catch(err) {
// No iframes to remove
}
var span = document.querySelector('mark').cloneNode(true);
span.removeChild(span.querySelector('span'));
var prediction = span.textContent;
console.log('The prediction is: ', span)
var replaced = prediction.split(' ').join('+');
var ifrm = document.createElement('iframe'); // create the iframe
ifrm.setAttribute('id', 'ifrm'); // assign an id
var prodigyContainer = document.querySelector('.prodigy-container');
ifrm.setAttribute('src', `https://www.google.com/search?igu=1&ei=&q=${replaced}`);
ifrm.setAttribute('height', '100%');
ifrm.setAttribute('width', '100%');
prodigyContainer.insertAdjacentElement('beforeend', ifrm);
})
Anyway, it works, although this totally speeds down the annotation flow, and Google asks you to do a CAPTCHA every now and then because they assume you're a bot