Anotate an html iframe

Hello !

First of all, bravo on prodigy, the tool looks amazing and I will, for sure, use it a lot in the future.

I have a question about how to annotate an html snippet, using the choice interface.
I have a case where I want to select between 3 choices for a snippet.
I have designed a jsonl looking like this :

{}
{“html”: “>iframe src=‘https://www.qwant.com/?q=prodigy+ai&client=opensearch’ width=‘990’ height=‘950’>”,“text”:“prodigy ai”, “answer”: " ",“options”: [{“id”: 1, “text”: “UP”},{“id”:0, “text”: “Down”},{“id”:2, “text”: “OK”}]}
{“html”: “>iframe src=‘https://www.qwant.com/?q=prodigy&client=opensearch’ width=‘990’ height=‘950’>”,“text”:“prodigy”, “answer”: " ",“options”: [{“id”: 1, “text”: “UP”},{“id”:0, “text”: “Down”},{“id”:2, “text”: “OK”}]}
{“html”: “>iframe src=‘https://www.qwant.com/?q=prodigy+.+ai&client=opensearch’ width=‘990’ height=‘950’>”,“text”:“prodigy.ai”, “answer”: " ",“options”: [{“id”: 1, “text”: “UP”},{“id”:0, “text”: “Down”},{“id”:2, “text”: “OK”}]}

( the >iframe is on purpose <iframe would try to render inside the post otherwise).

I made as a starter a custom recipe, in order to visualise the iframe and the choices :

@recipe('ad',dataset=recipe_args['dataset'],source=recipe_args['source'], label=recipe_args['label'])
def addresse(dataset,source,label=''):
return {
        'dataset': dataset,
        'view_id':'choice',
        'stream': JSONL(source),
        'update': None,
        'batch_size': 25,
        'progress': None,
        'config': {'lang': 'fr', 'labels': label}
    }

However, when i launch my recipe, I can see the choices, but I can’t see the html being displayed.
Am i doing something wrong?

And I have a bonus question, as I have seen your image segmentation example, do you think it’s doable to do the same with html objects?
Thank you,
Robin

Thanks for the report. Your recipe looks correct – I just tested it locally with your example and it turns out that there’s currently a small bug in the web app that mistakenly sanitizes HTML with settings that don’t allow iframes (which is obviously a bug and a regression we didn’t test for – very sorry about that :sweat:).

I already have a fix for this and it will be included in the next release.

Re image segmentation example: This depends on what you’re trying to do and which HTML elements you’re interested in. DOM elements are trickier than images, because their rendering depends on too many factors, and you can’t easily pinpoint exact, absolute (or even relative) coordinates in px, and expect them to mark the exact same area across viewports, browsers and devices. But depending on what you’re trying to do, maybe we can find a solution!

Edit: Forgot to mention this earlier, but once the bug mentioned above is fixed, you could even consider adding a custom HTML template to your recipe’s config. This might make your life a little easier, since you won’t have to double the markup in each task.

Assuming your tasks look like this:

{"url": "https://...", "text": "prodigy", "options": [{"id": 1, "text": "UP"},{"id":0, "text": "Down"},{"id": 2, "text": "OK"}]}

Your can then reference any task properties as Mustache variables, e.g. {{url}}:

'html_template': '<iframe src="{{url}}" width="990" height="950"></iframe>'

It even works for nested objects like {{url.prefix}} and stuff like that – although I haven’t tested this extensively yet.

The HTML template strategy also means you won’t have to store all the random HTML markup with your tasks – when you export the annotations, they only contain the url property (which I assume is what you mostly care about in your case). And if you want to change your iframe template, for example to adjust the size or add formatting, you’ll only have to change it once in the template. But as I said, unfortunately this example currently doesn’t work because of the HTML sanitisation bug – but it will, once that’s fixed :blush:

Thank you for your thorough answer, this will help me in the near future. I will definitely use the mustache template in the future in order to make it easier to play round.

Good to know for the sanitation, looking forward the next update then !

Regarding the segmentation, my case would be each result of the search ( for example https://www.qwant.com/?q=prodi%20gy&t=web).
So as the position of the results are all pretty standard, I think the positioning issue should not have too much of an impact.

No worries and sorry again about the bug!

Ah okay. The problem here is that – browser-specific differences aside – the responsive layout and site rendering depends on the viewport width. So simply adding absolute-positioned (or even relative-positioned) overlays on top like in the image segmentation view won’t work – as soon as you resize the browser window, those positions will change. And even if you do find a way to keep them consistent, I imagine that resolving them back to the original position later on might be a problem.

<iframe>s are a little tricky for this as well, because the page embedding the iframe has very little control over the iframe if the frame content is not hosted on the same domain (which is usually a good thing!). So in the long run, you might be better off streaming the search results from an API (if that’s possible), or saving out each page as a .html and extracting content from there (although that seems pretty unsatisfying – but still better than taking screenshots)

Annotating live websites is an interesting problem, though – I’ll definitely keep thinking about this!

Update: The <iframe> issue will be fixed in the upcoming Prodigy v0.3.0! :tada:

Awesome news !

By the way, while testing the upcoming version, I tried your example again and while the iframe now renders correctly, the site is not loaded. Looks like the X-Frame-Options response header on the site is now set to SAMEORIGIN (basically, the page now seems to block iframe embedding on different domains, see here for details).

This might not matter to you if you’re running Prodigy on your internal network or from a whitelisted host, but I thought I’d give you a heads-up.