Creating custom labels review recipe to remove noise from the dataset

hi @nlp-guy!

Very interesting project! Thanks for sharing and your questions.

Four different input panels (arms) may be challenging. The simplest would be to label one arm at a time. However, I bet you've already rejected that idea to avoid doing 4x annotations.

Another option may be to create four vertically stacked input boxes with field suggestions. You'd use the open-ended text input box but add in field_suggestions which then uses an auto-suggest and auto-complete. You can tab between each of the boxes, filling in your categories by auto-completing.

arm

Here's the code of an example:

import prodigy
from prodigy.components.loaders import Images

@prodigy.recipe(("data-review-recipe"))
def data_review_recipe(dataset, images_path):    
    
    stream = Images(images_path)
   
    tools = [
        "needle_driver",
        "monopolar_curved_scissor",
        "force_bipolar",
        "clip_applier",
        "tip_up_fenestrated_grasper",
        "cadiere_forceps",
        "bipolar_forceps",
        "vessel_sealer",
        "suction_irrigator",
        "bipolar_dissector",
        "prograsp_forceps",
        "stapler",
        "permanent_cautery_hook_spatula",
        "grasping_retractor",
        "nan",
        "blank"
    ]

    blocks = [
        {"view_id":"image"},
        {"view_id": "text_input", "field_id": "arm_a", "field_placeholder": "Arm A", "field_suggestions": tools},
        {"view_id": "text_input", "field_id": "arm_b", "field_placeholder": "Arm B", "field_suggestions": tools},
        {"view_id": "text_input", "field_id": "arm_c", "field_placeholder": "Arm C", "field_suggestions": tools},
        {"view_id": "text_input", "field_id": "arm_d", "field_placeholder": "Arm D", "field_suggestions": tools},
    ]

    return {
        "view_id": "blocks",
        "config": {"blocks": blocks},
        "dataset": dataset,
		"stream": stream,
    }

A few downsides to this (maybe there's a solution):

  • You can only select one field at a time
  • You can still enter other text than these categories. This is bad if you accidentally misstype something. Ideally you may need to validate these fields (e.g., using validate_answer callback) after to ensure they're only of these categories. I found by using the auto-correct (press down) will ensure that it finds the closest.
  • Likely this could be improved with default/placeholders.

If this doesn't work, then likely your next solution would be to create custom javascript. This post below has one idea of adding in a "check box" to perhaps only show one each category (arm) at a time. Perhaps you could either create four check boxes to show the input per arm (e.g., Check boxes are ARM A, ARM B, ARM C, ARM D).

Not sure. Does it always show a duplicate image or a different image?

Could you have modified something in your prodigy.json? Perhaps vim /path/to/prodigy.json and double check you don't have any overrides?

Alternatively, this should work too if you want to reset your config overrides:

export PRODIGY_CONFIG_OVERRIDES="{}"

Let me know if this persists and we can follow up but I would suspect it's something in your code somewhere.