Creating custom labels review recipe to remove noise from the dataset

hi @nlp-guy,

Yep. To pre-fill, you need to have the key of each label align to the name of the field_id, for example:

{"image": "data/image-arms/images.png", "arm_a": "needle_driver", "arm_b": "nan", "arm_c": "needle_driver", "arm_d": "cadiere_forceps"}

See this post for more details

Since you'll be loading a file (.jsonl), you'll need to use the JSONL loader but also the fetch_media importer to get the images.

I wrote a modified version of the script above that assumes a .jsonl like I showed above:

import prodigy
from prodigy.components.preprocess import fetch_media
from prodigy.components.loaders import JSONL

@prodigy.recipe(("data-review-recipe"))
def data_review_recipe(dataset, image_file):    
    
    stream = JSONL(image_file)
    stream = fetch_media(stream, ["image"], skip=True)
   
    tools = [
        "needle_driver",
        "monopolar_curved_scissor",
        "force_bipolar",
        "clip_applier",
        "tip_up_fenestrated_grasper",
        "cadiere_forceps",
        "bipolar_forceps",
        "vessel_sealer",
        "suction_irrigator",
        "bipolar_dissector",
        "prograsp_forceps",
        "stapler",
        "permanent_cautery_hook_spatula",
        "grasping_retractor",
        "nan",
        "blank"
    ]

    blocks = [
        {"view_id":"image"},
        {"view_id": "text_input", "field_id": "arm_a", "field_placeholder": "Arm A", "field_suggestions": tools},
        {"view_id": "text_input", "field_id": "arm_b", "field_placeholder": "Arm B", "field_suggestions": tools},
        {"view_id": "text_input", "field_id": "arm_c", "field_placeholder": "Arm C", "field_suggestions": tools},
        {"view_id": "text_input", "field_id": "arm_d", "field_placeholder": "Arm D", "field_suggestions": tools},
    ]

    return {
        "view_id": "blocks",
        "config": {"blocks": blocks},
        "dataset": dataset,
		"stream": stream,
    }

It seemed to work for me. Hope this helps!