Hello, I need to import a bunch of decomposed sentences into Prodigy for human correction before sending off to a model. I’m in the middle of figuring out a method to do the below…importing a LLM output, correct in prodigy, send corrections/updates to a model….
Question— the prodigy span cat format is flat, correct? My original format is nested, which ( think is easier for me to digest)
I can write a script to flatten the list, as shown below? Or, can prodigy read nested lists, as the graphic interface seems to imply. I’m too new to Prodigy—bought the license last week.
Any help or push in the right direction is appreciated
import json
# Nested JSON input
nested_json = [
{
"term": "cond",
"text": "When X is between 7 and 12 inches",
"pos": [0, 33],
"details": {
"exp": {
"text": "X is between 7 and 12 inches",
"pos": [5, 33],
"details": {
"param": "X",
"value": [7, 12],
"unit": "inches"
}
}
}
},
{
"term": "subj",
"text": "the system",
"pos": [35, 44]
},
{
"term": "pred",
"text": "send a class Z type message to the mix bus",
"pos": [46, 88],
"details": {
"act": {
"text": "send",
"pos": [46, 49]
},
"obj": {
"text": "a class Z type message",
"pos": [51, 72]
},
"dest": {
"text": "to the mix bus",
"pos": [74, 88]
}
}
}
]
# Function to extract spans
def extract_spans(nested_json):
spans = []
for item in nested_json:
term = item['term'].upper()
spans.append({"start": item["pos"][0], "end": item["pos"][1], "label": term})
if 'details' in item:
for key, value in item['details'].items():
label = key.upper()
spans.append({"start": value["pos"][0], "end": value["pos"][1], "label": label})
return spans
# Extract spans
spans = extract_spans(nested_json)
# Create Prodigy compatible JSON
prodigy_json = {
"text": "When X is between 7 and 12 inches, the system shall send a class Z type message to the mix bus",
"spans": spans
}
# Output the Prodigy formatted JSON
print(json.dumps(prodigy_json, indent=2))
Outputs:
[
{
"text": "When X is between 7 and 12 inches, the system shall send a class Z type message to the mix bus",
"spans": [
{
"start": 0,
"end": 33,
"label": "COND"
},
{
"start": 5,
"end": 33,
"label": "EXP"
},
{
"start": 35,
"end": 44,
"label": "SUBJ"
},
{
"start": 46,
"end": 88,
"label": "PRED"
},
{
"start": 46,
"end": 49,
"label": "ACT"
},
{
"start": 51,
"end": 72,
"label": "OBJ"
},
{
"start": 74,
"end": 88,
"label": "DEST"
}
]
}
]