Accessing custom metadata when editing annotations derived from model predictions

This is similar to another ongoing query, but different enough that I thought it probably deserves a separate enquiry.

I'm looking through the annotations that Prodigy is creating after I run a custom recipe which uses a model to predict annotations.

Normally when I run an image.manual custom recipe, the JSON entry generated looks something like this (when exported from the database):

{
  "image": "sample/path/redacted/04-F-0269_Global_Screening_Guidance-03.jpg",
  "text": "04-F-0269_Global_Screening_Guidance-03",
  "meta": { "file": "04-F-0269_Global_Screening_Guidance-03.jpg" },
  "path": "sample/path/redacted/04-F-0269_Global_Screening_Guidance-03.jpg",
  "_is_binary": false,
  "_input_hash": 1413334570,
  "_task_hash": 1588323116,
  "_view_id": "image_manual",
  "width": 800,
  "height": 1035,
  "spans": [
    {
      "id": "0ef6ccd0-4a79-471d-9aa1-9c903c83801e",
      "label": "CONTENT",
      "color": "yellow",
      "x": 76.5,
      "y": 112.5,
      "height": 786.1,
      "width": 587.6,
      "center": [370.3, 505.55],
      "type": "rect",
      "points": [
        [76.5, 112.5],
        [76.5, 898.6],
        [664.1, 898.6],
        [664.1, 112.5]
      ]
    },
    {
      "id": "cd05d521-8efb-416b-87df-4624f16ca7f3",
      "label": "REDACTION",
      "color": "cyan",
      "x": 80.3,
      "y": 786.2,
      "height": 20.2,
      "width": 428.4,
      "center": [294.5, 796.3],
      "type": "rect",
      "points": [
        [80.3, 786.2],
        [80.3, 806.4],
        [508.7, 806.4],
        [508.7, 786.2]
      ]
    },
    {
      "id": "3e268e33-4eba-457d-8d17-8271a79ee589",
      "label": "REDACTION",
      "color": "magenta",
      "x": 108.1,
      "y": 772.3,
      "height": 15.1,
      "width": 400.6,
      "center": [308.4, 779.85],
      "type": "rect",
      "points": [
        [108.1, 772.3],
        [108.1, 787.4],
        [508.7, 787.4],
        [508.7, 772.3]
      ]
    }
  ],
  "answer": "accept",
  "_timestamp": 1638214078
}

I have noticed that when I run my custom recipe with a model in the middle, all I get is this:

{
  "image": "sample/path/redacted/04-F-0269_Global_Screening_Guidance-03.jpg",
  "text": "04-F-0269_Global_Screening_Guidance-03",
  "meta": { "file": "04-F-0269_Global_Screening_Guidance-03.jpg" },
  "path": "sample/path/redacted/04-F-0269_Global_Screening_Guidance-03.jpg",
  "_is_binary": false,
  "_input_hash": 1413334570,
  "_task_hash": 1588323116,
  "_view_id": "image_manual",
  "width": 800,
  "height": 1035,
  "spans": [
    {
      "label": "CONTENT",
      "points": [
        [76.5, 112.5],
        [76.5, 898.6],
        [664.1, 898.6],
        [664.1, 112.5]
      ]
    },
    {
      "label": "REDACTION",
      "points": [
        [80.3, 786.2],
        [80.3, 806.4],
        [508.7, 806.4],
        [508.7, 786.2]
      ]
    },
    {
      "label": "REDACTION",
      "points": [
        [108.1, 772.3],
        [108.1, 787.4],
        [508.7, 787.4],
        [508.7, 772.3]
      ]
    }
  ],
  "answer": "accept",
  "_timestamp": 1638214078
}

Inside the spans property, the only two properties each span has is label and points. I don't get an annotation id autogenerated, I don't get anything like the height and width etc.

Is this a bug or is this how it's meant to be working?

In my current self-training workflow, what I would really want (and expect) would be for the initial (full) set of properties to be generated regardless of whether some of the spans were initially derived from a model or not.

Is there any way to get access to whatever is used on Prodigy's end to generate these properties? esp things like the id generator and the colour etc?

Thank you!

Deleted my previous reply because I was wrong :sweat_smile:

I think what's happening here is that for new spans you create, Prodigy will populate them with the complete format, including the x/y/width/height, as well as an internal ID (not that relevant, mostly just a uuid used to disambiguate the boxes in the UI). However, it doesn't currently update existing spans in the data. So if you feed in your spans produced by your recipe and model, it will leave them as-is.

I think the initial motivation behind this was to try and not mess with pre-defined data – but maybe that's not great either and Prodigy should just update existing spans with all the additional properties that it generates under the hood anyways.

In the meantime, you can easily generate the x/y/width/height/center yourself based on the points – it's a relatively simple calculation:

def points_to_rect(x1, y1, x2, y2):
    x = min(x1 ,x2)
    y = min(y1, y2)
    width = x1 - x2 if x1 > x2 else x2 - x1
    height = y1 - y2 if y1 > y2 else y2 - y1
    center = [x + width / 2, y + height / 2]
    return x, y, width, height, center

Is the id generation exposed anywhere to me? I actually am using it in a separate part of my annotation workflow that deals with duplicate annotations.

Thanks for the other suggestions. As you say, it's easy to add all this in there. I mainly wanted to know if I was missing something or if this is the current intended functionality. For now I'll just add in some custom / manual fixes to handle all this.

On the one hand I guess I was sad that it isn't already handled, but on the other hand that's just the flip side of the fact that it's eminently customisable. On balance I'm happy :slight_smile:

You can definitely feed in examples with your own IDs if you want, and those will be preserved – in that case, you just need to make sure that they're unique.

Also, I just noticed on the other thread that your shapes are off and not actual rectangles – so this might explain what's going on and why Prodigy doesn't detect them as rectangular bounding boxes and fills in the other properties:

So I fixed the bounding box issue (my fault, sorry!), but it doesn't seem to make the id and the other metadata get generated. I have a good idea about how I can add that in myself, though. Thanks for your help on these two issues!

This is what I get for an annotation now:

{
  "image": "path_to_repo/data/raw/foia-scans/08-F-0481_ARB3DecisionMemos1261-1823-031.jpg",
  "text": "08-F-0481_ARB3DecisionMemos1261-1823-031",
  "meta": { "file": "08-F-0481_ARB3DecisionMemos1261-1823-031.jpg" },
  "path": "path_to_repo/data/raw/foia-scans/08-F-0481_ARB3DecisionMemos1261-1823-031.jpg",
  "_input_hash": -1804373820,
  "_task_hash": -527543016,
  "spans": [
    {
      "points": [
        [143, 196],
        [143, 4547],
        [3363, 4547],
        [3363, 196]
      ],
      "label": "CONTENT"
    },
    {
      "points": [
        [402, 2756],
        [402, 4036],
        [3193, 4036],
        [3193, 2756]
      ],
      "label": "REDACTION"
    },
    {
      "points": [
        [375, 1378],
        [375, 1763],
        [3112, 1763],
        [3112, 1378]
      ],
      "label": "REDACTION"
    },
    {
      "points": [
        [411, 1915],
        [411, 2210],
        [3139, 2210],
        [3139, 1915]
      ],
      "label": "REDACTION"
    },
    {
      "points": [
        [545, 358],
        [545, 653],
        [3148, 653],
        [3148, 358]
      ],
      "label": "REDACTION"
    },
    {
      "points": [
        [393, 2327],
        [393, 2640],
        [3184, 2640],
        [3184, 2327]
      ],
      "label": "REDACTION"
    }
  ],
  "_view_id": "image_manual",
  "width": 3542,
  "height": 4583,
  "answer": "accept",
  "_timestamp": 1640194443
}