Pattern loading error: "ValueError: invalid literal for int() with base 10"

Using recipe ner.manual I got ValueError: invalid literal for int() with base 10 for the this pattern in my pattern jsonl file:

{"label": "OPS_CON", "pattern":[{"LEMMA": "incident"}], "id": "id-incident"}

Everything is OK If I remove id like this:
{"label": "OPS_CON", "pattern":[{"LEMMA": "incident"}]}

Also everything is OK if I change incident to Incident (uppercase) like this:
{"label": "OPS_CON", "pattern":[{"LEMMA": "Incident"}], "id": "id-incident"}

The same problem appears with the following strings:
bomb, approach, fail, ...

Hi! It seems like the problem here is definitely the "id" in your pattern, which Prodigy typically uses for its internal IDs that encode the pattern number. Is there a specific reason you included this, and if it's just for internal purposes, could you use a different key, e.g. "_id"?

(The reason the error only occurs with certain spellings is that these are the ones that match: if a pattern doesn't match, the error won't trigger.)

I followed spaCy recommendation for adding IDs to patterns because this feature looks very interesting for my case.

I am testing NER in Aviation Safety Report domain and I would like to group more patterns with the same entity like this:
{"label": "ACTOR", "pattern":[{"LOWER": "First Officer"}], "id":"flight-crew"}
{"label": "ACTOR", "pattern":[{"LOWER": "Captain"}], "id":"flight-crew"}
{"label": "ACTOR", "pattern":[{"LOWER": "Pilot Flying"}], "id":"flight-crew"}
{"label": "ACTOR", "pattern":[{"LOWER": "Second Officer"}], "id":"flight-crew"}

I don't think I will be able to get "ent.ent_id_" directly from "ent" in spaCy if I replace "id" with "_id".

Okey, I can write a dictionary for each id, but the above built-in approach is much more elegant.

Ah, If you're working with spaCy and the EntityRuler, assigning IDs to your patterns makes sense. But in this case, it all happens within Prodigy and your patterns are only used to pre-highlight entities. The "label" is then used as the suggested entity label. So there's kinda no place where you would need to define pattern IDs.

It's definitely true, though, that this makes it kinda annoying to reuse pattern files between spaCy and Prodigy. So we should just ignore any preexisting "id" values in Prodigy to not cause any conflicts.

Edit: Fixed in v1.11!

1 Like