Hi All,
I would like to train the parser to recognize a new dependency (called MONTHLY_SALES) between the DATE and MONEY entity types, such that the training data would look like:
TRAIN_DATA = [
(
"Sales were $1 million in the first quarter of 2018.",
{
"heads": [1, 1, 4, 4, 8, 1, 8, 8, 5, 8, 9, 1],
"deps": ["nsubj", "ROOT", "quantmod", "compound", "MONTHLY_SALES", "prep", "det", "amod", "pobj", "prep", "pobj", "punct"],
},
),
]
It seems more logical, however, to treat the DATE and MONEY entities as spans whereby the merged entity tokens would be used instead of the individual tokens as the indices for the dependency arcs, as in:
(Note: in the below example “$1 million” is a MONEY entity and “the first quarter of 2018” is a DATE entity)
TRAIN_DATA = [
(
"Sales were $1 million in the first quarter of 2018.",
{
"heads": [1, 1, 4, 1, 3, 1],
"deps": ["nsubj", "ROOT", "MONTHLY_SALES", "prep", "pobj", "punct"],
},
),
]
Which one of the training data examples would work, or do I need to use Prodigy?
Thanks for your assistance.