patterns and relevant sentence on batch training

Dear Prodigy support,

when I make active learning using patterns or long text options I get as output a JSONL file including also a spans list for some annotations.

In the spans list we find:

  1. the start, end of the text (pattern), priority, and score and pattern no. in case of patterns
  2. the start, end of the text (relevant sentence), and only score in case of relevant sentences

I am wondering if the spans presence has any effect during the batch training on the NN or if the patterns and the long text options are just ways to catch some annotations that could be lost during active learning (patterns) or parse the text (in case of long text option) to find more relevant annotations.

in a few words, if I provide only text and labels (without spans) I got same batch train results?

thank you in advance and really grateful for the wonderful tool we can use

Claudio Nespoli

You’re referring to training the text classifier and textcat.batch-train, right?

If so, then the answer is no, none of this info is used during training. The text classifier only gets trained on the "text" and "label" (and of course the "answer" is used to determine whether the label is correct or not).

The "spans" are only present in the data because they’re needed to show you the highlighted text during annotation – and to preserve this information, so you’ll always be able to reproduce exactly what the annotator saw on the screen.

1 Like