questions on Multi NERs Annotation & Training at Once in a Sentence

ryanwesslen · October 3, 2022, 8:53pm

The spacy evaluate is the command. It can evaluate trained models. Since spaCy is open source, you can see the code for it:

github.com

explosion/spaCy/blob/70e21dfcad28b044903ba33b2b8831d925151b76/spacy/cli/evaluate.py#L54


      
                  data_path,
                  output=output,
                  use_gpu=use_gpu,
                  gold_preproc=gold_preproc,
                  displacy_path=displacy_path,
                  displacy_limit=displacy_limit,
                  silent=False,
              )
          
          
          def evaluate(
              model: str,
              data_path: Path,
              output: Optional[Path] = None,
              use_gpu: int = -1,
              gold_preproc: bool = False,
              displacy_path: Optional[Path] = None,
              displacy_limit: int = 25,
              silent: bool = True,
              spans_key: str = "sc",
          ) -> Dict[str, Any]:

What's important is the function handle_scores_per_type. This is what is called when using --label stats. As you can see, it's called by default for spacy evaluate.

github.com

explosion/spaCy/blob/70e21dfcad28b044903ba33b2b8831d925151b76/spacy/cli/evaluate.py#L138


      
                      ents=render_ents,
                  )
                  msg.good(f"Generated {displacy_limit} parses as HTML", displacy_path)
          
              if output_path is not None:
                  srsly.write_json(output_path, data)
                  msg.good(f"Saved results to {output_path}")
              return data
          
          
          def handle_scores_per_type(
              scores: Dict[str, Any],
              data: Dict[str, Any] = {},
              *,
              spans_key: str = "sc",
              silent: bool = False,
          ) -> Dict[str, Any]:
              msg = Printer(no_print=silent, pretty=not silent)
              if "morph_per_feat" in scores:
                  if scores["morph_per_feat"]:
                      print_prf_per_type(msg, scores["morph_per_feat"], "MORPH", "feat")

The one thing you may want to do is create a dedicated hold out (evaluation) dataset. By default, Prodigy can enable the --eval-split 0.2 which will do the splitting for you. The problem is each run you may get a different set of data. Ideally, you should split the data.

If you're going to use spacy train, the data-to-spacy recipe can help. It'll do a partition of your data and then convert it to .spacy binary format (which is ideal for using spacy).

There isn't a spacy version of train-curve unfortunately. You can write your own version. See this for details:

Also, you can export your prodigy train-curve results to a .txt file by adding in "> train_curve.txt" so:

prodigy train-curve ... > train_curve.txt

Topic		Replies	Views
Training Multiple entities at the Same time? ner , spacy , solved	11	3162	November 27, 2018
combining multiple models and exporting training data to spacy ner , spacy	3	2877	November 13, 2018
Best strategy for training an NER engine usage , ner	8	2176	December 27, 2017
Ner Training with Prodigy vs Spacy ner , spacy , best-practices	2	1204	July 2, 2020
Improve trained models with annotations usage , ner , training	3	516	September 20, 2021

questions on Multi NERs Annotation & Training at Once in a Sentence

Related topics