Hi,
Iβve been experimenting with training a NER system from scratch. I followed the steps in Labeling sequence labeling (e.g. NER) task from scratch to get started, and annotated some examples.
Firstly, the UX for labelling is awesome - way nicer than brat, for instance . Having labelled these sentences, is there a way to output gold json files? db-out seems to dump the modelβs predictions, but not necessarily what labelling has confirmed.
Secondly, when using ner.batch_train I ran into the following error:
Loaded model models/en_ner_test/
Using 20% of examples (771) for evaluation
Using 100% of remaining examples (7324) for training
Dropout: 0.2 Batch size: 128 Iterations: 50
BEFORE 0.135
Correct 315
Incorrect 2019
Entities 897
Unknown 582
# LOSS RIGHT WRONG ENTS SKIP ACCURACY
01 0.459 277 2057 816 0 0.119
02 0.424 286 2048 865 0 0.123
03 0.378 271 2063 1018 0 0.116
04 0.336 277 2057 1229 0 0.119
05 0.320 279 2055 1157 0 0.120
06 0.289 278 2056 1398 0 0.119
07 0.284 273 2061 1507 0 0.117
08 0.253 266 2068 1440 0 0.114
09 0.221 259 2075 1453 0 0.111
10 0.218 254 2080 1402 0 0.109
11 0.204 246 2088 1435 0 0.105
12 0.191 253 2081 1588 0 0.108
13 0.175 246 2088 1513 0 0.105
14 0.178 251 2083 1544 0 0.108
15 0.162 243 2091 1508 0 0.104
16 0.162 246 2088 1542 0 0.105
17 0.139 242 2092 1567 0 0.104
18 0.159 238 2096 1548 0 0.102
19 0.143 239 2095 1711 0 0.102
20 0.129 229 2105 1444 0 0.098
21 0.128 231 2103 1659 0 0.099
22 0.133 227 2107 1436 0 0.097
28%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| 2048/7324 [01:41<04:22, 20.12it/s]fish: 'python -m prodigy ner.batch-traβ¦' terminated by signal SIGSEGV (Address boundary error)