Hi, I'm trying to use the ner.teach
recipe with a patterns file and would only like to label examples for a single label at a time. However the --label
parameter seems to be ignored. I can get the desired behavior if I use a separate patterns file with only the currently used label, however was hoping to avoid this workaround.
I am using Prodigy v1.10. Am I missing something obvious here?
test.jsonl
{"text": "spam is bad"}
{"text": "ham is good"}
{"text": "this ham is also good"}
{"text": "spam ham is confusing"}
test_patterns.jsonl
{"pattern": "spam", "label": "Spam"}
{"pattern": "ham", "label": "Ham"}
Command and logging output
12:19PM ~/work/ad-hoc/> PRODIGY_LOGGING=basic prodigy ner.teach test_spam en_core_web_lg ./test.jsonl --patterns test_patterns.jsonl --label "Ham"
12:19:07: INIT: Setting all logging levels to 20
email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
12:19:08: RECIPE: Calling recipe 'ner.teach'
Using 1 label(s): Ham
12:19:08: RECIPE: Starting recipe ner.teach
12:19:08: LOADER: Using file extension 'jsonl' to find loader
12:19:08: LOADER: Loading stream from jsonl
12:19:08: LOADER: Rehashing stream
12:19:12: RECIPE: Creating EntityRecognizer using model en_core_web_lg
12:19:21: MODEL: Added sentence boundary detector to model pipeline
12:19:21: MODEL: Loading match patterns from disk
12:19:21: MODEL: Adding 2 patterns
12:19:21: MODEL: Ensure pattern labels are added to EntityRecognizer
12:19:21: RECIPE: Created PatternMatcher and loaded in patterns
12:19:21: SORTER: Resort stream to prefer uncertain scores (bias 0.0)
12:19:21: VALIDATE: Validating components returned by recipe
12:19:21: CONTROLLER: Initialising from recipe
12:19:21: VALIDATE: Creating validator for view ID 'ner'
12:19:21: VALIDATE: Validating Prodigy and recipe config
12:19:21: DB: Initializing database SQLite
12:19:21: DB: Connecting to database SQLite
12:19:21: DB: Creating dataset '2020-06-18_12-19-21'
12:19:21: CONTROLLER: Initialising from recipe
12:19:21: CONTROLLER: Validating the first batch for session: None
12:19:21: PREPROCESS: Splitting sentences
12:19:21: FILTER: Filtering duplicates from stream
12:19:21: FILTER: Filtering out empty examples for key 'text'
12:19:21: MODEL: Predicting spans for batch (batch size 64)
12:19:21: MODEL: Sorting batch by entity type (batch size 32)
12:19:21: CORS: initialized with wildcard "*" CORS origins
✨ Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!
INFO: ::1:49482 - "GET / HTTP/1.1" 200 OK
INFO: ::1:49482 - "GET /bundle.js HTTP/1.1" 200 OK
12:19:26: GET: /project
INFO: ::1:49482 - "GET /project HTTP/1.1" 200 OK
12:19:26: POST: /get_session_questions
12:19:26: FEED: Finding next batch of questions in stream
12:19:26: RESPONSE: /get_session_questions (5 examples)
INFO: ::1:49482 - "POST /get_session_questions HTTP/1.1" 200 OK
INFO: ::1:49482 - "GET /favicon.ico HTTP/1.1" 200 OK
Screenshot: