Hi! We try to do our best and answer questions as soon as possible, and I usually put a lot of effort into my replies. However, we can’t guarantee instant replies and help with your implementation. You posted your question late at night my time, and already bumped the thread at noon my time. This really isn’t productive.
You can also always use the search function (button in the top right corner) to see if a question has already been answered before. For example, if you type in “regex”, you’ll find threads related to using regular expressions: https://support.prodi.gy/search?q=“regex” The first result actually shows a very similar approach and solution.
If you just want to stream in regex matches and annotate whether they are correct / suitable training data or not, the easiest way would be to write a function that takes the incoming stream of examples, finds matches in the texts and creates an annotation example with a "span"
for each match (see the “Annotation task formats” in your PRODIGY_README.html
for details on the JSON format).
Here’s a simple example:
import re
import copy
expression = re.compile(YOUR_REGEX_HERE)
label = 'ORG' # or any other label
def regex_matcher(stream):
for eg in stream:
for match in re.finditer(expression, eg['text']): # find match in example text
task = copy.deepcopy(eg) # match found – copy the example
start, end = match.span() # get matched indices
task['spans'] = [{'start': start, 'end': end, 'label': label}] # label match
yield task
Here’s a custom recipe template to get you started:
Using the view_id
"ner"
, you can render the examples as highlighted entities, and then accept or reject them. The annotations will then be saved to the given dataset, and you can then use them to update a model.