where can I find the full API-documentation for all prodigy functions/classes?

I went through the progidy API and usages. Maybe I am wrong but I felt that there could be some prodigy utilities left out from the API.

Later, I inspected the texcat.teach recipe and found this import from prodigy.models.textcat import TextClassifier. I looked for any documentation for such function but I didn't found any there.

So these are the questions:

Is there a documentation for this particular utility prodigy.models.textcat.TextClassifier?
If doesn't,how many other utilities are left out from the API? and how could we have access to an API of such utilities?

Thank you in advance.

Sergio M.

Hi! I totally thought we had a short section on the binary annotation models in the components/functions docs but it seems like we don't. Sorry about that, I'll add that back :+1:

Basically, the TextClassifier and EntityRecognizer included in Prodigy are the "annotation models" for scoring binary suggestions and updating spaCy from binary answers. That's all they do. They only have 2 public methods that are called: __call__ (take the stream and yield (score, example) tuples of suggestions) and update (the update callback that can be used in a recipe). They're only relevant as wrappers around spaCy's nlp object.

1 Like

Thank you for the answer.

I came here again because I found another functionality and I don't find documentation anywhere.
Looking for a solution, I came with another quesiton, so I have two questions: (maybe I am more worried about the second question, because I think I would not depend more on any documentation in case I can inspect the code by myself)

Question 1:

Inspecting the recipe terms.teach I found this import:

from prodigy.components.sorters import Probability

I visited the sorters documentation. There is explained the general API of sorters and three sorters functions are specified:

  • prefer_uncertain
  • prefer_high_scores
  • prefer_low_scores

But nothing about the class Probablity.

Question 2:

In order to understand its functionality, I tried to inspect the code by myself.
Inspecting the code I realized that all functions/classes are empty. I've never seen this before. I think maybe it is something related to cython...or the way you deploy the package in order to keep it secret....I don't know...This is an example:

 class Probability(object):
    """ Given a stream of (p, item) pairs, ask questions with probability 1-p. """
    def __init__(self, stream): # real signature unknown; restored from __doc__
        """ Probability.__init__(self, stream) """
        pass

    def __iter__(self): # real signature unknown; restored from __doc__
        """ Probability.__iter__(self) """
        pass

    __weakref__ = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """list of weak references to the object (if defined)"""

or


def get_uncertainty(score, bias=0.0): # real signature unknown; restored from __doc__
    """ get_uncertainty(score, bias=0.0) """
    pass

def prefer_high_scores(stream, bias=0.0): # real signature unknown; restored from __doc__
    """ prefer_high_scores(stream, bias=0.0) """
    pass

def prefer_low_scores(stream, bias=0.0): # real signature unknown; restored from __doc__
    """ prefer_low_scores(stream, bias=0.0) """
    pass

def prefer_uncertain(stream, bias=0.0, algorithm=None): # real signature unknown; restored from __doc__
    """ prefer_uncertain(stream, bias=0.0, algorithm=UNSET) """
    pass

In case there is no way to see the code, it is a pitty for me, as I would like to go through the code in order to understand all things in detail.

Greeting

Sergio M.

Ah, the Probability class is a super simple probability sorter. Here's the code if you

class Probability(object):
    """
    Given a stream of (p, item) pairs, ask questions with probability 1-p.
    """
    def __init__(self, stream):
        self.stream = stream

    def __iter__(self):
        for i, (prob, task) in enumerate(self.stream):
            if not isinstance(prob, int) and not isinstance(prob, float):
                raise ValueError( f"Sorting priority needs to be numeric, not {type(prob)}")
            if i == 0:
                yield task
                continue
            prob = max(0.0001, prob)
            if random.random() < prob:
                yield task

Yes, this is the result of the Cython compilation with docstrings enabled. It seems to create these additional files when the code is compiled to preserve the docstrings so you can call help() on a Cython function in your editor.

1 Like