I am implementing a custom component to set some attributes on my
Token following the same structure as in this example. I will probably have the custom component in its own package exploiting entry points like explained here.
My question is how do I add methods to my
Doc? Like the post processing done in
main() from the example referenced above? It could be a method like
get_countries() - I suppose I could add it as an attribute as well but is this the best way to do it?
Yes, that’s pretty much exactly what the custom attributes with getters and custom pipeline components were designed for
But when you create the getter functions then you don’t have access to the
Doc, only the
Tokens, right? At least based on
has_tech_org(self, tokens) in this example.
I guess I could just add
self.doc as an attribute in the
__call__() method? Is that the way to do it?
The getter function always receives the object it’s called on as its argument – so if you add a getter to the
Doc, that function will receive the
return # something
In the example you linked, I called the object
tokens because the
has_tech_org method is used for the
Doc and the
Span. Both have tokens you can iterate over, so we can reuse the method – but maybe the code would have been clearer here if I had called that argument
obj for object.
Btw, if you’re adding extension attributes on tokens and spans and you need the parent
Doc, it’s available as the