Python Descriptor — A Thorough Guide
Learn about Descriptor Protocol, Data vs. Non-data descriptor, look-up chain, mechanism behind functions, and the __getattribute__that empowers the descriptors at the back
Descriptor is any object which defines the methods __get__()
, __set__()
, or __delete__()
. When a class attribute is a descriptor, its special binding behaviour is triggered upon attribute lookup. Normally, using a.b to get, set or delete an attribute looks up the object named b in the class dictionary for a, but if b is a descriptor, the respective descriptor method gets called.
Descriptors are Python objects that implement any of the method of the descriptor protocol:
__get__(self, obj, type=None) -> object
__set__(self, obj, value) -> None
__delete__(self, obj) -> None
- If an object defines
__set__()
or__delete__()
, it is a data descriptor. - If it only defines
__get__(),
it is non-data descriptors - If it define both
__get__()
and__set__()
with the__set__()
raising anAttributeError
when called, it is a read-only data descriptor,
In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol. If any of those methods are defined for an object, it is said to be a descriptor.
What does this “binding behaviour” mean? Let’s look at an example:
class DataDescriptor(object):
def __init__(self):
self.value = 0
def __get__(self, obj, type):
print("__get__")
return self.value
def __set__(self, obj, value):
print(" __set__")
try:
self.value = value
except AttributeError:
print(f"Can not set value {value}") def __delete__(self, obj):
print(" __del__")
class Example():
attr = DataDescriptor()
d = DataDescriptor()
e = Example()e.attr # 0, __get__
e.attr = "new attribute" #__set__
del e.attr # __del__print(d.__dict__)
# {'value': 0}
print(e.__dict__)
# {}
print(Example.__dict__)
# {'__module__': '__main__', 'attr': <__main__.DataDescriptor object at 0x7f1635e58940>, '__dict__': <attribute '__dict__' of 'Example' objects>, '__weakref__': <attribute '__weakref__' of 'Example' objects>, '__doc__': None}
You can see descriptor can not use their methods by its’ own, meaning like in the example we won’t call d.__get__()
or d.__set__()
, instead we use the described class attribute e.attr
or e.attr = “value"
. it means we can define the the way a value can be set, read, and deleted through__get__(), __set__()
and __delete__()
method on a descriptor and then bind them to a given property.
Lookup chain
The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary.
For instance,
a.x
has a lookup chain starting witha.__dict__['x']
, thentype(a).__dict__['x']
, and continuing through the base classes oftype(a)
excluding metaclasses.If the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined
You can also see when we call attr
on instance e
of Example
, expression e.attr
looks up the attribute attr
in the chain of namespaces for e
. If the search finds a descriptor outside of the instance __dict__
, its __get__()
method is invoked.
e.attr
(type(e).__dict__['attr'].__get__(e, type(e)))
#__get__e.attr = "another new attribute"
(type(e).__dict__['attr'].__set__(e, type(e)),"another new attribute")
# __set__
The details of invocation depend on whether e
is an object, class, or instance of super, noting at the back of scene it is __getattribute__()
doing all the magic work.
A descriptor can even be called directly by its method name. For example, d.__get__(obj)
.
Invocation from an instance:
object.__getattribute__()
transforms b.x
into type(b).__dict__['x'].__get__(b, type(b)).
Instance lookup scans through a chain of namespaces in following order. Let’s say we are looking for attribute x
on object o
- data descriptors: value from
__get__
method of the data descriptor named afterx
- instance variables: value of
o.
__dict__
for the key named asx
- non-data descriptors: value from
__get__
method of the non-data descriptor named afterx
- class variables:
type(o).__dict__
for the key named asx
- parent’s class variables all the way along the MRO,
__getattr__()
if it is provided.
If a descriptor is found for a.x
, then it is invoked with: desc.__get__(a, type(a))
.
Noted that attribute lookup doesn’t call object.__getattribute__()
directly. Instead, both the dot operator and the getattr()
function perform attribute lookup by way of a helper function:
def getattr_hook(obj, name):
try:
return obj.__getattribute__(name)
except AttributeError:
if not hasattr(type(obj), '__getattr__'):
raise
return type(obj).__getattr__(obj, name) # __getattr__
Invocation from a class:
type.__getattribute__()
which transforms B.x
into B.__dict__['x'].__get__(None, B)
.
The logic for a dotted lookup such as A.x
is in type.__getattribute__()
. The steps are similar to instance dictionary lookup but it’s a search through the class’s method resolution order.
If a descriptor is found, it is invoked with desc.__get__(None, A)
Invocation from super
A custom __getattribute__()
method for invoking descriptors.
If a
is an instance of super
, then the binding super(B, obj).m()
searches obj.__class__.__mro__
for the base class A
immediately preceding B
and then invokes the descriptor with the call: A.__dict__['m'].__get__(obj, obj.__class__)
.
Non-data descriptor
We have an example about Data descriptor earlier , how about Non-data descriptor? Non-data descriptor has the __get__
method :
class NonDataDescriptor():
def __init__(self):
self.value = 0 def __get__(self, obj, type):
print(" __get__")
return self.value + 1class Example():
attr = NonDataDescriptor()e = Example()
d = NonDataDescriptor()print(e.attr) # __get__ 1
print(e.attr) # __get__ 2
print(e.__dict__) # {}
print(d.__dict__) # {"value": 2}e.attr = 4
print(e.attr) # 4
print(e.__dict__) # {'attr': 4}
print(d.__dict__) # {'value': 0}
In Non-data descriptor, the assigned value e.g. e.attr = 4
is stored in instance dictionary while with data descriptor the assigned value is stored in descriptor dictionary where the set method of descriptor is set.
Also note that descriptors are instantiated just once per class instance, which means a descriptor state is shared across the in each single instance of a class. So when we call the e.attr
the second time, the value increments to 2.
Functions and Methods
Have you ever wondered how dot method call works e.g. x.method()
? That is thanks to the non-data descriptors.
Functions stored in class dictionaries get turned into methods when invoked through __get__
accessed as attributes. The non-data descriptor transforms an obj.f(*args)
call into f(obj, *args)
. Calling cls.f(*args)
becomes f(*args)
.
The function class has the __get__()
method for binding methods during attribute access. This means that functions are non-data descriptors that return bound methods during dotted lookup from an instance. Here’s how it works:
class Function: def __get__(self, obj, objtype=None):
"Simulate func_descr_get() in Objects/funcobject.c"
if obj is None:
return self
return MethodType(self, obj)class D:
def f(self, x):
return x
Below are different scenarios where you can see how the descriptor works:
1. dotted access from class from dictionary --> function
>>> D.__dict__['f']
<function D.f at 0x00C45070>2. dotted access from class --> function
>>> D.f
<function D.f at 0x00C45070>3. dotted access from instance --> bound function
>>> d = D()
>>> d.f
<bound method D.f of <__main__.D object at 0x00B18C90>>Internally, the bound method stores the underlying function and the bound instance:>>> d.f.__func__
<function D.f at 0x00C45070>>>> d.f.__self__
<__main__.D object at 0x1012e1f98>
It is similar for class and static method. See before the the comparison of the bindings:
Property vs Descriptor
Remember the @property decorator we mentioned earlier? In the following, we are going to show that property()
is just a syntax sugar for data descriptor.
__getattribute__
Finally, we are going to look at __getattribute__
which underpins the descriptor, and compare with the usage of __getattr__
object.__getattribute__
(self, name)Called unconditionally to implement attribute accesses for instances of the class. If the class also defines
__getattr__()
, the latter will not be called unless__getattribute__()
either calls it explicitly or raises anAttributeError
.This method should return the (computed) attribute value or raise an
AttributeError
exception.In order to avoid infinite recursion in this method, its implementation should always call the base class method with the same name to access any attributes it needs, for example,
object.__getattribute__(self, name)
.
object.__getattr__
(self, name)Called when the default attribute access fails with an
AttributeError
. This method should either return the (computed) attribute value or raise anAttributeError
exception.
The documentation might seem confusing. But essentially,
- With
__getattr__
, if you try to access an undefined attribute, Python will call this method; - With
__getattribute_
, if you try to access any attribute (defined or undefined), Python will call this method.
Let’s see an example:
>>> class Example():
def __init__(self, valid_attr):
self.valid_attr=valid_attr>>> e = Example("valid")>>> print(e.__dict__)
{'valid_attr': 'valid'}>>> print(e.valid_attr)
valid>>> print(e.invalid_attr)
AttributeError: 'Example' object has no attribute 'invalid_attr'
Now with __getattr__
, you can notice that the previously non-existing invalud_attr
trigger the __getattr__
function which results in setting the attribute in __dict__
and return a value.
>>> class Example():
def __init__(self, valid_attr):
self.valid_attr=valid_attr def __getattr__(self, attr):
self.__dict__[attr]= "this is invalid"
return "this is indeed invalid">>> e = Example("valid")>>> print(e.__dict__)
{'valid_attr': 'valid'}>>> print(e.valid_attr)
valid>>> print(e.invalid_attr)
this is indeed invalid>>> print(e.__dict__)
{'valid_attr': 'valid', 'invalid_attr': 'this is invalid'}
Now with __getattrbute__
, you can notice that both invalid_attr
and valid_attr
trigger the __getattribute__
function which returns the string
>>> class Example():
def __init__(self, valid_attr):
self.valid_attr=valid_attr def __getattribute__(self, attr):
return "this is indeed invalid">>> e = Example("valid")>>> print(e.__dict__)
this is indeed invalid>>> print(e.valid_attr)
this is indeed invalid>>> print(e.invalid_attr)
this is indeed invalid
Note that this is not normally how __getattribute__
works though:
def __getattribute__(self, attr):
if attr == “invalid”:
return “this is indeed invalid"
else:
return object.__getattribute__(self,attr)
# same as----- super().__getattribute__(attr)
Warning that in order to avoid infinite recursion, please call the base class method with the same name to access any attributes so don’t return self.__dict__[name]
, as this will trigger your own version of __getattribute__
over and over again. By taking the base class’s version of __getattribute__() rather than your own one __getattribute__(), we need to pass in the self
as well as the value attr
.
Also noted that if our class contain both __getattr__
and __getattribute__
methods then __getattr__
is ignored. But if __getattribute__
raises AttributeError
exception then the exception will be ignored and __getattr__
method will be invoked.
Understanding the above mechanism helps us understand descriptor better. For example, below is an implementation of thedotted lookup is in object.__getattribute__()
:
def object_getattribute(obj, name):
"Emulate PyObject_GenericGetAttr() in Objects/object.c"
null = object()
objtype = type(obj)
cls_var = getattr(objtype, name, null)
descr_get = getattr(type(cls_var), '__get__', null)
if descr_get is not null:
if (hasattr(type(cls_var), '__set__')
or hasattr(type(cls_var), '__delete__')):
return descr_get(cls_var, obj, objtype)
# data descriptor
if hasattr(obj, '__dict__') and name in vars(obj):
return vars(obj)[name]
# instance variable
if descr_get is not null:
return descr_get(cls_var, obj, objtype)
# non-data descriptor
if cls_var is not null:
return cls_var
# class variable
raise AttributeError(name)
So in summary, descriptors are a powerful protocol. They are the mechanism behind properties, methods, static methods, class methods, and super()
. They are used throughout Python itself to implement the new style classes. Descriptors simplify the underlying C-code and offer a flexible set of new tools for everyday Python programs.
Happy Reading!