Python Descriptor — A Thorough Guide

Learn about Descriptor Protocol, Data vs. Non-data descriptor, look-up chain, mechanism behind functions, and the __getattribute__that empowers the descriptors at the back

E.Y.
8 min readJan 22, 2021
Photo by Joseph Gonzalez on Unsplash

Descriptor is any object which defines the methods __get__(), __set__(), or __delete__(). When a class attribute is a descriptor, its special binding behaviour is triggered upon attribute lookup. Normally, using a.b to get, set or delete an attribute looks up the object named b in the class dictionary for a, but if b is a descriptor, the respective descriptor method gets called.

Descriptors are Python objects that implement any of the method of the descriptor protocol:

__get__(self, obj, type=None) -> object
__set__(self, obj, value) -> None
__delete__(self, obj) -> None
  • If an object defines __set__() or __delete__(), it is a data descriptor.
  • If it only defines __get__(), it is non-data descriptors
  • If it define both __get__() and __set__() with the __set__() raising an AttributeError when called, it is a read-only data descriptor,

In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol. If any of those methods are defined for an object, it is said to be a descriptor.

What does this “binding behaviour” mean? Let’s look at an example:

class DataDescriptor(object):
def __init__(self):
self.value = 0

def __get__(self, obj, type):
print("__get__")
return self.value

def __set__(self, obj, value):
print(" __set__")
try:
self.value = value
except AttributeError:
print(f"Can not set value {value}")
def __delete__(self, obj):
print(" __del__")

class Example():
attr = DataDescriptor()

d = DataDescriptor()
e = Example()
e.attr # 0, __get__
e.attr = "new attribute" #__set__
del e.attr # __del__
print(d.__dict__)
# {'value': 0}
print(e.__dict__)
# {}
print(Example.__dict__)
# {'__module__': '__main__', 'attr': <__main__.DataDescriptor object at 0x7f1635e58940>, '__dict__': <attribute '__dict__' of 'Example' objects>, '__weakref__': <attribute '__weakref__' of 'Example' objects>, '__doc__': None}

You can see descriptor can not use their methods by its’ own, meaning like in the example we won’t call d.__get__() or d.__set__(), instead we use the described class attribute e.attr or e.attr = “value" . it means we can define the the way a value can be set, read, and deleted through__get__(), __set__() and __delete__() method on a descriptor and then bind them to a given property.

Lookup chain

The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary.

For instance, a.x has a lookup chain starting with a.__dict__['x'], then type(a).__dict__['x'], and continuing through the base classes of type(a) excluding metaclasses.

If the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined

You can also see when we call attr on instance e of Example , expression e.attr looks up the attribute attr in the chain of namespaces for e. If the search finds a descriptor outside of the instance __dict__, its __get__() method is invoked.

e.attr
(type(e).__dict__['attr'].__get__(e, type(e)))
#__get__
e.attr = "another new attribute"
(type(e).__dict__['attr'].__set__(e, type(e)),"another new attribute")
# __set__

The details of invocation depend on whether e is an object, class, or instance of super, noting at the back of scene it is __getattribute__() doing all the magic work.

A descriptor can even be called directly by its method name. For example, d.__get__(obj).

Invocation from an instance:

object.__getattribute__() transforms b.x into type(b).__dict__['x'].__get__(b, type(b)).

Instance lookup scans through a chain of namespaces in following order. Let’s say we are looking for attribute x on object o

  • data descriptors: value from __get__ method of the data descriptor named after x
  • instance variables: value of o. __dict__ for the key named as x
  • non-data descriptors: value from __get__ method of the non-data descriptor named after x
  • class variables: type(o).__dict__ for the key named as x
  • parent’s class variables all the way along the MRO,
  • __getattr__() if it is provided.

If a descriptor is found for a.x, then it is invoked with: desc.__get__(a, type(a)) .

Noted that attribute lookup doesn’t call object.__getattribute__() directly. Instead, both the dot operator and the getattr() function perform attribute lookup by way of a helper function:

def getattr_hook(obj, name):
try:
return obj.__getattribute__(name)
except AttributeError:
if not hasattr(type(obj), '__getattr__'):
raise
return type(obj).__getattr__(obj, name) # __getattr__

Invocation from a class:

type.__getattribute__() which transforms B.x into B.__dict__['x'].__get__(None, B) .

The logic for a dotted lookup such as A.x is in type.__getattribute__(). The steps are similar to instance dictionary lookup but it’s a search through the class’s method resolution order.

If a descriptor is found, it is invoked with desc.__get__(None, A)

Invocation from super

A custom __getattribute__() method for invoking descriptors.

If a is an instance of super, then the binding super(B, obj).m() searches obj.__class__.__mro__ for the base class A immediately preceding B and then invokes the descriptor with the call: A.__dict__['m'].__get__(obj, obj.__class__).

Non-data descriptor

We have an example about Data descriptor earlier , how about Non-data descriptor? Non-data descriptor has the __get__ method :

class NonDataDescriptor():
def __init__(self):
self.value = 0
def __get__(self, obj, type):
print(" __get__")
return self.value + 1
class Example():
attr = NonDataDescriptor()
e = Example()
d = NonDataDescriptor()
print(e.attr) # __get__ 1
print(e.attr) # __get__ 2
print(e.__dict__) # {}
print(d.__dict__) # {"value": 2}
e.attr = 4
print(e.attr) # 4
print(e.__dict__) # {'attr': 4}
print(d.__dict__) # {'value': 0}

In Non-data descriptor, the assigned value e.g. e.attr = 4 is stored in instance dictionary while with data descriptor the assigned value is stored in descriptor dictionary where the set method of descriptor is set.

Also note that descriptors are instantiated just once per class instance, which means a descriptor state is shared across the in each single instance of a class. So when we call the e.attr the second time, the value increments to 2.

Functions and Methods

Have you ever wondered how dot method call works e.g. x.method()? That is thanks to the non-data descriptors.

Functions stored in class dictionaries get turned into methods when invoked through __get__ accessed as attributes. The non-data descriptor transforms an obj.f(*args) call into f(obj, *args). Calling cls.f(*args) becomes f(*args).

The function class has the __get__() method for binding methods during attribute access. This means that functions are non-data descriptors that return bound methods during dotted lookup from an instance. Here’s how it works:

class Function:    def __get__(self, obj, objtype=None):
"Simulate func_descr_get() in Objects/funcobject.c"
if obj is None:
return self
return MethodType(self, obj)
class D:
def f(self, x):
return x

Below are different scenarios where you can see how the descriptor works:

1. dotted access from class from dictionary --> function
>>>
D.__dict__['f']
<function D.f at 0x00C45070>
2. dotted access from class --> function
>>>
D.f
<function D.f at 0x00C45070>
3. dotted access from instance --> bound function
>>>
d = D()
>>> d.f
<bound method D.f of <__main__.D object at 0x00B18C90>>
Internally, the bound method stores the underlying function and the bound instance:>>> d.f.__func__
<function D.f at 0x00C45070>
>>> d.f.__self__
<__main__.D object at 0x1012e1f98>

It is similar for class and static method. See before the the comparison of the bindings:

Property vs Descriptor

Remember the @property decorator we mentioned earlier? In the following, we are going to show that property() is just a syntax sugar for data descriptor.

__getattribute__

Finally, we are going to look at __getattribute__ which underpins the descriptor, and compare with the usage of __getattr__

object.__getattribute__(self, name)

Called unconditionally to implement attribute accesses for instances of the class. If the class also defines __getattr__(), the latter will not be called unless __getattribute__() either calls it explicitly or raises an AttributeError.

This method should return the (computed) attribute value or raise an AttributeError exception.

In order to avoid infinite recursion in this method, its implementation should always call the base class method with the same name to access any attributes it needs, for example, object.__getattribute__(self, name).

object.__getattr__(self, name)

Called when the default attribute access fails with an AttributeError . This method should either return the (computed) attribute value or raise an AttributeError exception.

The documentation might seem confusing. But essentially,

  • With __getattr__ , if you try to access an undefined attribute, Python will call this method;
  • With __getattribute_ , if you try to access any attribute (defined or undefined), Python will call this method.

Let’s see an example:

>>> class Example():
def __init__(self, valid_attr):
self.valid_attr=valid_attr
>>> e = Example("valid")>>> print(e.__dict__)
{'valid_attr': 'valid'}
>>> print(e.valid_attr)
valid
>>> print(e.invalid_attr)
AttributeError: 'Example' object has no attribute 'invalid_attr'

Now with __getattr__ , you can notice that the previously non-existing invalud_attr trigger the __getattr__ function which results in setting the attribute in __dict__ and return a value.

>>> class Example():
def __init__(self, valid_attr):
self.valid_attr=valid_attr
def __getattr__(self, attr):
self.__dict__[attr]= "this is invalid"
return "this is indeed invalid"
>>> e = Example("valid")>>> print(e.__dict__)
{'valid_attr': 'valid'}
>>> print(e.valid_attr)
valid
>>> print(e.invalid_attr)
this is indeed invalid
>>> print(e.__dict__)
{'valid_attr': 'valid', 'invalid_attr': 'this is invalid'}

Now with __getattrbute__ , you can notice that both invalid_attr and valid_attrtrigger the __getattribute__ function which returns the string

>>> class Example():
def __init__(self, valid_attr):
self.valid_attr=valid_attr
def __getattribute__(self, attr):
return "this is indeed invalid"
>>> e = Example("valid")>>> print(e.__dict__)
this is indeed invalid
>>> print(e.valid_attr)
this is indeed invalid
>>> print(e.invalid_attr)
this is indeed invalid

Note that this is not normally how __getattribute__ works though:

def __getattribute__(self, attr):
if attr == “invalid”:
return “this is indeed invalid"
else:
return object.__getattribute__(self,attr)
# same as----- super().__getattribute__(attr)

Warning that in order to avoid infinite recursion, please call the base class method with the same name to access any attributes so don’t return self.__dict__[name] , as this will trigger your own version of __getattribute__ over and over again. By taking the base class’s version of __getattribute__() rather than your own one __getattribute__(), we need to pass in the self as well as the value attr .

Also noted that if our class contain both __getattr__ and __getattribute__ methods then __getattr__ is ignored. But if __getattribute__ raises AttributeError exception then the exception will be ignored and __getattr__ method will be invoked.

Understanding the above mechanism helps us understand descriptor better. For example, below is an implementation of thedotted lookup is in object.__getattribute__():

def object_getattribute(obj, name):
"Emulate PyObject_GenericGetAttr() in Objects/object.c"
null = object()
objtype = type(obj)
cls_var = getattr(objtype, name, null)
descr_get = getattr(type(cls_var), '__get__', null)
if descr_get is not null:
if (hasattr(type(cls_var), '__set__')
or hasattr(type(cls_var), '__delete__')):
return descr_get(cls_var, obj, objtype)
# data descriptor
if hasattr(obj, '__dict__') and name in vars(obj):
return vars(obj)[name]
# instance variable
if descr_get is not null:
return descr_get(cls_var, obj, objtype)
# non-data descriptor
if cls_var is not null:
return cls_var
# class variable
raise AttributeError(name)

So in summary, descriptors are a powerful protocol. They are the mechanism behind properties, methods, static methods, class methods, and super(). They are used throughout Python itself to implement the new style classes. Descriptors simplify the underlying C-code and offer a flexible set of new tools for everyday Python programs.

Happy Reading!

--

--