Python Descriptor — A Thorough Guide

Learn about Descriptor Protocol, Data vs. Non-data descriptor, look-up chain, mechanism behind functions, and the __getattribute__that empowers the descriptors at the back

Image for post
Image for post
Photo by Joseph Gonzalez on Unsplash

Descriptor is any object which defines the methods , , or . When a class attribute is a descriptor, its special binding behaviour is triggered upon attribute lookup. Normally, using a.b to get, set or delete an attribute looks up the object named b in the class dictionary for a, but if b is a descriptor, the respective descriptor method gets called.

Descriptors are Python objects that implement any of the method of the descriptor protocol:

__get__(self, obj, type=None) -> object
__set__(self, obj, value) -> None
__delete__(self, obj) -> None
  • If an object defines or , it is a data descriptor.
  • If it only defines it is non-data descriptors
  • If it define both and with the raising an when called, it is a read-only data descriptor,

In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol. If any of those methods are defined for an object, it is said to be a descriptor.

What does this “binding behaviour” mean? Let’s look at an example:

class DataDescriptor(object):
def __init__(self):
self.value = 0

def __get__(self, obj, type):
print("__get__")
return self.value

def __set__(self, obj, value):
print(" __set__")
try:
self.value = value
except AttributeError:
print(f"Can not set value {value}")
def __delete__(self, obj):
print(" __del__")

class Example():
attr = DataDescriptor()

d = DataDescriptor()
e = Example()
e.attr # 0, __get__
e.attr = "new attribute" #__set__
del e.attr # __del__
print(d.__dict__)
# {'value': 0}
print(e.__dict__)
# {}
print(Example.__dict__)
# {'__module__': '__main__', 'attr': <__main__.DataDescriptor object at 0x7f1635e58940>, '__dict__': <attribute '__dict__' of 'Example' objects>, '__weakref__': <attribute '__weakref__' of 'Example' objects>, '__doc__': None}

You can see descriptor can not use their methods by its’ own, meaning like in the example we won’t call or , instead we use the described class attribute e or . it means we can define the the way a value can be set, read, and deleted through and method on a descriptor and then bind them to a given property.

Lookup chain

The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary.

For instance, has a lookup chain starting with , then , and continuing through the base classes of excluding metaclasses.

If the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined

You can also see when we call on instance of , expression looks up the attribute in the chain of namespaces for . If the search finds a descriptor outside of the instance , its method is invoked.

e.attr
(type(e).__dict__['attr'].__get__(e, type(e)))
#__get__
e.attr = "another new attribute"
(type(e).__dict__['attr'].__set__(e, type(e)),"another new attribute")
# __set__

The details of invocation depend on whether is an object, class, or instance of super, noting at the back of scene it is doing all the magic work.

A descriptor can even be called directly by its method name. For example, .

Invocation from an instance:

transforms into

Instance lookup scans through a chain of namespaces in following order. Let’s say we are looking for attribute on object

  • data descriptors: value from method of the data descriptor named after
  • instance variables: value of for the key named as
  • non-data descriptors: value from method of the non-data descriptor named after
  • class variables: for the key named as
  • parent’s class variables all the way along the MRO,
  • if it is provided.

If a descriptor is found for , then it is invoked with: .

Noted that attribute lookup doesn’t call directly. Instead, both the dot operator and the function perform attribute lookup by way of a helper function:

def getattr_hook(obj, name):
try:
return obj.__getattribute__(name)
except AttributeError:
if not hasattr(type(obj), '__getattr__'):
raise
return type(obj).__getattr__(obj, name) # __getattr__

Invocation from a class:

which transforms into .

The logic for a dotted lookup such as is in . The steps are similar to instance dictionary lookup but it’s a search through the class’s method resolution order.

If a descriptor is found, it is invoked with

Invocation from super

A custom method for invoking descriptors.

If is an instance of , then the binding searches for the base class immediately preceding and then invokes the descriptor with the call: .

Non-data descriptor

We have an example about Data descriptor earlier , how about Non-data descriptor? Non-data descriptor has the method :

class NonDataDescriptor():
def __init__(self):
self.value = 0
def __get__(self, obj, type):
print(" __get__")
return self.value + 1
class Example():
attr = NonDataDescriptor()
e = Example()
d = NonDataDescriptor()
print(e.attr) # __get__ 1
print(e.attr) # __get__ 2
print(e.__dict__) # {}
print(d.__dict__) # {"value": 2}
e.attr = 4
print(e.attr) # 4
print(e.__dict__) # {'attr': 4}
print(d.__dict__) # {'value': 0}

In Non-data descriptor, the assigned value e.g. is stored in instance dictionary while with data descriptor the assigned value is stored in descriptor dictionary where the set method of descriptor is set.

Also note that descriptors are instantiated just once per class instance, which means a descriptor state is shared across the in each single instance of a class. So when we call the the second time, the value increments to 2.

Functions and Methods

Have you ever wondered how dot method call works e.g.? That is thanks to the non-data descriptors.

Functions stored in class dictionaries get turned into methods when invoked through accessed as attributes. The non-data descriptor transforms an call into . Calling becomes .

The function class has the method for binding methods during attribute access. This means that functions are non-data descriptors that return bound methods during dotted lookup from an instance. Here’s how it works:

class Function:    def __get__(self, obj, objtype=None):
"Simulate func_descr_get() in Objects/funcobject.c"
if obj is None:
return self
return MethodType(self, obj)
class D:
def f(self, x):
return x

Below are different scenarios where you can see how the descriptor works:

1. dotted access from class from dictionary --> function
>>>
D.__dict__['f']
<function D.f at 0x00C45070>
2. dotted access from class --> function
>>>
D.f
<function D.f at 0x00C45070>
3. dotted access from instance --> bound function
>>>
d = D()
>>> d.f
<bound method D.f of <__main__.D object at 0x00B18C90>>
Internally, the bound method stores the underlying function and the bound instance:>>> d.f.__func__
<function D.f at 0x00C45070>
>>> d.f.__self__
<__main__.D object at 0x1012e1f98>

It is similar for class and static method. See before the the comparison of the bindings:

Image for post
Image for post

Property vs Descriptor

Remember the @property decorator we mentioned earlier? In the following, we are going to show that is just a syntax sugar for data descriptor.

Finally, we are going to look at which underpins the descriptor, and compare with the usage of

(self, name)

Called unconditionally to implement attribute accesses for instances of the class. If the class also defines , the latter will not be called unless either calls it explicitly or raises an .

This method should return the (computed) attribute value or raise an exception.

In order to avoid infinite recursion in this method, its implementation should always call the base class method with the same name to access any attributes it needs, for example, .

(self, name)

Called when the default attribute access fails with an . This method should either return the (computed) attribute value or raise an exception.

The documentation might seem confusing. But essentially,

  • With , if you try to access an undefined attribute, Python will call this method;
  • With , if you try to access any attribute (defined or undefined), Python will call this method.

Let’s see an example:

>>> class Example():
def __init__(self, valid_attr):
self.valid_attr=valid_attr
>>> e = Example("valid")>>> print(e.__dict__)
{'valid_attr': 'valid'}
>>> print(e.valid_attr)
valid
>>> print(e.invalid_attr)
AttributeError: 'Example' object has no attribute 'invalid_attr'

Now with , you can notice that the previously non-existing trigger the function which results in setting the attribute in and return a value.

>>> class Example():
def __init__(self, valid_attr):
self.valid_attr=valid_attr
def __getattr__(self, attr):
self.__dict__[attr]= "this is invalid"
return "this is indeed invalid"
>>> e = Example("valid")>>> print(e.__dict__)
{'valid_attr': 'valid'}
>>> print(e.valid_attr)
valid
>>> print(e.invalid_attr)
this is indeed invalid
>>> print(e.__dict__)
{'valid_attr': 'valid', 'invalid_attr': 'this is invalid'}

Now with , you can notice that both and trigger the function which returns the string

>>> class Example():
def __init__(self, valid_attr):
self.valid_attr=valid_attr
def __getattribute__(self, attr):
return "this is indeed invalid"
>>> e = Example("valid")>>> print(e.__dict__)
this is indeed invalid
>>> print(e.valid_attr)
this is indeed invalid
>>> print(e.invalid_attr)
this is indeed invalid

Note that this is not normally how works though:

def __getattribute__(self, attr):
if attr == “invalid”:
return “this is indeed invalid"
else:
return object.__getattribute__(self,attr)
# same as----- super().__getattribute__(attr)

Warning that in order to avoid infinite recursion, please call the base class method with the same name to access any attributes so don’t return , as this will trigger your own version of over and over again. By taking the base class’s version of __getattribute__() rather than your own one __getattribute__(), we need to pass in the as well as the value .

Also noted that if our class contain both and methods then is ignored. But if raises exception then the exception will be ignored and method will be invoked.

Understanding the above mechanism helps us understand descriptor better. For example, below is an implementation of thedotted lookup is in :

def object_getattribute(obj, name):
"Emulate PyObject_GenericGetAttr() in Objects/object.c"
null = object()
objtype = type(obj)
cls_var = getattr(objtype, name, null)
descr_get = getattr(type(cls_var), '__get__', null)
if descr_get is not null:
if (hasattr(type(cls_var), '__set__')
or hasattr(type(cls_var), '__delete__')):
return descr_get(cls_var, obj, objtype)
# data descriptor
if hasattr(obj, '__dict__') and name in vars(obj):
return vars(obj)[name]
# instance variable
if descr_get is not null:
return descr_get(cls_var, obj, objtype)
# non-data descriptor
if cls_var is not null:
return cls_var
# class variable
raise AttributeError(name)

So in summary, descriptors are a powerful protocol. They are the mechanism behind properties, methods, static methods, class methods, and . They are used throughout Python itself to implement the new style classes. Descriptors simplify the underlying C-code and offer a flexible set of new tools for everyday Python programs.

Happy Reading!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store