Construct Your Object with Python Data Class

E.Y.
7 min readJun 6, 2021

--

Photo by Sara Cervera on Unsplash

If you are into OOP (object oriented programming), then you might have heard about the comparison between objects and data structures:

  • Objects hide their data behind abstractions and expose functions that operate on that data.
  • Data structures expose their data and have no meaningful functions.

Most classes we construct are mainly object structures, meaning the main purpose is to use them operate on data instead of acting as containers for data. But there are times when we need to construct complex data object, what do we do?

Before I use data class, I simply use a combination of dictionary, set, and list, maybe some other Collecitons type such as namedtuple. These data types can do the work, e.g. with nested dictionary, but they are clumsy to work with, less flexible (e.g. with namedtuple, it has to be immutable, but what if I want an immutable data object?), and are normally less expressive as I hope them to be. Hence, the data class.

The functionality was introduced in python 3.7, which was originally described in PEP 557. The PEP was inspired by the attrs project, which can do even more (e.g. slots, validators), but since it is built-in, it is more accessible and convenient in that regard.

Where data class is most useful is the ability to create complex data objects with less boilerplate code and more expressive syntax. For example, comparing with the flow to create an object with normal class:

from dataclasses import dataclass@dataclass(unsafe_hash=True)
class Stock:
product: str
unit_price: int
quantity: int = 0
def total_cost(self) -> int:
return self.unit_price * self.quantity
====================================
class Stock:
name: str
unit_price: int
quantity: int = 0
def __init__(
self,
name: str,
unit_price: int,
quantity: int = 0
) -> None:
self.name = name
self.unit_price = unit_price
self.quantity = quantity
def total_cost(self) -> int:
return self.unit_price * self.quantity
def __hash__(self) -> int:
return hash((self.name, self.unit_price, self.quantity))
def __eq__(self, other) -> bool:
if not isinstance(other, Product):
return NotImplemented
return (
(self.name, self.unit_price, self.quantity) ==
(other.name, other.unit_price, other.quantity))

That is big cut in codes! Note that apart from convenience, the data class is no different to regular class, e.g. you can add methods as usual like the total_cost() as above:

>>> card = Stock('Card', 2, 20)
>>> card.total_cost()
40

Module-level decorators

You may notice that we have (unsafe_hash=True) added to the @dataclass decorator. This force the addition of a .__hash__() method, and there are more fine grained control you can exert.

@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)

These are the flags you can add to denote whether or not to add various “dunder” methods to the class:

  • init: If true (the default), a __init__() method will be generated.
  • repr: If true (the default), a __repr__() method will be generated.
  • eq: If true (the default), an __eq__() method will be generated.
  • order: If true (the default is False), __lt__(), __le__(), __gt__(), and __ge__() methods will be generated.
  • unsafe_hash: If False (the default), a __hash__() method is generated according to how eq and frozen are set.
  • frozen: If true (the default is False), assigning to fields will generate an exception. This emulates read-only frozen instances.

So instead of implementing all these methods yourself, like the example with __hash__ and __eq__ , you can just use the flag to turn the feature on and off.

Field-level decorators

The dataclass() decorator adds special methods to the class, but to control each individual class variables, a field level decorator is used.

dataclasses.field(*, default=MISSING, default_factory=MISSING, repr=True, hash=None, init=True, compare=True, metadata=None)

What are Field ? Field objects describe each defined field. These objects are created internally, and are returned by the fields() module-level method . They each has:

  • name: The name of the field.
  • type: The type of the field.
  • default, default_factory, init, repr, hash, compare, and metadata have the identical meaning and values as they do in the field() declaration.

For example, in the example above, the product, unit_price, quantity are the fields.

@dataclass(unsafe_hash=True)
class Stock:
product: str
unit_price: int
quantity: int = 0

You can define the level of control for each field:

  • default: If provided, this will be the default value for this field. This is needed because the field() call itself replaces the normal position of the default value.
  • default_factory: A function that returns the initial value of the field. If provided, it must be a zero-argument callable.
  • init: If true (the default), this field is included to the generated __init__() method.
  • repr: If true (the default), this field is included in the string by the generated __repr__() method.
  • compare: If true (the default), this field is included in the generated equality and comparison methods (__eq__(), __gt__(), et al.).
  • hash: This can be a bool or None. If true, this field is included in the generated __hash__() method. If None (the default), use the value of compare: this would normally be the expected behavior.
  • metadata: This can be a mapping or None.

Most of these are explanatory, I just want to briefly mention the default_factory and metadata with example:

@dataclass
class Todo:
date: str
completed: bool = field(default=false)
todo_list: list[int] = field(default_factory=list)
todos = Todo()
todos.todo_list.append("get up early")
print(todos.todo_list)
>>["get up early"]

Note the decorator for todo_list ? This is because, we can only supply immutable object to default parameter, and if we want to have mutable data as default, we have to use default_factory , otherwise exceptions will be raised.

@dataclass
class Todo:
todo_list: list = []
ValueError: mutable default <class 'list'> for field todo_list is not allowed: use default_factory

On the other thread, the metadata parameter for the decorated class to add information to fields:

@dataclass
class Todo:
date: str = field( metadata="date of the completion todo")

Fields()

To get the details of all the fields, we use the field method, which returns a tuple of Field objects that define the fields for this dataclass. Accepts either a dataclass, or an instance of a dataclass.

dataclasses.fields(class_or_instance)

For example:

>>> from dataclasses import fields
>>> fields(Todo)
(Field(date='date',type=<class 'str'>,...,metadata="date of the completion todo"))

Outlier: ClassVar & InitVar

Apart from the field level control , there is other way denote the initialisation of a field using the type annotation:

  • If a field is a ClassVar, it is excluded from consideration as a field and is ignored by the dataclass mechanisms.
  • If a field is a InitVar, it is an init only field. They are added as parameters to the generated __init__() method, and are passed to the optional __post_init__() method, but won’t be stored in the class instance.
@dataclass
class C:
i: int
j: int = None
database: InitVar[DatabaseType] = None
def __post_init__(self, database):
if self.j is None and database is not None:
self.j = database.lookup('j')
c = C(10, database=my_database)

In this case, fields() will return Field objects for i and j, but not for database.

Use __post_init__ to control data class initialisation

Sometimes you may want to have even further control over the initiated data class instance, especially you want to initiate field values that depend on one or more other fields. This is where the __post_init__() method of data class comes in, which will be called by __init__() . It will normally be called as self.__post_init__().

@dataclass
class C:
a: float
b: float
c: float = field(init=False)
def __post_init__(self):
self.c = self.a + self.b

Note that in the inheritance, the __init__() method generated by dataclass() does not call base class __init__() methods. If the base class has an __init__() method that has to be called, it is common to call this method in a __post_init__() method:

@dataclass
class Rectangle:
height: float
width: float
@dataclass
class Square(Rectangle):
side: float
def __post_init__(self):
super().__init__(self.side, self.side)

Other methods:

dataclasses.asdict(instance, *, dict_factory=dict)

Converts the dataclass instance to a dict. Each dataclass is converted to a dict of its fields, as name: value pairs. dataclasses, dicts, lists, and tuples are recursed into. For example:

@dataclass
class Point:
x: int
y: int
@dataclass
class C:
mylist: list[Point]
p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}
c = C([Point(0, 0), Point(10, 4)])
assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}

dataclasses.astuple(instance, *, tuple_factory=tuple)

Converts the dataclass instance to a tuple.

dataclasses.make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)

This is another way of making data class. It creates a new dataclass with name cls_name, fields as defined in fields, base classes as given in bases, and initialized with a namespace as given in namespace.

C = make_dataclass('C',
[('x', int),
'y',
('z', int, field(default=5))],
namespace={'add_one': lambda self: self.x + 1})
=======================> Is equivalent to:@dataclass
class C:
x: int
y: 'typing.Any'
z: int = 5
def add_one(self):
return self.x + 1

dataclasses.replace(instance, /, **changes)

Creates a new object of the same type as instance, replacing fields with values from changes.

dataclasses.is_dataclass(class_or_instance)

Return True if its parameter is a dataclass or an instance of one, otherwise return False.

Frozen instances and Immutability

To emulate immutability, you can pass frozen=True to the dataclass() decorator. In that case, dataclasses will add __setattr__() and __delattr__() methods to the class.

Inheritance with reverse MRO

With dataclass() decorator and inheritance, it newly created class looks up all the super classes in reverse MRO ( starting at object). As a result, derived classes will override base classes on repeated attributes.

@dataclass
class Base:
x: int = 0
@dataclass
class A(Base):
x: int = 15
def __init__(self, x: int = 15):

That’s so much of it!

Happy Reading!

--

--

No responses yet