If you are into OOP (object oriented programming), then you might have heard about the comparison between objects and data structures:
- Objects hide their data behind abstractions and expose functions that operate on that data.
- Data structures expose their data and have no meaningful functions.
Most classes we construct are mainly object structures, meaning the main purpose is to use them operate on data instead of acting as containers for data. But there are times when we need to construct complex data object, what do we do?
Before I use data class, I simply use a combination of dictionary, set, and list, maybe some other Collecitons type such as namedtuple. These data types can do the work, e.g. with nested dictionary, but they are clumsy to work with, less flexible (e.g. with namedtuple, it has to be immutable, but what if I want an immutable data object?), and are normally less expressive as I hope them to be. Hence, the data class.
The functionality was introduced in python 3.7, which was originally described in PEP 557. The PEP was inspired by the attrs
project, which can do even more (e.g. slots, validators), but since it is built-in, it is more accessible and convenient in that regard.
Where data class is most useful is the ability to create complex data objects with less boilerplate code and more expressive syntax. For example, comparing with the flow to create an object with normal class:
from dataclasses import dataclass@dataclass(unsafe_hash=True)
class Stock:
product: str
unit_price: int
quantity: int = 0def total_cost(self) -> int:
return self.unit_price * self.quantity====================================
class Stock:
name: str
unit_price: int
quantity: int = 0def __init__(
self,
name: str,
unit_price: int,
quantity: int = 0
) -> None:
self.name = name
self.unit_price = unit_price
self.quantity = quantitydef total_cost(self) -> int:
return self.unit_price * self.quantitydef __hash__(self) -> int:
return hash((self.name, self.unit_price, self.quantity))def __eq__(self, other) -> bool:
if not isinstance(other, Product):
return NotImplemented
return (
(self.name, self.unit_price, self.quantity) ==
(other.name, other.unit_price, other.quantity))
That is big cut in codes! Note that apart from convenience, the data class is no different to regular class, e.g. you can add methods as usual like the total_cost()
as above:
>>> card = Stock('Card', 2, 20)
>>> card.total_cost()
40
Module-level decorators
You may notice that we have (unsafe_hash=True)
added to the @dataclass decorator. This force the addition of a .__hash__()
method, and there are more fine grained control you can exert.
@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
These are the flags you can add to denote whether or not to add various “dunder” methods to the class:
init
: If true (the default), a__init__()
method will be generated.repr
: If true (the default), a__repr__()
method will be generated.eq
: If true (the default), an__eq__()
method will be generated.order
: If true (the default isFalse
),__lt__()
,__le__()
,__gt__()
, and__ge__()
methods will be generated.unsafe_hash
: IfFalse
(the default), a__hash__()
method is generated according to howeq
andfrozen
are set.frozen
: If true (the default isFalse
), assigning to fields will generate an exception. This emulates read-only frozen instances.
So instead of implementing all these methods yourself, like the example with __hash__
and __eq__
, you can just use the flag to turn the feature on and off.
Field-level decorators
The dataclass()
decorator adds special methods to the class, but to control each individual class variables, a field level decorator is used.
dataclasses.field(*, default=MISSING, default_factory=MISSING, repr=True, hash=None, init=True, compare=True, metadata=None)
What are Field
? Field
objects describe each defined field. These objects are created internally, and are returned by the fields()
module-level method . They each has:
name
: The name of the field.type
: The type of the field.default
,default_factory
,init
,repr
,hash
,compare
, andmetadata
have the identical meaning and values as they do in thefield()
declaration.
For example, in the example above, the product, unit_price, quantity
are the fields.
@dataclass(unsafe_hash=True)
class Stock:
product: str
unit_price: int
quantity: int = 0
You can define the level of control for each field:
default
: If provided, this will be the default value for this field. This is needed because thefield()
call itself replaces the normal position of the default value.default_factory
: A function that returns the initial value of the field. If provided, it must be a zero-argument callable.init
: If true (the default), this field is included to the generated__init__()
method.repr
: If true (the default), this field is included in the string by the generated__repr__()
method.compare
: If true (the default), this field is included in the generated equality and comparison methods (__eq__()
,__gt__()
, et al.).hash
: This can be a bool orNone
. If true, this field is included in the generated__hash__()
method. IfNone
(the default), use the value ofcompare
: this would normally be the expected behavior.metadata
: This can be a mapping or None.
Most of these are explanatory, I just want to briefly mention the default_factory
and metadata
with example:
@dataclass
class Todo:
date: str
completed: bool = field(default=false)
todo_list: list[int] = field(default_factory=list)todos = Todo()
todos.todo_list.append("get up early")
print(todos.todo_list)
>>["get up early"]
Note the decorator for todo_list
? This is because, we can only supply immutable object to default
parameter, and if we want to have mutable data as default, we have to use default_factory
, otherwise exceptions will be raised.
@dataclass
class Todo:
todo_list: list = []ValueError: mutable default <class 'list'> for field todo_list is not allowed: use default_factory
On the other thread, the metadata
parameter for the decorated class to add information to fields:
@dataclass
class Todo:
date: str = field( metadata="date of the completion todo")
Fields()
To get the details of all the fields, we use the field
method, which returns a tuple of Field
objects that define the fields for this dataclass. Accepts either a dataclass, or an instance of a dataclass.
dataclasses.fields(class_or_instance)
For example:
>>> from dataclasses import fields
>>> fields(Todo)
(Field(date='date',type=<class 'str'>,...,metadata="date of the completion todo"))
Outlier: ClassVar & InitVar
Apart from the field level control , there is other way denote the initialisation of a field using the type annotation:
- If a field is a
ClassVar
, it is excluded from consideration as a field and is ignored by the dataclass mechanisms. - If a field is a
InitVar
, it is an init only field. They are added as parameters to the generated__init__()
method, and are passed to the optional__post_init__()
method, but won’t be stored in the class instance.
@dataclass
class C:
i: int
j: int = None
database: InitVar[DatabaseType] = Nonedef __post_init__(self, database):
if self.j is None and database is not None:
self.j = database.lookup('j')c = C(10, database=my_database)
In this case, fields()
will return Field
objects for i
and j
, but not for database
.
Use __post_init__
to control data class initialisation
Sometimes you may want to have even further control over the initiated data class instance, especially you want to initiate field values that depend on one or more other fields. This is where the __post_init__()
method of data class comes in, which will be called by __init__()
. It will normally be called as self.__post_init__()
.
@dataclass
class C:
a: float
b: float
c: float = field(init=False)def __post_init__(self):
self.c = self.a + self.b
Note that in the inheritance, the __init__()
method generated by dataclass()
does not call base class __init__()
methods. If the base class has an __init__()
method that has to be called, it is common to call this method in a __post_init__()
method:
@dataclass
class Rectangle:
height: float
width: float@dataclass
class Square(Rectangle):
side: floatdef __post_init__(self):
super().__init__(self.side, self.side)
Other methods:
dataclasses.asdict
(instance, *, dict_factory=dict)
Converts the dataclass instance
to a dict. Each dataclass is converted to a dict of its fields, as name: value
pairs. dataclasses, dicts, lists, and tuples are recursed into. For example:
@dataclass
class Point:
x: int
y: int@dataclass
class C:
mylist: list[Point]p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}c = C([Point(0, 0), Point(10, 4)])
assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
dataclasses.astuple
(instance, *, tuple_factory=tuple)
Converts the dataclass instance
to a tuple.
dataclasses.make_dataclass
(cls_name, fields, *, bases=(), namespace=None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
This is another way of making data class. It creates a new dataclass with name cls_name
, fields as defined in fields
, base classes as given in bases
, and initialized with a namespace as given in namespace
.
C = make_dataclass('C',
[('x', int),
'y',
('z', int, field(default=5))],
namespace={'add_one': lambda self: self.x + 1})=======================> Is equivalent to:@dataclass
class C:
x: int
y: 'typing.Any'
z: int = 5 def add_one(self):
return self.x + 1
dataclasses.replace
(instance, /, **changes)
Creates a new object of the same type as instance
, replacing fields with values from changes
.
dataclasses.is_dataclass
(class_or_instance)
Return True
if its parameter is a dataclass or an instance of one, otherwise return False
.
Frozen instances and Immutability
To emulate immutability, you can pass frozen=True
to the dataclass()
decorator. In that case, dataclasses will add __setattr__()
and __delattr__()
methods to the class.
Inheritance with reverse MRO
With dataclass()
decorator and inheritance, it newly created class looks up all the super classes in reverse MRO ( starting at object
). As a result, derived classes will override base classes on repeated attributes.
@dataclass
class Base:
x: int = 0@dataclass
class A(Base):
x: int = 15def __init__(self, x: int = 15):
That’s so much of it!
Happy Reading!