Python Abstract Base Classes for Containers and Container Data Types
Apart from the common data types like dictionary, set, list, etc., Python has a wealth of data types at your disposal, and provides the ability to create your own kinds of data type that have certain shapes. In this blog, we are going to look at the Collections.abc
, which provides abstract base classes that can be used to test whether a class provides a particular interface, as well as specialised container datatypes, that provides alternatives to Python’s general purpose built-in containers, dict
, list
, set
, and tuple
.
Collections.ABC
In python, how do you if an object can be called len()
method or not? You may need to check if there is a __len__
method defined on it. This is quite a hassle, if you need to check it one by one. Not only for sized object, this applies for other kinds such as sequence, map, iterable objects, etc.?
In other languages, there is concept called interface to enforce certain properties on an object to create certain object shapes, such as java or TypeScript.
In Object Oriented Programming, an Interface is a description of all functions that an object must have in order to be an “X”. The purpose of interfaces is to allow the computer to enforce these properties and to know that an object of TYPE T must have functions called X,Y,Z, etc.
In python, since it is a dynamic typed language, there is no such enforcement. Luckily, since python 3.5, there is collections.abc module introduced, which provides a variety of protocols to enforce certain shape of an object. If you wanted to hook into a protocol, you could subclass one of these protocols.
At a high level:
- Container: ABC for classes that provide the
__contains__()
method. Should return true if item is in self, false otherwise. Note that for objects that don’t define__contains__()
, the membership test first tries iteration via__iter__()
, then the old sequence iteration protocol via__getitem__()
. - Hashable: ABC for classes that provide the
__hash__()
method. - Sized: ABC for classes that provide the
__len__()
method. - Callable: ABC for classes that provide the
__call__()
method. Called when the instance is “called” as a function; if this method is defined,x(arg1, arg2, ...)
roughly translates totype(x).__call__(x, arg1, ...)
. - Iterable: ABC for classes that provide the
__iter__()
method. Checkingisinstance(obj, Iterable)
detects classes that are registered asIterable
or that have an__iter__()
method, but it does not detect classes that iterate with the__getitem__()
method. The only reliable way to determine whether an object is iterable is to calliter(obj)
. - Collection: ABC for sized iterable container classes.
- Iterator: ABC for classes that provide the
__iter__()
and__next__()
methods. See also the definition of iterator. - Reversible: ABC for iterable classes that also provide the
__reversed__()
method. Called by thereversed()
built-in to implement reverse iteration. It should return a new iterator object that iterates over all the objects in the container in reverse order. If the__reversed__()
method is not provided, thereversed()
built-in will fall back to using the sequence protocol (__len__()
and__getitem__()
). - Generator: ABC for generator classes that implement the protocol defined in PEP 342 that extends iterators with the
send()
,throw()
andclose()
methods. See also the definition of generator.
Container Types
This module implements specialised container datatypes providing alternatives to Python’s general purpose built-in containers, dict
, list
, set
, and tuple
.
namedtuple():
factory function for creating tuple subclasses with named fields
deque:
list-like container with fast appends and pops on either end
ChainMap:
dict-like class for creating a single view of multiple mappings
Counter:
dict subclass for counting hashable objects
OrderedDict:
dict subclass that remembers the order entries were added
defaultdict:
dict subclass that calls a factory function to supply missing values
UserDict:
wrapper around dictionary objects for easier dict subclassing
UserList:
wrapper around list objects for easier list subclassing
UserString:
wrapper around string objects for easier string subclassing
Let’s take a look at some of them:
Counters: A counter is a sub-class of the dictionary, and is used to count hashable objects.
class collections.Counter([iterable-or-mapping])
The counter() function can be called on different types:
from collections import Counterprint(Counter(['A','A','A','B','B','C']))
print(Counter({'A':3, 'B':2, 'C':1}))
print(Counter(A=3, B=2, C=1))=============>Counter({'B': 2, 'A': 3, 'C': 1})
Counter({'B': 2, 'A': 3, 'C': 1})
Counter({'B': 2, 'A': 3, 'C': 1})
OrderedDict: An orderedDict is also a sub-class of dictionary but it maintains the order where the keys are inserted.
class collections.OrderDict()
Note that overwrite a value of the key-value pair doesn’t change the position of that key.
from collections import OrderedDictletters = OrderedDict([("B", 2), ("A", 3), ("C", 1)])
for key, value in letters.items():
print(key, value)
==============
B 2
A 3
C 1
DefaultDict: A DefaultDict is also a sub-class to dictionary with default values for the key, so there won’t raises a KeyError
.
class collections.defaultdict(default_factory)
If the default_factory
function is not provided, the object with the memory address will be returned, otherwise you can define your own function to generate the value.
from collections import defaultdictexample_dict = defaultdict(object)
print(example_dict['tea'])#object at 0x9g231a23u3018def get_default():
return 'value not exist'example_dict=defaultdict(get_default)
print(example_dict['coffee'])#value absent
ChainMap: A ChainMap encapsulates dictionaries into a unit and returns a list of dictionaries.
class collections.ChainMap(dict1, dict2)
Some usage:
from collections import ChainMapGet all dicts:dic1={'red':5,'black':1,'white':2}
dic2={'B': 2, 'A': 3, 'C': 1}
my_chain = ChainMap(dic1,dic2)
my_chain.maps
#[{'black': 1, 'red': 5, 'white': 2}, {'B': 2, 'A': 3, 'C': 1}]Get all keys:(list(my_chain.keys()))
#['black','red', 'white', 'B', 'A', 'C']Get all values:(list(my_chain.values()))
#[2, 3, 1, 5, 1, 2]Add new dictionary:dict3={'F':10,'D':12}
new_chain=my_chain.new_child(dict3)
NamedTuple: A NamedTuple is an immutable data type that is just like a tuple except that you don’t have to use integer indexes to access members but a typename.
class collections.namedtuple(typename, field_names)
In this way, namedtuples are like dictionaries except they are immutable. One advantage of namedtuples is that it is self-documentary. Moreover, as namedtuple instances do not have per-instance dictionaries, they require much less memory.
from collections import namedtupleTea = namedtuple('Tea', 'type price temp')
white_tea = Tea(name="silver needle", price=30, temp=98)==========================
white_tea
# Tea(name='silver needle', price=30, temp=98)white_tea.name
# 'silver needle'
Note that since namedtuples are immutable, when update the attribute in the data type, we need the ._replace()
method.
white_tea = Tea(name="silver needle", price=30, temp=98)
white_tea._replace(price=35)
To convert to dictionary and convert back:
# convert namedtuple to dictionary Tea = namedtuple('Tea', 'type price temp')
white_tea = Tea(name="silver needle", price=30, temp=98)
white_tea._asdict()])
# OrderedDict([('name', 'silver needle'), ('price', 30),('temp', 98)])# convert dictionary to namedtupledict_tea=dict({"name":"silver needle", "price":30, "temp":98})
white_tea = namedtuple('tea',['name','price','temp'])
white_tea(**dict_tea)
Deque: Deque is a special list for appending and poping from both sides of the container. It provides O(1) time complexity for append and pop operations as compared to list with O(n) time complexity.
class collections.deque(list)
Some useful methods:
- append() : insert the value in its argument to the right end of deque.
- appendleft() : insert the value in its argument to the left end of deque.
- pop() : delete an item from the right end of deque.
- popleft() : delete an item from the left end of deque.
from collections import dequed = deque([1,2,3,4])
d.append(5) #[1,2,3,4,5]
d.appendleft(6) #[6,1,2,3,4,5]
d.pop() #[6,1,2,3,4]
d.popleft() #[1,2,3,4]
UserDict: UserDict is a dictionary-like container that acts as a wrapper around the dictionary objects.
class collections.UserDict([initialdata])
This is used to customise some functionality of user defined dictionary, especially in the case to overwrite some builtin methods.
from collections import UserDict class my_tea(UserDict):
def pop(self, s = None):
raise Exception("Not possible to throw the tea")custom_tea = UserDict(my_tea)
white_tea = custom_tea({"name":"silver needle", "price":30, "temp":98})white_tea.pop()===========
Not possible to throw the tea
UserList: UserList is a list like container that acts as a wrapper around the list objects. This is just like userDict
class collections.UserList([list])
UserString: UserString is a string like container and just like UserDict and UserList it acts as a wrapper around string objects.
That’s so much of it!
Happy Reading!