Python Abstract Base Classes for Containers and Container Data Types

Photo by Victor Mui on Unsplash

Apart from the common data types like dictionary, set, list, etc., Python has a wealth of data types at your disposal, and provides the ability to create your own kinds of data type that have certain shapes. In this blog, we are going to look at the Collections.abc , which provides abstract base classes that can be used to test whether a class provides a particular interface, as well as specialised container datatypes, that provides alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.

In python, how do you if an object can be called len() method or not? You may need to check if there is a __len__ method defined on it. This is quite a hassle, if you need to check it one by one. Not only for sized object, this applies for other kinds such as sequence, map, iterable objects, etc.?

In other languages, there is concept called interface to enforce certain properties on an object to create certain object shapes, such as java or TypeScript.

In Object Oriented Programming, an Interface is a description of all functions that an object must have in order to be an “X”. The purpose of interfaces is to allow the computer to enforce these properties and to know that an object of TYPE T must have functions called X,Y,Z, etc.

In python, since it is a dynamic typed language, there is no such enforcement. Luckily, since python 3.5, there is collections.abc module introduced, which provides a variety of protocols to enforce certain shape of an object. If you wanted to hook into a protocol, you could subclass one of these protocols.

example protocols
Collections.abc

At a high level:

  • Container: ABC for classes that provide the __contains__() method. Should return true if item is in self, false otherwise. Note that for objects that don’t define __contains__(), the membership test first tries iteration via __iter__(), then the old sequence iteration protocol via __getitem__() .
  • Hashable: ABC for classes that provide the __hash__() method.
  • Sized: ABC for classes that provide the __len__() method.
  • Callable: ABC for classes that provide the __call__() method. Called when the instance is “called” as a function; if this method is defined, x(arg1, arg2, ...) roughly translates to type(x).__call__(x, arg1, ...).
  • Iterable: ABC for classes that provide the __iter__() method. Checking isinstance(obj, Iterable) detects classes that are registered as Iterable or that have an __iter__() method, but it does not detect classes that iterate with the __getitem__() method. The only reliable way to determine whether an object is iterable is to call iter(obj).
  • Collection: ABC for sized iterable container classes.
  • Iterator: ABC for classes that provide the __iter__() and __next__() methods. See also the definition of iterator.
  • Reversible: ABC for iterable classes that also provide the __reversed__() method. Called by the reversed() built-in to implement reverse iteration. It should return a new iterator object that iterates over all the objects in the container in reverse order. If the __reversed__() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__() and __getitem__()).
  • Generator: ABC for generator classes that implement the protocol defined in PEP 342 that extends iterators with the send(), throw() and close() methods. See also the definition of generator.

This module implements specialised container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.

namedtuple():factory function for creating tuple subclasses with named fields

deque:list-like container with fast appends and pops on either end

ChainMap:dict-like class for creating a single view of multiple mappings

Counter:dict subclass for counting hashable objects

OrderedDict:dict subclass that remembers the order entries were added

defaultdict:dict subclass that calls a factory function to supply missing values

UserDict:wrapper around dictionary objects for easier dict subclassing

UserList:wrapper around list objects for easier list subclassing

UserString:wrapper around string objects for easier string subclassing

Let’s take a look at some of them:

Counters: A counter is a sub-class of the dictionary, and is used to count hashable objects.

class collections.Counter([iterable-or-mapping])

The counter() function can be called on different types:

from collections import Counterprint(Counter(['A','A','A','B','B','C']))
print(Counter({'A':3, 'B':2, 'C':1}))
print(Counter(A=3, B=2, C=1))
=============>Counter({'B': 2, 'A': 3, 'C': 1})
Counter({'B': 2, 'A': 3, 'C': 1})
Counter({'B': 2, 'A': 3, 'C': 1})

OrderedDict: An orderedDict is also a sub-class of dictionary but it maintains the order where the keys are inserted.

class collections.OrderDict()

Note that overwrite a value of the key-value pair doesn’t change the position of that key.

from collections import OrderedDictletters = OrderedDict([("B", 2), ("A", 3), ("C", 1)])
for key, value in letters.items():
print(key, value)
==============
B 2
A 3
C 1

DefaultDict: A DefaultDict is also a sub-class to dictionary with default values for the key, so there won’t raises a KeyError.

class collections.defaultdict(default_factory)

If the default_factory function is not provided, the object with the memory address will be returned, otherwise you can define your own function to generate the value.

from collections import defaultdictexample_dict = defaultdict(object)
print(example_dict['tea'])
#object at 0x9g231a23u3018def get_default():
return 'value not exist'
example_dict=defaultdict(get_default)
print(example_dict['coffee'])
#value absent

ChainMap: A ChainMap encapsulates dictionaries into a unit and returns a list of dictionaries.

class collections.ChainMap(dict1, dict2)

Some usage:

from collections import ChainMapGet all dicts:dic1={'red':5,'black':1,'white':2}
dic2={'B': 2, 'A': 3, 'C': 1}
my_chain = ChainMap(dic1,dic2)
my_chain.maps
#[{'black': 1, 'red': 5, 'white': 2}, {'B': 2, 'A': 3, 'C': 1}]
Get all keys:(list(my_chain.keys()))
#['black','red', 'white', 'B', 'A', 'C']
Get all values:(list(my_chain.values()))
#[2, 3, 1, 5, 1, 2]
Add new dictionary:dict3={'F':10,'D':12}
new_chain=my_chain.new_child(dict3)

NamedTuple: A NamedTuple is an immutable data type that is just like a tuple except that you don’t have to use integer indexes to access members but a typename.

class collections.namedtuple(typename, field_names)

In this way, namedtuples are like dictionaries except they are immutable. One advantage of namedtuples is that it is self-documentary. Moreover, as namedtuple instances do not have per-instance dictionaries, they require much less memory.

from collections import namedtupleTea = namedtuple('Tea', 'type price temp')
white_tea = Tea(name="silver needle", price=30, temp=98)
==========================
white_tea
# Tea(name='silver needle', price=30, temp=98)
white_tea.name
# 'silver needle'

Note that since namedtuples are immutable, when update the attribute in the data type, we need the ._replace() method.

white_tea = Tea(name="silver needle", price=30, temp=98)
white_tea._replace(price=35)

To convert to dictionary and convert back:

# convert namedtuple to dictionary Tea = namedtuple('Tea', 'type price temp')
white_tea = Tea(name="silver needle", price=30, temp=98)
white_tea._asdict()])
# OrderedDict([('name', 'silver needle'), ('price', 30),('temp', 98)])
# convert dictionary to namedtupledict_tea=dict({"name":"silver needle", "price":30, "temp":98})
white_tea = namedtuple('tea',['name','price','temp'])
white_tea(**dict_tea)

Deque: Deque is a special list for appending and poping from both sides of the container. It provides O(1) time complexity for append and pop operations as compared to list with O(n) time complexity.

class collections.deque(list)

Some useful methods:

  • append() : insert the value in its argument to the right end of deque.
  • appendleft() : insert the value in its argument to the left end of deque.
  • pop() : delete an item from the right end of deque.
  • popleft() : delete an item from the left end of deque.
from collections import dequed = deque([1,2,3,4])
d.append(5) #[1,2,3,4,5]
d.appendleft(6) #[6,1,2,3,4,5]
d.pop() #[6,1,2,3,4]
d.popleft() #[1,2,3,4]

UserDict: UserDict is a dictionary-like container that acts as a wrapper around the dictionary objects.

class collections.UserDict([initialdata])

This is used to customise some functionality of user defined dictionary, especially in the case to overwrite some builtin methods.

from collections import UserDict class my_tea(UserDict):       
def pop(self, s = None):
raise Exception("Not possible to throw the tea")
custom_tea = UserDict(my_tea)
white_tea = custom_tea({"name":"silver needle", "price":30, "temp":98})
white_tea.pop()===========
Not possible to throw the tea

UserList: UserList is a list like container that acts as a wrapper around the list objects. This is just like userDict

class collections.UserList([list])

UserString: UserString is a string like container and just like UserDict and UserList it acts as a wrapper around string objects.

That’s so much of it!

Happy Reading!