Data classes in Python
Table of contents
Let’s talk classes in Python. They’re a great way to organize functions &
behaviors that center around a particular instance of data & functionality. A
simple class is with an __init__
method to initialize some properties on the
instance. There are other special methods in Python like the __repr__
method to get a string representation of the class among others. Below I’ll
break down what kinds of special methods you’ll need to add to your class to
get the most out of using them & how to simplify writing classes when they’re
centered around data specifically using Data Classes.
tl;dr
Python 3.7 introduced Data Classes
as described in PEP 557
. A Data Class is a way to write less boilerplate around classes by using the@dataclass
decorator to add special methods based on the variables defined in the class that use type annotations.If you’d like to read more about them yourself, you read the documentation linked above.
The inspiration for this post was reading through the Pydantic documentation on
dataclass
which lead me back to these very useful class decorators for data-driven classes in Python.
Writing a data-driven class the long way
Okay, so bare with me here. The next few sections are going to show you how to write a good-practice data-driven class without using the Data Classes library in Python >= 3.7.
Curious to see @dataclass
in action?
If you’d like to skip examples of special methods for a Python class without Data Classes then ⬇️ you can skip to that section by clicking here .
This works OK, but we aren’t done!. There’s a whole set of special methods missing here that need to be implemented. Things like initializing & printing values are the bare minimum here.
Setting up equality checking
First, let’s start with checking for equality. This is useful if we’re passing
around flags & want to compare them to each other. Here’s the code for adding
the __eq__
& __ne__
short for equal & not equal respectively.
Now that we can compare instances of SimpleFlag
, we can think about
hashability & making this class hashable. In order to do that we’ll need to
create a __hash__
deterministic method that returns a value based on the
unique properties of the class.
💡 What's "hashable" mean for a class in Python?
Understanding why hashability is important
To understand what a hashable object is in Python, let’s define what
hashability means & why it’s needed. You need to make your class hashable when
you will use instances as items in a dictionary or set. Python includes a
hash()
function
that you can use with arguments consisting of
properties of the class.
The __hash__
special method
needs to return an integer
with the same value every time so it only works best when the arguments being
passed-in are immutable. To be able to use the __hash__
method, you’ll need to
implement a __eq__
as well.
Practical uses for hashable classes
Having a hashable class is important so you can create class instances that can be used as keys in dictionaries. This allows for efficient lookup & retrieval. When using class instances in a set, you can then filter & de-duplicate data. Lastly, hashable classes are needed for creating custom data structures such as bloom filters & hash tables.
Now if we make the instance hashable, we have to initialize our class properties name
& is_active
arguments immutable.
Making properties immutable manually
Okay, so the next step is to re-write the __init__
method & add the
@property
decorator to new methods for the properties to be able to access
them.
As you can see in the highlighted sections above, that’s a lot to be adding to this simple class, but so far we have immutability in our class, hashing in case you want to store instances in a dictionary, along with checking for equality or not, being able to print the values of the class in a string representation, & initialization of the class.
Another thing to keep in mind here is that because we’ve used double-underscores
or dunders (__
) as the prefix to our properties in the class, we now have to
modify our access of these properties in the class with the same prefix. For the
next change, I’ll use the new __
prefix when accessing the properties, but
keep in mind that you have to update how you access properties internally in the
class when you make properties immutable.
Setting up comparison methods
We’re still missing one major feature, & that’s sorting. So we have to implement those special methods as well below.
With those last changes, we now have a fully implemented class with the special methods needed for initializing, printing, sorting, comparing, & hashability. We’re done with this class as long as none of the properties need to be changed.
Check out the whole file for
simple-flag.py
.
Below I’ve highlighted all the special methods that have been added to
simple-flag.py
in a single file. You can see that there are 11 methods that
were added to the class.
Adding more properties to the class
Now let’s add a new feature to our simple-flag.py
because tracking name
&
is_active
is not enough. We also want to track a new property named
created_at
. In order to do this, you’ll need to modify nine methods &
add a new @property
decorator to a new method to return the immutable
created_at
value. This would mean modifying all the code that sets
properties & gets properties from the class. It also means all of the
comparison & hash methods will need to be modified.
Taking the original simple-flag.py
file & modifying it with all the changes I
mentioned produces a fairly large diff using git diff --no-index --stat
for
just adding a single property.
If you're curious about seeing all the changes you can click here.
As you can see below, there are a lot of modifications that need to be made just because there was a single property added to the class. As more properties get added to a class, you’ll need to modify more & more sections of your class in the similar ways.
Now that’s a lot of changes to add or remove properties. It’s easy to make a mistake as well as you’re finding & replacing text. Thankfully, there is a better way to write classes in Python where these 11 methods & more are added automatically. These classes are called Data Classes. They are a great way to succinctly define classes that store data.
Writing our SimpleFlag
class with dataclasses
To write the same simple-flag.py
class from above using the Data Classes
decorator, you will first notice that there is a lot less method definitions
involved.
Your eyes aren’t playing tricks on you. That’s it. As PEP 557
said, this is a convenient way to create classes. The @dataclass
decorator
takes the type annotations from the class variables definitions & adds the
following special methods to our SimpleFlag
instance.
__init__
- How the class gets initialized is taken care for us.- You can run your own code with
__post_init__
if you want to run something after the auto-created__init__
method.
- You can run your own code with
__repr__
- How the class is represented is taken care for us.__eq__
- How the class is compared between instances is taken care for us.
Assigning default values
You can also assign default values to properties is straight-forward. Taking the
example above, let’s add a default value of False
to the is_active
property
if it’s not passed in.
Adding immutability to Data Classes
Next, let’s add immutability to SimpleFlag
by passing the frozen=True
keyword argument into the decorator call.
Now the Data Classes library will add following special methods as well.
__eq__
- How the class is compared between instances is taken care for us.__hash__
- How the class generates an integer hashed by the values of the instance to support using instances as keys in a dictionary or items in a set.__setattr__
This is used to raise aFrozenInstanceError
if it’s called to set a value to an immutable property.__delattr__
This is used to raise aFrozenInstanceError
ifdel
is called to delete an immutable property.
Modifying properties by creating new copies
With the frozen=True
keyword argument in the decorator call for the Data
Class, you can’t modify properties of an instance directly anymore. But. You
can create a copy with a different value for an immutable property like so.
With the dataclasses.replace()
function, you pass in the class you’d like to
modify as the first argument with keyword arguments for the properties you’d
like to modify.
It is possible to still modify properties when the class has immutable
properties. There’s an example further along that shows how to modify a
property when frozen=True
is passed into the decorator using the
object.__setattr__
method
.
Adding ordering to Data Classes
Adding comparisons to a Data Class is easy with the order=True
argument
passed into the decorator.
All of the following special methods get generated. The comparison is done between tuples of the classes fields, in order.
__ge__
- How the class runs a rich comparison for greater than or equal comparisons.__gt__
- How the class runs a rich comparison for greater than comparisons.__le__
- How the class runs a rich comparison for greater than or equal comparisons.__lt__
- How the class runs a rich comparison for greater than comparisons.
Adding a sort_index
to control ordering
While order=True
writes our functions for us, we can still control ordering
operations by tying them to a particular property on the class.
Take a look at the highlighted sections above. First, we have to import a new
function from dataclasses
called field
. We need to this to control how the
sort_order
property gets initialized and how it’s represented in string form.
You can also see that there’s a new special method that we need to add to our
class to run after the initializing step. This is where we set the value for
sort_order
from the created_at
property. This will now be the property that
is used in rich comparisons like, >
, >=
, <
, <=
.
Setting sort_index
when the class is immutable
If you’re using frozen=True
& order=True
in the decorator, you won’t be able
to set variables in the __post_init__
method in the same way as above. Instead
of assigning a value to the property on self
directly, you’ll need to use the
object.__setattr__
method to assign a value to the immutable property.
As you can see from the highlight above, the function signature is the object
(self
), the property (sort_index
), and the value (self.created_at
) for
that property.
Setting default values
With Data Classes you can also set defaults for properties of the class. You
may have noticed this in the example for the section Assigning default
values
where we made the is_active
property have a default value
of True
.
While this works well with types such as bool
, str
, or int
. But for types
like dict
, list
, or set
, we will need to set it the value for the property
with the field
function and the argument default_factory=<type>
.
Creating properties that are unique to an instance
When working with Data Classes, you will have to use the fields
function
from the dataclasses
library to make certain mutable properties unique to the
instance. Bellow we’ll create a new tags
property that is a list of strings.
In this example above, the list will be shared across all instances of the
class. In order to make tags
unique, we will need to import the fields
function from dataclasses
and set a default_factory
argument equal to the
list
function in Python.
With the highlighted code added above, you will now have the tags
property
unique for each instance of SimpleFlag
.
Whew, & that’s a wrap folks
I’ve covered a lot in this post. Python classes can be written in two distinct
ways such as a behavior-driven class or a data-driven class. If you’re writing
the latter of these types, you will want to write much less boilerplate & use
the new @dataclass
decorator to make the maintenance of Python classes much
easier without having to create special methods that could lead to mistakes if
they’re written manually. Thanks for reading!
This post was written by a human & not by artificial intelligence (AI) tools. I don't have anything against AI but I am interested in differenciating content created by people versus machines. To find out more about the Not by AI badge, please click it.
If you enjoyed this post, please explore other posts by the topics listed below.