Object oriented programming
For decades procedural programming was the main tool to structure and divide software into little pieces, but around 1980 a new paradigm became popular: Object Oriented Programming (OOP). However, OOP did not replaced procedural programming, both paradigms still coexist, and, to this day, both of them are the most common programming paradigms. There are languages, like C, that only use functions, others like Java, that only use classes, the building block in Object Oriented Programming languages, and still others, like Python and Rust, are multiparadigmatic: the programmer choses.
Classes
In Object Oriented Programming the class is the main building block. To introduce the concept of the classes let’s write a small program, using first a procedural approach.
Now let’s do the same, but using classes. (In general any program that you can think of can be programmed using both paradigms).
There’s quite a bit to unpack here. We have defined a class with three methods: init, calc_area, and print.
A method is a function defined within a class. Methods are called with the syntax “object.method_name()”. This is a syntax that you have already seen many times in Python, e.g. text_string.upper() or my_list.append(item).
In this class, Rectangle, there is a special method: init. Every time that we want to initialize an object, the class uses this special method. You can create Python classes without the init method, but this is because Python will create a default __init__method for you.
In the previous example the class that we have defined is Rectangle, and we have also created two objects of that class: rectangle1 and rectangle2. An object is an instance of the class, in this case every object is a particular rectangle, with its width and height.
The class has properties, data associated, like, in this case, width and height, and also, its methods define some behaviours. Methods are functions and functions are actions that act on data. So a class has methods, actions, behaviors, and objects have data associated to them and the behaviours provided by the class methods. Sometimes it is useful to think of a class as the representation of a type of thing, like rectangles, cars, or web servers. The methods would represent the behaviour of that type of thing and its properties would be the data associated to every particular instance, every object, of that type of thing.
You could think of a class as a data structure, like a dictionary, with some methods that act on the data stored in that dictionary. It would be something like this:
So, you could almost think of a class as syntactic sugar that bounds some functions to a data structure, like a dictionary. The data structure that holds the specific data of each object, like rectangle1 and rectangle2, in the class methods is represented by self. That is why, in Python, every standard class method has as a first argument self, the argument that represents the data for a particular object. In some languages, like in the old good Perl, this association of the methods to the data structure to create a class is explicit. In Perl they use the bless statement. They bless the data stored in a data structure with the methods/functions located in what they call a package, that is a kind of Perl module.
By the way, our Rectangle class has a print method, but in Python it would be better to name that method as str:
These kind of methods, that start and end with two underscores, in Python are called magic methods.
Attributes: properties and methods
Finnaly, you will hear about attributes. In Python we talk about methods, properties and attributes. Attributes are the sum of the properties and methods, so all that is accessed with the dot notation: “object.attribute”. In this regard you will also read about other terms like: [fields]. There is no consensus about the exact meaning of all these terms: attribute, field, property, and in different programming languages they tend to use them in sligtly different ways. But remember no matter the exact term used: classes have an interface comprised by data and behaviour.
If you want to access the list of attributes of an object, you can use the dir built-in function.
dir returns a big list of attributes because Python, under the hood, adds a lot of functionality to any object. Moreover, there are quite a lot of methods whose names start and ends with two under scores, like: init, class, or name. These are known as special or magic methods.
Encapsulation
Classes and modularity
So, classes we created by the fussion of data structures and methods that worked on those data structures. One could ask, if we can write our rectangle logic using a dict as a data structure and a couple of functions, are classes really necessary and, moreover, why bother? Well, classes are not really necessary. There are many languages, like C, that do not use classes at all, they do not even have the concept, and those languages are used to create huge and successfull software projects. For instance, the Linux kernel, that is mostly C, has more than 28 millions lines of code, and more than 20 thousand contributors.
So clases are not necessary, but they can be very convinient because they are even more modular than mere functions. By using classes we can isolate the different pieces that comprise the software even more effectively than with functions because they merge the data structure and the behaviour. And in that way the structure of the data becames a hidden detail implementation, it is not part of the interface anymore. In the function based rectangle implementation the data structure, in that case a dictionary, was part of the interface of the rectangle_print funtion, while in the print method of the Rectangle class the way in which the data is given to the method is not a user concern. For instance, we could reimplement the Rectangle class that stores the object data in a different way, but that wouldn’t need to alter the interface.
In this particular case, to maintain the same exact interface, we have created a private (more about this later) dictionary to hold the object data, _sides, and we have used the property decorator to provide the width and height properties that the previous implementation had.
Classes are a way of approaching the question of how data and action, data structure and functions, relate. In the procedural approach data structures are part of the interface, whereas in the class they way in which the data is stored is mostly hidden, it is almost an implementation detail.
This tight relation between data and behaviour is known as encapsulation. The class interface is smaller than the combined interface of all the functions required to do the same job. So classes are convinient, among other reasons, because they allow to modularize the software projects more effectively that functions. This is one of the major appeals of the Object Oriented Programming approach, the modularization is deeper than in the procedural case. With the right design, Object Oriented Programs are easier to scale.
You can create huge projects based on the procedural paradigm, but to have a highly modularized project, you have to be very disciplined when defining and changing the data structures. For instance, you could collect all functions that work with a particular data structure in the same place, like in a file, a Python module. In that way if the data structure would had to be changed, the functions that should be modifed would all be collected together in the same file. This kind of procedural approach would be a limited kind of encapsulaiton and would have some of the benefits of the Object Oriented paradigm.
Let’s see another example of how hidding the structure of the data helps with modularity. Imagine that we have implemented a Circle class.
Now we are asked to add a feature related to ellipses and to avoid extra code we take advantage of the fact that circles are just a special kind of ellipses and we decide to reimplement our Circle class in a completely different way.
We have implemented Circle as a subclass of Ellipse. This is called: object inheritance. But this is not the most relevant part here. From the modularization point of view, the most important lines are the two last ones. The user of the Circle class, despite the complete reimplementation of the Circle class, has not changed anything at all in its code. This is what we mean by modularity. This code is modular, the programmers that build the Circle class and the ones that use it could be completely independent and the implementers could radically change their implementation, including the data structures that store the object data, without the users of that funcionality needing, in many cases, to change their code.
Private and public attributes
Classes can hide part of its data and methods from the rest of the program to create an even smaller interface, and the smaller the interface, the looser the conection between the code modules, and the more maintainable and extensible the code will be.
One way to create smaller interfaces is to limit the variables/properties made available to the class users. This is also part of the encapsulation provided by the object oriented approach, we can make variables private, available only to the class and not the class users. For instance, in our rectangle example we could make the width and height variables private, not available outside the class.
The only difference is that now the width and height properties are called _width and _private. In other programming languages they have ways of enforcing the distinction between private and public methods and properties, but that is not the case in Python. In Python everything is public, can be accessed, and privacy is a mere convention. Methods and properties whose names start with an underscore are supposed to be private, they should be only accessed by the class methods. That is not enforced by the language, but be aware that Python maintainers will asume that you understand and honor this convention and, thus, they will change these “private” attributes without any previous warning.
The main advantage of having private attributes is that they are only used in a limited part of the code, like within the class, and can not used by the rest of the code base. That means that when we need to change something related to a private attribute, the change will be limited to that class, and maybe, to their descendants (more about later). This is particularly useful in large projects where changes are inevitable. In contrast, in procedural programming, a change in any variable used by a function will require changes in several other parts, making it more complex and time-consuming to fix bugs and to functionalities. As a general rule, to guarantee a high level of modularity, make public as few attributes as possible.
Objects store state
Sometimes it is necessary to keep track of a state, like the value of a variable, between function calls. In the procedural approach we could do it by storing the state in a variable that is passed every time to the function. Let’s imagine that we want to count how many times a function has been called, we could write something like this:
This would work, but to pass and return the num_times variable everytime to the function it is a bit cumbersome. Moreover, this variable might be only relevant to that function, so having the variable available outside the function scope is uncessary and it would be just a source of problems. Alternatively, we could use a global variable.
This is much more convinient and requires less code, but we are making use of a global variable to keep track of the state, the number of times that the function has been called, and global variables break modularity because they are avaible in all code base, any part of the code could alter them, and, thus, are prone to create maintainability problems.
Objects provide a solution that it is both convenient and modular because it does not use global variables.
Now the variable that we are using to store the state, number_of_times, is only available to the greeter objects.
In this case number_of_times variable is an object property, so it is not shared between objects, it will reflect only how many times the print_hello method has been called in a particular object.
However, classes even allow us to create class level variables shared by all objects if we need to.
Be careful with the class level variables because their scope is larger than the object variables, and a more limited scope facilitates code modularity.
This is another big advantage of the Object Oriented paradigm: it allows us to keep track of a state between calls in a very natural way without having to use global variables.
Abstraction
Classes facilitate code reuse by creating hierarchies of classes that share some of their behaviours by using inheritance. This is a feature that I do not usually need and that some languages with object oriented capabilities, like Rust, do not implement or do not encourage.
Another way to share behaviors between different clases is to just use the idea of a shared behaviour, that is, to have similar interfaces. In Python this kind of ideas are widely used. For instance, there are core pythonic ideas that are just interfaces, shared behaviours, like: file-objects, iterators, sequences or mappings. If you are interested in these ideas I recommend to you the Real Python tutorial on interfaces.
Procedural vs object oriented
Python, as we have seen, allows us to write procedural and object oriented code. You can chose different approaches to solve different problems and you can combine them. That forces you to decide what to do in each occasion and there are no strict rules to follow, you will develop your style by reading and writting code.