How you write code is as much a matter of personal style as how you wear your hair. But just as there are bad hair days, there are bad code days.
The Pragmatic Programmer
Syntax
Syntax refers to the arrangement of words and phrases to create well-formed sentences in a language.
Programming languages, usually, have a well defined grammar; a set of reserved keywords, data types, operators, punctuation, scoping rules, etc... In Python 3, this syntax is defined in the Python Language Reference.
While the grammar of a language gives us a framework with which to produce valid programs, it does not prescribe how to write clean code. Knowing the grammar of a spoken language allows one to form coherent sentences, but does not necessarily make one an eloquent communicator.
Conventions
No, I don't mean comicon, but rather a set of additional rules that are not enforced by the language, though third party tools often exist to add a layer of enforcement, but are generally accepted by the community. Conventions will be mentioned throughout these articles, as following them is a key part of writing clean code. Conventions allow programmers around the world to write code in a unified manner that extends beyond syntactical rules. This normalization of code reduces the mental barrier to entry when reading foreign code, as a well trained programmers brain knows what to expect.
Conventions establish a predictable structure and format that facilitate a more pleasant reading experience by setting expectations on code organization, appearance, and levels of abstraction, making it easier for readers to navigate and understand the codebase.
Imagine you were asked to help set the table at a friends house, you are looking for forks, spoons and knives. While reading this sentence, you are already imagining a drawer around 20 - 30cm in height, around 40 - 70cm in width, probably located right under the counter above a larger drawer or cabinet. This mental image is formed because most people follow a similar convention of organizing kitchen utensils. Similarly, adhering to programming conventions can make it easier to understand and navigate code. Just as you know where to look for the utensils, following conventions in programming helps you know where to look in the code and what to expect.
Naming
Other than reserved keywords, programmers are left to their own devices when it comes to naming variables, functions, classes, their children, and so on.
In Python, the PEP 8 style guide is the de facto standard for naming conventions. It is a good idea to read through it at least once, and to keep it handy for reference, though we will cover much of what it contains throughout the articles in this series.
For example, here are a few conventions around naming various entities in Python:
Convention | Example | |
---|---|---|
Variables | lower case separated by _ | my_variable |
Constants | UPPER CASE words separated _ | MY_CONSTANT |
Functions & Methods | lower case words separated _ | my_function |
Classes | CamelCase | MyClass |
So one might think, following this set of conventions will lead one to write clean code. Clean code, sadly and luckily for our jobs, can't be generated from a set of rules, it extends far beyond the scope of rules to the realm of art. Though any code that doesn't follow these basic rules likely meant the code isn't clean.
Note that there are many conventions beyond naming, including ones that relate to design, comments, docstring, testing and so on. It is highly recommended to learn and follow conventions.
Variables
In programs, variables are names which refer/point to an object. For example, the following program defines two variables name
and age
which refer to a string and integer respectively:
name = "alexis"
age = 28
The thoughtful naming of these variables is a key component of clean code. If the above code just said x = 28
the reader would have no reasonable way of ascertaining that x
refers to an age. If the code instead said age_of_the_person_called_alexis = 28
that may be quite specific and clear to the reader, but would add unnecessary complexity!
The art of naming is again the same art of the rest of clean code. What is the minimum (required) complexity to naming a variable such that the context can reasonably be ascertained without slowing down or confusing the reader. The names of variables should tell the reader what they contain in the context they live in. In the above context, it is clear that age
refers to the age of the person who's name is defined of the preceding line. Knowing the context and the level of abstraction is an important part of not only naming variables, but of all clean code concepts. We will discuss abstraction in a separate article as it is a large topic that deserves its own treatment.
Common Naming Mistakes/Anti-Patterns
Not Following Conventions
Follow the pep8 conventions or the conventions of the code base you are writing in.
Unecessary Acronyms & Shortening of Wrds
Avoid trying to save a few keystrokes by shortening variable names. For example usr_inp
instead of user_input
. The exception to this advice is for well known or agreed upon acronyms/placeholders such as _
, i
, num
, N
and so on.
Inconsitent Style
Whatever style of naming you choose, keep it consistent throughout the code base. If the code base is not yours, see if you can figure out what style they use. It doesn't matter if the style is not your preferred one, consistency is key.
Overwriting Reserved Keywords
Python has a small set of reserved keywords, avoid using variable names that overwrite them, I quite often see list
, dict
or type
used as variable names, which is a bad idea!
Datastructures in the Name
It is tempting to clarify what the name is storing. For example let's say we have a variable that is storing a list of floats representing velocities. It can be tempting to name this velocities_list
as it allows the reader to quickly figure out what the variable is storing, so it seems like an easy way to improve code clarity. It is a better idea however to use type hints (discussed further down) if the type of a variable is not clear from the context. The question also arrises, do I care if velocities are stored in a list? or do I only care if they are in a Sequence
type? Should it therefore be called velocities_sequence
? Variable names should contain the minimum necessary complexity to convey the context what the variable stores. Specifying that the velocities are in a list
is not always necessary.
Comments
Clean code is self-documenting!
At first glance, comments seem like a brilliant idea. We can add them everywhere to make it extremely clear to any reader what our intention was, what the code is doing, what our favorite cats name is and so on. However, comments can add unecessary complexity!
Comments are often seen as a freeby when it comes to improving code legability, with programmers ignoring the fact that each comment is an additional thing to:
- maintain: as code changes, so do any comments that refer to this code (see Section 2.4)
- read: some comments are good, however if there are too many comments that are bad, it becomes difficult to know which are important to read. It becomes easy for readers to skip comments entirely when being bombarded by them.
- not enforced: There are no enforcement mechanisms for comments. From the perspective of the interpreter, comments might as well not exist. Comments therefore are entirely enforced by the programmer and readers. This is related to maintainability, as each comment must be read, understood and validated manually.
Comments that state the obvious
It is common for programmers to want to comment everything:
# add x and y and assign it to v
v = x + y
# double the value of v
v *= 2
# print v
print(v)
Comments fall out of date
Another common issue in fast moving codebases are comments that fall out of date. What might start as:
# add x and y and assign it to v
v = x + y
can quickly become:
# add x and y and assign it to v
v = x * y
due to a changing requirement or bug fix. We now have a comment which disagrees with the code it comments! What a nightmare! Comments which are out of date, and therefore wrong, also contribute to the mental overhead of reading code. The reader must now decide whether to trust the comment or the code, and if the comment is wrong, what else is wrong?
Commented out code
Version control systems (such as Git) are a critical tool for the development process. We will not be delving into Git in this article but it is worth mentioning that it is a tool that allows us to track changes to our code over time.
Even with a version control system, it is common to see:
# v = x + y
v = x * y
where old code has been commented out in favor if new code. Do not do this! Delete the old code! If it is commented out, it is no longer needed. If you need it again, you can always revert to a previous version of the code.
The example above may not seem so terrible, however imagine a large code base with seemingly random commented out code throughout it. After a short while nobody remembers why the code was commented out, what it used to do, will it still work if we uncomment it? and so on.
How to Write Good Comments
Now that we have gone over what bad comments commonly look like, what do good comments look like?
Good comments reduce the complexity of the code they comment.
Common ways of achieving this is to add context and information, when required, that may not be apparent from reading the code itself. It is incredibly rare that this missing information relates to what the code does itself, rather it's more likely to be related to WHY the code exists at all.
# Need v to keep track of the largest possible value
v = x * y
Control Flow
Type Hints
Python is a dynamically typed language, meaning that the type of a variable is not known until runtime. Languages such as C++ are statically typed, meaning that the type of a variable is known at compile time.
Let's say we want to define a function which takes a sequence of integers as input and returns the sum of squares as output. In C++11 onwards we could write:
int sum_of_squares(std::vector<int> numbers) {
int sum = 0;
for (int number : numbers) {
sum += number * number;
}
return sum;
}
Note that the type of the input is specified in the function signature, and the type of the output is specified in the return type.
We specify to the compiler that numbers is a vector of integers (std::vector<int>
), and that the output is an integer (left hand side of int sum_of_squares
). Inside the function we can see that the sum
is defined to be of type integer, with even inside the loop the variable number being defined as being of type integer.
In Python, we could write the same function as follows:
def sum_of_squares(numbers):
total = 0
for number in numbers:
total += number * number
return total
with there being no reqirement to specify the types of any inputs, outputs or variables.
Type hints were introduced in PEP 484 and allow us to specify the types of variables, inputs and outputs. The above function could be written as follows:
from typing import Sequence
def sum_of_squares(numbers: Sequence[int]) -> int:
total: int = 0
for number in numbers:
total += number * number
return total
The syntax of adding some text after a colon is called an annotation. These are not enforced by the Python interpreter, and are simply there to provide information to the reader. For example, we could write:
total: "hello how are you today" = 0
Type hinting variables
In practice it is rare to add type hints to variables in Python, type hints are primarily used in function and method signatures.
Type hints are an annotation that uses third party tooling such as MyPy for enforcement. These tools leverage the Python annotation functionality to add typing information.
The syntax is:
variable_name: type = value
with the special exception of function/method signatures where the syntax is:
def function_name(input_name: input_type) -> return_type:
So you might imagine signatures such as:
def add(a: int, b: int) -> int:
def write_text_to_file(file_name: str, contents: str) -> None:
def grade_students(students: Sequence[Student]) -> Sequence[Grade]:
Reading typed signatures reduces the mental overhead of understanding what a function expects as input and returns as output.