TDD is a common practice in software development, it makes the programmer focus on the requirements before writing the code, a subtle but important difference. As it is detailed in this post, there are multiple benefits from using TDD: it reduces the number bugs in your code, it helps you to prioritize interfaces vs implementation, it works as an alert to check whether your last change has broken previously working code, it improves the quality of your code and it can act as code documentation.
In this post we are going to analyze four kinds of tests that can be useful in software projects: unit tests, smoke tests, integration tests and utility tests.
I want this post to be hands-on, therefore, we are going to do some coding based on Pytest. Pytest is one of the most used libraries for testing, being the other one unittest. Unittest library is the default test library for Python and it's included in the core code. Pytest, however, has to be installed, but from my point of view, it has a cleaner API and some interesting features that make it generally my primary choice when I'm developing libraries.
In order to explain each of the tests, we are going to create a dummy class, since I'm interested in artificial intelligence, my dummy class is going to be a neural network.
class NeuralNetwork: def __init__(self, n_layers): self.name = "NN" self.n_layers = n_layers self.model = None def train(self, dataset): self.model = train_model(dataset) def compute_accuracy(self, dataset): return self.model.compute_acc(dataset) def train_model(dataset): # code to train a model given a dataset
The code is really simple, we can initialize the network with a specific number of layers, we can train the network and compute its accuracy given a dataset. For those in the AI field, this network may look too simple, but it is enough for the purpose of this post.
A unit test checks whether a function or a class method behaves as expected. They are very common in library development. For simplicity, we are just going to test the class initialization.
@pytest.fixture() def nn5(): return NeuralNetwork(n_layers=5) def test_unit_create_nn(nn5): assert nn5.name == "NN" assert nn5.n_layers == 5 assert nn5.model is None
The first function is a fixture, which is a piece of code that you can use to initialize a test. Also, it can be added directly as an argument in the test.
The second function is the unit test. We check that the name, number of layers and model of the class has been created as expected. We will do a unit test for every method.
It is common to link the unit tests of your library to a continuos integration (CI) system like Travis. The typical behavior is to make Travis run all the unit tests when a developer makes a pull request to the master branch of your repo. That way you make sure that the code you have in master doesn't break. CI systems may be completed with Continuos Delivery, this is normally referred as CI/CD.
Smoke tests are used to make sure that critical parts of the system work. They can be used in production systems to quickly check that there are no obvious failures.
def load_dataset(filename): # code to load a dataset from a file def test_smoke_train(): nn = initialize_NN() dataset = load_dataset('data.csv') nn.train(dataset) assert nn.model is not None
In our example, we just check if the system has been trained and that it doesn't fail. These tests can also be added to the CI system when doing a pull request, for that reason they tend to be quick.
Integration tests are use to check the correct behavior of a system. While we are developing, we could create code that is not failing, but that provides wrong or suboptimal behaviors. In the case of a neural network algorithm, we would like to make sure that for a known dataset we always get a certain value of accuracy.
@pytest.mark.parametrize('n_layers, targ_acc', [ (2, 0.70), (5, 0.80), ]) def test_integration_acc(n_layers, targ_acc): nn = NeuralNetwork(n_layers=n_layers) dataset = load_dataset('data.csv') nn.train(dataset) acc = nn.compute_accuracy(dataset) assert acc >= targ_acc
For these situations, the parametrize function of Pytest is really helpful. In our case, Pytest will test the same code twice. With a neural network of two layers, we expect an accuracy of at least 0.70. With a neural network of 5 layers, our expected accuracy is higher.
In the CI system, these tests are not executed every pull request, but maybe once a day. The reason is because they typically take longer than the other kinds of tests.
The previous types of tests analyzed are designed to make sure that the code works as expected. But in some situations, tests can be used to just show an example of how to implement the code. This is the task of utility tests.
The most common place to add utility tests is in the documentation. In this case, utility tests are also refered as doctests.
class NeuralNetwork: """Neural network Args: n_layers (int): Number of layers. Examples: >>> dataset = load_dataset('data.csv') >>> nn = NeuralNetwork(n_layers=5) >>> nn.train(dataset) >>> accuracy = nn.compute_accuracy(dataset) >>> print(accuracy) 0.87 """
In many situations, a developer might think that there is no need for a utility test if the documentation is clear enough. However, I found that it helps me to understand very quickly the behavior of a piece of code and at the same time, to develop quicker, specially when I'm reusing code. I use utility tests extensively in my codebase.