Test-Driven Development
TDD is a common practice in software development, it makes the programmer focus on the requirements before writing the code, a subtle but important difference. As it is detailed in this post, there are multiple benefits from using TDD: it reduces the number of bugs in your code, it helps you to prioritize interfaces vs implementation, it works as an alert to check whether your last change has broken previously working code, it improves the quality of your code and it can act as code documentation.
In this post, we are going to analyze four kinds of tests that can be useful in software projects: unit tests, smoke tests, integration tests and utility tests. These tests are inspired by the test pipeline of Lenskit library from Minnesota University.
In order to explain each of the tests, we are going to create examples based on Pytest, since I'm interested in artificial intelligence, the code is going to be based on neural networks.
class NeuralNetwork:
def __init__(self, n_layers):
self.name = "NN"
self.n_layers = n_layers
self.model = None
def train(self, dataset):
self.model = train_model(dataset)
def compute_accuracy(self, dataset):
return self.model.compute_acc(dataset)
def train_model(dataset):
# code to train a model given a dataset
The code is really simple, we can initialize the network with a specific number of layers, we can train the network and compute its accuracy given a dataset. For those in the AI field, this network may look too simple, but it is enough for the purpose of this post.
Unit Tests
A unit test checks whether a function or a class method behaves as expected. They are very common in library development. For simplicity, we are just going to test the class initialization.
@pytest.fixture()
def nn5():
return NeuralNetwork(n_layers=5)
def test_unit_create_nn(nn5):
assert nn5.name == "NN"
assert nn5.n_layers == 5
assert nn5.model is None
The first function is a fixture, which is a piece of code that you can use to initialize a test. Also, it can be added directly as an argument in the test.
The second function is the unit test. We check that the name, number of layers and model of the class has been created as expected. We will do a unit test for every method.
It is common to link the unit tests of your library to a continuous integration (CI) system like Travis. The typical behavior is to make Travis run all the unit tests when a developer makes a pull request to the master branch of your repo. That way you make sure that the code you have in master doesn't break. CI systems may be completed with Continuous Delivery, this is normally referred to as CI/CD.
Smoke Tests
Smoke tests are used to make sure that critical parts of the system work. They can be used in production systems to quickly check that there are no obvious failures.
def load_dataset(filename):
# code to load a dataset from a file
def test_smoke_train():
nn = initialize_NN()
dataset = load_dataset('data.csv')
nn.train(dataset)
assert nn.model is not None
In our example, we just check if the system has been trained and that it doesn't fail. These tests can also be added to the CI system when doing a pull request, for that reason they tend to be quick.
Integration/Functional Tests
Integration and functional tests are used to check the correct behavior of a system. While we are developing, we could create code that is not failing, but that provides wrong or suboptimal behaviors. In the case of a neural network algorithm, we would like to make sure that for a known dataset we always get a certain value of accuracy.
@pytest.mark.parametrize('n_layers, targ_acc', [
(2, 0.70),
(5, 0.80),
])
def test_integration_acc(n_layers, targ_acc):
nn = NeuralNetwork(n_layers=n_layers)
dataset = load_dataset('data.csv')
nn.train(dataset)
acc = nn.compute_accuracy(dataset)
assert acc >= targ_acc
For these situations, the parametrize function of Pytest is really helpful. In our case, Pytest will test the same code twice. With a neural network of two layers, we expect an accuracy of at least 0.70. With a neural network of 5 layers, our expected accuracy is higher.
In the CI system, these tests are not executed every pull request, but maybe once a day. The reason is that they typically take longer than other kinds of tests.
Utility Tests
The previous types of tests analyzed are designed to make sure that the code works as expected. But in some situations, tests can be used to just show an example of how to implement the code. This is the task of utility tests.
The most common place to add utility tests is in the documentation. In this case, utility tests are also referred to as doctests.
class NeuralNetwork:
"""Neural network
Args:
n_layers (int): Number of layers.
Examples:
>>> dataset = load_dataset('data.csv')
>>> nn = NeuralNetwork(n_layers=5)
>>> nn.train(dataset)
>>> accuracy = nn.compute_accuracy(dataset)
>>> print(accuracy)
0.87
"""
In many situations, a developer might think that there is no need for a utility test if the documentation is clear enough. However, I found that it helps me to understand very quickly the behavior of a piece of code and at the same time, to develop quicker, especially when I'm reusing code. I use utility tests extensively in my codebase.
Happy testing!