Weeks 7 - 8: Test-driven development; Python documentation
Test-driven development
Next in our agenda is the issue of test-driven development (TDD). In test-driven development, we use software tests to ensure our program is bug-free, but also to provide a roadmap or specification for how it should behave. Here, we are mainly interested in unit tests, which are defined below, even though there are several other testing models.
Software (unit) test: A program that compares the output of a piece of code to its expected output.
Test-driven development builds on the following principle:
Testable code is good code.
Some examples of tests follow:
- a test than ensures adding elements maintains the heap invariant
- a test that ensures that
pop()
will return the last element added to a stack - a test that calls a custom function with an unsupported type and ensures that it will throw a
TypeError
In test-driven development, tests are written before the source code! The TDD workflow is as follows:
-
Plan a new piece of functionality (e.g., the
push(x)
operation on a stack). This step involves writing tests that help describe the desired functionality, before writing any code for the function itself. -
Implement the new functionality. This step ends when all test cases from Step 1 pass, indicating the new component works as expected.
-
Refactor your code. Coding style, code efficiency, etc. can be taken care of in this step.
The above is not always strictly followed. In particular, steps 1-2 can be iterated, and we can always write more tests for existing software components or refactor our code in the future.
Python project structure
In modern software development, it is standard practice for every open-source project to "ship" with its own test suite, i.e., a collection of software tests that an end user can run after installing the module to ensure it runs as intended on their machine / environment.
This introduces a great opportunity to talk about the structure of a typical open-source project. Many open-source projects follow the structure shown below:
project/
src/
<source code goes here>
test/
<software tests go here>
doc/
src/
<documentation source, e.g., Markdown files>
build/
<build location for the documentation
examples/
<examples of using the code>
README
INSTALL
LICENSE
VERSION
Some remarks are in order:
- the file called
README
provides a synopsis of the project and may include self-contained examples, guidelines for reporting bugs, installing, etc. - the file called
INSTALL
includes installation instructions, but can be omitted if these are particularly simple and can be included inREADME
LICENSE
contains the software license under which the software is offered. A list of open source licenses is available here. You can also read what happens if you omit a LICENSE file.- including
examples/
of usage is optional. If the examples are simple enough, they can go in theREADME
file. However, it is always a good idea to add examples that demonstrate use of your library / module in nontrivial settings here.
Python projects obey a similar structure. However, some things change slightly. For example, if your module
is called my_module
, you would be using that name instead of src/
.
project/
my_module/
__init__.py
# other .py files here
my_submodule/
__init__.py
# submodule .py files here
another_submodule/
__init__.py
# submodule .py files here
test/
# test code here
...
Here, __init__.py
is a special kind of file; it informs Python that the folder it is located
in is actually part of a Python module. It allows you to, e.g., write
>>> import my_module
>>> import my_module.my_submodule
>>> import my_module.another_submodule
If you had omitted the __init__.py
file in the another_submodule/
folder, you would not
be able to import it as in the code snippet above.
Note
The __init__.py
file is responsible for a number of other things, like exposing
functions and classes to the end user, etc. It is implicitly executed when you
import the module it lives inside. You can read more here,
in addition to the examples we will be presenting here.
Example I: writing tests for a stack
Suppose we want to implement our own stack in Python, backed by a Python list. The first stage in test-driven development is to identify what functionality we want to implement - e.g., what operations should our stack support. Here is how the project would look like:
stackProject/
stack/
__init__.py
stack.py
test/
__init__.py
test_stack.py
# INSTALL, LICENSE etc.
We expected to define a class called Stack
in stack/stack.py
. We would also
like to expose this class to a user that imports the stack
module. This means
that our stack/__init__.py
file should look like this:
from .stack import Stack
Initially, our stack/stack.py
file only contains a skeleton of the class,
since in test-driven development we write our test suite before writing
the actual implementation. Here is how stack/stack.py
will look like:
class Stack(object):
def __init__(self):
self.stack = []
def __len__(self):
return len(self.stack)
def push(self, x):
pass
def pop(self):
pass
def peek(self):
pass
def clear(self):
pass
To write our unit tests, we will use the help of a library. There are several libraries for testing in Python, e.g.:
unittest
: a built-in module for writing unit tests.nose
: extendsunittest
with extra functionality.hypothesis
: a module property-based testing.
Here, we will use unittest
. With unittest
, we create a class that subclasses
the unittest.TestCase
class that includes our tests. We typically create a new
such class for each piece of code we are testing (e.g., one test class per submodule,
or even one test class per class in the source code).
To write test cases, we need to figure out what we want our Stack
class to
implement. Here is a list of operations we would like it to support:
__init__(self)
: initialize the stack with an empty listpush(self, x)
: push an element onto the stackpop(self)
: remove the element on top of the stack and return itpeek(self)
: return the element on top of the stack without removing it__len__(self)
: return the number of elements in the stackclear(self)
: empty the stack
Here is what (an incomplete version of) test_stack.py
could look like:
import unittest
from stack import Stack
# convention: test classes should be named Test<X>
class TestStack(unittest.TestCase):
# setUp(): a special method to run before each test method
def setUp(self):
"""Create empty stack before each test"""
self.stack = Stack()
# convention: test methods should be named test_<name>
def test_push(self):
self.assertEqual(0, len(self.stack))
self.stack.push(0)
self.assertEqual(1, len(self.stack))
self.stack.push(100)
self.assertEqual(2, len(self.stack))
self.stack.pop()
self.assertEqual(1, len(self.stack))
def test_pop_empty(self):
# should raise a ValueError when popping empty stack
with self.assertRaises(ValueError):
self.stack.pop()
def test_pop(self):
self.stack.push(5)
self.stack.push(10)
self.assertEqual(10, self.stack.pop())
Naming conventions
The naming conventions used above are important. In particular, if we want
unittest
to discover our tests automatically, we should follow these
conventions:
- tests should live inside a
test/
ortests/
folder - all test classes should be named like
Test<Name>
- all test methods should be named like
test_<name>
In addition, the test/
subdirectory should contain an __init__.py
file
as well.
Assertions
In most tests, we are asserting that a condition is true. This is done
using the unittest.assert<X>
family of functions. For example, we write
self.assertEqual(1, len(self.stack))
to make sure that the length of the stack is 1. It is pointless to memorize all these functions, since there are so many of them and their usage is similar. You can check the official doc for the most commonly used ones.
Now, to run our tests, we should navigate under the project/
folder and run:
python -m unittest
At this point, all our tests will fail. The output will look something like below:
FFF
======================================================================
FAIL: test_pop (test.test_stack.TestStack)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Documents/stack_example/test/test_stack.py", line 25, in test_pop
self.assertEqual(10, self.stack.pop())
AssertionError: 10 != None
======================================================================
FAIL: test_pop_empty (test.test_stack.TestStack)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Documents/stack_example/test/test_stack.py", line 20, in test_pop_empty
self.stack.pop()
AssertionError: ValueError not raised
======================================================================
FAIL: test_push (test.test_stack.TestStack)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Documents/stack_example/test/test_stack.py", line 15, in test_push
self.assertEqual(1, len(self.stack))
AssertionError: 1 != 0
----------------------------------------------------------------------
Ran 3 tests in 0.000s
FAILED (failures=3)
The top line is a string FFF
, which indicates that all 3 tests failed. The
next few lines give detailed information for each test.
Let us try to fix the code in Stack
. In stack/stack.py
, we modify the code
for the 3 methods that currently do nothing as follows:
class Stack(object):
...
def push(self, x):
self.stack.append(x)
def pop(self):
return self.stack.pop() # python lists offer a pop() function
def peek(self):
return self.stack[-1]
Let us re-run the tests:
$ python -m unittest
.E.
======================================================================
ERROR: test_pop_empty (test.test_stack.TestStack)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/chunkster/Documents/ORIE/TA/orie-5270-spring-2021/examples/week7/stack_example/test/test_stack.py", line 20, in test_pop_empty
self.stack.pop()
File "/home/chunkster/Documents/ORIE/TA/orie-5270-spring-2021/examples/week7/stack_example/stack/stack.py", line 13, in pop
return self.stack.pop()
IndexError: pop from empty list
----------------------------------------------------------------------
Ran 3 tests in 0.000s
FAILED (errors=1)
Two of our tests passed, but the test involving an empty stack failed. The reason it failed was that
we expected pop
on an empty list to raise a ValueError
, though it returned an IndexError
instead.
Therefore, we need to fix that:
class Stack(object):
...
def pop(self):
try:
self.stack.pop()
except IndexError:
raise ValueError("Popping from an empty stack!")
All our tests will pass now.
Test-driven development can influence design choices
Our test class does not use seek
and clear
at all. Implementing a test for
clear
is straightforward, since the intended behavior is simple to identify.
However, the issue of peek
is more subtle. Notice that peek
in its current
form assumes that we always call it on a nonempty stack. What should peek()
do if the stack is empty?
This is one of the many instances where test can dictate the expected behavior.
For example, suppose we added the following test in test_stack.py
:
class TestStack(unittest.TestCase):
...
def test_peek_empty(self):
with self.assertRaises(ValueError):
self.stack.peek()
The above test would only succeed if peeking at an empty stack raises a ValueError
(consistent with the test_pop_empty()
test case). Therefore, it instructs
us to modify the code for peek
as follows:
class Stack(object):
...
def peek(self):
try:
return self.stack[-1]
except IndexError:
raise ValueError("Peeking into an empty stack!")
Importantly, the intended behavior was completely described by a test case.
Test skipping decorators
The unittest
library defines a collection of decorators that allow you to execute
or skip tests conditionally. These can be useful for:
- distinguishing between "legacy" and "new" behavior when running tests
- differentiating between different platforms (e.g. Windows vs. Linux)
Some examples follow below:
import sys
import unittest
from stack import Stack
class TestStack(unittest.TestCase):
...
@unittest.skipIf(stack.__version__ < (1, 0),
"Not supported in this version!")
def test_stack_printing(self):
# test code
@unittest.skipUnless(sys.platform.startswith("win"),
"Requires Windows")
def test_windows_support(self):
# test code
The skipping decorators are documented here.
Example II: An algorithms module
Let us look at another example. Suppose we have a module called algos
which
contains implementation of standard algorithms. Here is the project structure:
project/
algos/
__init__.py
core.py
graph/
__init__.py
core.py
test/
__init__.py
test_core.py
test_graph.py
Note that the above contains a submodule called graph
, which could contain implementations
of standard graph algorithms. Note that since it is a submodule, it requires its own
__init__.py
file to be treated as such. With this structure, all of the following
imports will work:
import algos
import algos.graph
import algos.graph as graph_algos
from algos import graph
from algos import graph as graph_algos
What to import
Note that while it is possible to do, e.g.,
import algos.core
this is not the recommended way to use a python library. Your import
statements should only use modules and submodules
Under algos/core.py
we have included an implementation of binary search:
def binary_search(arr, x):
lo, hi = 0, len(arr) - 1
while lo <= hi:
mid = (lo + hi) // 2
if arr[mid] > x:
hi = mid - 1
elif arr[mid] < x:
lo = mid + 1
else:
return mid
return hi
This version of binary search works as follows:
- if
x
is part ofarr
, it returns an indexi
such thatarr[i] = x
- if
x
is not part ofarr
, it returns the index of the largest element inarr
that is smaller thanx
.
The algos/__init__.py
file exposes binary search for imports:
$ cat algos/__init__.py
from .core import binary_search
What kinds of tests should we add? We should always try to cover the following:
- "corner" cases (e.g.
arr
is empty or consists of a single element) - "difficult" inputs (e.g., when
x
smaller or larger than all the elements ofarr
) - unsupported types (not always applicable)
Here is some example tests under test/test_core.py
:
import unittest
from algos import binary_search
class TestSearch(unittest.TestCase):
def test_empty(self):
arr = []
self.assertEqual(binary_search(arr, -1), -1)
self.assertEqual(binary_search(arr, 10), -1)
def test_singleton(self):
arr = [5]
self.assertEqual(binary_search(arr, 4), -1)
self.assertEqual(binary_search(arr, 5), 0)
self.assertEqual(binary_search(arr, 6), 0)
def test_nonelement(self):
arr = [-1, 4, 9, 12, 109]
self.assertEqual(binary_search(arr, 2), 0)
self.assertEqual(binary_search(arr, 5), 1)
self.assertEqual(binary_search(arr, 200), len(arr) - 1)
self.assertEqual(binary_search(arr, -5), -1)
def test_element(self):
arr = [-10, -5, 4, 12, 13, 18]
# make sure every element is found at its index
for (idx, elt) in enumerate(arr):
with self.subTest(idx=idx):
self.assertEqual(binary_search(arr, elt), idx)
Exercise
Our version of binary search does not always return intuitive results when
the element x
is not part of the array (see e.g. what happens under
test_singleton
). Try the following:
- Modify the test cases so that looking for an element
x
that is not part of the array returnsNone
- Modify
binary_search
so that the new tests pass.
Measuring code coverage
Another important feature in test-driven development is that of code coverage. Code coverage is informally defined below.
Coverage: percentage of code blocks with corresponding unit tests.
Of course, "code blocks" is ambiguous in the above statement. This is on purpose: there are different ways to measure coverage, such as:
- condition coverage: do our tests cover all possible evaluations of conditional statements?
- function coverage: does every function / method have a test?
- edge coverage: do our tests cover every edge in the control flow graph?
Computing code coverage is very tedious to do manually. Instead, we can use a standard Python
tool, called [coverage](https://coverage.readthedocs.io/)
to measure it.
Here is a minimal working example of measuring code coverage:
-
Install library with
pip
:$ pip install --user coverage
-
Invoke
unittest
using thecoverage
module:$ coverage run --source=algos -m unittest
-
View the code coverage report:
$ coverage report -m Name Stmts Miss Cover Missing ------------------------------------------------------- algos/__init__.py 1 0 100% algos/core.py 10 0 100% algos/graph/__init__.py 1 0 100% algos/graph/core.py 4 2 50% 2, 5 ------------------------------------------------------- TOTAL 16 2 88%
Note that our coverage report indicates that our tests do not cover
the functions in algos/graph/core.py
, and also indicates which statements are
not covered in the tests.
Coverage best practices: here are some practical tips related to utilizing code coverage.
- Aim for high code coverage: many developers recommend 80%.
- If you are a maintainer, introduce a policy about tests, e.g.
- only merge pull requests that include tests if they introduce new functionality
- do not merge to
master
if the change would drop code coverage below a certain threshold - Be ready to redesign your tests if code coverage is consistently low.
- Consider automating code coverage reports in your open source projects.
The doctest
module: embedding tests in documentation
In addition to unittest
and related modules, there is a way to embed unit
tests in your code's documentation. This is made possible via the doctest
module, which is also included in your python installation by default.
There are certain advantages to using doctest
. For example, tests are part
of the documentation, which means users can read them off the help page. In addition,
running the tests themselves is easier and does not rely on your tests following
any naming conventions or following a particular project structure. It is ideal
if your code is targeted to a specialized audience or just shared back and forth
between you and a few collaborators.
On the other hand, unittest
is better for large, public-facing projects as it
separates writing code from writing tests. It is also better at testing certain things,
such as exceptions. The reason for that is that doctest
describes tests via the
read-eval loop of Python and the expected output is given as a string rather than
tested via an assert
-type function.
Here is an example: suppose we are writing a binomial
function that computes the
binomial coefficient \(\binom{n}{k}\):
import math
def binomial(n, k):
"""Implements the `n choose k` operation.
Note the return values are always integers. If `k > n`, returns
zero.
>>> binomial(5, 2)
10
>>> binomial(4, 2)
6
>>> binomial(2, 4)
0
"""
if n < k:
return 0
list_top = [i for i in range(k+1, n+1)]
list_bot = [i for i in range(1, n - k + 1)]
return math.prod(list_top) // math.prod(list_bot)
Note how the tests are embedded in the documentation: a line >>>
, which looks like
the Python interpreter prompt, indicates a statement to run. The next line contains
the expected output.
Running the documentation tests is very simple. One option is to invoke doctest
from the command line:
$ python -m doctest binom.py
If you see no output, it means the tests all passed! You can get detailed output
using the -v
or --verbose
flag:
$ python -m doctest -v binom.py
doctest -v binom.py
Trying:
binomial(5, 2)
Expecting:
10
ok
Trying:
binomial(4, 2)
Expecting:
6
ok
Trying:
binomial(2, 4)
Expecting:
0
ok
1 items had no tests:
binom
1 items passed all tests:
3 tests in binom.binomial
3 tests in 2 items.
3 passed and 0 failed.
Test passed.
Another option that you can use if binom.py
is intended to be imported but
not used as a script itself is to add the following code in binom.py
:
if __name__ == "__main__":
import doctest
doctest.testmod()
Running python binom.py
will now run all the documentation tests in that
file. Again, we can use the -v
flag to get verbose output:
$ python binom.py -v