Weeks 7 - 8: Test-driven development; Python documentation

Test-driven development

Next in our agenda is the issue of test-driven development (TDD). In test-driven development, we use software tests to ensure our program is bug-free, but also to provide a roadmap or specification for how it should behave. Here, we are mainly interested in unit tests, which are defined below, even though there are several other testing models.

Software (unit) test: A program that compares the output of a piece of code to its expected output.

Test-driven development builds on the following principle:

Testable code is good code.

Some examples of tests follow:

a test than ensures adding elements maintains the heap invariant
a test that ensures that pop() will return the last element added to a stack
a test that calls a custom function with an unsupported type and ensures that it will throw a TypeError

In test-driven development, tests are written before the source code! The TDD workflow is as follows:

Plan a new piece of functionality (e.g., the push(x) operation on a stack). This step involves writing tests that help describe the desired functionality, before writing any code for the function itself.
Implement the new functionality. This step ends when all test cases from Step 1 pass, indicating the new component works as expected.
Refactor your code. Coding style, code efficiency, etc. can be taken care of in this step.

The above is not always strictly followed. In particular, steps 1-2 can be iterated, and we can always write more tests for existing software components or refactor our code in the future.

Python project structure

In modern software development, it is standard practice for every open-source project to "ship" with its own test suite, i.e., a collection of software tests that an end user can run after installing the module to ensure it runs as intended on their machine / environment.

This introduces a great opportunity to talk about the structure of a typical open-source project. Many open-source projects follow the structure shown below:

project/
    src/
        <source code goes here>
    test/
        <software tests go here>
    doc/
        src/
            <documentation source, e.g., Markdown files>
        build/
            <build location for the documentation
    examples/
        <examples of using the code>
    README
    INSTALL
    LICENSE
    VERSION

Some remarks are in order:

the file called README provides a synopsis of the project and may include self-contained examples, guidelines for reporting bugs, installing, etc.
the file called INSTALL includes installation instructions, but can be omitted if these are particularly simple and can be included in README
LICENSE contains the software license under which the software is offered. A list of open source licenses is available here. You can also read what happens if you omit a LICENSE file.
including examples/ of usage is optional. If the examples are simple enough, they can go in the README file. However, it is always a good idea to add examples that demonstrate use of your library / module in nontrivial settings here.

Python projects obey a similar structure. However, some things change slightly. For example, if your module is called my_module, you would be using that name instead of src/.

project/
    my_module/
        __init__.py
        # other .py files here
        my_submodule/
            __init__.py
            # submodule .py files here
        another_submodule/
            __init__.py
            # submodule .py files here
    test/
        # test code here
    ...

Here, __init__.py is a special kind of file; it informs Python that the folder it is located in is actually part of a Python module. It allows you to, e.g., write

>>> import my_module
>>> import my_module.my_submodule
>>> import my_module.another_submodule

If you had omitted the __init__.py file in the another_submodule/ folder, you would not be able to import it as in the code snippet above.

Note

The __init__.py file is responsible for a number of other things, like exposing functions and classes to the end user, etc. It is implicitly executed when you import the module it lives inside. You can read more here, in addition to the examples we will be presenting here.

Example I: writing tests for a stack

Suppose we want to implement our own stack in Python, backed by a Python list. The first stage in test-driven development is to identify what functionality we want to implement - e.g., what operations should our stack support. Here is how the project would look like:

stackProject/
    stack/
        __init__.py
        stack.py
    test/
        __init__.py
        test_stack.py
    # INSTALL, LICENSE etc.

We expected to define a class called Stack in stack/stack.py. We would also like to expose this class to a user that imports the stack module. This means that our stack/__init__.py file should look like this:

from .stack import Stack

Initially, our stack/stack.py file only contains a skeleton of the class, since in test-driven development we write our test suite before writing the actual implementation. Here is how stack/stack.py will look like:

class Stack(object):

    def __init__(self):
        self.stack = []

    def __len__(self):
        return len(self.stack)

    def push(self, x):
        pass

    def pop(self):
        pass

    def peek(self):
        pass

    def clear(self):
        pass

To write our unit tests, we will use the help of a library. There are several libraries for testing in Python, e.g.:

unittest: a built-in module for writing unit tests.
nose: extends unittest with extra functionality.
hypothesis: a module property-based testing.

Here, we will use unittest. With unittest, we create a class that subclasses the unittest.TestCase class that includes our tests. We typically create a new such class for each piece of code we are testing (e.g., one test class per submodule, or even one test class per class in the source code).

To write test cases, we need to figure out what we want our Stack class to implement. Here is a list of operations we would like it to support:

__init__(self): initialize the stack with an empty list
push(self, x): push an element onto the stack
pop(self): remove the element on top of the stack and return it
peek(self): return the element on top of the stack without removing it
__len__(self): return the number of elements in the stack
clear(self): empty the stack

Here is what (an incomplete version of) test_stack.py could look like:

import unittest

from stack import Stack

# convention: test classes should be named Test<X>
class TestStack(unittest.TestCase):

    # setUp(): a special method to run before each test method
    def setUp(self):
        """Create empty stack before each test"""
        self.stack = Stack()

    # convention: test methods should be named test_<name>
    def test_push(self):
        self.assertEqual(0, len(self.stack))
        self.stack.push(0)
        self.assertEqual(1, len(self.stack))
        self.stack.push(100)
        self.assertEqual(2, len(self.stack))
        self.stack.pop()
        self.assertEqual(1, len(self.stack))

    def test_pop_empty(self):
        # should raise a ValueError when popping empty stack
        with self.assertRaises(ValueError):
            self.stack.pop()

    def test_pop(self):
        self.stack.push(5)
        self.stack.push(10)
        self.assertEqual(10, self.stack.pop())

Naming conventions

The naming conventions used above are important. In particular, if we want unittest to discover our tests automatically, we should follow these conventions:

tests should live inside a test/ or tests/ folder
all test classes should be named like Test<Name>
all test methods should be named like test_<name>

In addition, the test/ subdirectory should contain an __init__.py file as well.

Assertions

In most tests, we are asserting that a condition is true. This is done using the unittest.assert<X> family of functions. For example, we write

self.assertEqual(1, len(self.stack))

to make sure that the length of the stack is 1. It is pointless to memorize all these functions, since there are so many of them and their usage is similar. You can check the official doc for the most commonly used ones.

Now, to run our tests, we should navigate under the project/ folder and run:

python -m unittest

At this point, all our tests will fail. The output will look something like below:

FFF
======================================================================
FAIL: test_pop (test.test_stack.TestStack)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Documents/stack_example/test/test_stack.py", line 25, in test_pop
    self.assertEqual(10, self.stack.pop())
AssertionError: 10 != None

======================================================================
FAIL: test_pop_empty (test.test_stack.TestStack)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Documents/stack_example/test/test_stack.py", line 20, in test_pop_empty
    self.stack.pop()
AssertionError: ValueError not raised

======================================================================
FAIL: test_push (test.test_stack.TestStack)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Documents/stack_example/test/test_stack.py", line 15, in test_push
    self.assertEqual(1, len(self.stack))
AssertionError: 1 != 0

----------------------------------------------------------------------
Ran 3 tests in 0.000s

FAILED (failures=3)

The top line is a string FFF, which indicates that all 3 tests failed. The next few lines give detailed information for each test.

Let us try to fix the code in Stack. In stack/stack.py, we modify the code for the 3 methods that currently do nothing as follows:

class Stack(object):
    ...
    def push(self, x):
        self.stack.append(x)

    def pop(self):
        return self.stack.pop()         # python lists offer a pop() function

    def peek(self):
        return self.stack[-1]

Let us re-run the tests:

$ python -m unittest
.E.
======================================================================
ERROR: test_pop_empty (test.test_stack.TestStack)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/chunkster/Documents/ORIE/TA/orie-5270-spring-2021/examples/week7/stack_example/test/test_stack.py", line 20, in test_pop_empty
    self.stack.pop()
  File "/home/chunkster/Documents/ORIE/TA/orie-5270-spring-2021/examples/week7/stack_example/stack/stack.py", line 13, in pop
    return self.stack.pop()
IndexError: pop from empty list

----------------------------------------------------------------------
Ran 3 tests in 0.000s

FAILED (errors=1)

Two of our tests passed, but the test involving an empty stack failed. The reason it failed was that we expected pop on an empty list to raise a ValueError, though it returned an IndexError instead. Therefore, we need to fix that:

class Stack(object):
    ...
    def pop(self):
        try:
            self.stack.pop()
        except IndexError:
            raise ValueError("Popping from an empty stack!")

All our tests will pass now.

Test-driven development can influence design choices

Our test class does not use seek and clear at all. Implementing a test for clear is straightforward, since the intended behavior is simple to identify.

However, the issue of peek is more subtle. Notice that peek in its current form assumes that we always call it on a nonempty stack. What should peek() do if the stack is empty?

This is one of the many instances where test can dictate the expected behavior. For example, suppose we added the following test in test_stack.py:

class TestStack(unittest.TestCase):
    ...
    def test_peek_empty(self):
        with self.assertRaises(ValueError):
            self.stack.peek()

The above test would only succeed if peeking at an empty stack raises a ValueError (consistent with the test_pop_empty() test case). Therefore, it instructs us to modify the code for peek as follows:

class Stack(object):
    ...
    def peek(self):
        try:
            return self.stack[-1]
        except IndexError:
            raise ValueError("Peeking into an empty stack!")

Importantly, the intended behavior was completely described by a test case.

Test skipping decorators

The unittest library defines a collection of decorators that allow you to execute or skip tests conditionally. These can be useful for:

distinguishing between "legacy" and "new" behavior when running tests
differentiating between different platforms (e.g. Windows vs. Linux)

Some examples follow below:

import sys
import unittest
from stack import Stack
class TestStack(unittest.TestCase):
    ...
    @unittest.skipIf(stack.__version__ < (1, 0),
                     "Not supported in this version!")
    def test_stack_printing(self):
        # test code

    @unittest.skipUnless(sys.platform.startswith("win"),
                         "Requires Windows")
    def test_windows_support(self):
        # test code

The skipping decorators are documented here.

Example II: An algorithms module

Let us look at another example. Suppose we have a module called algos which contains implementation of standard algorithms. Here is the project structure:

project/
    algos/
        __init__.py
        core.py
        graph/
            __init__.py
            core.py
    test/
        __init__.py
        test_core.py
        test_graph.py

Note that the above contains a submodule called graph, which could contain implementations of standard graph algorithms. Note that since it is a submodule, it requires its own __init__.py file to be treated as such. With this structure, all of the following imports will work:

import algos
import algos.graph
import algos.graph as graph_algos
from algos import graph
from algos import graph as graph_algos

What to import

Note that while it is possible to do, e.g.,

import algos.core

this is not the recommended way to use a python library. Your import statements should only use modules and submodules

Under algos/core.py we have included an implementation of binary search:

def binary_search(arr, x):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] > x:
            hi = mid - 1
        elif arr[mid] < x:
            lo = mid + 1
        else:
            return mid
    return hi

This version of binary search works as follows:

if x is part of arr, it returns an index i such that arr[i] = x
if x is not part of arr, it returns the index of the largest element in arr that is smaller than x.

The algos/__init__.py file exposes binary search for imports:

$ cat algos/__init__.py
from .core import binary_search

What kinds of tests should we add? We should always try to cover the following:

"corner" cases (e.g. arr is empty or consists of a single element)
"difficult" inputs (e.g., when x smaller or larger than all the elements of arr)
unsupported types (not always applicable)

Here is some example tests under test/test_core.py:

import unittest

from algos import binary_search

class TestSearch(unittest.TestCase):
    def test_empty(self):
        arr = []
        self.assertEqual(binary_search(arr, -1), -1)
        self.assertEqual(binary_search(arr, 10), -1)

    def test_singleton(self):
        arr = [5]
        self.assertEqual(binary_search(arr, 4), -1)
        self.assertEqual(binary_search(arr, 5), 0)
        self.assertEqual(binary_search(arr, 6), 0)

    def test_nonelement(self):
        arr = [-1, 4, 9, 12, 109]
        self.assertEqual(binary_search(arr, 2), 0)
        self.assertEqual(binary_search(arr, 5), 1)
        self.assertEqual(binary_search(arr, 200), len(arr) - 1)
        self.assertEqual(binary_search(arr, -5), -1)

    def test_element(self):
        arr = [-10, -5, 4, 12, 13, 18]
        # make sure every element is found at its index
        for (idx, elt) in enumerate(arr):
            with self.subTest(idx=idx):
                self.assertEqual(binary_search(arr, elt), idx)

Exercise

Our version of binary search does not always return intuitive results when the element x is not part of the array (see e.g. what happens under test_singleton). Try the following:

Modify the test cases so that looking for an element x that is not part of the array returns None
Modify binary_search so that the new tests pass.

Measuring code coverage

Another important feature in test-driven development is that of code coverage. Code coverage is informally defined below.

Coverage: percentage of code blocks with corresponding unit tests.

Of course, "code blocks" is ambiguous in the above statement. This is on purpose: there are different ways to measure coverage, such as:

condition coverage: do our tests cover all possible evaluations of conditional statements?
function coverage: does every function / method have a test?
edge coverage: do our tests cover every edge in the control flow graph?

Computing code coverage is very tedious to do manually. Instead, we can use a standard Python tool, called [coverage](https://coverage.readthedocs.io/) to measure it.

Here is a minimal working example of measuring code coverage:

Install library with pip:
```
$ pip install --user coverage
```

Invoke unittest using the coverage module:

$ coverage run --source=algos -m unittest

View the code coverage report:

$ coverage report -m
Name                      Stmts   Miss  Cover   Missing
-------------------------------------------------------
algos/__init__.py             1      0   100%
algos/core.py                10      0   100%
algos/graph/__init__.py       1      0   100%
algos/graph/core.py           4      2    50%   2, 5
-------------------------------------------------------
TOTAL                        16      2    88%

Note that our coverage report indicates that our tests do not cover the functions in algos/graph/core.py, and also indicates which statements are not covered in the tests.

Coverage best practices: here are some practical tips related to utilizing code coverage.

Aim for high code coverage: many developers recommend 80%.
If you are a maintainer, introduce a policy about tests, e.g.
only merge pull requests that include tests if they introduce new functionality
do not merge to master if the change would drop code coverage below a certain threshold
Be ready to redesign your tests if code coverage is consistently low.
Consider automating code coverage reports in your open source projects.

The `doctest` module: embedding tests in documentation

In addition to unittest and related modules, there is a way to embed unit tests in your code's documentation. This is made possible via the doctest module, which is also included in your python installation by default.

There are certain advantages to using doctest. For example, tests are part of the documentation, which means users can read them off the help page. In addition, running the tests themselves is easier and does not rely on your tests following any naming conventions or following a particular project structure. It is ideal if your code is targeted to a specialized audience or just shared back and forth between you and a few collaborators.

On the other hand, unittest is better for large, public-facing projects as it separates writing code from writing tests. It is also better at testing certain things, such as exceptions. The reason for that is that doctest describes tests via the read-eval loop of Python and the expected output is given as a string rather than tested via an assert-type function.

Here is an example: suppose we are writing a binomial function that computes the binomial coefficient $\binom{n}{k}$:

import math

def binomial(n, k):
    """Implements the `n choose k` operation.

    Note the return values are always integers. If `k > n`, returns
    zero.

    >>> binomial(5, 2)
    10
    >>> binomial(4, 2)
    6
    >>> binomial(2, 4)
    0
    """
    if n < k:
        return 0
    list_top = [i for i in range(k+1, n+1)]
    list_bot = [i for i in range(1, n - k + 1)]
    return math.prod(list_top) // math.prod(list_bot)

Note how the tests are embedded in the documentation: a line >>>, which looks like the Python interpreter prompt, indicates a statement to run. The next line contains the expected output.

Running the documentation tests is very simple. One option is to invoke doctest from the command line:

$ python -m doctest binom.py

If you see no output, it means the tests all passed! You can get detailed output using the -v or --verbose flag:

$ python -m doctest -v binom.py
doctest -v binom.py
Trying:
    binomial(5, 2)
Expecting:
    10
ok
Trying:
    binomial(4, 2)
Expecting:
    6
ok
Trying:
    binomial(2, 4)
Expecting:
    0
ok
1 items had no tests:
    binom
1 items passed all tests:
   3 tests in binom.binomial
3 tests in 2 items.
3 passed and 0 failed.
Test passed.

Another option that you can use if binom.py is intended to be imported but not used as a script itself is to add the following code in binom.py:

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Running python binom.py will now run all the documentation tests in that file. Again, we can use the -v flag to get verbose output:

$ python binom.py -v