{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "offensive-enterprise",
   "metadata": {},
   "source": [
    "# Working with sparse matrices in `scipy`\n",
    "\n",
    "In this notebook, we demonstrate some examples utilizing sparse matrices in `scipy`. Make sure you have `numpy` and `scipy` installed:\n",
    "\n",
    "```bash\n",
    "pip install --user numpy scipy   # or: pip3 install --user numpy scipy\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "average-clinton",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<4x3 sparse matrix of type '<class 'numpy.int64'>'\n",
       "\twith 6 stored elements in Compressed Sparse Row format>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import scipy.sparse as sps\n",
    "\n",
    "# Method 1: create a scipy array directly from numpy / list\n",
    "A = [\n",
    "    [1, 0, 0],\n",
    "    [4, 0, 1],\n",
    "    [0, 0, 2],\n",
    "    [5, 4, 0]\n",
    "]\n",
    "A_sp = sps.csr_matrix(A)    # use csr/csc directly if array is already built\n",
    "A_sp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "seven-cliff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([1, 4, 1, 2, 5, 4]),\n",
       " array([0, 0, 2, 2, 0, 1], dtype=int32),\n",
       " array([0, 1, 3, 4, 6], dtype=int32))"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we can access the fields of A_sp directly\n",
    "A_sp.data, A_sp.indices, A_sp.indptr   # same interface for csc format"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "annoying-wings",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Keys: dict_keys([(0, 0), (0, 2), (1, 1), (2, 0), (2, 2), (3, 1)]) Values: dict_values([1, 1, 1, 1, 1, 1])\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<4x3 sparse matrix of type '<class 'numpy.int64'>'\n",
       "\twith 6 stored elements in Compressed Sparse Column format>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Method 2: creating an array incrementally\n",
    "A_sp = sps.dok_matrix((4, 3), dtype=np.int)\n",
    "for i in range(4):\n",
    "    for j in range(3):\n",
    "        if (i + j) % 2 == 0:\n",
    "            A_sp[i, j] = 1\n",
    "# dok matrices have similar interface to dict\n",
    "print(\"Keys:\", A_sp.keys(), \"Values:\", A_sp.values())\n",
    "\n",
    "# can convert to other formats\n",
    "A_sp.tocsc()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "addressed-cambodia",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 0, 1],\n",
       "       [0, 1, 0],\n",
       "       [1, 0, 1],\n",
       "       [0, 1, 0]])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# can also convert to numpy (dense) array\n",
    "# do *NOT* use .todense(), use .toarray() instead\n",
    "A_sp.toarray()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "frozen-necklace",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 0, 0],\n",
       "       [4, 0, 1],\n",
       "       [0, 0, 2],\n",
       "       [5, 4, 0]])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Method 3: create CSR / CSC directly\n",
    "\n",
    "# CSR: col_idx, row_ptr\n",
    "values = [1, 4, 1, 2, 5, 4]\n",
    "col_id = [0, 0, 2, 2, 0, 1]\n",
    "row_pt = [0, 1, 3, 4, len(values)]\n",
    "\n",
    "A_csr = sps.csr_matrix((values, col_id, row_pt), shape=(4, 3))\n",
    "A_csr.toarray()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "several-portsmouth",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 0, 0],\n",
       "       [4, 0, 1],\n",
       "       [0, 0, 2],\n",
       "       [5, 4, 0]])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# CSC: row_idx, col_ptr\n",
    "values = [1, 4, 5, 4, 1, 2]\n",
    "row_id = [0, 1, 3, 3, 1, 2]\n",
    "col_pt = [0, 3, 4, len(values)]\n",
    "\n",
    "A_csc = sps.csc_matrix((values, row_id, col_pt), shape=(4, 3))\n",
    "A_csc.toarray()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "golden-custom",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([[1, 0, 0],\n",
       "        [4, 0, 1],\n",
       "        [0, 0, 2],\n",
       "        [5, 4, 0]]),\n",
       " array([[1, 0, 0],\n",
       "        [4, 0, 1],\n",
       "        [0, 0, 2],\n",
       "        [5, 4, 0]]))"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# another way: use (data, (i, j)) for either CSC/CSR format (convenience method)\n",
    "# note: values can appear in any order (not necessarily per row / column)\n",
    "data = [1, 5, 4, 1, 2, 4]\n",
    "rows = [0, 3, 1, 1, 2, 3]\n",
    "cols = [0, 0, 0, 2, 2, 1]\n",
    "B_csr  = sps.csr_matrix((data, (rows, cols)), shape=(4, 3))\n",
    "B_csc  = sps.csc_matrix((data, (rows, cols)), shape=(4, 3))\n",
    "B_csr.toarray(), B_csc.toarray()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "certified-preserve",
   "metadata": {},
   "source": [
    "## Linear algebra in `scipy`\n",
    "\n",
    "Dedicated modules:\n",
    "1. `scipy.sparse.linalg` for linear algebra with sparse matrices\n",
    "1. `scipy.linalg` for numpy-like linear algebra (many common methods, more fine-grained control)\n",
    "\n",
    "**Note**: in `numpy`, `A * B` is elementwise multiplication (with broadcasting). But if `A` or `B` is a scipy sparse array, `*` stands for usual matrix-matrix multiplication. Use `A.multiply(B)` for elementwise multiplication instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "fixed-second",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([[1, 0, 1],\n",
       "        [0, 1, 0],\n",
       "        [1, 0, 1],\n",
       "        [0, 1, 0]]),\n",
       " array([[1, 0, 0],\n",
       "        [4, 0, 1],\n",
       "        [0, 0, 2],\n",
       "        [5, 4, 0]]),\n",
       " array([[2, 0, 1],\n",
       "        [4, 1, 1],\n",
       "        [1, 0, 3],\n",
       "        [5, 5, 0]]))"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A_sp.toarray(), B_csc.toarray(), (A_sp + B_csc).toarray()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "bizarre-impossible",
   "metadata": {},
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "dimension mismatch",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-41-f1afa1f4998b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mA_sp\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mB_csc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/.local/lib/python3.9/site-packages/scipy/sparse/base.py\u001b[0m in \u001b[0;36m__mul__\u001b[0;34m(self, other)\u001b[0m\n\u001b[1;32m    477\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0missparse\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    478\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 479\u001b[0;31m                 \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'dimension mismatch'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    480\u001b[0m             \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_mul_sparse_matrix\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    481\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: dimension mismatch"
     ]
    }
   ],
   "source": [
    "A_sp * B_csc   # this will try matrix multiplication, but shapes don't match"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "featured-favorite",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 0, 0],\n",
       "       [0, 0, 0],\n",
       "       [0, 0, 2],\n",
       "       [0, 4, 0]])"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# the below will work\n",
    "C = A_sp.multiply(B_csc) # or B_csc.multiply(A_sp)\n",
    "C.toarray()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "mathematical-satisfaction",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([0.2838    , 3.0328217 , 3.79524337, 1.68237336]),\n",
       " array([0.2838    , 3.0328217 , 3.79524337, 1.68237336]))"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# matrix-vector multiplication\n",
    "v = np.random.randn(3)\n",
    "A_csr @ v, A_csr * v       # same results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "identical-cream",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(array([[1, 0, 0],\n",
       "        [4, 1, 0],\n",
       "        [0, 2, 0],\n",
       "        [5, 0, 4]]),\n",
       " array([[1, 0, 0],\n",
       "        [4, 1, 0],\n",
       "        [0, 2, 0],\n",
       "        [5, 0, 4]]))"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# matrix-matrix multiplication\n",
    "V = sps.csc_matrix([[1, 0, 0], [0, 0, 1], [0, 1, 0]])\n",
    "(A_csr @ V).toarray(), (A_csr * V).toarray()    # same results with `@` and `*`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "violent-calculation",
   "metadata": {},
   "source": [
    "The `scipy.sparse.linalg` module offers several implementations of iterative algorithms for solving systems of the form $Ax = b$, least-squares problems, and eigenvalue problems. A full list can be found [here](https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "id": "irish-excerpt",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MINRES distance: 8.497616290366594e-06\n"
     ]
    }
   ],
   "source": [
    "import scipy.sparse.linalg as sps_linalg\n",
    "\n",
    "# setup a sparse linear system\n",
    "A = np.random.randint(0, 2, (100, 100))\n",
    "A = sps.csr_matrix(A + A.T, dtype=np.float64)\n",
    "b = np.random.rand(100)\n",
    "\n",
    "# Example 1: solve using MINRES (for symmetric matrices)\n",
    "x_minres, status = sps_linalg.minres(A, b, tol=1e-8)\n",
    "print(\"MINRES distance:\", np.linalg.norm(A @ x_minres - b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "id": "analyzed-folder",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LSQR error: 1.3225258756063518\n"
     ]
    }
   ],
   "source": [
    "# Example 2: solve noisy system using least squares\n",
    "m, n = 250, 50\n",
    "A_ls = np.random.rand(250, 50)\n",
    "A_ls[A_ls < 0.5] = 0.0\n",
    "A_ls = sps.csr_matrix(A_ls)\n",
    "x = np.random.rand(50)\n",
    "b_ls = A_ls @ x + 0.1 * np.random.randn(250)\n",
    "\n",
    "x_ls = sps_linalg.lsqr(A_ls, b_ls)[0]\n",
    "print(\"LSQR error:\", np.linalg.norm(A_ls @ x_ls - b_ls))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "id": "living-audience",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([-13.59685714, -13.02633564, -12.66742863, -11.95097689,\n",
       "       -11.79172183,  12.02290524,  12.35624536,  13.53789273,\n",
       "        14.55214696, 102.60929514])"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Example 3: Compute the eigenvectors and eigenvalues of a sparse matrix\n",
    "# Use which='LM' to get largest magnitude eigenvalues\n",
    "λ, V = sps_linalg.eigsh(A, k=10, which='LM')          # k=6 by default\n",
    "λ"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "smooth-volunteer",
   "metadata": {},
   "outputs": [],
   "source": [
    "# use which='LA' to get algebraically largest eigenvalues\n",
    "λ_a, V_a = sps.linalg.eigsh(A, k=10, which='LA')\n",
    "λ_a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "id": "bridal-hostel",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((100, 10), (100, 10))"
      ]
     },
     "execution_count": 129,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# V and V_a have the 10 eigenvectors as their columns\n",
    "V.shape, V_a.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aquatic-ribbon",
   "metadata": {},
   "source": [
    "### Application: Spectral Clustering\n",
    "\n",
    "In spectral clustering, we want a partition a graph containing nodes and edges that connect each other to a set of distinct \"communities\". For simplicity, we treat the undirected setting.\n",
    "\n",
    "Formally, our graph is represented by an *adjacency matrix* $A \\in \\mathbb{R}^{n \\times n}$, such that\n",
    "\n",
    "$$\n",
    "A_{ij} = \\begin{cases}\n",
    "1, & \\text{ if node $i$ is connected to node $j$}, \\\\\n",
    "0, & \\text{ otherwise}\n",
    "\\end{cases}\n",
    "$$\n",
    "\n",
    "Here we are implicitly assuming that nodes are numbered from $1$ to $n$. Moreover, let $D$ be a diagonal matrix such that\n",
    "$D_{ii} = \\sum_{j=1}^n A_{ij}$, i.e., the number of neighbors of node $i$. We then form the *normalized adjacency matrix*:\n",
    "\n",
    "$$\n",
    "A_N = D^{-1/2} A D^{-1/2}.\n",
    "$$\n",
    "\n",
    "Here, $A_N$ is \"weighting\" each edge $(i, j)$ by the harmonic mean of the number of neighbors of $i$ and $j$. Spectral clustering computes the top-$k$ eigenvectors, $V \\in \\mathbb{R}^{n \\times k}$, and treats each row, $V_{i}$, as the $k$-dimensional \"feature\" of node $i$. **These features are then fed to a point clustering algorithm such as k-means.**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "engaged-romantic",
   "metadata": {},
   "source": [
    "The graph we will build here is called a Stochastic Block Model (SBM) with $k$ communities. It is a random graph where each node is randomly assigned to one of the $k$ communities. Then, for each pair of nodes $i$ and $j$:\n",
    "\n",
    "- if $i$ and $j$ belong to the same community, they are connected with an edge with probability $p$.\n",
    "- if $i$ and $j$ belong to different communities, they are connected with an edge with probability $q \\ll p$.\n",
    "\n",
    "The functions below generate the (sparse) adjacency matrix of an instance of an SBM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "stretch-locking",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import random\n",
    "import scipy.sparse as sps\n",
    "\n",
    "def gen_sbm(n, k, p, q):\n",
    "    \"\"\"\n",
    "    Generate a k-community SBM where every node has the same probability\n",
    "    of belonging to either community.\n",
    "    \n",
    "    Arguments\n",
    "    ---------\n",
    "    n : int\n",
    "        The number of nodes\n",
    "    k : int\n",
    "        The number of communities\n",
    "    p, q : float\n",
    "        The probability of intra-group and inter-group edges, respectively.\n",
    "        \n",
    "    Returns\n",
    "    -------\n",
    "    communities : list\n",
    "        A list containing the community of each node\n",
    "    A : scipy.sparse.spmatrix\n",
    "        The adjacency matrix of the graph\n",
    "    \"\"\"\n",
    "    A = sps.dok_matrix((n, n), dtype=np.int)\n",
    "    communities = [0] * n\n",
    "    # generate community memberships\n",
    "    for i in range(n):\n",
    "        communities[i] = random.randint(0, k-1)\n",
    "    # generate edges\n",
    "    # NOTE: random.random() <= x is event with probability x if 0 < x < 1\n",
    "    for i in range(n):\n",
    "        for j in range(i+1, n):   # no self-loops\n",
    "            if communities[i] == communities[j]:\n",
    "                A[i, j] = random.random() <= p      # automatically cast to 1 if True\n",
    "            else:\n",
    "                A[i, j] = random.random() <= q\n",
    "            A[j, i] = A[i, j]                       # symmetry\n",
    "    # return membership and adjacency matrix\n",
    "    return communities, A.tocsr()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "varying-headline",
   "metadata": {},
   "source": [
    "We now need some utility functions for forming the normalized adjacency matrix and computing its top-$k$ eigenvectors. Since we are dealing with sparse matrices, we will use the linear algebra routines found under the `sparse` module.\n",
    "\n",
    "Recall that for the normalized adjacency matrix we want to form $A_N = D^{-1/2} A D^{-1/2}$, where $D_{ii} = \\sum_{j=1}^n A_{ij}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "id": "ready-darkness",
   "metadata": {},
   "outputs": [],
   "source": [
    "def normalized_adjacency(A):\n",
    "    \"\"\"\n",
    "    Given an adjacency matrix `A`, convert it to a normalized adjacency matrix.\n",
    "    \n",
    "    Arguments\n",
    "    ---------\n",
    "    A : scipy.sparse.spmatrix\n",
    "        The adjacency matrix\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    scipy.sparse.spmatrix\n",
    "        The normalized adjacency matrix\n",
    "    \"\"\"\n",
    "    n = A.shape[0]\n",
    "    d = A * np.ones(n, dtype=np.int)     # sum_{j} A_{i,j} for each i\n",
    "    Dinv = sps.diags(1 / np.sqrt(d))     # create D^(-1/2)\n",
    "    return Dinv * A * Dinv              # normalized adjacency matrix\n",
    "\n",
    "\n",
    "def top_k_evecs(A, k):\n",
    "    \"\"\"\n",
    "    Compute the top-k (largest in magnitude) eigenvectors of a symmetric matrix `A`.\n",
    "    \"\"\"\n",
    "    _, evecs = sps_linalg.eigsh(A, k)\n",
    "    return evecs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "healthy-wales",
   "metadata": {},
   "source": [
    "We now create an SBM with $k = 2$ communities and $n = 100$ nodes and use the k-means algorithm to classify each node to a community using its coefficients in the top-$k$ eigenvectors as node features. The k-means algorithm is also available in `scipy` in the `cluster.vq` module.\n",
    "\n",
    "**Note**: below we use `kmeans2`, which returns the list of assigned labels, instead of `kmeans`, which only returns the centroids found by the k-means algorithm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "id": "appropriate-technology",
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.cluster.vq import kmeans2\n",
    "\n",
    "communities, A = gen_sbm(100, 2, 0.25, 0.05)\n",
    "V = top_k_evecs(normalized_adjacency(A), 2)\n",
    "_, labels = kmeans2(V, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ready-recipe",
   "metadata": {},
   "source": [
    "At this point, note that the labels we generated could be permuted arbitrarily (i.e. we could make all `0`s into `1`s and vice-versa) without changing the community structure. Therefore, when computing the accuracy of our cluster assignment, we should consider both versions. This is reflected in the evaluation function below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "id": "incorrect-segment",
   "metadata": {},
   "outputs": [],
   "source": [
    "def cluster_acc(true_labels, pred_labels):\n",
    "    return max(np.mean(true_labels == pred_labels), np.mean(true_labels == 1 - pred_labels))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "id": "animal-cedar",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 133,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cluster_acc(communities, labels)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "loved-helmet",
   "metadata": {},
   "source": [
    "### References\n",
    "\n",
    "[Scipy Lecture Notes]("
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}