Minimization for Visualization
In today’s lecture, we’ll study the Kamada-Kawai algorithm for network visualization. We can think of the primary problem in network visualization as assigning to each node \(i\) a coordinate \(\mathbf{x}_i\) in \(d\)-dimensional space (for drawing a network on paper or screen, \(d = 2\)). We can collect all the node coordinates into a matrix \(\mathbf{X} \in \mathbb{R}^{n\times d}\), where the \(i\)th row is \(\mathbf{x}_i\). So, our challenge is to find a “good” matrix \(\mathbf{X}\) that leads to a nice drawing.
The way we do this algorithmically in the Kamada-Kawai algorithm is by minimizing a function of the form
\[ \begin{aligned} C(\mathbf{X}) = \sum_{i \in N} \sum_{j \neq i} h_{ij}\left(\lVert \mathbf{x}_i - \mathbf{x}_j \rVert - \ell_{ij}\right)^2 \end{aligned} \] for constants \(h_{ij}\) and \(\ell_{ij}\) that depend on the network structure. The goal is to find the matrix \(\mathbf{X}\) that minimizes \(C(\mathbf{X})\). In practice, we can’t usually find a global minimizer of \(C\) and so we instead settle for heuristics that will get us to a local minimum.
In class, we’ll use a built-in optimization function that will progressively minimize \(C\) by updating the coordinates \(\mathbf{x}_i\) of each node \(i\); cycling through nodes until we converge at a solution. This function relies on gradient descent, and in this problem you will compute the gradient of \(C\) with respect to node coordinates. .
The gradient of \(C\) with respect to the \(i\)th node coordinate \(\mathbf{x}_i\), which we’ll call \(\nabla_i C(\mathbf{X})\), is the vector of partial derivatives of \(C\) with respect to each component of \(\mathbf{x}_i\). That is,
\[ \begin{aligned} \nabla_i C(\mathbf{X}) = \left[\frac{\partial C}{\partial x_{i1}}, \frac{\partial C}{\partial x_{i2}}, \ldots, \frac{\partial C}{\partial x_{id}}\right] \end{aligned} \]
If we know the gradient, then we can perform an epoch of our minimization algorithm like this:
For each node \(i\):
- While not converged:
- Compute the gradient \(\nabla_i C(\mathbf{X})\)
- Update the node coordinate \(\mathbf{x}_i\) by moving in the direction of the negative gradient: \(\mathbf{x}_i \leftarrow \mathbf{x}_i - \alpha \nabla_i C(\mathbf{X})\) for some small step size \(\alpha\).
We then repeat this procedure for some number of epochs until the improvement in the overall cost function \(C\) becomes sufficiently small.
What You Should Do
Ok, that’s a lot of motivating preamble! Something we need to be able to do if we wanted to perform this algorithm is to compute the gradient \(\nabla_u C(\mathbf{X})\) with respect to the position of a single node \(u\). To keep things manageable, we’re going to assume that \(d = 1\), so each node position is actualyl a scalar. The matrix \(\mathbf{X}\) becomes a vector \(\mathbf{x}\) and we have
\[ \begin{aligned} C(\mathbf{x}) = \sum_{i \in N} \sum_{j \neq i} h_{ij}\left(\lvert x_i - x_j \rvert - \ell_{ij}\right)^2 \end{aligned} \]
Please differentiable the function \(C\) with respect to the position \(x_u\) of a single node \(u\), holding the positions of the other nodes constant. You may assume:
- \(x_i \neq x_j\) for all \(i \neq j\), which ensures that the function is indeed differentiable.
- The derivative of the function \(\lvert z \rvert\) when \(z \neq 0\) is \(\mathrm{sgn}(z)\), the sign of \(z\).
- It holds that \(h_{ij} = h_{ji}\) and \(\ell_{ij} = \ell_{ji}\) for all \(i\) and \(j\).