This week I will be looking at image segmentation and how it can be used to divide up an image into layers such that each segment can be properly upscaled individually. This can be useful as it can allow for better edge retention after upscaling.
Table of Contents
- Final Discussion/Conclusion
According to the paper, J. Shi and J. Malik proposed and analyzed the Normalized Cuts and Image Segmentation problem and trying to generate a general solution to this particular type of problems. This problem was brought up by Wertheimer around 85 years ago based on Graphic Theory, and is concerned with partitioning an image into multiple regions according to some homogeneity criterion.
As for this algorithm, it breaks down the targeted graph into a set of nodes, and each smallest unit of the graph, pixel, is regarded as the node of the graph. In addition, it regards the segmentation as graph partitioning problem. The algorithm measures both the total dissimilarity between the different groups as well as the total similarity within the groups. Thus, if we treat the problem in this manner, it will be transformed into a problem which requires solving the generalized eigenvalue.
Jianbo Shi and Jitendra Malik aim to extract global impressions of an image such as coherence of brightness, color and texture. They approached the problem as a graph segmentation problem such that each segmented part will have one “global impression” attached to it, making all neighboring segmented parts different (bright vs dark segment).
In order to do this they had to consider existing algorithms used in graph theory to segment data based on minimizing certain parameters. Such cuts are known as minimum cuts. These are cuts defined and computed based upon the minimum sum of weights of edges that have been removed.
However as observed by Leahy and Wu, this method of partitioning based upon weightings on the edge cuts favors segmenting the graph into small sets of isolated nodes. Contrary to this, Shi and Malik want to segment based on higher level traits which are “in general” larger segments. Thus they had to reformulate and define the cut equation to factor in these things.
The general idea behind the new formulation is that the decision to cut will be weighted as a cost based upon a node’s connections to other nodes. This weighting will ensure that small section cuts will have a very high “normalized cut value” (recalling that small cut values are more desired) because we are cutting off a high percentage of all the nodes that were once connected to it. Below is the definition of Ncut.
Defining the normalized association as follows will show
Combining the two above equations we obtain.
This shows that minimizing disassociation (minimizing Ncut) and maximizing association (maximizing Nassoc) are related and thus can be interchangeably used to to define cuts.
Although not discussed in this introduction the derivation and expansion of these algorithms will show an efficient way of segmenting images to produce separate images with individual properties. Such as segmenting the individual sides of this cube.
However, after the overall analysis, if we look back to this algorithm, the major issue would be the computation consumption. In particular, the calculation of eigenvectors and weight matrix required a lot of time (using the traditional method O(n^3)), therefore, it makes this algorithm impractical when it comes to large images, and define the least possible node into a single pixel. Even though, efficiency can be dramatically improved by applying the normalized cut theorem (O(mn)) by solving it in a matrix-free fashion. However, as long as the resolution of the image goes higher, the computation is always ill-conditioned, which makes the complexity much higher.
When it comes to general applications, it would be easy for the program to overflow the buffer register provided by the low-end chipsets. As it requires a lot of construction sets, it would be impossible for them to perform in a real-time manner.
Nowadays, as the computation power goes higher, it enables the possibility for us to do it in a real time manner in higher level language and with higher level chipsets.
Knowing the concept of the normalized cut we can now understand the algorithm. The algorithm is generalized as follows.
- Given an image generate a graph G=(V,E) with vertices (V) and edges (E) and calculate the weights on each edge (w).
- Solve the equation (discussed below) for the second smallest eigenvalue and eigenvector
- Use the second smallest eigenvalue and eigenvector to calculate the minimum normalized cut and partition the graph into two new graphs
- Determine whether the two new graphs can be partitioned again given a minimum Ncut condition (which we set)
- If step four is valid we will recursively call the algorithm to partition again.
In order to calculate all the normalized cuts necessary we will need to solve the following equation.
In this equation there are several variables to define.
- : This is defined as an N=|V| dimensional indicator to mark whether a point is in segment A (1) or segment B (-1)
- : This is the final calculated N cut for the input of x.
- : Where w represents the weight of a node at point (i,j)
- : N by N diagonal matrix with the weights d(i) on the diagonal
- : N by N symmetric matrix with W(i,j)=w_ij
- : Transpose of y
Solving the above for real values of y means we need to solve the following eigenvalue system. By solving this we obtain two important things, the eigenvalues, and eigenvectors of the system. Because of the Rayleigh quotient we know that given a symmetric matrix the minimum value of an eigenvalue matrix is the second eigenvalue.
Going through the computation and linear algebra we obtain:
The second smallest eigenvalue. Where z is defined as.
The implementation of the code in Matlab will have to be split into several parts. The flowchart of the program has been shown below. The full implementation of the code will be attached to this paper.
In the calculation of the weights the Gaussian function is used. Shi and Malik found that this function (given selected scaling factors) is a good representation of the weight on each edge. The reason for choosing this is that the function is really good for representing singularities such as edges and sharp changes in contrast. In this implementation the weight represents the difference in brightness between two pixels. The closer the difference between the brightness, the more likely these two pixels are part of the same segment. The larger the difference between the brightness of two pixels, the less likely they are part of the same segment This results in the following equation for calculating the weights.
Here the variables in the equation are defined as follows
- F is defined as a feature vector representing brightness (can be intensity, color, or texture information)
- and is a scaling vector that Shi and Malik found should be set to 10-20% of the range of max(d)-min(d)
- represents the euclidean distance between F(i) and F(j)
- X represents the spatial location of the node
The implementation in Matlab is incredibly inefficient in its current state. After investigating Shi and Malik’s C implementation of the algorithm, we believe this is due to incredibly memory intensive computation in Ncuts.m. There should be a way to circumvent the 8-9 gigabyte ram requirement by setting a second condition when checking whether we have reached the end of our recursion. Namely, we need to set a minimum area. This is more efficient than performing an Ncut computation and thus will be much faster to implement. However, we are not sure what area size to pick and as such the an exact solution escapes us. Below is an example of the algorithm with a run time of ~2minutes.
This algorithm developed by Shi and Malik is simple and ingenious. It takes a traditional concept of the graph cuts and extends it so that instead of mathematically optimizing cuts, we take normalized cuts which depending on the feature vector used can be adapted to calculate cuts based on perceptual grouping through their idea of normalized cuts. By solving a generalized eigenvalue system for a real valued solution, previously deemed impossible.
In our implementation of the algorithm though inefficient, follows logically from the description and original C implementation provided by Shi and Malik themselves. This can be easily seen as Shi and Malik’s algorithm has been proved to be O(mn), contrasted to ours of unknown but (definitely) slower implementation. This can definitely be improved by improving the algorithm and change of hardware as Shi and Malik’s algorithm uses a lot of matrix operations which can easily be parallelized.
In conclusion, we believe after thoroughly reading the paper and implementing a rudimentary version of the algorithm we have gained a strong understanding of image segmentation and how it can be used to segment images based on parameters relating to human perception.
Shi, Jianbo, and Jitendra Malik. “Normalized Cuts and Image Segmentation.” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 22.8 (2000): 888-905. Web.