Upscaling Block Diagram
Discussion on Merge:
The process of merge took an extensive amount of time to make working. For some reason after denoising and upscaling there were many anomalies within the image datat. For instance a black pixel may have non zero RGB valuers or in YUV form it may have non 0 UV channels or very large Y values for 0 UV. We concluded that we cannot use ==0 as the threshold for computing whether a pixel is a part of a segment or not.
Originally each segment was passed as a map of RGB pixels to the merge portion of the code and we checked if the R,G,B channels had values <1. And if it did we would do nothing to the pixel. However this proved impossible as after denoising and upscaling some of the supposed black pixels weren’t truly black. Worst of all the values weren’t consistent sometimes they were (1,1,1) or (3,3,3) or (5,5,5) and we were not able to discern a pattern or upper bound on these values.
After that we decided to convert the image to a black and white format and then directly compare whether an image is “black” or not. The same problem occurred as above and some segments would end up overwriting parts of other segments due to blurring of pixels around the edges.
We tried the same thing going from RGB to YUV and performing the same computation. However we were faced with the same problem again with erratic Y values.
There are two things to consider about why the above three ways failed. Firstly, we were using OpenCV’s implementation of pixel conversion. We are not sure how it is converted and whether it will produce interpolation errors or computation errors. Secondly, the way we accessed the Y values of the YUV converted image were (probably) wrong. Thus, we were getting values as uint output instead of the float values the Y channel was supposed to be.
To resolve these issues, after much trial and error we discovered that the best way to correctly do this is to calculate the Y channel ourselves manually. This required scaling the output Y value by 255.0 and doing calculations in floating point to maintain as much accuracy as possible. Only with this way were we able to properly use a threshold to determine whether a pixel is black or not with a much smaller margin of error in the use of floating point.
Image Segmentation Block Diagram
Discussion on Disjoint-Set Forests as Segment Storage
Disjoint-set forests with union by rank and path compression. Let us define what these things mean. A disjoin-set data structure allows us to keep track of a set of elements made up of disjoint components. In our case, this would mean that each disjoint set would represent an individual segment. The two notable functions that this provides is the find and union (named join in the implementation). This allows us to determine if two representatives are part of the same root and to join two trees into one single tree building the segmentation. The two notable functions that this provides is the find and union (named join in the implementation). This allows us to determine if two representatives are part of the same root and to join two trees into one single tree building the segmentation.
In union-by-size, the determining factor for which set to join to the other is the number of elements in each set. Union by rank ensures that we can retain minimum depth for maximum efficiency. This is results in a complexity of ~O (log n). Rank is initially 0 when a one element set is formed. Then as we begin joining sets, sets of the same rank will be joined and then incrementing the rank variable of the new union set.
Now what is a disjoint-set forest? Instead of implementing a disjoint-set data structure using either an array or vector, in which a representative could be any element inside the set, a disjoint set forest uses a tree structure where the representative is the root of the tree. This alters the find and so that when called it reaturs root of each tree instead. In the case of find called on a node, it will follow parent nodes until it reaches the root. This means that we do not need to keep track of the actual set in which an element belongs to as we can easily search through a tree to determine if two elements are in the same set. In addition to this, when using path compression when find is called on any node it will attach itself and all its parent nodes directly to the root hence lowering the height of the tree making subsequent finds much faster.
Felenswalb and Huttenlocher’s algorithm has complexity O (m α(m)) Where α(m) is the inverse Ackerman function and m represents the number of vertices in the graph. Because there are at most three edges connected to each vertex (one for each RGB component) all operations can be done in constant time. Therefore, the only limiting factor is the rank (depth) which is bounded by O (m α(m)).