Things to note: 1. We are using OpenCV version 3.0 RC1 for all the tests below. Prior versions 2.8 and 3.1 failed in one aspect or another. 2.8 failed to compile on Linux and 3.1 failed for some unknown reason when deallocating memory for matrices when a function went out of scope.
Results from the i7-5600U Processor (CPU version of the algorithm using filter2)
Again, we will be using the same 3 color bar as a control base case. Again, using the same input arguments of sigma=0.1, k=30, min_size=10. We chose the upscale ratio of 2.0x we see the following results. First, we take a look at the scaled segments. The runtime of scaling+segmentation+merge is 15.814 seconds. Here is the input image.
Here are the upscaled segments…
And now merged together…
Now we make the image more complex. We use the Undertale logo for this test. We used parameters sigma=0.8, k=900, min_size=550, and a scale ratio of 2.0x. First, we look at the input image. The runtime was 9 minutes 41 seconds. We can already see the incredible difference in run time. This is firstly due to the difference in the resolution, bit depth, and the number of segments produced (12).
Now we take a look at the upscaled sements (only a few shown).
And after a few merges…
And finally here is the output with all the segments merged together. Even given that we have upscaled it we can already see that the heart inside the R appears to have gained some sharpness and in general the whole image seems to have gained more contrast in addition to being upscaled. However, due to the filter used (filter2) the output seems a bit softer than it really should.
Below we look at the source image for our last example.
Given the computer’s CPU it took a total of 44 min 12 sec to upscale one segment (original image) to 2160p. Thus we concluded it would be infeasible to complete a complete upscaling of this image to 2160p. The output of the 4x upscaling is below.
Now we use a completely different setup to perform the upscaling. Given our previous knowledge of the multiplicative nature of runtime due to image segmentation, all of the next tests will be done without image segmentation. We first perform the same upscaling using an i7-5930k. This is a 6 core CPU capable of 12 threads and has a base frequency of 3.5GHz and turobos up to 3.70GHz. Using the same input image as above this CPU was able to perform the upscaling in 38 minutes and 42 seconds. Considering the price difference and core number compared to the Thinkpad, this is only marginally faster. This shows that using CPU to do computation is only producing diminision returns. Below is the output run on filter2.
Our third test case was an AMD FX 9590 Black an 8 core processor with 4.7GHz clock speed and turbo speed of 5.0GHz. There are two main issues to address when it comes to using AMD products to perform the computation. Firstly, because people working at Intel first developed OpenCV, it uses SSE (Streaming SIMD Extensions) optimizations. Although AMD later added support for SSE instructions starting with its Athlon XP line, it is still not up to par with Intel’s implementation. Thus computation is inherently slower than using an Intel product. As thus even with its superior core count and processor speed due to the limitations of the SSE implementation on this processor it is even slower than an Intel i7 dual core processor with a runtime of 1 hour 31 minutes and 9 seconds.
Running the same image using a GPU implementation enabling the use of CUDA and much more efficient parallelization. We used a Nvidia GTX 1070 (1920 Cuda Cores, Clock 1506 Mhz, Boost Clock 1683 Mhz), GTX 980Ti (2816 Cuda Cores, Clock 1075 Mhz, Boost Clock 1075 MHz.), and 2 GTX980Ti in SLI. This was significantly faster, on the order of hundreds of times faster than a CPU implementation.
All results have been summarized below for the city scene.
|Processor Type||Time to execute|
|AMD FX 9590 Black (filter2)||1 hour 31 minutes 9 seconds|
|Intel i7-5600U (filter2)||44 minutes 12 seconds|
|Intel i7-5930k (filter2)||38 minutes 42 seconds|
|Intel E5-2690v4 (filter2)||16 minutes 27 seconds|
|Nvidia GTX 1070 (filter2)||5.0823 seconds|
|Nvidia GTX 980Ti (filter2)||8.0292 seconds|
|Nvidia GTX 980Ti SLI (2) (filter2)||1.16984 seconds|
And the output