We still have several things to address for our project. Most of the issues must do with run time and efficiency of the application of both segmentation and upscaling together. The current form of the implementation requires upscaling every segment individually and then merging them together. Given larger images especially for the CPU based implementation, for 1080p->2160p upscaling the runtime is already 44minutes for one segment. This means if we were to have a lot of segments, the of the city scene, this means 205*44=9020 minutes approximately 150 hours. This is unacceptable. If we were to incorporate segmentation again it would have to be inside the upscaling loop itself or somehow built into the models. In hindsight, we can completely disregard this problem when using a DGPU to do the processing as the runtime for one segment will take only several seconds meaning any image will take only several minutes to scale by 2x. However, this will completely nullify any notion of a low power (no DGPU) or mobile implementation (mobile) using image segmentation. Another way we could achieve a much better run time is to use an algorithm developed by someone else for super resolution. For example, we could use Milanfar’s algorithm.
The current denoising model developed for this project is somewhat peculiar. It introduces noise to a completely black image. Although the output appears to the human eye to be completely the same, the segmentation algorithm appears to be very sensitive to the noise (albeit visually appealing) added by the denoising process. It would be good to redevelop the model to add the exception for the case of RGB values of (0,0,0) to do nothing. This has led to the use of the original file to produce segments due to the inconsistencies produced by the model.
Currently, the current segmentation code uses a custom RGB class. Although simpler than the OpenCV matrix representation, it would be easier to read the code and easier to adapt the code if everything used was of type OpenCV. In addition to this, the segmentation, although completely dependent on the RGB class (can adapt the diff condition to fit YUV), can be implemented into the YUV format. This will make it unnecessary to use cv::cvtColor to convert between YUV and RGB. This will make it unnecessary to use cv::cvtColor to convert between YUV and RGB.
For improvements to the CNN, a better optimizing algorithm can be implemented in the future (when one is developed) to further reduce the required memory for computation. Currently when running the model, it takes several days and a server with 128Gb of RAM to finish. Using a rig with any lower amount of RAM will cause the memory to overflow. Although probably infeasible now there is a possiblility of moving the core algorithm (without segmentation).
As we know that, Android has native support for OpenCL, which will open the possibility to the device’s native GPU to implement the algorithm on mobile devices. This will significantly reduce the runtime as we have seen above due to the parallelization capabilities of GPUs. As the chips are getting faster and faster, it is possible to run the implementation in real-time, especially for 480P->720P, 720P->1080P. This makes it possible to make the most use out of the available cable bandwidth. This is very desirable to for streaming services (albeit at the dismay of the user) as they can stream compressed video and upscale it on the users’ end claiming “true HD resolution”.