Name: | Description: | Size: | Format: | |
---|---|---|---|---|
13.07 MB | Adobe PDF |
Authors
Abstract(s)
Advanced video applications in smart environments (e.g., smart cities) bring different
challenges associated with increasingly intelligent systems and demanding
requirements in emerging fields such as urban surveillance, computer vision in
industry, medicine and others. As a consequence, a huge amount of visual data
is captured to be analyzed by task-algorithm driven machines. Due to the large
amount of data generated, problems may occur at the data management level, and
to overcome this problem it is necessary to implement efficient compression methods
to reduce the amount of stored resources.
This thesis presents the research work on image compression methods using
deep learning algorithms analyzing the properties of different algorithms, because
recently these have shown good results in image compression. It is also explained
the convolutional neural networks and presented a state-of-the-art of autoencoders.
Two compression approaches using autoencoders were studied, implemented and
tested, namely an object-oriented compression scheme, and algorithms oriented
to high resolution images (UHD and 360º images). In the first approach, a video
surveillance scenario considering objects such as people, cars, faces, bicycles and
motorbikes was regarded, and a compression method using autoencoders was developed
with the purpose of the decoded images being delivered for machine vision
processing. In this approach the performance was measured analysing the traditional
image quality metrics and the accuracy of task driven by machine using decoded
images. In the second approach, several high resolution images were considered
adapting the method used in the previous approach considering properties of the
image, like variance, gradients or PCA of the features, instead of the content that
the image represents.
Regarding the first approach, in comparison with the Versatile Video Coding
(VVC) standard, the proposed approach achieves significantly better coding efficiency,
e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is
also significantly higher when performed over visual objects compressed with the
proposed scheme in comparison with the same tasks performed over the same visual
objects compressed with the VVC. These results demonstrate that the learningbased
approach proposed is a more efficient solution for compression of visual objects than standard encoding. Considering the second approach although it is possible to
obtain better results than VVC on the test subsets, the presented approach only
presents significant gains considering 360º images.
Description
Keywords
Learning-based compression Autoencoders Visual objects Video surveillance UHD images 360º images Convolutional neural networks