ESTG - Mestrado em Engenharia Eletrotécnica - Telecomunicações
Permanent URI for this collection
Browse
Browsing ESTG - Mestrado em Engenharia Eletrotécnica - Telecomunicações by advisor "Assunção, Pedro António Amado"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- JOINT CODING OF MULTIMODAL BIOMEDICAL IMAGES US ING CONVOLUTIONAL NEURAL NETWORKSPublication . Parracho, João Oliveira; Assunção, Pedro António Amado; Távora, Luís Miguel de Oliveira Pegado de Noronha e; Thomaz, Lucas ArrabalThe massive volume of data generated daily by the gathering of medical images with different modalities might be difficult to store in medical facilities and share through communication networks. To alleviate this issue, efficient compression methods must be implemented to reduce the amount of storage and transmission resources required in such applications. However, since the preservation of all image details is highly important in the medical context, the use of lossless image compression algorithms is of utmost importance. This thesis presents the research results on a lossless compression scheme designed to encode both computerized tomography (CT) and positron emission tomography (PET). Different techniques, such as image-to-image translation, intra prediction, and inter prediction are used. Redundancies between both image modalities are also investigated. To perform the image-to-image translation approach, we resort to lossless compression of the original CT data and apply a cross-modality image translation generative adversarial network to obtain an estimation of the corresponding PET. Two approaches were implemented and evaluated to determine a PET residue that will be compressed along with the original CT. In the first method, the residue resulting from the differences between the original PET and its estimation is encoded, whereas in the second method, the residue is obtained using encoders inter-prediction coding tools. Thus, in alternative to compressing two independent picture modalities, i.e., both images of the original PET-CT pair solely the CT is independently encoded alongside with the PET residue, in the proposed method. Along with the proposed pipeline, a post-processing optimization algorithm that modifies the estimated PET image by altering the contrast and rescaling the image is implemented to maximize the compression efficiency. Four different versions (subsets) of a publicly available PET-CT pair dataset were tested. The first proposed subset was used to demonstrate that the concept developed in this work is capable of surpassing the traditional compression schemes. The obtained results showed gains of up to 8.9% using the HEVC. On the other side, JPEG2k proved not to be the most suitable as it failed to obtain good results, having reached only -9.1% compression gain. For the remaining (more challenging) subsets, the results reveal that the proposed refined post-processing scheme attains, when compared to conventional compression methods, up 6.33% compression gain using HEVC, and 7.78% using VVC.
- LEARNING-BASED IMAGE COMPRESSION USING MULTIPLE AUTOENCODERSPublication . António, Rúben Duarte; Assunção, Pedro António Amado; Faria, Sérgio Manuel Maciel de; Távora, Luís Miguel de Oliveira Pegado de Noronha eAdvanced video applications in smart environments (e.g., smart cities) bring different challenges associated with increasingly intelligent systems and demanding requirements in emerging fields such as urban surveillance, computer vision in industry, medicine and others. As a consequence, a huge amount of visual data is captured to be analyzed by task-algorithm driven machines. Due to the large amount of data generated, problems may occur at the data management level, and to overcome this problem it is necessary to implement efficient compression methods to reduce the amount of stored resources. This thesis presents the research work on image compression methods using deep learning algorithms analyzing the properties of different algorithms, because recently these have shown good results in image compression. It is also explained the convolutional neural networks and presented a state-of-the-art of autoencoders. Two compression approaches using autoencoders were studied, implemented and tested, namely an object-oriented compression scheme, and algorithms oriented to high resolution images (UHD and 360º images). In the first approach, a video surveillance scenario considering objects such as people, cars, faces, bicycles and motorbikes was regarded, and a compression method using autoencoders was developed with the purpose of the decoded images being delivered for machine vision processing. In this approach the performance was measured analysing the traditional image quality metrics and the accuracy of task driven by machine using decoded images. In the second approach, several high resolution images were considered adapting the method used in the previous approach considering properties of the image, like variance, gradients or PCA of the features, instead of the content that the image represents. Regarding the first approach, in comparison with the Versatile Video Coding (VVC) standard, the proposed approach achieves significantly better coding efficiency, e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is also significantly higher when performed over visual objects compressed with the proposed scheme in comparison with the same tasks performed over the same visual objects compressed with the VVC. These results demonstrate that the learningbased approach proposed is a more efficient solution for compression of visual objects than standard encoding. Considering the second approach although it is possible to obtain better results than VVC on the test subsets, the presented approach only presents significant gains considering 360º images.
- Subjective assessment of 3D still images using attention modelsPublication . Arruda, Auridélia Moura de; Assunção, Pedro António AmadoThe subjective process associated with image quality evaluation is endorsed by human psychophysical and physiological measurements. In the human visual system (HVS), the visual attention (VA) is a crucial element, which quickly identifies the notable regions of the images, subjectively linked to Regions of Interest (ROI). These are represented by a binary mask, indicating whether a pixel in the corresponding image belongs to the ROI. Subjective mechanisms dominate eye movements in the first two seconds of viewing and, due to the high relation between the eye actions and the VA, eye-tracking tests are used to validate 2D attention models: eye movements are recorded and processed to generate a Fixation Density Map (FDM). In the 3D domain, an essential factor for VA among the additional parameters of visual dimension is the scene depth. The main objective of this work was to study the impact of a particular ROI on the subjective quality perception of 3D still images, considering different types and level of noise (or distortion) in and out of the ROI. The 3DGaze images and eye movement database, obtained from an eye tracking experiment described in [1] and specifically created for performance evaluation of stereoscopic 3D attention models was used. Besides the full public availability, this database has original images in various HD sizes, all in PNG format and with natural content. Binary masks were generated for each FDM and different types and intensities of noise was added to each corresponding image according to the ROIs. By using the generated binary masks and image pixels positions in the ROI, the noise was added in the image regions located inside or outside the ROI. Subjective testing with users observing the images and scoring their quality was done to verify the importance of these regions in the subjective quality evaluation. The images were classified according to whether the noise was added inside or outside the ROI, the noise type (Gaussian, Speckle), parameter values (intensity level) and the noisy image view (left or right). The results have shown that Gaussian noise has less impact on the quality than Speckle, with higher intensity level and also when the noise is added to the right view and inside the ROI. This is justified due to the fact that viewers fix their eyes over the ROI during more time, thus perceiving higher distortion. As the amount of different image content was small, the information about the dominant eye appears to be inconclusive. Changing some parameters is suggested for future works, in such a way that there is more certainty on the results without interferences between analyses. As this work was limited to data contained in the image (bottom-up visual interest), some concepts related to the visualization context (top-down visual interest), such as rarity or surprise, may naturally be included in future works, as well as the characteristic of looking primarily for human faces or humanoid things.