Browsing by Author "Miragaia, Rolando"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Applying deep learning to real-time UAV-based forest monitoring: Leveraging multi-sensor imagery for improved resultsPublication . Marques, Tomás; Carreira, Samuel; Miragaia, Rolando; Ramos, João; Pereira, AntónioRising global fire incidents necessitate effective solutions, with forest surveillance emerging as a crucial strategy. This paper proposes a complete solution using technology that integrates visible and infrared spectrum images through Unmanned Aerial Vehicles (UAVs) for enhanced detection of people and vehicles in forest environments. Unlike existing computer vision models relying on single-sensor imagery, this approach overcomes limitations posed by limited spectrum coverage, particularly addressing challenges in low-light conditions, fog, or smoke. The developed 4-channel model uses both types of images to take advantage of the strengths of each one simultaneously. This article presents the development and implementation of a solution for forest monitoring ranging from the transmission of images captured by a UAV to their analysis with an object detection model without human intervention. This model consists of a new version of the YOLOv5 (You Only Look Once) architecture. After the model analyzes the images, the results can be observed on a web platform on any device, anywhere in the world. For the model training, a dataset with thermal and visible images from the aerial perspective was captured with a UAV. From the development of this proposal, a new 4- channel model was created, presenting a substantial increase in precision and mAP (Mean Average Precision) metrics compared to traditional SOTA (state-of-the-art) models that only make use of red, green, and blue (RGB) images. Allied with the increase in precision, we confirmed the hypothesis that our model would perform better in conditions unfavorable to RGB images, identifying objects in situations with low light and reduced visibility with partial occlusions. With the model’s training using our dataset, we observed a significant increase in the model’s performance for images in the aerial perspective. This study introduces a modular system architecture featuring key modules: multisensor image capture, transmission, processing, analysis, and results presentation. Powered by an innovative object detection deep-learning model, these components collaborate to enable real-time, efficient, and distributed forest monitoring across diverse environments.
- Multi Pitch Estimation of Piano Music using Cartesian Genetic Programming with Spectral Harmonic MaskPublication . Miragaia, Rolando; Reis, Gustavo; Fernandéz de Vega, Francisco; Chávez, FranciscoPiano notes recognition, or pitch estimation of piano notes has been a popular research topic for many years, and is still investigated nowadays. It is a fundamental task during the process of automatic music transcription (extracting the musical score from an acoustic signal). We take advantage of Cartesian Genetic Programming (CGP) to evolve mathematical functions that act as independent classifiers for piano notes. These classifiers are then used to identify the presence of piano notes in polyphonic audio signals. This paper describes our technique and the latest improvements made in our research. The main feature is the introduction of spectral harmonic masks in the binarization process for measuring the fitness values that has allowed to improve the classification rate: 10% in the F-measure mean result. Our system architecture is also described to show the feasibility of its parallelization, which will reduce the computing time.
- Synthetic image generation for effective deep learning model training for ceramic industry applicationsPublication . Gaspar, Fábio; Daniel Carreira; Rodrigues, Nuno; Miragaia, Rolando; Ribeiro, José; Costa, Paulo; Pereira, AntónioIn the rapidly evolving field of machine learning engineering, access to large, high-quality, and well-balanced labeled datasets is indispensable for accurate product classification. This necessity holds particular significance in sectors such as the ceramics industry, in which effective production line activities are paramount and deep learning classification mechanisms are particularly relevant for streamlining processes; but real-world image samples are scarce and difficult to obtain, hindering dataset building and consequently model training and deployment. This paper presents a novel approach for dataset building in the context of the ceramic industry, which involves employing synthetic images for building or complementing datasets for image classification problems. The proposed methodology was implemented in CeramicFlow, an innovative computer graphics rendering pipeline designed to create synthetic images by employing computer-aided design models of ceramic objects and incorporating domain randomization techniques. As a result, a fully synthetic image dataset named Synthetic CeramicNet was created and validated in real-world ceramic classification problems. The results demonstrate that synthetic images provide an adequate basis for datasets and can significantly reduce reliance on real-world data when developing deep learning approaches for image classification problems in the ceramic industry. Furthermore, the proposed approach can potentially be applied to other industrial fields.
- Systematic Review of Emotion Detection with Computer Vision and Deep LearningPublication . Pereira, Rafael; Mendes, Carla; Ribeiro, José; Ribeiro, Roberto; Miragaia, Rolando; Rodrigues, Nuno; Costa, Nuno; Pereira, AntónioEmotion recognition has become increasingly important in the field of Deep Learning (DL) and computer vision due to its broad applicability by using human–computer interaction (HCI) in areas such as psychology, healthcare, and entertainment. In this paper, we conduct a systematic review of facial and pose emotion recognition using DL and computer vision, analyzing and evaluating 77 papers from different sources under Preferred Reporting Items for Systematic Reviews and MetaAnalyses (PRISMA) guidelines. Our review covers several topics, including the scope and purpose of the studies, the methods employed, and the used datasets. The scope of this work is to conduct a systematic review of facial and pose emotion recognition using DL methods and computer vision. The studies were categorized based on a proposed taxonomy that describes the type of expressions used for emotion detection, the testing environment, the currently relevant DL methods, and the datasets used. The taxonomy of methods in our review includes Convolutional Neural Network (CNN), Faster Region-based Convolutional Neural Network (R-CNN), Vision Transformer (ViT), and “Other NNs”, which are the most commonly used models in the analyzed studies, indicating their trendiness in the field. Hybrid and augmented models are not explicitly categorized within this taxonomy, but they are still important to the field. This review offers an understanding of state-of-the-art computer vision algorithms and datasets for emotion recognition through facial expressions and body poses, allowing researchers to understand its fundamental components and trends.