Repository logo
 
Loading...
Profile Picture

Search Results

Now showing 1 - 8 of 8
  • Applying deep learning to real-time UAV-based forest monitoring: Leveraging multi-sensor imagery for improved results
    Publication . Marques, Tomás; Carreira, Samuel; Miragaia, Rolando; Ramos, João; Pereira, António
    Rising global fire incidents necessitate effective solutions, with forest surveillance emerging as a crucial strategy. This paper proposes a complete solution using technology that integrates visible and infrared spectrum images through Unmanned Aerial Vehicles (UAVs) for enhanced detection of people and vehicles in forest environments. Unlike existing computer vision models relying on single-sensor imagery, this approach overcomes limitations posed by limited spectrum coverage, particularly addressing challenges in low-light conditions, fog, or smoke. The developed 4-channel model uses both types of images to take advantage of the strengths of each one simultaneously. This article presents the development and implementation of a solution for forest monitoring ranging from the transmission of images captured by a UAV to their analysis with an object detection model without human intervention. This model consists of a new version of the YOLOv5 (You Only Look Once) architecture. After the model analyzes the images, the results can be observed on a web platform on any device, anywhere in the world. For the model training, a dataset with thermal and visible images from the aerial perspective was captured with a UAV. From the development of this proposal, a new 4- channel model was created, presenting a substantial increase in precision and mAP (Mean Average Precision) metrics compared to traditional SOTA (state-of-the-art) models that only make use of red, green, and blue (RGB) images. Allied with the increase in precision, we confirmed the hypothesis that our model would perform better in conditions unfavorable to RGB images, identifying objects in situations with low light and reduced visibility with partial occlusions. With the model’s training using our dataset, we observed a significant increase in the model’s performance for images in the aerial perspective. This study introduces a modular system architecture featuring key modules: multisensor image capture, transmission, processing, analysis, and results presentation. Powered by an innovative object detection deep-learning model, these components collaborate to enable real-time, efficient, and distributed forest monitoring across diverse environments.
  • Systematic Review of Emotion Detection with Computer Vision and Deep Learning
    Publication . Pereira, Rafael; Mendes, Carla; Ribeiro, José; Ribeiro, Roberto; Miragaia, Rolando; Rodrigues, Nuno; Costa, Nuno; Pereira, António
    Emotion recognition has become increasingly important in the field of Deep Learning (DL) and computer vision due to its broad applicability by using human–computer interaction (HCI) in areas such as psychology, healthcare, and entertainment. In this paper, we conduct a systematic review of facial and pose emotion recognition using DL and computer vision, analyzing and evaluating 77 papers from different sources under Preferred Reporting Items for Systematic Reviews and MetaAnalyses (PRISMA) guidelines. Our review covers several topics, including the scope and purpose of the studies, the methods employed, and the used datasets. The scope of this work is to conduct a systematic review of facial and pose emotion recognition using DL methods and computer vision. The studies were categorized based on a proposed taxonomy that describes the type of expressions used for emotion detection, the testing environment, the currently relevant DL methods, and the datasets used. The taxonomy of methods in our review includes Convolutional Neural Network (CNN), Faster Region-based Convolutional Neural Network (R-CNN), Vision Transformer (ViT), and “Other NNs”, which are the most commonly used models in the analyzed studies, indicating their trendiness in the field. Hybrid and augmented models are not explicitly categorized within this taxonomy, but they are still important to the field. This review offers an understanding of state-of-the-art computer vision algorithms and datasets for emotion recognition through facial expressions and body poses, allowing researchers to understand its fundamental components and trends.
  • Synthetic image generation for effective deep learning model training for ceramic industry applications
    Publication . Gaspar, Fábio; Daniel Carreira; Rodrigues, Nuno; Miragaia, Rolando; Ribeiro, José; Costa, Paulo; Pereira, António
    In the rapidly evolving field of machine learning engineering, access to large, high-quality, and well-balanced labeled datasets is indispensable for accurate product classification. This necessity holds particular significance in sectors such as the ceramics industry, in which effective production line activities are paramount and deep learning classification mechanisms are particularly relevant for streamlining processes; but real-world image samples are scarce and difficult to obtain, hindering dataset building and consequently model training and deployment. This paper presents a novel approach for dataset building in the context of the ceramic industry, which involves employing synthetic images for building or complementing datasets for image classification problems. The proposed methodology was implemented in CeramicFlow, an innovative computer graphics rendering pipeline designed to create synthetic images by employing computer-aided design models of ceramic objects and incorporating domain randomization techniques. As a result, a fully synthetic image dataset named Synthetic CeramicNet was created and validated in real-world ceramic classification problems. The results demonstrate that synthetic images provide an adequate basis for datasets and can significantly reduce reliance on real-world data when developing deep learning approaches for image classification problems in the ceramic industry. Furthermore, the proposed approach can potentially be applied to other industrial fields.
  • Driving Behavior Classification Using a ConvLSTM
    Publication . Pingo, Alberto; Castro, João; Loureiro, Paulo; Mendes, Silvio; Bernardino, Anabela; Miragaia, Rolando; Husyeva, Iryna
    This work explores the classification of driving behaviors using a hybrid deep learning model that combines Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) networks (ConvLSTM). Sensor data are collected from a smartphone application and undergo a preprocessing pipeline, including data normalization, labeling, and feature extraction, to enhance the model’s performance. By capturing temporal and spatial dependencies within driving patterns, the proposed ConvLSTM model effectively differentiates between normal and aggressive driving behaviors. The model is trained and evaluated against traditional stacked LSTM and Bidirectional LSTM (BiLSTM) architectures, demonstrating superior accuracy and robustness. Experimental results confirm that the preprocessing techniques improve classification performance, ensuring high reliability in driving behavior recognition. The novelty of this work lies in a simple data preprocessing methodology combined with the specific application scenario. By enhancing data quality before feeding it into the AI model, we improve classification accuracy and robustness. The proposed framework not only optimizes model performance but also demonstrates practical feasibility, making it a strong candidate for real-world deployment.
  • Cartesian genetic programming applied to pitch estimation of piano notes
    Publication . Inacio, Tiago; Miragaia, Rolando; Reis, Gustavo; Grilo, Carlos; Fernandez, Francisco
    Pitch Estimation, also known as Fundamental Frequency (F0) estimation, has been a popular research topic for many years, and is still investigated nowadays. This paper presents a novel approach to the problem of Pitch Estimation, using Cartesian Genetic Programming (CGP). We take advantage of evolutionary algorithms, in particular CGP, to evolve mathematical functions that act as classifiers. These classifiers are used to identify piano notes' pitches in an audio signal. For a first approach, the obtained results are very promising: our error rate outperforms two of three state-of-the-art pitch estimators.
  • CGP4Matlab - A Cartesian Genetic Programming MATLAB Toolbox for Audio and Image Processing
    Publication . Miragaia, Rolando; Jorge dos Reis, Gustavo Miguel; Fernandéz, Francisco; Inácio, Tiago; Grilo, Carlos
    This paper presents and describes CGP4Matlab, a powerful toolbox that allows to run Cartesian Genetic Programming within MATLAB. This toolbox is particularly suited for signal processing and image processing problems. The implementation of CGP4Matlab, which can be freely downloaded, is described. Some encouraging results on the problem of pitch estimation of musical piano notes achieved using this toolbox are also presented. Pitch estimation of audio signals is a very hard problem with still no generic and robust solution found. Due to the highly flexibility of CGP4Matlab, we managed to apply a new cartesian genetic programming based approach to the problem of pitch estimation. The obtained results are comparable with the state of the art algorithms. © Springer International Publishing AG, part of Springer Nature 2018.
  • Artificial intelligence applied to the stone manufacturing industry: A systematic literature review
    Publication . Santos Silva, Alexandre; Antunes, Carolina; Miragaia, Rolando; Costa, Rogério Luís C.; Silva, Fernando; Ribeiro, José
    Natural stone has long been used in construction, as its properties provide functional and visual value, and the natural stone market currently holds significant importance in the global economy. It is important to consider integrating new technologies in the production chain to aid the industry in moving forward, increasing profit margins and reducing wasted material. This article reviews recent trends in using Artificial Intelligence and Machine Learning techniques in the industry between 2017 and 2024, following a methodology for Systematic Literature Reviews in computer science. It was found that extensive research has been conducted on the subject of tile classification, with solid solutions proposed, achieving results that can be considered robust enough for industrial application. Other subjects comprise tasks regarding stone cutting and defect detection, as well as variable prediction, and quarry activity monitoring. Some authors propose solutions to integrate new technologies into the complete production chain. While more research needs to be done on specific subjects, this review provides a solid first step to future research.
  • INTU-AI: Digitalization of Police Interrogation Supported by Artificial Intelligence
    Publication . Garcia, José António; Grilo, Carlos; Domingues, Patrício; Miragaia, Rolando
    Traditional police interrogation processes remain largely time-consuming and reliant on substantial human effort for both analysis and documentation. Intuition Artificial Intelligence (INTU-AI) is a Windows application designed to digitalize the administrative workflow associated with police interrogations, while enhancing procedural efficiency through the integration of AI-driven emotion recognition models. The system employs a multimodal approach that captures and analyzes emotional states using three primary vectors: Facial Expression Recognition (FER), Speech Emotion Recognition (SER), and Text-based Emotion Analysis (TEA). This triangulated methodology aims to identify emotional inconsistencies and detect potential suppression or concealment of affective responses by interviewees. INTU-AI serves as a decision-support tool rather than a replacement for human judgment. By automating bureaucratic tasks, it allows investigators to focus on critical aspects of the interrogation process. The system was validated in practical training sessions with inspectors and with a 12-question questionnaire. The results indicate a strong acceptance of the system in terms of its usability, existing functionalities, practical utility of the program, user experience, and open-ended qualitative responses.