Logo do repositório
 

CIIC - Artigos em Revistas com Peer Review

URI permanente para esta coleção:

Navegar

Entradas recentes

A mostrar 1 - 10 de 81
  • Evolving a multi-classifier system with cartesian genetic programming for multi-pitch estimation of polyphonic piano music
    Publication . Miragaia, Rolando; Vega, Francisco Fernandez de; Reis, Gustavo
    This paper presents a new method for multi-pitch estimation on piano recordings. We propose a framework based on a set of classifiers to analyze the audio input and identify the piano notes present on the given audio signal. Our system's classifiers were evolved using Cartesian Genetic Programming: we take advantage of Cartesian Genetic Programming to evolve a set of mathematical functions that act as independent classifiers for piano notes. Our latest improvements are also presented, including test results using F-measure metrics. Our system architecture is also described to show the feasibility of its parallelization and implementation as a real time system. The proposed approach achieved competitive results, when compared to the state of the art.
  • Evolving a Multi-Classifier System for Multi-Pitch Estimation of Piano Music and Beyond: An Application of Cartesian Genetic Programming
    Publication . Miragaia, Rolando; Fernández, Francisco; Reis, Gustavo; Inácio, Tiago
    This paper presents a new method with a set of desirable properties for multi-pitch estimation of piano recordings. We propose a framework based on a set of classifiers to analyze audio input and to identify piano notes present in a given audio signal. Our system’s classifiers are evolved using Cartesian genetic programming: we take advantage of Cartesian genetic programming to evolve a set of mathematical functions that act as independent classifiers for piano notes. Two significant improvements are described: the use of a harmonic mask for better fitness values and a data augmentation process for improving the training stage. The proposed approach achieves com-petitive results using F-measure metrics when compared to state-of-the-art algorithms. Then, we go beyond piano and show how it can be directly applied to other musical instruments, achieving even better results. Our system’s architecture is also described to show the feasibility of its parallelization and its implementation as a real-time system. Our methodology is also a white-box optimization approach that allows for clear analysis of the solutions found and for researchers to learn and test improvements based on the new findings.
  • Optimising anti-spam filters with evolutionary algorithms
    Publication . Yevseyeva, Iryna; Basto-Fernandes, Vitor; Ruano-Ordás, David; Méndez, José R.
    This work is devoted to the problem of optimising scores for anti-spam filters, which is essential for the accuracy of any filter based anti-spam system, and is also one of the biggest challenges in this research area. In particular, this optimisation problem is considered from two different points of view: single and multiobjective problem formulations. Some of existing approaches within both formulations are surveyed, and their advantages and disadvantages are discussed. Two most popular evolutionary multiobjective algorithms and one single objective algorithm are adapted to optimisation of the anti-spam filters' scores and compared on publicly available datasets widely used for benchmarking purposes. This comparison is discussed, and the recommendations for the developers and users of optimising anti-spam filters are provided.
  • Customized crowds and active learning to improve classification
    Publication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, Bernardete
    Traditional classification algorithms can be limited in their performance when a specific user is targeted. User preferences, e.g. in recommendation systems, constitute a challenge for learning algorithms. Additionally, in recent years user’s interaction through crowdsourcing has drawn significant interest, although its use in learning settings is still underused. In this work we focus on an active strategy that uses crowd-based non-expert information to appropriately tackle the problem of capturing the drift between user preferences in a recommendation system. The proposed method combines two main ideas: to apply active strategies for adaptation to each user; to implement crowdsourcing to avoid excessive user feedback. A similitude technique is put forward to optimize the choice of the more appropriate similitude-wise crowd, under the guidance of basic user feedback. The proposed active learning framework allows non-experts classification performed by crowds to be used to define the user profile, mitigating the labeling effort normally requested to the user. The framework is designed to be generic and suitable to be applied to different scenarios, whilst customizable for each specific user. A case study on humor classification scenario is used to demonstrate experimentally that the approach can improve baseline active results.
  • A Graph Database Representation of Portuguese Criminal-Related Documents
    Publication . Carnaz, Gonçalo; Nogueira, Vitor Beires; Antunes, Mário
    Organizations have been challenged by the need to process an increasing amount of data, both structured and unstructured, retrieved from heterogeneous sources. Criminal investigation police are among these organizations, as they have to manually process a vast number of criminal reports, news articles related to crimes, occurrence and evidence reports, and other unstructured documents. Automatic extraction and representation of data and knowledge in such documents is an essential task to reduce the manual analysis burden and to automate the discovering of names and entities relationships that may exist in a case. This paper presents SEMCrime, a framework used to extract and classify named-entities and relations in Portuguese criminal reports and documents, and represent the data retrieved into a graph database. A 5WH1 (Who, What, Why, Where, When, and How) information extraction method was applied, and a graph database representation was used to store and visualize the relations extracted from the documents. Promising results were obtained with a prototype developed to evaluate the framework, namely a name-entity recognition with an F-Measure of 0.73, and a 5W1H information extraction performance with an F-Measure of 0.65.
  • Information Security and Cybersecurity Management: A Case Study with SMEs in Portugal
    Publication . Antunes, Mário; Maximiano, Marisa; Gomes, Ricardo; Pinto, Daniel
    Information security plays a key role in enterprises management, as it deals with the confidentiality, privacy, integrity, and availability of one of their most valuable resources: data and information. Small and Medium-sized enterprises (SME) are seen as a blind spot in information security and cybersecurity management, which is mainly due to their size, regional and familiar scope, and financial resources. This paper presents an information security and cybersecurity management project, in which a methodology based on the well-known ISO-27001:2013 standard was designed and implemented in fifty SMEs that were located in the center region of Portugal. The project was conducted by a business association located at the center of Portugal and mainly participated by SMEs. The Polytechnic of Leiria and an IT auditing/consulting team were the other two entities that participated on the project. The characterisation of the participating enterprises, the ISO-27001:2013 based methodology developed and implemented in SMEs, as well as the results obtained in this case study, are depicted and analysed in the paper. The attained results show a clear benefit to the audited and intervened SMEs, being mainly attested by the increasing of their information security management robustness and collaborators’ cyberawareness.
  • Exposing Manipulated Photos and Videos in Digital Forensics Analysis
    Publication . Ferreira, Sara; Antunes, Mário; Correia, Manuel E.
    Tampered multimedia content is being increasingly used in a broad range of cybercrime activities. The spread of fake news, misinformation, digital kidnapping, and ransomware-related crimes are amongst the most recurrent crimes in which manipulated digital photos and videos are the perpetrating and disseminating medium. Criminal investigation has been challenged in applying machine learning techniques to automatically distinguish between fake and genuine seized photos and videos. Despite the pertinent need for manual validation, easy-to-use platforms for digital forensics are essential to automate and facilitate the detection of tampered content and to help criminal investigators with their work. This paper presents a machine learning Support Vector Machines (SVM) based method to distinguish between genuine and fake multimedia files, namely digital photos and videos, which may indicate the presence of deepfake content. The method was implemented in Python and integrated as new modules in the widely used digital forensics application Autopsy. The implemented approach extracts a set of simple features resulting from the application of a Discrete Fourier Transform (DFT) to digital photos and video frames. The model was evaluated with a large dataset of classified multimedia files containing both legitimate and fake photos and frames extracted from videos. Regarding deepfake detection in videos, the Celeb-DFv1 dataset was used, featuring 590 original videos collected from YouTube, and covering different subjects. The results obtained with the 5-fold cross-validation outperformed those SVM-based methods documented in the literature, by achieving an average F1-score of 99.53%, 79.55%, and 89.10%, respectively for photos, videos, and a mixture of both types of content. A benchmark with state-of-the-art methods was also done, by comparing the proposed SVM method with deep learning approaches, namely Convolutional Neural Networks (CNN). Despite CNN having outperformed the proposed DFT-SVM compound method, the competitiveness of the results attained by DFT-SVM and the substantially reduced processing time make it appropriate to be implemented and embedded into Autopsy modules, by predicting the level of fakeness calculated for each analyzed multimedia file.
  • Usability of Smartbands by the Elderly Population in the Context of Ambient Assisted Living Applications
    Publication . Correia, Luís; Fuentes, Daniel; Ribeiro, José; Costa, Nuno; Reis, Arsénio; Rabadão, Carlos; Barroso, João; Pereira, António
    Nowadays, the Portuguese population is aging at a fast pace. The situation is more severe in the interior regions of the country, where the rural areas have few people and have been constantly losing population; these are mostly elderly who, in some cases, live socially isolated. They are also often deprived of some types of social, health and technological services. One of the current challenges with respect to the elderly is that of improving the quality of life for those who still have some autonomy and live in their own residences so that they may continue living autonomously, while receiving the assistance of some exterior monitoring and supporting services. The Internet of Things (IoT) paradigm demonstrates great potential for creating technological solutions in this area as it aims to seamlessly integrate information technology with the daily lives of people. In this context, it is necessary to develop services that monitor the activity and health of the elderly in real time and alert caregivers or other family members in the case of an unusual event or behaviour. It is crucial that the technological system is able to collect data in a nonintrusive manner and without requiring much interaction with the elderly. Smartband devices are very good candidates for this purpose and, therefore, this work proposes assessing the level of acceptance of the usage of a smartbands by senior users in their daily activities. By using the definition of an architecture and the development of a prototype, it was possible to test the level of acceptance of smartbands by a sample of the elderly population—with surprising results from both the elderly and the caregivers—which constitutes an important contribution to the research field of Ambient Assisted Living (AAL). The evaluation showed that most users did not feel that the smartband was intrusive to their daily tasks and even considered using it in the future, while caregivers considered that the platform was very intuitive.
  • A Survey on Data-driven Performance Tuning for Big Data Analytics Platforms
    Publication . Costa, Rogério Luís de C.; Moreira, José; Pintor, Paulo; Santos, Veronica dos; Lifschitz, Sérgio
    Many research works deal with big data platforms looking forward to data science and analytics. These are complex and usually distributed environments, composed of several systems and tools. As expected, there is a need for a closer look at performance issues. In this work, we review performance tuning strategies in the big data environment. We focus on data-driven tuning techniques, discussing the use of database inspired approaches. Concerning big data and NoSQL stores, performance tuning issues are quite different from the so-called conventional systems. Many existing solutions are mostly ad-hoc activities that do not fit for multiple situations. But there are some categories of data-driven solutions that can be taken as guidelines and incorporated into general-purpose auto-tuning modules for big data systems. We examine typical performance tuning actions, discussing available solutions to support some of the tuning process's primary activities. We also discuss recent implementations of data-driven performance tuning solutions for big data platforms. We propose an initial classification based on the domain state-of-the-art and present selected tuning actions for large-scale data processing systems. Finally, we organized existing works towards self-tuning big data systems based on this classification and presented general and system-specific tuning recommendations. We found that most of the literature pieces evaluate the use of tuning actions at the physical design perspective, and there is a lack of self-tuning machine-learning-based solutions for big data systems.
  • A Dataset of Photos and Videos for Digital Forensics Analysis Using Machine Learning Processing
    Publication . Ferreira, Sara; Antunes, Mário; Correia, Manuel E.
    Deepfake and manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. Ransomware, the dissemination of fake news, and digital kidnapping-related crimes are the most recurrent, in which tampered multimedia content has been the primordial disseminating vehicle. Digital forensic analysis tools are being widely used by criminal investigations to automate the identification of digital evidence in seized electronic equipment. The number of files to be processed and the complexity of the crimes under analysis have highlighted the need to employ efficient digital forensics techniques grounded on state-of-the-art technologies. Machine Learning (ML) researchers have been challenged to apply techniques and methods to improve the automatic detection of manipulated multimedia content. However, the implementation of such methods have not yet been massively incorporated into digital forensic tools, mostly due to the lack of realistic and well-structured datasets of photos and videos. The diversity and richness of the datasets are crucial to benchmark the ML models and to evaluate their appropriateness to be applied in real-world digital forensics applications. An example is the development of third-party modules for the widely used Autopsy digital forensic application. This paper presents a dataset obtained by extracting a set of simple features from genuine and manipulated photos and videos, which are part of state-of-the-art existing datasets. The resulting dataset is balanced, and each entry comprises a label and a vector of numeric values corresponding to the features extracted through a Discrete Fourier Transform (DFT). The dataset is available in a GitHub repository, and the total amount of photos and video frames is 40, 588 and 12, 400, respectively. The dataset was validated and benchmarked with deep learning Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) methods; however, a plethora of other existing ones can be applied. Generically, the results show a better F1-score for CNN when comparing with SVM, both for photos and videos processing. CNN achieved an F1-score of 0.9968 and 0.8415 for photos and videos, respectively. Regarding SVM, the results obtained with 5-fold cross-validation are 0.9953 and 0.7955, respectively, for photos and videos processing. A set of methods written in Python is available for the researchers, namely to preprocess and extract the features from the original photos and videos files and to build the training and testing sets. Additional methods are also available to convert the original PKL files into CSV and TXT, which gives more flexibility for the ML researchers to use the dataset on existing ML frameworks and tools.