A carregar...
32 resultados
Resultados da pesquisa
A mostrar 1 - 10 de 32
- Adaptive learning for dynamic environments: A comparative approachPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteNowadays most learning problems demand adaptive solutions. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. Various efforts have been pursued in machine learning settings to learn in such environments, specially because of their non-trivial nature, since changes occur between the distribution data used to define the model and the current environment. In this work we present the Drift Adaptive Retain Knowledge (DARK) framework to tackle adaptive learning in dynamic environments based on recent and retained knowledge. DARK handles an ensemble of multiple Support Vector Machine (SVM) models that are dynamically weighted and have distinct training window sizes. A comparative study with benchmark solutions in the field, namely the Learn++.NSE algorithm, is also presented. Experimental results revealed that DARK outperforms Learn++.NSE with two different base classifiers, an SVM and a Classification and Regression Tree (CART).
- Improving Text Classification Performance with Incremental Background KnowledgePublication . Silva, Catarina; Ribeiro, BernardeteText classification is generally the process of extracting interesting and non-trivial information and knowledge from text. One of the main problems with text classification systems is the lack of labeled data, as well as the cost of labeling unlabeled data. Thus, there is a growing interest in exploring the use of unlabeled data as a way to improve classification performance in text classification. The ready availability of this kind of data in most applications makes it an appealing source of information. In this work we propose an Incremental Background Knowledge (IBK) technique to introduce unlabeled data into the training set by expanding it using initial classifiers to deliver oracle decisions. The defined incremental SVM margin-based method was tested in the Reuters-21578 benchmark showing promising results.
- High-performance bankruptcy prediction model using Graphics Processing UnitsPublication . Ribeiro, Bernardete; Lopes, Noel; Silva, CatarinaIn recent years the the potential and programmability of Graphics Processing Units (GPU) has raised a note-worthy interest in the research community for applications that demand high-computational power. In particular, in financial applications containing thousands of high-dimensional samples, machine learning techniques such as neural networks are often used. One of their main limitations is that the learning phase can be extremely consuming due to the long training times required which constitute a hard bottleneck for their use in practice. Thus their implementation in graphics hardware is highly desirable as a way to speed up the training process. In this paper we present a bankruptcy prediction model based on the parallel implementation of the Multiple BackPropagation (MBP) algorithm which is tested on a real data set of French companies (healthy and bankrupt). Results by running the MBP algorithm in a sequential processing CPU version and in a parallel GPU implementation show reduced computational costs with respect to the latter while yielding very competitive performance.
- Financial distress model prediction using SVM+Publication . Ribeiro, Bernardete; Silva, Catarina; Vieira, Armando; Gaspar-Cunha, A.; Neves, João C. dasFinancial distress prediction is of great importance to all stakeholders in order to enable better decision-making in evaluating firms. In recent years, the rate of bankruptcy has risen and it is becoming harder to estimate as companies become more complex and the asymmetric information between banks and firms increases. Although a great variety of techniques have been applied along the years, no comprehensive method incorporating an holistic perspective had hitherto been considered. Recently, SVM+ a technique proposed by Vapnik [17] provides a formal way to incorporate privileged information onto the learning models improving generalization. By exploiting additional information to improve traditional inductive learning we propose a prediction model where data is naturally separated into several groups according to the size of the firm. Experimental results in the setting of a heterogeneous data set of French companies demonstrated that the proposed model showed superior performance in terms of prediction accuracy in bankruptcy prediction and misclassification cost.
- Enhanced default risk models with SVM+Publication . Ribeiro, Bernardete; Silva, Catarina; Chen, Ning; Vieira, Armando; Carvalho das Neves, JoãoDefault risk models have lately raised a great interest due to the recent world economic crisis. In spite of many advanced techniques that have extensively been proposed, no comprehensive method incorporating a holistic perspective has hitherto been considered. Thus, the existing models for bankruptcy prediction lack the whole coverage of contextual knowledge which may prevent the decision makers such as investors and financial analysts to take the right decisions. Recently, SVM+ provides a formal way to incorporate additional information (not only training data) onto the learning models improving generalization. In financial settings examples of such non-financial (though relevant) information are marketing reports, competitors landscape, economic environment, customers screening, industry trends, etc. By exploiting additional information able to improve classical inductive learning we propose a prediction model where data is naturally separated into several structured groups clustered by the size and annual turnover of the firms. Experimental results in the setting of a heterogeneous data set of French companies demonstrated that the proposed default risk model showed better predictability performance than the baseline SVM and multi-task learning with SVM.
- DOTS: Drift Oriented Tool SystemPublication . Antunes, Mário; Costa, Joana; Silva, Catarina; Bernardete RibeiroDrift is a given in most machine learning applications. The idea that models must accommodate for changes, and thus be dynamic, is ubiquitous. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. There are multiple drift patterns types: concepts that appear and disappear suddenly, recurrently, or even gradually or incrementally. Researchers strive to propose and test algorithms and techniques to deal with drift in text classification, but it is difficult to find adequate benchmarks in such dynamic environments. In this paper we present DOTS, Drift Oriented Tool System, a framework that allows for the definition and generation of text-based datasets where drift characteristics can be thoroughly defined, implemented and tested. The usefulness of DOTS is presented using a Twitter stream case study. DOTS is used to define datasets and test the effectiveness of using different document representation in a Twitter scenario. Results show the potential of DOTS in machine learning research.
- Assistive mobile software for public transportationPublication . Silva, João De Sousa E; Silva, Catarina; Marcelino, Luís; Ferreira, Rui; Pereira, AntónioThe need of mobility on public transport for persons with visual impairment is mandatory. While traveling on a public transport, the simple ability to know the current location is almost impossible for such persons. To overcome this hurdle, we developed an assistive application that can alert its user to the proximity of all public transportation stops, giving emphasis to the chosen final stop. The application is adjustable to any transportation system and is particularly relevant to use in public transports that do not have any audio system available. The developed prototype runs on an Android OS device equipped with Global Positioning System (GPS). To ensure the highest possible level of reliability and to make it predictable to users, the application's architecture is free of as much dependencies as possible. Therefore, only GPS, or other localization mechanism, is required. The interface was designed to be suitable not only for talkback (Android's inbuilt screen-reader) aimed at blind users, but also for people with low vision that can still use their sight to check the screen. Thus, it was meant to be graphically simple and unobtrusive. It was tested by visual impaired persons leading to the conclusion that it demonstrates an existing need, and opens a new perspective in public transportation's accessibility.
- On using crowdsourcing and active learning to improve classification performancePublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteCrowdsourcing is an emergent trend for general-purpose classification problem solving. Over the past decade, this notion has been embodied by enlisting a crowd of humans to help solve problems. There are a growing number of real-world problems that take advantage of this technique, such as Wikipedia, Linux or Amazon Mechanical Turk. In this paper, we evaluate its suitability for classification, namely if it can outperform state-of-the-art models by combining it with active learning techniques. We propose two approaches based on crowdsourcing and active learning and empirically evaluate the performance of a baseline Support Vector Machine when active learning examples are chosen and made available for classification to a crowd in a web-based scenario. The proposed crowdsourcing active learning approach was tested with Jester data set, a text humour classification benchmark, resulting in promising improvements over baseline results.
- Citizens@City Mobile Application for Urban Problem ReportingPublication . Ribeiro, António Miguel; Costa, Rui Pedro; Marcelino, Luís; Silva, CatarinaUrban problems, such as holes in the pavement, poor accesses to wheelchairs or lack of public lighting, are becoming pervasive. Despite the fact that most of these problems directly affect life quality and sometimes even safety, not everyone has the readiness or initiative to report them to the proper authorities. This fact makes these “black spots” difficult to identify and the repairing process slow. Citizens@City is an Android mobile application that allows the general population to play a more active role in the identification of these problems by reporting them to the proper authorities in a simple and fast way. Moreover, citizens will have the possibility to follow the identification and repairing processes, and know at a given moment its status (e.g. identified, repairing scheduled, solved). Additionally, it will also allow the proper authorities to identify and manage the reported problems, from their identification until they are solved.
- Distributed Text Classification With an Ensemble Kernel-Based Learning ApproachPublication . Silva, Catarina; Lotric, Uros; Ribeiro, Bernardete; Dobnikar, AndrejConstructing a single text classifier that excels in any given application is a rather inviable goal. As a result, ensemble systems are becoming an important resource, since they permit the use of simpler classifiers and the integration of different knowledge in the learning process. However, many text-classification ensemble approaches have an extremely high computational burden, which poses limitations in applications in real environments. Moreover, state-of-the-art kernel-based classifiers, such as support vector machines and relevance vector machines, demand large resources when applied to large databases. Therefore, we propose the use of a new systematic distributed ensemble framework to tackle these challenges, based on a generic deployment strategy in a cluster distributed environment. We employ a combination of both task and data decomposition of the text-classification system, based on partitioning, communication, agglomeration, and mapping to define and optimize a graph of dependent tasks. Additionally, the framework includes an ensemble system where we exploit diverse patterns of errors and gain from the synergies between the ensemble classifiers. The ensemble data partitioning strategy used is shown to improve the performance of baseline state-of-the-art kernel-based machines. The experimental results show that the performance of the proposed framework outperforms standard methods both in speed and classification.
