Browsing by Author "Costa, Joana"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
- Adaptive learning for dynamic environments: A comparative approachPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteNowadays most learning problems demand adaptive solutions. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. Various efforts have been pursued in machine learning settings to learn in such environments, specially because of their non-trivial nature, since changes occur between the distribution data used to define the model and the current environment. In this work we present the Drift Adaptive Retain Knowledge (DARK) framework to tackle adaptive learning in dynamic environments based on recent and retained knowledge. DARK handles an ensemble of multiple Support Vector Machine (SVM) models that are dynamically weighted and have distinct training window sizes. A comparative study with benchmark solutions in the field, namely the Learn++.NSE algorithm, is also presented. Experimental results revealed that DARK outperforms Learn++.NSE with two different base classifiers, an SVM and a Classification and Regression Tree (CART).
- Choice of Best Samples for Building Ensembles in Dynamic EnvironmentsPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteMachine learning approaches often focus on optimizing the algorithm rather than assuring that the source data is as rich as possible. However, when it is possible to enhance the input examples to construct models, one should consider it thoroughly. In this work, we propose a technique to define the best set of training examples using dynamic ensembles in text classification scenarios. In dynamic environments, where new data is constantly appearing, old data is usually disregarded, but sometimes some of those disregarded examples may carry substantial information. We propose a method that determines the most relevant examples by analysing their behaviour when defining separating planes or thresholds between classes. Those examples, deemed better than others, are kept for a longer time-window than the rest. Results on a Twitter scenario show that keeping those examples enhances the final classification performance.
- Functional health literacy: psychometric properties of the newest vital sign for Portuguese adolescents (NVS-PTeen)Publication . Santos, Osvaldo; Stefanovska-Petkovska, Miodraga; Virgolino, Ana; Miranda, Ana Cristina; Costa, Joana; Fernandes, Elisabete; Cardoso, Susana; Carneiro, António VazSelf-management of health requires skills to obtain, process, understand, and use healthrelated information. Assessment of adolescents’ functional health literacy requires valid, reliable,and low-burden tools. The main objective of this study was to adapt and study the psychometric properties of the Newest Vital Sign for the Portuguese adolescents’ population (NVS-PTeen). Classic psychometric indicators of reliability and validity were combined with item response theory (IRT) analyses in a cross-sectional survey, complemented with a 3-month test-retest assessment. The NVSPTeen was self-administered to students enrolled in grades 8 to 12 (12 to 17 years old) in a school setting. Overall, 386 students (191 girls) from 16 classes of the same school participated in the study (mean age = 14.5; SD = 1.5). Internal reliability of the NVS-PTeen was = 0.60. The NVS-PTeen total score was positively and significantly correlated with Portuguese (r = 0.28) and mathematics scores (r = 0.31), school years (r = 0.31), and age (r = 0.19). Similar to the original scale (for the U.S.), the NVSPTeen is composed of two dimensions, reading-related literacy and numeracy. Temporal reliability is adequate, though with a learning effect. IRT analyses revealed differences in difficulty and discriminative capacity among items, all with adequate outfit and infit values. Results showed that the NVS-PTeen is valid and reliable, sensible to inter-individual educational differences, and adequate for regular screening of functional health literacy in adolescents.
- Harvesting opinions in Twitter for sentiment analysisPublication . Guevara, Juan; Costa, Joana; Arroba, Jorge; Silva, CatarinaSentiment analysis is a very popular technique for social network analysis. One of the most popular social networks for microblogging that has a great growth is Twitter, which allows people to express their opinions using short, simple sentences. These texts are generated daily, and for this reason it is common for people to want to know which are the trending topics and their drifts. In this paper we propose to deploy a mobile app that provides information focusing on areas, such as, Politics, Social, Tourism, and Marketing using a statistical lexicon approach. The application shows the polarity of each theme as positive, negative, or neutral.
- The impact of longstanding messages in micro-blogging classificationPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Bernardete RibeiroSocial networks are making part of the daily routine of millions of users. Twitter is among Facebook and Instagram one of the most used, and can be seen as a relevant source of information as users share not only daily status, but rapidly propagate news and events that occur worldwide. Considering the dynamic nature of social networks, and their potential in information spread, it is imperative to find learning strategies able to learn in these environments and cope with their dynamic nature. Time plays an important role by easily out-dating information, being crucial to understand how informative can past events be to current learning models and for how long it is relevant to store previously seen information, to avoid the computation burden associated with the amount of data produced. In this paper we study the impact of longstanding messages in micro-blogging classification by using different training timewindow sizes in the learning process. Since there are few studies dealing with drift in Twitter and thus little is known about the types of drift that may occur, we simulate different types of drift in an artificial dataset to evaluate and validate our strategy. Results shed light on the relevance of previously seen examples according to different types of drift.