Percorrer por autor "Costa, Joana"
A mostrar 1 - 10 de 14
Resultados por página
Opções de ordenação
- Active Manifold Learning with Twitter Big DataPublication . Silva, Catarina; Antunes, Mário; Costa, Joana; Ribeiro, BernardeteThe data produced by Internet applications have increased substantially. Big data is a flaring field that deals with this deluge of data by using storage techniques, dedicated infrastructures and development frameworks for the parallelization of defined tasks and its consequent reduction. These solutions however fall short in online and highly data demanding scenarios, since users expect swift feedback. Reduction techniques are efficiently used in big data online applications to improve classification problems. Reduction in big data usually falls in one of two main methods: (i) reduce the dimensionality by pruning or reformulating the feature set; (ii) reduce the sample size by choosing the most relevant examples. Both approaches have benefits, not only of time consumed to build a model, but eventually also performance-wise, usually by reducing overfitting and improving generalization capabilities. In this paper we investigate reduction techniques that tackle both dimensionality and size of big data. We propose a framework that combines a manifold learning approach to reduce dimensionality and an active learning SVM-based strategy to reduce the size of labeled sample. Results on Twitter data show the potential of the proposed active manifold learning approach.
- Adaptive learning for dynamic environments: A comparative approachPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteNowadays most learning problems demand adaptive solutions. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. Various efforts have been pursued in machine learning settings to learn in such environments, specially because of their non-trivial nature, since changes occur between the distribution data used to define the model and the current environment. In this work we present the Drift Adaptive Retain Knowledge (DARK) framework to tackle adaptive learning in dynamic environments based on recent and retained knowledge. DARK handles an ensemble of multiple Support Vector Machine (SVM) models that are dynamically weighted and have distinct training window sizes. A comparative study with benchmark solutions in the field, namely the Learn++.NSE algorithm, is also presented. Experimental results revealed that DARK outperforms Learn++.NSE with two different base classifiers, an SVM and a Classification and Regression Tree (CART).
- Association between living setting and malnutrition among older adults: The PEN-3S studyPublication . Madeira, Teresa; Peixoto-Plácido, Catarina; Sousa-Santos, Nuno; Santos, Osvaldo; Costa, Joana; Alarcão, Violeta; Nicola, Paulo Jorge; Severo, Milton; Lopes, Carla; Clara, João GorjãoObjectives: Malnutrition is frequent among older adults, especially those living in nursing homes, but the association between residential setting and nutritional status is controversial. The aim of this study was to examine the association between living setting (nursing home versus community) and malnutrition while adjusting for demographic, socioeconomic, health-related, and psychosocial factors. Methods: This cross-sectional study included a randomly selected representative sample of Portuguese adults ≥65 y of age. Interviewers collected data regarding demographic and socioeconomic characteristics, nutritional status, physical activity, energy intake, cognitive function, self-reported general health, functional status, symptoms of depression, and loneliness. Logistic regression models were used to estimate the association between residential setting and malnutrition. Results: Participants were 1186 nursing home residents (72.8% women, 49.2% ≥85 y of age) and 1120 community dwellers (49% women, 21.3% ≥85 y of age). Following Mini Nutritional Assessment (MNA®) criteria, 29.6% of nursing home residents and 14.1% of community dwellers were at risk of malnutrition, whereas 2.3% and 0.3%, respectively, were malnourished. The living setting was not significantly associated with malnutrition after adjusting for functional status, symptoms of depression, and feelings of loneliness (odds ratio, 1.03; 95% confidence interval, 0.67–1.58). Conclusions: Risk of malnutrition and malnutrition are more prevalent among nursing home residents than community dwellers. Physical (functional status) and mental health (symptoms of depression and loneliness) seems more relevant to nutritional status than residential setting by itself. These findings should be taken into account when designing public health policies to tackle malnutrition among older adults.
- Boosting dynamic ensemble’s performance in TwitterPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteMany text classification problems in social networks, and other contexts, are also dynamic problems, where concepts drift through time, and meaningful labels are dynamic. In Twitter-based applications in particular, ensembles are often applied to problems that fit this description, for example sentiment analysis or adapting to drifting circumstances. While it can be straightforward to request different classifiers' input on such ensembles, our goal is to boost dynamic ensembles by combining performance metrics as efficiently as possible. We present a twofold performance-based framework to classify incoming tweets based on recent tweets. On the one hand, individual ensemble classifiers' performance is paramount in defining their contribution to the ensemble. On the other hand, examples are actively selected based on their ability to effectively contribute to the performance in classifying drifting concepts. The main step of the algorithm uses different performance metrics to determine both each classifier strength in the ensemble and each example importance, and hence lifetime, in the learning process. We demonstrate, on a drifted benchmark dataset, that our framework drives the classification performance considerably up for it to make a difference in a variety of applications.
- Choice of Best Samples for Building Ensembles in Dynamic EnvironmentsPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteMachine learning approaches often focus on optimizing the algorithm rather than assuring that the source data is as rich as possible. However, when it is possible to enhance the input examples to construct models, one should consider it thoroughly. In this work, we propose a technique to define the best set of training examples using dynamic ensembles in text classification scenarios. In dynamic environments, where new data is constantly appearing, old data is usually disregarded, but sometimes some of those disregarded examples may carry substantial information. We propose a method that determines the most relevant examples by analysing their behaviour when defining separating planes or thresholds between classes. Those examples, deemed better than others, are kept for a longer time-window than the rest. Results on a Twitter scenario show that keeping those examples enhances the final classification performance.
- CrowdTargeting: Making Crowds More PersonalPublication . Costa, Joana; Silva, Catarina; Ribeiro, Bernardete; Antunes, MárioCrowdsourcing is a bubbling research topic that has the potential to be applied in numerous online and social scenarios. It consists on obtaining services or information by soliciting contributions from a large group of people. However, the question of defining the appropriate scope of a crowd to tackle each scenario is still open. In this work we compare two approaches to define the scope of a crowd in a classification problem, casted as a recommendation system. We propose a similarity measure to determine the closeness of a specific user to each crowd contributor and hence to define the appropriate crowd scope. We compare different levels of customization using crowd-based information, allowing non-experts classification by crowds to be tuned to substitute the user profile definition. Results on a real recommendation data set show the potential of making crowds more personal, i.e. of tuning the crowd to the crowdtarget.
- Customized crowds and active learning to improve classificationPublication . Costa, Joana; Silva, Catarina; Antunes, Mário; Ribeiro, BernardeteTraditional classification algorithms can be limited in their performance when a specific user is targeted. User preferences, e.g. in recommendation systems, constitute a challenge for learning algorithms. Additionally, in recent years user’s interaction through crowdsourcing has drawn significant interest, although its use in learning settings is still underused. In this work we focus on an active strategy that uses crowd-based non-expert information to appropriately tackle the problem of capturing the drift between user preferences in a recommendation system. The proposed method combines two main ideas: to apply active strategies for adaptation to each user; to implement crowdsourcing to avoid excessive user feedback. A similitude technique is put forward to optimize the choice of the more appropriate similitude-wise crowd, under the guidance of basic user feedback. The proposed active learning framework allows non-experts classification performed by crowds to be used to define the user profile, mitigating the labeling effort normally requested to the user. The framework is designed to be generic and suitable to be applied to different scenarios, whilst customizable for each specific user. A case study on humor classification scenario is used to demonstrate experimentally that the approach can improve baseline active results.
- DOTS: Drift Oriented Tool SystemPublication . Antunes, Mário; Costa, Joana; Silva, Catarina; Bernardete RibeiroDrift is a given in most machine learning applications. The idea that models must accommodate for changes, and thus be dynamic, is ubiquitous. Current challenges include temporal data streams, drift and non-stationary scenarios, often with text data, whether in social networks or in business systems. There are multiple drift patterns types: concepts that appear and disappear suddenly, recurrently, or even gradually or incrementally. Researchers strive to propose and test algorithms and techniques to deal with drift in text classification, but it is difficult to find adequate benchmarks in such dynamic environments. In this paper we present DOTS, Drift Oriented Tool System, a framework that allows for the definition and generation of text-based datasets where drift characteristics can be thoroughly defined, implemented and tested. The usefulness of DOTS is presented using a Twitter stream case study. DOTS is used to define datasets and test the effectiveness of using different document representation in a Twitter scenario. Results show the potential of DOTS in machine learning research.
- Framework for Intelligent Swimming Analytics with Wearable Sensors for Stroke ClassificationPublication . Costa, Joana; Silva, Catarina; Santos, Miguel; Fernandes, Telmo; Faria, SérgioIntelligent approaches in sports using IoT devices to gather data, attempting to optimize athlete’s training and performance, are cutting edge research. Synergies between recent wearable hardware and wireless communication strategies, together with the advances in intelligent algorithms, which are able to perform online pattern recognition and classification with seamless results, are at the front line of high-performance sports coaching. In this work, an intelligent data analytics system for swimmer performance is proposed. The system includes (i) pre-processing of raw signals; (ii) feature representation of wearable sensors and biosensors; (iii) online recognition of the swimming style and turns; and (iv) post-analysis of the performance for coaching decision support, including stroke counting and average speed. The system is supported by wearable inertial (AHRS) and biosensors (heart rate and pulse oximetry) placed on a swimmer’s body. Radio-frequency links are employed to communicate with the heart rate sensor and the station in the vicinity of the swimming pool, where analytics is carried out. Experiments were carried out in a real training setup, including 10 athletes aged 15 to 17 years. This scenario resulted in a set of circa 8000 samples. The experimental results show that the proposed system for intelligent swimming analytics with wearable sensors effectively yields immediate feedback to coaches and swimmers based on real-time data analysis. The best result was achieved with a Random Forest classifier with a macro-averaged F1 of 95.02%. The benefit of the proposed framework was demonstrated by effectively supporting coaches while monitoring the training of several swimmers.
- Functional health literacy: psychometric properties of the newest vital sign for Portuguese adolescents (NVS-PTeen)Publication . Santos, Osvaldo; Stefanovska-Petkovska, Miodraga; Virgolino, Ana; Miranda, Ana Cristina; Costa, Joana; Fernandes, Elisabete; Cardoso, Susana; Carneiro, António VazSelf-management of health requires skills to obtain, process, understand, and use healthrelated information. Assessment of adolescents’ functional health literacy requires valid, reliable,and low-burden tools. The main objective of this study was to adapt and study the psychometric properties of the Newest Vital Sign for the Portuguese adolescents’ population (NVS-PTeen). Classic psychometric indicators of reliability and validity were combined with item response theory (IRT) analyses in a cross-sectional survey, complemented with a 3-month test-retest assessment. The NVSPTeen was self-administered to students enrolled in grades 8 to 12 (12 to 17 years old) in a school setting. Overall, 386 students (191 girls) from 16 classes of the same school participated in the study (mean age = 14.5; SD = 1.5). Internal reliability of the NVS-PTeen was = 0.60. The NVS-PTeen total score was positively and significantly correlated with Portuguese (r = 0.28) and mathematics scores (r = 0.31), school years (r = 0.31), and age (r = 0.19). Similar to the original scale (for the U.S.), the NVSPTeen is composed of two dimensions, reading-related literacy and numeracy. Temporal reliability is adequate, though with a learning effect. IRT analyses revealed differences in difficulty and discriminative capacity among items, all with adequate outfit and infit values. Results showed that the NVS-PTeen is valid and reliable, sensible to inter-individual educational differences, and adequate for regular screening of functional health literacy in adolescents.
