Logo do repositório
 
Publicação

A Survey on Data-driven Performance Tuning for Big Data Analytics Platforms

datacite.subject.fosCiências Sociais::Economia e Gestão
datacite.subject.fosCiências Naturais::Ciências da Computação e da Informação
datacite.subject.sdg08:Trabalho Digno e Crescimento Económico
datacite.subject.sdg09:Indústria, Inovação e Infraestruturas
datacite.subject.sdg10:Reduzir as Desigualdades
dc.contributor.authorCosta, Rogério Luís de C.
dc.contributor.authorMoreira, José
dc.contributor.authorPintor, Paulo
dc.contributor.authorSantos, Veronica dos
dc.contributor.authorLifschitz, Sérgio
dc.date.accessioned2026-02-20T16:48:34Z
dc.date.available2026-02-20T16:48:34Z
dc.date.issued2021-07-15
dc.descriptionEx-docente
dc.description.abstractMany research works deal with big data platforms looking forward to data science and analytics. These are complex and usually distributed environments, composed of several systems and tools. As expected, there is a need for a closer look at performance issues. In this work, we review performance tuning strategies in the big data environment. We focus on data-driven tuning techniques, discussing the use of database inspired approaches. Concerning big data and NoSQL stores, performance tuning issues are quite different from the so-called conventional systems. Many existing solutions are mostly ad-hoc activities that do not fit for multiple situations. But there are some categories of data-driven solutions that can be taken as guidelines and incorporated into general-purpose auto-tuning modules for big data systems. We examine typical performance tuning actions, discussing available solutions to support some of the tuning process's primary activities. We also discuss recent implementations of data-driven performance tuning solutions for big data platforms. We propose an initial classification based on the domain state-of-the-art and present selected tuning actions for large-scale data processing systems. Finally, we organized existing works towards self-tuning big data systems based on this classification and presented general and system-specific tuning recommendations. We found that most of the literature pieces evaluate the use of tuning actions at the physical design perspective, and there is a lack of self-tuning machine-learning-based solutions for big data systems.eng
dc.description.sponsorshipThis work is partially funded by National Funds through the FCT (Foundation for Science and Technology) in the context of the projects UIDB/04524/2020 and UIDB/00127/2020, and by Fundo Europeu de Desenvolvimento Regional (FEDER), Programa Operacional Competitividade e Internacionalização in the context of the projects POCI-01-0145-FEDER-032636 and Produtech II SIF – POCI-01-0247- FEDER-024541. Some of the authors are partially supported by grants from CNPq and CAPES, Brazilian public funding agencies and research institutes.
dc.identifier.citationRogério Luís de C. Costa, José Moreira, Paulo Pintor, Veronica dos Santos, Sérgio Lifschitz, A Survey on Data-driven Performance Tuning for Big Data Analytics Platforms, Big Data Research, Volume 25, 2021, 100206, ISSN 2214-5796, https://doi.org/10.1016/j.bdr.2021.100206.
dc.identifier.doi10.1016/j.bdr.2021.100206
dc.identifier.eissn2214-580X
dc.identifier.issn2214-5796
dc.identifier.urihttp://hdl.handle.net/10400.8/15691
dc.language.isoeng
dc.peerreviewedyes
dc.publisherElsevier
dc.relationResearch Center in Informatics and Communications
dc.relation.hasversionhttps://www.sciencedirect.com/science/article/pii/S221457962100023X
dc.relation.ispartofBig Data Research
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectBig data systems
dc.subjectBig data platforms
dc.subjectPerformance tuning
dc.subjectDatabase systems
dc.titleA Survey on Data-driven Performance Tuning for Big Data Analytics Platformseng
dc.typejournal article
dspace.entity.typePublication
oaire.awardTitleResearch Center in Informatics and Communications
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F04524%2F2020/PT
oaire.citation.endPage17
oaire.citation.startPage1
oaire.citation.titleBig Data Research
oaire.citation.volume25
oaire.fundingStream6817 - DCRRNI ID
oaire.versionhttp://purl.org/coar/version/c_970fb48d4fbd8a85
person.familyNamede Carvalho Costa
person.givenNameRogério Luís
person.identifier.ciencia-id7717-9573-0C0F
person.identifier.orcid0000-0003-2306-7585
person.identifier.ridA-7940-2016
person.identifier.scopus-author-id7801604983
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
relation.isAuthorOfPublication5654d934-3fa0-4afb-9b3b-f2736104924c
relation.isAuthorOfPublication.latestForDiscovery5654d934-3fa0-4afb-9b3b-f2736104924c
relation.isProjectOfPublication67435020-fe0d-4b46-be85-59ee3c6138c7
relation.isProjectOfPublication.latestForDiscovery67435020-fe0d-4b46-be85-59ee3c6138c7

Ficheiros

Principais
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
A Survey on Data-driven Performance Tuning for Big Data Analytics Platforms.pdf
Tamanho:
965.67 KB
Formato:
Adobe Portable Document Format
Descrição:
Many research works deal with big data platforms looking forward to data science and analytics. These are complex and usually distributed environments, composed of several systems and tools. As expected, there is a need for a closer look at performance issues. In this work, we review performance tuning strategies in the big data environment. We focus on data-driven tuning techniques, discussing the use of database inspired approaches. Concerning big data and NoSQL stores, performance tuning issues are quite different from the so-called conventional systems. Many existing solutions are mostly ad-hoc activities that do not fit for multiple situations. But there are some categories of data-driven solutions that can be taken as guidelines and incorporated into general-purpose auto-tuning modules for big data systems. We examine typical performance tuning actions, discussing available solutions to support some of the tuning process's primary activities. We also discuss recent implementations of data-driven performance tuning solutions for big data platforms. We propose an initial classification based on the domain state-of-the-art and present selected tuning actions for large-scale data processing systems. Finally, we organized existing works towards self-tuning big data systems based on this classification and presented general and system-specific tuning recommendations. We found that most of the literature pieces evaluate the use of tuning actions at the physical design perspective, and there is a lack of self-tuning machine-learning-based solutions for big data systems.
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.32 KB
Formato:
Item-specific license agreed upon to submission
Descrição: