Logo do repositório
 
Publicação

Optimizing GPU Code for CPU Execution Using OpenCL and Vectorization: A Case Study on Image Coding

dc.contributor.authorPereira, Pedro M. M.
dc.contributor.authorDomingues, Patrício
dc.contributor.authorM. M. Rodrigues, Nuno
dc.contributor.authorGabriel Falcao
dc.contributor.authorFaria, Sergio M. M. de
dc.date.accessioned2025-07-16T15:55:24Z
dc.date.available2025-07-16T15:55:24Z
dc.date.issued2016
dc.description.abstractAlthough OpenCL aims to achieve portability at the code level, di erent hardware platforms requires di erent approaches in order to extract the best performance for OpenCL-based code. In this work, we use an image encoder originally tuned for OpenCL on GPU (OpenCL-GPU), and optimize it for multi-CPU based platforms. We produce two OpenCL-based versions: i) a regular one (OpenCL-CPU) and ii) a CPU vector-based one (OpenCL-CPU-Vect). The use of CPU vectorization exploits the OpenCL support, making it much simpler than directly coding with SIMD instructions such as SSE and AVX. Globally, while the OpenCL-GPU version is the fastest when run on a high end GPU requiring around 580 seconds to encode the Lenna image, its performance drops roughly 65% when run unchanged on a multicore CPU machine. For the CPU tuned versions, OpenCL-CPU encodes the Lenna image in 805 seconds, while the vectorization-based approach executes the same operation in 672 seconds. Results show that meaningful performance gains can be achieved by tailoring the OpenCL code to the CPU, and that the use of CPU vectorization instructions through OpenCL is both rather simple and performance rewarding.eng
dc.description.sponsorshipFinancial support provided in the scope of R&D Unit 50008, nanced by the applicable nancial framework (FCT/MEC through national funds and co-funded by FEDER - PT2020 partnership agreement).
dc.identifier.citationPereira, P.M.M., Domingues, P., Rodrigues, N.M.M., Falcao, G., de Faria, S.M.M. (2016). Optimizing GPU Code for CPU Execution Using OpenCL and Vectorization: A Case Study on Image Coding. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_42
dc.identifier.doi10.1007/978-3-319-49583-5_42
dc.identifier.isbn9783319495828
dc.identifier.isbn9783319495835
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.urihttp://hdl.handle.net/10400.8/13676
dc.language.isoeng
dc.peerreviewedn/a
dc.publisherSpringer International Publishing
dc.relation.hasversionhttps://www.worldscientific.com/doi/abs/10.1142/S0218196716500090
dc.relation.ispartofLecture Notes in Computer Science
dc.relation.ispartofAlgorithms and Architectures for Parallel Processing
dc.rights.uriN/A
dc.subjectOpenCL
dc.subjectMulticore
dc.subjectManycore
dc.subjectImage encoding
dc.subjectSIMD
dc.titleOptimizing GPU Code for CPU Execution Using OpenCL and Vectorization: A Case Study on Image Codingeng
dc.typebook part
dspace.entity.typePublication
oaire.citation.endPage545
oaire.citation.startPage537
oaire.citation.titleAlgorithms and Architectures for Parallel Processing
oaire.versionhttp://purl.org/coar/version/c_970fb48d4fbd8a85
person.familyNameDomingues
person.familyNameM. M. Rodrigues
person.familyNameFaria
person.givenNamePatrício
person.givenNameNuno
person.givenNameSergio
person.identifier.ciencia-idAA15-6185-C477
person.identifier.ciencia-id6917-B121-4E34
person.identifier.ciencia-id8815-4101-28DD
person.identifier.orcid0000-0002-6207-6292
person.identifier.orcid0000-0001-9536-1017
person.identifier.orcid0000-0002-0993-9124
person.identifier.ridC-5245-2011
person.identifier.scopus-author-id13411315400
person.identifier.scopus-author-id7006052345
person.identifier.scopus-author-id14027853900
relation.isAuthorOfPublicationb88ada5f-0d8b-4e55-ab0a-62aa82ea1388
relation.isAuthorOfPublicationb4ebe652-7f0e-4e67-adb0-d5ea29fc9e69
relation.isAuthorOfPublicationf69bd4d6-a6ef-4d20-8148-575478909661
relation.isAuthorOfPublication.latestForDiscoveryb88ada5f-0d8b-4e55-ab0a-62aa82ea1388

Ficheiros

Principais
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
284.pdf
Tamanho:
185.19 KB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.32 KB
Formato:
Item-specific license agreed upon to submission
Descrição: