Publicação
Optimizing GPU Code for CPU Execution Using OpenCL and Vectorization: A Case Study on Image Coding
| dc.contributor.author | Pereira, Pedro M. M. | |
| dc.contributor.author | Domingues, Patrício | |
| dc.contributor.author | M. M. Rodrigues, Nuno | |
| dc.contributor.author | Gabriel Falcao | |
| dc.contributor.author | Faria, Sergio M. M. de | |
| dc.date.accessioned | 2025-07-16T15:55:24Z | |
| dc.date.available | 2025-07-16T15:55:24Z | |
| dc.date.issued | 2016 | |
| dc.description.abstract | Although OpenCL aims to achieve portability at the code level, di erent hardware platforms requires di erent approaches in order to extract the best performance for OpenCL-based code. In this work, we use an image encoder originally tuned for OpenCL on GPU (OpenCL-GPU), and optimize it for multi-CPU based platforms. We produce two OpenCL-based versions: i) a regular one (OpenCL-CPU) and ii) a CPU vector-based one (OpenCL-CPU-Vect). The use of CPU vectorization exploits the OpenCL support, making it much simpler than directly coding with SIMD instructions such as SSE and AVX. Globally, while the OpenCL-GPU version is the fastest when run on a high end GPU requiring around 580 seconds to encode the Lenna image, its performance drops roughly 65% when run unchanged on a multicore CPU machine. For the CPU tuned versions, OpenCL-CPU encodes the Lenna image in 805 seconds, while the vectorization-based approach executes the same operation in 672 seconds. Results show that meaningful performance gains can be achieved by tailoring the OpenCL code to the CPU, and that the use of CPU vectorization instructions through OpenCL is both rather simple and performance rewarding. | eng |
| dc.description.sponsorship | Financial support provided in the scope of R&D Unit 50008, nanced by the applicable nancial framework (FCT/MEC through national funds and co-funded by FEDER - PT2020 partnership agreement). | |
| dc.identifier.citation | Pereira, P.M.M., Domingues, P., Rodrigues, N.M.M., Falcao, G., de Faria, S.M.M. (2016). Optimizing GPU Code for CPU Execution Using OpenCL and Vectorization: A Case Study on Image Coding. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_42 | |
| dc.identifier.doi | 10.1007/978-3-319-49583-5_42 | |
| dc.identifier.isbn | 9783319495828 | |
| dc.identifier.isbn | 9783319495835 | |
| dc.identifier.issn | 0302-9743 | |
| dc.identifier.issn | 1611-3349 | |
| dc.identifier.uri | http://hdl.handle.net/10400.8/13676 | |
| dc.language.iso | eng | |
| dc.peerreviewed | n/a | |
| dc.publisher | Springer International Publishing | |
| dc.relation.hasversion | https://www.worldscientific.com/doi/abs/10.1142/S0218196716500090 | |
| dc.relation.ispartof | Lecture Notes in Computer Science | |
| dc.relation.ispartof | Algorithms and Architectures for Parallel Processing | |
| dc.rights.uri | N/A | |
| dc.subject | OpenCL | |
| dc.subject | Multicore | |
| dc.subject | Manycore | |
| dc.subject | Image encoding | |
| dc.subject | SIMD | |
| dc.title | Optimizing GPU Code for CPU Execution Using OpenCL and Vectorization: A Case Study on Image Coding | eng |
| dc.type | book part | |
| dspace.entity.type | Publication | |
| oaire.citation.endPage | 545 | |
| oaire.citation.startPage | 537 | |
| oaire.citation.title | Algorithms and Architectures for Parallel Processing | |
| oaire.version | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |
| person.familyName | Domingues | |
| person.familyName | M. M. Rodrigues | |
| person.familyName | Faria | |
| person.givenName | Patrício | |
| person.givenName | Nuno | |
| person.givenName | Sergio | |
| person.identifier.ciencia-id | AA15-6185-C477 | |
| person.identifier.ciencia-id | 6917-B121-4E34 | |
| person.identifier.ciencia-id | 8815-4101-28DD | |
| person.identifier.orcid | 0000-0002-6207-6292 | |
| person.identifier.orcid | 0000-0001-9536-1017 | |
| person.identifier.orcid | 0000-0002-0993-9124 | |
| person.identifier.rid | C-5245-2011 | |
| person.identifier.scopus-author-id | 13411315400 | |
| person.identifier.scopus-author-id | 7006052345 | |
| person.identifier.scopus-author-id | 14027853900 | |
| relation.isAuthorOfPublication | b88ada5f-0d8b-4e55-ab0a-62aa82ea1388 | |
| relation.isAuthorOfPublication | b4ebe652-7f0e-4e67-adb0-d5ea29fc9e69 | |
| relation.isAuthorOfPublication | f69bd4d6-a6ef-4d20-8148-575478909661 | |
| relation.isAuthorOfPublication.latestForDiscovery | b88ada5f-0d8b-4e55-ab0a-62aa82ea1388 |
