| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 1.19 MB | Adobe PDF | |||
| This article has been accepted for publication in IEEE Transactions on Multimedia. This is the author's version which has not been fully edited and content may change prior to final publication. | 3.31 MB | Adobe PDF |
Orientador(es)
Resumo(s)
Attention models, particularly Transformers, have significantly advanced deep learning in fields like natural language processing and computer vision by capturing contextual relationships in both sequential and spatial data. This ability is valuable for Point Clouds (PC), which are unstructured sets of points in 3D space. Transformers can effectively identify correlations between distant points, allowing them to focus on the most critical regions of the data. To demonstrate this capability, this paper proposes a novel, scalable Graph-Guided Transformer model, labeled 2GFormer, for static PC geometry. This model is built using a scalable architecture that leverages Graph Convolutions to enhance a Relational Neighborhood SelfAttention (RNSA) base layer model. Both models are integrated into the JPEG Pleno Learning-based Point Cloud Coding (JPEG PCC) standard, resulting in the creation of two attention-enabled codecs for static PC coding: JPEG RNSA and JPEG 2GFormer. While JPEG RNSA codec delivers significant compression improvements for solid and dense PCs compared to the baseline JPEG PCC standard, JPEG 2GFormer extends these gains to solid, dense, and sparse PCs with only a marginal increase in model parameters. Additionally, JPEG 2GFormer outperforms both conventional and learning-based state-of-the-art PC codecs. These results position JPEG 2GFormer as a highly efficient solution for versatile PC coding.
Descrição
Palavras-chave
Graph Convolutions JPEG Pleno Point Cloud Coding Scalable Transformer Self-Attention
Contexto Educativo
Citação
M. Ghafari, A. F. R. Guarda, N. M. M. Rodrigues and F. Pereira, "Scalable Graph-Guided Transformer for Point Cloud Geometry Coding," in IEEE Transactions on Multimedia, doi: 10.1109/TMM.2025.3598605.
Editora
IEEE
