AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Tian, Y.; Li, X.; Cheng, Z.; Huang, Y.; Yu, T.

Design of Realistic and Artistically Expressive 3D Facial Models for Film AIGC: A Cross-Modal Framework Integrating Audience Perception Evaluation Journal Article

In: Sensors, vol. 25, no. 15, 2025, ISSN: 14248220 (ISSN), (Publisher: Multidisciplinary Digital Publishing Institute (MDPI)).

Abstract | Links | BibTeX | Tags: 3D faces, 3d facial model, 3D facial models, 3D modeling, adaptation, adult, Article, Audience perception evaluation, benchmarking, controlled study, Cross-modal, Face generation, Facial modeling, facies, Feature extraction, feedback, feedback system, female, Geometry, High-fidelity, human, illumination, Immersive media, Lighting, male, movie, Neural radiance field, Neural Radiance Fields, perception, Quality control, Rendering (computer graphics), Semantics, sensor, Three dimensional computer graphics, Virtual production, Virtual Reality

@article{tian_design_2025,

title = {Design of Realistic and Artistically Expressive 3D Facial Models for Film AIGC: A Cross-Modal Framework Integrating Audience Perception Evaluation},

author = {Y. Tian and X. Li and Z. Cheng and Y. Huang and T. Yu},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105013137724&doi=10.3390%2Fs25154646&partnerID=40&md5=8508a27b693f0857ce7cb58e97a2705c},

doi = {10.3390/s25154646},

issn = {14248220 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Sensors},

volume = {25},

number = {15},

abstract = {The rise of virtual production has created an urgent need for both efficient and high-fidelity 3D face generation schemes for cinema and immersive media, but existing methods are often limited by lighting–geometry coupling, multi-view dependency, and insufficient artistic quality. To address this, this study proposes a cross-modal 3D face generation framework based on single-view semantic masks. It utilizes Swin Transformer for multi-level feature extraction and combines with NeRF for illumination decoupled rendering. We utilize physical rendering equations to explicitly separate surface reflectance from ambient lighting to achieve robust adaptation to complex lighting variations. In addition, to address geometric errors across illumination scenes, we construct geometric a priori constraint networks by mapping 2D facial features to 3D parameter space as regular terms with the help of semantic masks. On the CelebAMask-HQ dataset, this method achieves a leading score of SSIM = 0.892 (37.6% improvement from baseline) with FID = 40.6. The generated faces excel in symmetry and detail fidelity with realism and aesthetic scores of 8/10 and 7/10, respectively, in a perceptual evaluation with 1000 viewers. By combining physical-level illumination decoupling with semantic geometry a priori, this paper establishes a quantifiable feedback mechanism between objective metrics and human aesthetic evaluation, providing a new paradigm for aesthetic quality assessment of AI-generated content. © 2025 Elsevier B.V., All rights reserved.},

note = {Publisher: Multidisciplinary Digital Publishing Institute (MDPI)},

keywords = {3D faces, 3d facial model, 3D facial models, 3D modeling, adaptation, adult, Article, Audience perception evaluation, benchmarking, controlled study, Cross-modal, Face generation, Facial modeling, facies, Feature extraction, feedback, feedback system, female, Geometry, High-fidelity, human, illumination, Immersive media, Lighting, male, movie, Neural radiance field, Neural Radiance Fields, perception, Quality control, Rendering (computer graphics), Semantics, sensor, Three dimensional computer graphics, Virtual production, Virtual Reality},

pubstate = {published},

tppubtype = {article}

}

2024

Chen, M.; Liu, M.; Wang, C.; Song, X.; Zhang, Z.; Xie, Y.; Wang, L.

Cross-Modal Graph Semantic Communication Assisted by Generative AI in the Metaverse for 6G Journal Article

In: Research, vol. 7, 2024, ISSN: 20965168 (ISSN); 26395274 (ISSN), (Publisher: American Association for the Advancement of Science).

Abstract | Links | BibTeX | Tags: 3-dimensional, 3Dimensional models, Cross-modal, Graph neural networks, Graph semantics, Metaverses, Multi-modal data, Point-clouds, Semantic communication, Semantic features, Semantics, Three dimensional computer graphics, Virtual scenario

@article{chen_cross-modal_2024,

title = {Cross-Modal Graph Semantic Communication Assisted by Generative AI in the Metaverse for 6G},

author = {M. Chen and M. Liu and C. Wang and X. Song and Z. Zhang and Y. Xie and L. Wang},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192245049&doi=10.34133%2Fresearch.0342&partnerID=40&md5=7cc27ebe995b04a10e3a0bff6591bd2d},

doi = {10.34133/research.0342},

issn = {20965168 (ISSN); 26395274 (ISSN)},

year  = {2024},

date = {2024-01-01},

journal = {Research},

volume = {7},

abstract = {Recently, the development of the Metaverse has become a frontier spotlight, which is an important demonstration of the integration innovation of advanced technologies in the Internet. Moreover, artificial intelligence (AI) and 6G communications will be widely used in our daily lives. However, the effective interactions with the representations of multimodal data among users via 6G communications is the main challenge in the Metaverse. In this work, we introduce an intelligent cross-modal graph semantic communication approach based on generative AI and 3-dimensional (3D) point clouds to improve the diversity of multimodal representations in the Metaverse. Using a graph neural network, multimodal data can be recorded by key semantic features related to the real scenarios. Then, we compress the semantic features using a graph transformer encoder at the transmitter, which can extract the semantic representations through the cross-modal attention mechanisms. Next, we leverage a graph semantic validation mechanism to guarantee the exactness of the overall data at the receiver. Furthermore, we adopt generative AI to regenerate multimodal data in virtual scenarios. Simultaneously, a novel 3D generative reconstruction network is constructed from the 3D point clouds, which can transfer the data from images to 3D models, and we infer the multimodal data into the 3D models to increase realism in virtual scenarios. Finally, the experiment results demonstrate that cross-modal graph semantic communication, assisted by generative AI, has substantial potential for enhancing user interactions in the 6G communications and Metaverse. © 2024 Elsevier B.V., All rights reserved.},

note = {Publisher: American Association for the Advancement of Science},

keywords = {3-dimensional, 3Dimensional models, Cross-modal, Graph neural networks, Graph semantics, Metaverses, Multi-modal data, Point-clouds, Semantic communication, Semantic features, Semantics, Three dimensional computer graphics, Virtual scenario},

pubstate = {published},

tppubtype = {article}

}

Xie, W.; Liu, Y.; Wang, K.; Wang, M.

LLM-Guided Cross-Modal Point Cloud Quality Assessment: A Graph Learning Approach Journal Article

In: IEEE Signal Processing Letters, vol. 31, pp. 2250–2254, 2024, ISSN: 15582361 (ISSN); 10709908 (ISSN), (Publisher: Institute of Electrical and Electronics Engineers Inc.).

Abstract | Links | BibTeX | Tags: 3D reconstruction, Cross-modal, Language Model, Large language model, Learning approach, Multi-modal, Multimodal quality assessment, Point cloud quality assessment, Point-clouds, Quality assessment

@article{xie_llm-guided_2024,

title = {LLM-Guided Cross-Modal Point Cloud Quality Assessment: A Graph Learning Approach},

author = {W. Xie and Y. Liu and K. Wang and M. Wang},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203417746&doi=10.1109%2FLSP.2024.3452556&partnerID=40&md5=36dc618026f173aeb192ddc3ac430e76},

doi = {10.1109/LSP.2024.3452556},

issn = {15582361 (ISSN); 10709908 (ISSN)},

year  = {2024},

date = {2024-01-01},

journal = {IEEE Signal Processing Letters},

volume = {31},

pages = {2250–2254},

abstract = {This paper addresses the critical need for accurate and reliable point cloud quality assessment (PCQA) in various applications, such as autonomous driving, robotics, virtual reality, and 3D reconstruction. To meet this need, we propose a large language model (LLM)-guided PCQA approach based on graph learning. Specifically, we first utilize the LLM to generate quality description texts for each 3D object, and employ two CLIP-like feature encoders to represent the image and text modalities. Next, we design a latent feature enhancer module to improve contrastive learning, enabling more effective alignment performance. Finally, we develop a graph network fusion module that utilizes a ranking-based loss to adjust the relationship of different nodes, which explicitly considers both modality fusion and quality ranking. Experimental results on three benchmark datasets demonstrate the effectiveness and superiority of our approach over 12 representative PCQA methods, which demonstrate the potential of multi-modal learning, the importance of latent feature enhancement, and the significance of graph-based fusion in advancing the field of PCQA. © 2024 Elsevier B.V., All rights reserved.},

note = {Publisher: Institute of Electrical and Electronics Engineers Inc.},

keywords = {3D reconstruction, Cross-modal, Language Model, Large language model, Learning approach, Multi-modal, Multimodal quality assessment, Point cloud quality assessment, Point-clouds, Quality assessment},

pubstate = {published},

tppubtype = {article}

}

2023

Feng, Y.; Zhu, H.; Peng, D.; Peng, X.; Hu, P.

RONO: Robust Discriminative Learning with Noisy Labels for 2D-3D Cross-Modal Retrieval Proceedings Article

In: Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, pp. 11610–11619, IEEE Computer Society, 2023, ISBN: 10636919 (ISSN).

Abstract | Links | BibTeX | Tags: 3D content, 3D data, 3D modeling, Adversarial machine learning, Contrastive Learning, Cross-modal, Discriminative learning, Federated learning, Heterogeneous structures, Learning mechanism, Learning performance, Metaverses, Multi-modal learning, Noisy labels, Spatio-temporal data

@inproceedings{feng_rono_2023,

title = {RONO: Robust Discriminative Learning with Noisy Labels for 2D-3D Cross-Modal Retrieval},

author = {Y. Feng and H. Zhu and D. Peng and X. Peng and P. Hu},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85170845124&doi=10.1109%2fCVPR52729.2023.01117&partnerID=40&md5=2eee285207ff3ea8e774480e29d96ec1},

doi = {10.1109/CVPR52729.2023.01117},

isbn = {10636919 (ISSN)},

year  = {2023},

date = {2023-01-01},

booktitle = {Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit},

volume = {2023-June},

pages = {11610–11619},

publisher = {IEEE Computer Society},

abstract = {Recently, with the advent of Metaverse and AI Generated Content, cross-modal retrieval becomes popular with a burst of 2D and 3D data. However, this problem is challenging given the heterogeneous structure and semantic discrepancies. Moreover, imperfect annotations are ubiquitous given the ambiguous 2D and 3D content, thus inevitably producing noisy labels to degrade the learning performance. To tackle the problem, this paper proposes a robust 2D-3D retrieval framework (RONO) to robustly learn from noisy multimodal data. Specifically, one novel Robust Discriminative Center Learning mechanism (RDCL) is proposed in RONO to adaptively distinguish clean and noisy samples for respectively providing them with positive and negative optimization directions, thus mitigating the negative impact of noisy labels. Besides, we present a Shared Space Consistency Learning mechanism (SSCL) to capture the intrinsic information inside the noisy data by minimizing the cross-modal and semantic discrepancy between common space and label space simultaneously. Comprehensive mathematical analyses are given to theoretically prove the noise tolerance of the proposed method. Furthermore, we conduct extensive experiments on four 3D-model multimodal datasets to verify the effectiveness of our method by comparing it with 15 state-of-the-art methods. © 2023 IEEE.},

keywords = {3D content, 3D data, 3D modeling, Adversarial machine learning, Contrastive Learning, Cross-modal, Discriminative learning, Federated learning, Heterogeneous structures, Learning mechanism, Learning performance, Metaverses, Multi-modal learning, Noisy labels, Spatio-temporal data},

pubstate = {published},

tppubtype = {inproceedings}

}