AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Lv, J.; Slowik, A.; Rani, S.; Kim, B. -G.; Chen, C. -M.; Kumari, S.; Li, K.; Lyu, X.; Jiang, H.

Multimodal Metaverse Healthcare: A Collaborative Representation and Adaptive Fusion Approach for Generative Artificial-Intelligence-Driven Diagnosis Journal Article

In: Research, vol. 8, 2025, ISSN: 20965168 (ISSN).

Abstract | Links | BibTeX | Tags: Adaptive fusion, Collaborative representations, Diagnosis, Electronic health record, Generative adversarial networks, Health care application, Healthcare environments, Immersive, Learning frameworks, Metaverses, Multi-modal, Multi-modal learning, Performance

@article{lv_multimodal_2025,

title = {Multimodal Metaverse Healthcare: A Collaborative Representation and Adaptive Fusion Approach for Generative Artificial-Intelligence-Driven Diagnosis},

author = {J. Lv and A. Slowik and S. Rani and B. -G. Kim and C. -M. Chen and S. Kumari and K. Li and X. Lyu and H. Jiang},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-86000613924&doi=10.34133%2fresearch.0616&partnerID=40&md5=fdc8ae3b29db905105dada9a5657b54b},

doi = {10.34133/research.0616},

issn = {20965168 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Research},

volume = {8},

abstract = {The metaverse enables immersive virtual healthcare environments, presenting opportunities for enhanced care delivery. A key challenge lies in effectively combining multimodal healthcare data and generative artificial intelligence abilities within metaverse-based healthcare applications, which is a problem that needs to be addressed. This paper proposes a novel multimodal learning framework for metaverse healthcare, MMLMH, based on collaborative intra- and intersample representation and adaptive fusion. Our framework introduces a collaborative representation learning approach that captures shared and modality-specific features across text, audio, and visual health data. By combining modality-specific and shared encoders with carefully formulated intrasample and intersample collaboration mechanisms, MMLMH achieves superior feature representation for complex health assessments. The framework’s adaptive fusion approach, utilizing attention mechanisms and gated neural networks, demonstrates robust performance across varying noise levels and data quality conditions. Experiments on metaverse healthcare datasets demonstrate MMLMH’s superior performance over baseline methods across multiple evaluation metrics. Longitudinal studies and visualization further illustrate MMLMH’s adaptability to evolving virtual environments and balanced performance across diagnostic accuracy, patient–system interaction efficacy, and data integration complexity. The proposed framework has a unique advantage in that a similar level of performance is maintained across various patient populations and virtual avatars, which could lead to greater personalization of healthcare experiences in the metaverse. MMLMH’s successful functioning in such complicated circumstances suggests that it can combine and process information streams from several sources. They can be successfully utilized in next-generation healthcare delivery through virtual reality. © 2025 Jianhui Lv et al.},

keywords = {Adaptive fusion, Collaborative representations, Diagnosis, Electronic health record, Generative adversarial networks, Health care application, Healthcare environments, Immersive, Learning frameworks, Metaverses, Multi-modal, Multi-modal learning, Performance},

pubstate = {published},

tppubtype = {article}

}

Kai, W. -H.; Xing, K. -X.

Video-driven musical composition using large language model with memory-augmented state space Journal Article

In: Visual Computer, vol. 41, no. 5, pp. 3345–3357, 2025, ISSN: 01782789 (ISSN).

Abstract | Links | BibTeX | Tags: 'current, Associative storage, Augmented Reality, Augmented state space, Computer simulation languages, Computer system recovery, Distributed computer systems, HTTP, Language Model, Large language model, Long-term video-to-music generation, Mamba, Memory architecture, Memory-augmented, Modeling languages, Music, Musical composition, Natural language processing systems, Object oriented programming, Performance, Problem oriented languages, State space, State-space

@article{kai_video-driven_2025,

title = {Video-driven musical composition using large language model with memory-augmented state space},

author = {W. -H. Kai and K. -X. Xing},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001073242&doi=10.1007%2fs00371-024-03606-w&partnerID=40&md5=7ea24f13614a9a24caf418c37a10bd8c},

doi = {10.1007/s00371-024-03606-w},

issn = {01782789 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Visual Computer},

volume = {41},

number = {5},

pages = {3345–3357},

abstract = {The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. However, the research work on LLms for music inspiration is still in its infancy. To fill the gap in this field and break through the dilemma that LLMs can only understand short videos with limited frames, we propose a large language model with state space for long-term video-to-music generation. To capture long-range dependency and maintaining high performance, while further decrease the computing cost, our overall network includes the Enhanced Video Mamba, which incorporates continuous moving window partitioning and local feature augmentation, and a long-term memory bank that captures and aggregates historical video information to mitigate information loss in long sequences. This framework achieves both subquadratic-time computation and near-linear memory complexity, enabling effective long-term video-to-music generation. We conduct a thorough evaluation of our proposed framework. The experimental results demonstrate that our model achieves or surpasses the performance of the current state-of-the-art models. Our code released on https://github.com/kai211233/S2L2-V2M. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.},

keywords = {'current, Associative storage, Augmented Reality, Augmented state space, Computer simulation languages, Computer system recovery, Distributed computer systems, HTTP, Language Model, Large language model, Long-term video-to-music generation, Mamba, Memory architecture, Memory-augmented, Modeling languages, Music, Musical composition, Natural language processing systems, Object oriented programming, Performance, Problem oriented languages, State space, State-space},

pubstate = {published},

tppubtype = {article}

}

2024

Lee, L. -K.; Chan, E. H.; Tong, K. K. -L.; Wong, N. K. -H.; Wu, B. S. -Y.; Fung, Y. -C.; Fong, E. K. S.; Hou, U. Leong; Wu, N. -I.

Utilizing Virtual Reality and Generative AI Chatbot for Job Interview Simulations Proceedings Article

In: K.T., Chui; Y.K., Hui; D., Yang; L.-K., Lee; L.-P., Wong; B.L., Reynolds (Ed.): Proc. - Int. Symp. Educ. Technol., ISET, pp. 209–212, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835036141-4 (ISBN).

Abstract | Links | BibTeX | Tags: chatbot, Chatbots, Computer interaction, Computer simulation languages, Generative adversarial networks, Generative AI, Hong-kong, Human computer interaction, ITS applications, Job interview simulation, Job interviews, Performance, Science graduates, User friendliness, Virtual environments, Virtual Reality

@inproceedings{lee_utilizing_2024,

title = {Utilizing Virtual Reality and Generative AI Chatbot for Job Interview Simulations},

author = {L. -K. Lee and E. H. Chan and K. K. -L. Tong and N. K. -H. Wong and B. S. -Y. Wu and Y. -C. Fung and E. K. S. Fong and U. Leong Hou and N. -I. Wu},

editor = {Chui K.T. and Hui Y.K. and Yang D. and Lee L.-K. and Wong L.-P. and Reynolds B.L.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206582338&doi=10.1109%2fISET61814.2024.00048&partnerID=40&md5=c6986c0697792254e167e143b75f14c6},

doi = {10.1109/ISET61814.2024.00048},

isbn = {979-835036141-4 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - Int. Symp. Educ. Technol., ISET},

pages = {209–212},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Stress and anxiety experienced by interviewees, particularly fresh graduates, would significantly impact their performance in job interviews. Due to the increased affordability and user-friendliness of virtual reality (VR), VR has seen a surge in its application within the educational sector. This paper presents the design and implementation of a job interview simulation system, leveraging VR and a generative AI chatbot to provide an immersive environment for computer science graduates in Hong Kong. The system aims to help graduates practice and familiarize themselves with various real-world scenarios of a job interview in English, Mandarin, and Cantonese, tailored to the unique language requirements of Hong Kong's professional environment. The system comprises three core modules: a mock question and answer reading module, an AI speech analysis module, and a virtual interview module facilitated by the generative AI chatbot, ChatGPT. We anticipate that the proposed simulator will provide valuable insights to education practitioners on utilizing VR and generative AI for job interview training, extending beyond computer science graduates. © 2024 IEEE.},

keywords = {chatbot, Chatbots, Computer interaction, Computer simulation languages, Generative adversarial networks, Generative AI, Hong-kong, Human computer interaction, ITS applications, Job interview simulation, Job interviews, Performance, Science graduates, User friendliness, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Kang, Z.; Liu, Y.; Zheng, J.; Sun, Z.

Revealing the Difficulty in Jailbreak Defense on Language Models for Metaverse Proceedings Article

In: Q., Gong; X., He (Ed.): SocialMeta - Proc. Int. Workshop Soc. Metaverse Comput., Sens. Netw., Part: ACM SenSys, pp. 31–37, Association for Computing Machinery, Inc, 2024, ISBN: 979-840071299-9 (ISBN).

Abstract | Links | BibTeX | Tags: % reductions, Attack strategies, Computer simulation languages, Defense, Digital elevation model, Guard rails, Jailbreak, Language Model, Large language model, Metaverse Security, Metaverses, Natural languages, Performance, Virtual Reality

@inproceedings{kang_revealing_2024,

title = {Revealing the Difficulty in Jailbreak Defense on Language Models for Metaverse},

author = {Z. Kang and Y. Liu and J. Zheng and Z. Sun},

editor = {Gong Q. and He X.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85212189363&doi=10.1145%2f3698387.3699998&partnerID=40&md5=673326728c3db35ffbbaf807eb7f003c},

doi = {10.1145/3698387.3699998},

isbn = {979-840071299-9 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {SocialMeta - Proc. Int. Workshop Soc. Metaverse Comput., Sens. Netw., Part: ACM SenSys},

pages = {31–37},

publisher = {Association for Computing Machinery, Inc},

abstract = {Large language models (LLMs) have demonstrated exceptional capabilities in natural language processing tasks, fueling innovations in emerging areas such as the metaverse. These models enable dynamic virtual communities, enhancing user interactions and revolutionizing industries. However, their increasing deployment exposes vulnerabilities to jailbreak attacks, where adversaries can manipulate LLM-driven systems to generate harmful content. While various defense mechanisms have been proposed, their efficacy against diverse jailbreak techniques remains unclear. This paper addresses this gap by evaluating the performance of three popular defense methods (Backtranslation, Self-reminder, and Paraphrase) against different jailbreak attack strategies (GCG, BEAST, and Deepinception), while also utilizing three distinct models. Our findings reveal that while defenses are highly effective against optimization-based jailbreak attacks and reduce the attack success rate by 79% on average, they struggle in defending against attacks that alter attack motivations. Additionally, methods relying on self-reminding perform better when integrated with models featuring robust safety guardrails. For instance, Llama2-7b shows a 100% reduction in Attack Success Rate, while Vicuna-7b and Mistral-7b, lacking safety alignment, exhibit a lower average reduction of 65.8%. This study highlights the challenges in developing universal defense solutions for securing LLMs in dynamic environments like the metaverse. Furthermore, our study highlights that the three distinct models utilized demonstrate varying initial defense performance against different jailbreak attack strategies, underscoring the complexity of effectively securing LLMs. © 2024 Copyright held by the owner/author(s).},

keywords = {% reductions, Attack strategies, Computer simulation languages, Defense, Digital elevation model, Guard rails, Jailbreak, Language Model, Large language model, Metaverse Security, Metaverses, Natural languages, Performance, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

2023

Si, J.; Song, J.; Woo, M.; Kim, D.; Lee, Y.; Kim, S.

Generative AI Models for Virtual Interviewers: Applicability and Performance Comparison Proceedings Article

In: IET. Conf. Proc., pp. 27–28, Institution of Engineering and Technology, 2023, ISBN: 27324494 (ISSN).

Abstract | Links | BibTeX | Tags: 3D Generation, College admissions, Digital elevation model, Effective practices, Generative AI, Job hunting, Metaverse, Metaverses, Performance, Performance comparison, Virtual environments, Virtual Interview, Virtual Reality