AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

113 entries « ‹ 1 of 5 › »

2025

Mekki, Y. M.; Simon, L. V.; Freeman, W. D.; Qadir, J.

Medical Education Metaverses (MedEd Metaverses): Opportunities, Use Case, and Guidelines Journal Article

In: Computer, vol. 58, no. 3, pp. 60–70, 2025, ISSN: 00189162 (ISSN).

Abstract | Links | BibTeX | Tags: Adaptive feedback, Augmented Reality, Immersive learning, Medical education, Metaverses, Performance tracking, Remote resources, Remote training, Resource efficiencies, Training efficiency, Virtual environments

Dong, Y.

Enhancing Painting Exhibition Experiences with the Application of Augmented Reality-Based AI Video Generation Technology Proceedings Article

In: P., Zaphiris; A., Ioannou; A., Ioannou; R.A., Sottilare; J., Schwarz; M., Rauterberg (Ed.): Lect. Notes Comput. Sci., pp. 256–262, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303176814-9 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, AI-generated art, Art and Technology, Arts computing, Augmented Reality, Augmented reality technology, Digital Exhibition Design, Dynamic content, E-Learning, Education computing, Generation technologies, Interactive computer graphics, Knowledge Management, Multi dimensional, Planning designs, Three dimensional computer graphics, Video contents, Video generation

@inproceedings{dong_enhancing_2025,

title = {Enhancing Painting Exhibition Experiences with the Application of Augmented Reality-Based AI Video Generation Technology},

author = {Y. Dong},

editor = {Zaphiris P. and Ioannou A. and Ioannou A. and Sottilare R.A. and Schwarz J. and Rauterberg M.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85213302959&doi=10.1007%2f978-3-031-76815-6_18&partnerID=40&md5=35484f5ed199a831f1a30f265a0d32d5},

doi = {10.1007/978-3-031-76815-6_18},

isbn = {03029743 (ISSN); 978-303176814-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15378 LNCS},

pages = {256–262},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Traditional painting exhibitions often rely on flat presentation methods, such as walls and stands, limiting their impact. Augmented Reality (AR) technology presents an opportunity to transform these experiences by turning static, flat artwork into dynamic, multi-dimensional presentations. However, creating and integrating video or dynamic content can be time-consuming and challenging, requiring meticulous planning, design, and production. In the context of urban renewal and community revitalization, particularly in China’s first-tier cities where real estate development has saturated the market, there is a growing trend to repurpose traditional commercial and office spaces with cultural and artistic exhibitions. These exhibitions not only enhance the spatial quality but also elevate the user experience, making the spaces more competitive. However, these non-traditional exhibition venues often lack the amenities of professional galleries, relying on walls, windows, and corners for displays, and requiring quick setup times. For visitors, who are often office workers or shoppers with limited time, the use of personal mobile devices for interaction is common. WeChat, China’s most widely used mobile application, provides a platform for convenient digital interactive experiences through mini-programs, which can support lightweight AR applications. AI video generation technologies, such as Conditional Generative Adversarial Networks (ControlNet) and Latent Consistency Models (LCM), have seen significant advancements. These technologies now allow for the creation of 3D models and video content from text and images. Tools like Meshy and Pika provide the ability to generate various video styles and offer precise control over video content. New AI video applications like Stable Video further expand the possibilities by rapidly converting static images into dynamic videos, facilitating easy adjustments and edits. This paper explores the application of AR-based AI video generation technology in enhancing the experience of painting exhibitions. By integrating these technologies, traditional paintings can be transformed into interactive, engaging displays that enrich the viewer’s experience. The study demonstrates the potential of these innovations to make art exhibitions more appealing and competitive in various public spaces, thereby improving both artistic expression and audience engagement. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {3D modeling, AI-generated art, Art and Technology, Arts computing, Augmented Reality, Augmented reality technology, Digital Exhibition Design, Dynamic content, E-Learning, Education computing, Generation technologies, Interactive computer graphics, Knowledge Management, Multi dimensional, Planning designs, Three dimensional computer graphics, Video contents, Video generation},

pubstate = {published},

tppubtype = {inproceedings}

}

Traditional painting exhibitions often rely on flat presentation methods, such as walls and stands, limiting their impact. Augmented Reality (AR) technology presents an opportunity to transform these experiences by turning static, flat artwork into dynamic, multi-dimensional presentations. However, creating and integrating video or dynamic content can be time-consuming and challenging, requiring meticulous planning, design, and production. In the context of urban renewal and community revitalization, particularly in China’s first-tier cities where real estate development has saturated the market, there is a growing trend to repurpose traditional commercial and office spaces with cultural and artistic exhibitions. These exhibitions not only enhance the spatial quality but also elevate the user experience, making the spaces more competitive. However, these non-traditional exhibition venues often lack the amenities of professional galleries, relying on walls, windows, and corners for displays, and requiring quick setup times. For visitors, who are often office workers or shoppers with limited time, the use of personal mobile devices for interaction is common. WeChat, China’s most widely used mobile application, provides a platform for convenient digital interactive experiences through mini-programs, which can support lightweight AR applications. AI video generation technologies, such as Conditional Generative Adversarial Networks (ControlNet) and Latent Consistency Models (LCM), have seen significant advancements. These technologies now allow for the creation of 3D models and video content from text and images. Tools like Meshy and Pika provide the ability to generate various video styles and offer precise control over video content. New AI video applications like Stable Video further expand the possibilities by rapidly converting static images into dynamic videos, facilitating easy adjustments and edits. This paper explores the application of AR-based AI video generation technology in enhancing the experience of painting exhibitions. By integrating these technologies, traditional paintings can be transformed into interactive, engaging displays that enrich the viewer’s experience. The study demonstrates the potential of these innovations to make art exhibitions more appealing and competitive in various public spaces, thereby improving both artistic expression and audience engagement. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Yokoyama, N.; Kimura, R.; Nakajima, T.

ViGen: Defamiliarizing Everyday Perception for Discovering Unexpected Insights Proceedings Article

In: H., Degen; S., Ntoa (Ed.): Lect. Notes Comput. Sci., pp. 397–417, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193417-9 (ISBN).

Abstract | Links | BibTeX | Tags: Artful Expression, Artistic technique, Augmented Reality, Daily lives, Defamiliarization, Dynamic environments, Engineering education, Enhanced vision systems, Generative AI, generative artificial intelligence, Human augmentation, Human engineering, Human-AI Interaction, Human-artificial intelligence interaction, Semi-transparent

@inproceedings{yokoyama_vigen_2025,

title = {ViGen: Defamiliarizing Everyday Perception for Discovering Unexpected Insights},

author = {N. Yokoyama and R. Kimura and T. Nakajima},

editor = {Degen H. and Ntoa S.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007760030&doi=10.1007%2f978-3-031-93418-6_26&partnerID=40&md5=dee6f54688284313a45579aab5f934d6},

doi = {10.1007/978-3-031-93418-6_26},

isbn = {03029743 (ISSN); 978-303193417-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15821 LNAI},

pages = {397–417},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {This paper proposes ViGen, an Augmented Reality (AR) and Artificial Intelligence (AI)-enhanced vision system designed to facilitate defamiliarization in daily life. Humans rely on sight to gather information, think, and act, yet the act of seeing often becomes passive in daily life. Inspired by Victor Shklovsky’s concept of defamiliarization and the artistic technique of photomontage, ViGen seeks to disrupt habitual perceptions. It achieves this by overlaying semi-transparent, AI-generated images, created based on the user’s view, through an AR display. The system is evaluated by several structured interviews, in which participants experience ViGen in three different scenarios. Results indicate that AI-generated visuals effectively supported defamiliarization by transforming ordinary scenes into unfamiliar ones. However, the user’s familiarity with a place plays a significant role. Also, while the feature that adjusts the transparency of overlaid images enhances safety, its limitations in dynamic environments suggest the need for further research across diverse cultural and geographic contexts. This study demonstrates the potential of AI-augmented vision systems to stimulate new ways of seeing, offering insights for further development in visual augmentation technologies. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {Artful Expression, Artistic technique, Augmented Reality, Daily lives, Defamiliarization, Dynamic environments, Engineering education, Enhanced vision systems, Generative AI, generative artificial intelligence, Human augmentation, Human engineering, Human-AI Interaction, Human-artificial intelligence interaction, Semi-transparent},

pubstate = {published},

tppubtype = {inproceedings}

}

Weerasinghe, M.; Kljun, M.; Pucihar, K. Č.

A Cross-Device Interaction with the Smartphone and HMD for Vocabulary Learning Proceedings Article

In: L., Zaina; J.C., Campos; D., Spano; K., Luyten; P., Palanque; G., Veer; A., Ebert; S.R., Humayoun; V., Memmesheimer (Ed.): Lect. Notes Comput. Sci., pp. 269–282, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303191759-2 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented Reality, Context-based, Context-based vocabulary learning, Cross-reality interaction, Engineering education, Head-mounted displays, Head-mounted-displays, Images synthesis, Keyword method, Mixed reality, Smart phones, Smartphones, Students, Text-to-image synthesis, Visualization, Vocabulary learning

@inproceedings{weerasinghe_cross-device_2025,

title = {A Cross-Device Interaction with the Smartphone and HMD for Vocabulary Learning},

author = {M. Weerasinghe and M. Kljun and K. Č. Pucihar},

editor = {Zaina L. and Campos J.C. and Spano D. and Luyten K. and Palanque P. and Veer G. and Ebert A. and Humayoun S.R. and Memmesheimer V.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007828696&doi=10.1007%2f978-3-031-91760-8_18&partnerID=40&md5=4ebf202715ba880dcfeb3232dba7e2c4},

doi = {10.1007/978-3-031-91760-8_18},

isbn = {03029743 (ISSN); 978-303191759-2 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15518 LNCS},

pages = {269–282},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Cross-reality (XR) systems facilitate interaction between devices with differing levels of virtual content. By engaging with a variety of such devices, XR systems offer the flexibility to choose the most suitable modality for specific task or context. This capability enables rich applications in training and education, including vocabulary learning. Vocabulary acquisition is a vital part of language learning, employing techniques such as words rehearsing, flashcards, labelling environments with post-it notes, and mnemonic strategies such as the keyword method. Traditional mnemonics typically rely on visual stimuli or mental visualisations. Recent research highlights that AR can enhance vocabulary learning by combining real objects with augmented stimuli such as in labelling environments. Additionally,advancements in generative AI now enable high-quality, synthetically generated images from text descriptions, facilitating externalisation of personalised visual stimuli of mental visualisations. However, creating interfaces for effective real-world augmentation remains challenging, particularly given the limited text input capabilities of Head-Mounted Displays (HMDs). This work presents an XR system that combines smartphones and HMDs by leveraging Augmented Reality (AR) for contextually relevant information and a smartphone for efficient text input. The system enables users to visually annotate objects with personalised images of keyword associations generated with DALL-E 2. To evaluate the system, we conducted a user study with 16 university graduate students, assessing both usability and overall user experience. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {Augmented Reality, Context-based, Context-based vocabulary learning, Cross-reality interaction, Engineering education, Head-mounted displays, Head-mounted-displays, Images synthesis, Keyword method, Mixed reality, Smart phones, Smartphones, Students, Text-to-image synthesis, Visualization, Vocabulary learning},

pubstate = {published},

tppubtype = {inproceedings}

}

Otsuka, T.; Li, D.; Siriaraya, P.; Nakajima, S.

Development of A Relaxation Support System Utilizing Stereophonic AR Proceedings Article

In: Int. Conf. Comput., Netw. Commun., ICNC, pp. 463–467, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152096-0 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented Reality, Environmental sounds, Generative AI, Immersive, Mental Well-being, Soundscapes, Spatial Audio, Stereo image processing, Support method, Support systems, Well being

Sajiukumar, A.; Ranjan, A.; Parvathi, P. K.; Satheesh, A.; Udayan, J. Divya; Subramaniam, U.

Generative AI-Enabled Virtual Twin for Meeting Assistants Proceedings Article

In: T., Saba; A., Rehman (Ed.): Proc. - Int. Women Data Sci. Conf. at Prince Sultan Univ., WiDS-PSU, pp. 60–65, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152092-2 (ISBN).

Abstract | Links | BibTeX | Tags: 3D avatar generation, 3D Avatars, 3D reconstruction, AI-augmented interaction, Augmented Reality, Communication and collaborations, Conversational AI, Neural radiation field, neural radiation fields (NeRF), Radiation field, Real time performance, real-time performance, Three dimensional computer graphics, Virtual spaces, Voice cloning

@inproceedings{sajiukumar_generative_2025,

title = {Generative AI-Enabled Virtual Twin for Meeting Assistants},

author = {A. Sajiukumar and A. Ranjan and P. K. Parvathi and A. Satheesh and J. Divya Udayan and U. Subramaniam},

editor = {Saba T. and Rehman A.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007691247&doi=10.1109%2fWiDS-PSU64963.2025.00025&partnerID=40&md5=f0bfb74a8f854c427054c73582909185},

doi = {10.1109/WiDS-PSU64963.2025.00025},

isbn = {979-833152092-2 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - Int. Women Data Sci. Conf. at Prince Sultan Univ., WiDS-PSU},

pages = {60–65},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {The growing dependence on virtual spaces for communication and collaboration has transformed interactions in numerous industries, ranging from professional meetings to education, entertainment, and healthcare. Despite the advancement of AI technologies such as three-dimensional modeling, voice cloning, and conversational AI, the convergence of these technologies in a single platform is still challenging. This paper introduces a unified framework that brings together state-of-the-art 3D avatar generation, real-time voice cloning, and conversational AI to enhance virtual interactions. The system utilizes Triplane neural representations and neural radiation fields (NeRF) for high-fidelity 3D avatar generation, speaker encoders coupled with Tacotron 2 and WaveRNN for natural voice cloning, and a context-aware chat algorithm for adaptive conversations. By overcoming the challenges of customization, integration, and real-time performance, the proposed framework addresses the increasing needs for realistic virtual representations, setting new benchmarks for AI-augmented interaction in virtual conferences, online representation, education, and healthcare. © 2025 IEEE.},

keywords = {3D avatar generation, 3D Avatars, 3D reconstruction, AI-augmented interaction, Augmented Reality, Communication and collaborations, Conversational AI, Neural radiation field, neural radiation fields (NeRF), Radiation field, Real time performance, real-time performance, Three dimensional computer graphics, Virtual spaces, Voice cloning},

pubstate = {published},

tppubtype = {inproceedings}

}

Song, T.; Liu, Z.; Zhao, R.; Fu, J.

ElderEase AR: Enhancing Elderly Daily Living with the Multimodal Large Language Model and Augmented Reality Proceedings Article

In: ICVRT - Proc. Int. Conf. Virtual Real. Technol., pp. 60–67, Association for Computing Machinery, Inc, 2025, ISBN: 979-840071018-6 (ISBN).

Abstract | Links | BibTeX | Tags: Age-related, Assisted living, Augmented Reality, Augmented reality technology, Daily Life Support, Daily living, Daily-life supports, Elderly, Elderly users, Independent living, Independent living systems, Language Model, Modeling languages, Multi agent systems, Multi-modal, Multimodal large language model

@inproceedings{song_elderease_2025,

title = {ElderEase AR: Enhancing Elderly Daily Living with the Multimodal Large Language Model and Augmented Reality},

author = {T. Song and Z. Liu and R. Zhao and J. Fu},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001924899&doi=10.1145%2f3711496.3711505&partnerID=40&md5=4df693735547b505172657a73359f3ca},

doi = {10.1145/3711496.3711505},

isbn = {979-840071018-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {ICVRT - Proc. Int. Conf. Virtual Real. Technol.},

pages = {60–67},

publisher = {Association for Computing Machinery, Inc},

abstract = {Elderly individuals often face challenges in independent living due to age-related cognitive and physical decline. To address these issues, we propose an innovative Augmented Reality (AR) system, “ElderEase AR”, designed to assist elderly users in their daily lives by leveraging a Multimodal Large Language Model (MLLM). This system enables elderly users to capture images of their surroundings and ask related questions, providing context-aware feedback. We evaluated the system’s perceived ease-of-use and feasibility through a pilot study involving 30 elderly users, aiming to enhance their independence and quality of life. Our system integrates advanced AR technology with an intelligent agent trained on multimodal datasets. Through prompt engineering, the agent is tailored to respond in a manner that aligns with the speaking style of elderly users. Experimental results demonstrate high accuracy in object recognition and question answering, with positive feedback from user trials. Specifically, the system accurately identified objects in various environments and provided relevant answers to user queries. This study highlights the powerful potential of AR and AI technologies in creating support tools for the elderly. It suggests directions for future improvements and applications, such as enhancing the system’s adaptability to different user needs and expanding its functionality to cover more aspects of daily living. © 2024 Copyright held by the owner/author(s).},

keywords = {Age-related, Assisted living, Augmented Reality, Augmented reality technology, Daily Life Support, Daily living, Daily-life supports, Elderly, Elderly users, Independent living, Independent living systems, Language Model, Modeling languages, Multi agent systems, Multi-modal, Multimodal large language model},

pubstate = {published},

tppubtype = {inproceedings}

}

Li, H.; Wang, Z.; Liang, W.; Wang, Y.

X’s Day: Personality-Driven Virtual Human Behavior Generation Journal Article

In: IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 5, pp. 3514–3524, 2025, ISSN: 10772626 (ISSN).

Abstract | Links | BibTeX | Tags: adult, Augmented Reality, Behavior Generation, Chatbots, Computer graphics, computer interface, Contextual Scene, female, human, Human behaviors, Humans, Long-term behavior, male, Novel task, Personality, Personality traits, Personality-driven Behavior, physiology, Social behavior, User-Computer Interface, Users' experiences, Virtual agent, Virtual environments, Virtual humans, Virtual Reality, Young Adult

@article{li_xs_2025,

title = {X’s Day: Personality-Driven Virtual Human Behavior Generation},

author = {H. Li and Z. Wang and W. Liang and Y. Wang},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105003864932&doi=10.1109%2fTVCG.2025.3549574&partnerID=40&md5=a865bbd2b0fa964a4f0f4190955dc787},

doi = {10.1109/TVCG.2025.3549574},

issn = {10772626 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Transactions on Visualization and Computer Graphics},

volume = {31},

number = {5},

pages = {3514–3524},

abstract = {Developing convincing and realistic virtual human behavior is essential for enhancing user experiences in virtual reality (VR) and augmented reality (AR) settings. This paper introduces a novel task focused on generating long-term behaviors for virtual agents, guided by specific personality traits and contextual elements within 3D environments. We present a comprehensive framework capable of autonomously producing daily activities autoregressively. By modeling the intricate connections between personality characteristics and observable activities, we establish a hierarchical structure of Needs, Task, and Activity levels. Integrating a Behavior Planner and a World State module allows for the dynamic sampling of behaviors using large language models (LLMs), ensuring that generated activities remain relevant and responsive to environmental changes. Extensive experiments validate the effectiveness and adaptability of our approach across diverse scenarios. This research makes a significant contribution to the field by establishing a new paradigm for personalized and context-aware interactions with virtual humans, ultimately enhancing user engagement in immersive applications. Our project website is at: https://behavior.agent-x.cn/. © 2025 IEEE. All rights reserved,},

keywords = {adult, Augmented Reality, Behavior Generation, Chatbots, Computer graphics, computer interface, Contextual Scene, female, human, Human behaviors, Humans, Long-term behavior, male, Novel task, Personality, Personality traits, Personality-driven Behavior, physiology, Social behavior, User-Computer Interface, Users' experiences, Virtual agent, Virtual environments, Virtual humans, Virtual Reality, Young Adult},

pubstate = {published},

tppubtype = {article}

}

Song, T.; Pabst, F.; Eck, U.; Navab, N.

Enhancing Patient Acceptance of Robotic Ultrasound through Conversational Virtual Agent and Immersive Visualizations Journal Article

In: IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 5, pp. 2901–2911, 2025, ISSN: 10772626 (ISSN).

Abstract | Links | BibTeX | Tags: 3D reconstruction, adult, Augmented Reality, Computer graphics, computer interface, echography, female, human, Humans, Imaging, Intelligent robots, Intelligent virtual agents, Language Model, male, Medical robotics, Middle Aged, Mixed reality, Patient Acceptance of Health Care, patient attitude, Patient comfort, procedures, Real-world, Reality visualization, Robotic Ultrasound, Robotics, Three-Dimensional, three-dimensional imaging, Trust and Acceptance, Ultrasonic applications, Ultrasonic equipment, Ultrasonography, Ultrasound probes, User-Computer Interface, Virtual agent, Virtual assistants, Virtual environments, Virtual Reality, Visual languages, Visualization, Young Adult

@article{song_enhancing_2025,

title = {Enhancing Patient Acceptance of Robotic Ultrasound through Conversational Virtual Agent and Immersive Visualizations},

author = {T. Song and F. Pabst and U. Eck and N. Navab},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105003687673&doi=10.1109%2fTVCG.2025.3549181&partnerID=40&md5=1d46569933582ecf5e967f0794aafc07},

doi = {10.1109/TVCG.2025.3549181},

issn = {10772626 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Transactions on Visualization and Computer Graphics},

volume = {31},

number = {5},

pages = {2901–2911},

abstract = {Robotic ultrasound systems have the potential to improve medical diagnostics, but patient acceptance remains a key challenge. To address this, we propose a novel system that combines an AI-based virtual agent, powered by a large language model (LLM), with three mixed reality visualizations aimed at enhancing patient comfort and trust. The LLM enables the virtual assistant to engage in natural, conversational dialogue with patients, answering questions in any format and offering real-time reassurance, creating a more intelligent and reliable interaction. The virtual assistant is animated as controlling the ultrasound probe, giving the impression that the robot is guided by the assistant. The first visualization employs augmented reality (AR), allowing patients to see the real world and the robot with the virtual avatar superimposed. The second visualization is an augmented virtuality (AV) environment, where the real-world body part being scanned is visible, while a 3D Gaussian Splatting reconstruction of the room, excluding the robot, forms the virtual environment. The third is a fully immersive virtual reality (VR) experience, featuring the same 3D reconstruction but entirely virtual, where the patient sees a virtual representation of their body being scanned in a robot-free environment. In this case, the virtual ultrasound probe, mirrors the movement of the probe controlled by the robot, creating a synchronized experience as it touches and moves over the patient's virtual body. We conducted a comprehensive agent-guided robotic ultrasound study with all participants, comparing these visualizations against a standard robotic ultrasound procedure. Results showed significant improvements in patient trust, acceptance, and comfort. Based on these findings, we offer insights into designing future mixed reality visualizations and virtual agents to further enhance patient comfort and acceptance in autonomous medical procedures. © 1995-2012 IEEE.},

keywords = {3D reconstruction, adult, Augmented Reality, Computer graphics, computer interface, echography, female, human, Humans, Imaging, Intelligent robots, Intelligent virtual agents, Language Model, male, Medical robotics, Middle Aged, Mixed reality, Patient Acceptance of Health Care, patient attitude, Patient comfort, procedures, Real-world, Reality visualization, Robotic Ultrasound, Robotics, Three-Dimensional, three-dimensional imaging, Trust and Acceptance, Ultrasonic applications, Ultrasonic equipment, Ultrasonography, Ultrasound probes, User-Computer Interface, Virtual agent, Virtual assistants, Virtual environments, Virtual Reality, Visual languages, Visualization, Young Adult},

pubstate = {published},

tppubtype = {article}

}

Kai, W. -H.; Xing, K. -X.

Video-driven musical composition using large language model with memory-augmented state space Journal Article

In: Visual Computer, vol. 41, no. 5, pp. 3345–3357, 2025, ISSN: 01782789 (ISSN).

Abstract | Links | BibTeX | Tags: 'current, Associative storage, Augmented Reality, Augmented state space, Computer simulation languages, Computer system recovery, Distributed computer systems, HTTP, Language Model, Large language model, Long-term video-to-music generation, Mamba, Memory architecture, Memory-augmented, Modeling languages, Music, Musical composition, Natural language processing systems, Object oriented programming, Performance, Problem oriented languages, State space, State-space

@article{kai_video-driven_2025,

title = {Video-driven musical composition using large language model with memory-augmented state space},

author = {W. -H. Kai and K. -X. Xing},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001073242&doi=10.1007%2fs00371-024-03606-w&partnerID=40&md5=7ea24f13614a9a24caf418c37a10bd8c},

doi = {10.1007/s00371-024-03606-w},

issn = {01782789 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Visual Computer},

volume = {41},

number = {5},

pages = {3345–3357},

abstract = {The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. However, the research work on LLms for music inspiration is still in its infancy. To fill the gap in this field and break through the dilemma that LLMs can only understand short videos with limited frames, we propose a large language model with state space for long-term video-to-music generation. To capture long-range dependency and maintaining high performance, while further decrease the computing cost, our overall network includes the Enhanced Video Mamba, which incorporates continuous moving window partitioning and local feature augmentation, and a long-term memory bank that captures and aggregates historical video information to mitigate information loss in long sequences. This framework achieves both subquadratic-time computation and near-linear memory complexity, enabling effective long-term video-to-music generation. We conduct a thorough evaluation of our proposed framework. The experimental results demonstrate that our model achieves or surpasses the performance of the current state-of-the-art models. Our code released on https://github.com/kai211233/S2L2-V2M. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.},

keywords = {'current, Associative storage, Augmented Reality, Augmented state space, Computer simulation languages, Computer system recovery, Distributed computer systems, HTTP, Language Model, Large language model, Long-term video-to-music generation, Mamba, Memory architecture, Memory-augmented, Modeling languages, Music, Musical composition, Natural language processing systems, Object oriented programming, Performance, Problem oriented languages, State space, State-space},

pubstate = {published},

tppubtype = {article}

}

Linares-Pellicer, J.; Izquierdo-Domenech, J.; Ferri-Molla, I.; Aliaga-Torro, C.

Breaking the Bottleneck: Generative AI as the Solution for XR Content Creation in Education Book Section

In: Lecture Notes in Networks and Systems, vol. 1140, pp. 9–30, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 23673370 (ISSN).

Abstract | Links | BibTeX | Tags: Adversarial machine learning, Augmented Reality, Breakings, Content creation, Contrastive Learning, Development process, Educational context, Federated learning, Generative adversarial networks, Immersive learning, Intelligence models, Learning experiences, Mixed reality, Resource intensity, Technical skills, Virtual environments

@incollection{linares-pellicer_breaking_2025,

title = {Breaking the Bottleneck: Generative AI as the Solution for XR Content Creation in Education},

author = {J. Linares-Pellicer and J. Izquierdo-Domenech and I. Ferri-Molla and C. Aliaga-Torro},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85212478399&doi=10.1007%2f978-3-031-71530-3_2&partnerID=40&md5=aefee938cd5b8a74ee811a463d7409ae},

doi = {10.1007/978-3-031-71530-3_2},

isbn = {23673370 (ISSN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lecture Notes in Networks and Systems},

volume = {1140},

pages = {9–30},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {The integration of Extended Reality (XR) technologies-Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR)-promises to revolutionize education by offering immersive learning experiences. However, the complexity and resource intensity of content creation hinders the adoption of XR in educational contexts. This chapter explores Generative Artificial Intelligence (GenAI) as a solution, highlighting how GenAI models can facilitate the creation of educational XR content. GenAI enables educators to produce engaging XR experiences without needing advanced technical skills by automating aspects of the development process from ideation to deployment. Practical examples demonstrate GenAI’s current capability to generate assets and program applications, significantly lowering the barrier to creating personalized and interactive learning environments. The chapter also addresses challenges related to GenAI’s application in education, including technical limitations and ethical considerations. Ultimately, GenAI’s integration into XR content creation makes immersive educational experiences more accessible and practical, driven by only natural interactions, promising a future where technology-enhanced learning is universally attainable. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {Adversarial machine learning, Augmented Reality, Breakings, Content creation, Contrastive Learning, Development process, Educational context, Federated learning, Generative adversarial networks, Immersive learning, Intelligence models, Learning experiences, Mixed reality, Resource intensity, Technical skills, Virtual environments},

pubstate = {published},

tppubtype = {incollection}

}

Shi, J.; Jain, R.; Chi, S.; Doh, H.; Chi, H. -G.; Quinn, A. J.; Ramani, K.

CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial Intelligence Proceedings Article

In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071394-1 (ISBN).

Abstract | Links | BibTeX | Tags: 'current, Application scenario, AR application, Augmented Reality, Context-Aware, Contextual information, Generative adversarial networks, generative artificial intelligence, Humanoid avatars, In-situ learning, Learning experiences, Power

@inproceedings{shi_caring-ai_2025,

title = {CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial Intelligence},

author = {J. Shi and R. Jain and S. Chi and H. Doh and H. -G. Chi and A. J. Quinn and K. Ramani},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005725461&doi=10.1145%2f3706598.3713348&partnerID=40&md5=e88afd8426e020155599ef3b2a044774},

doi = {10.1145/3706598.3713348},

isbn = {979-840071394-1 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Conf Hum Fact Comput Syst Proc},

publisher = {Association for Computing Machinery},

abstract = {Context-aware AR instruction enables adaptive and in-situ learning experiences. However, hardware limitations and expertise requirements constrain the creation of such instructions. With recent developments in Generative Artificial Intelligence (Gen-AI), current research tries to tackle these constraints by deploying AI-generated content (AIGC) in AR applications. However, our preliminary study with six AR practitioners revealed that the current AIGC lacks contextual information to adapt to varying application scenarios and is therefore limited in authoring. To utilize the strong generative power of GenAI to ease the authoring of AR instruction while capturing the context, we developed CARING-AI, an AR system to author context-aware humanoid-avatar-based instructions with GenAI. By navigating in the environment, users naturally provide contextual information to generate humanoid-avatar animation as AR instructions that blend in the context spatially and temporally. We showcased three application scenarios of CARING-AI: Asynchronous Instructions, Remote Instructions, and Ad Hoc Instructions based on a design space of AIGC in AR Instructions. With two user studies (N=12), we assessed the system usability of CARING-AI and demonstrated the easiness and effectiveness of authoring with Gen-AI. © 2025 Copyright held by the owner/author(s).},

keywords = {'current, Application scenario, AR application, Augmented Reality, Context-Aware, Contextual information, Generative adversarial networks, generative artificial intelligence, Humanoid avatars, In-situ learning, Learning experiences, Power},

pubstate = {published},

tppubtype = {inproceedings}

}

Behravan, M.; Gračanin, D.

From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 150–155, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, 3D object, 3D Object Generation, 3D reconstruction, Augmented Reality, Cutting edges, Generative AI, Interactive computer systems, Language Model, Large language model, large language models, matrix, Multilingual speech interaction, Real- time, Speech enhancement, Speech interaction, Volume Rendering

@inproceedings{behravan_voices_2025,

title = {From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality},

author = {M. Behravan and D. Gračanin},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005153589&doi=10.1109%2fVRW66409.2025.00038&partnerID=40&md5=b8aaab4e2378cde3595d98d79266d371},

doi = {10.1109/VRW66409.2025.00038},

isbn = {979-833151484-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},

pages = {150–155},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This paper presents Matrix, an advanced AI-powered framework designed for real-time 3D object generation in Augmented Reality (AR) environments. By integrating a cutting-edge text-to-3D generative AI model, multilingual speech-to-text translation, and large language models (LLMs), the system enables seamless user interactions through spoken commands. The framework processes speech inputs, generates 3D objects, and provides object recommendations based on contextual understanding, enhancing AR experiences. A key feature of this framework is its ability to optimize 3D models by reducing mesh complexity, resulting in significantly smaller file sizes and faster processing on resource-constrained AR devices. Our approach addresses the challenges of high GPU usage, large model output sizes, and real-time system responsiveness, ensuring a smoother user experience. Moreover, the system is equipped with a pre-generated object repository, further reducing GPU load and improving efficiency. We demonstrate the practical applications of this framework in various fields such as education, design, and accessibility, and discuss future enhancements including image-to-3D conversion, environmental object detection, and multimodal support. The open-source nature of the framework promotes ongoing innovation and its utility across diverse industries. © 2025 IEEE.},

keywords = {3D modeling, 3D object, 3D Object Generation, 3D reconstruction, Augmented Reality, Cutting edges, Generative AI, Interactive computer systems, Language Model, Large language model, large language models, matrix, Multilingual speech interaction, Real- time, Speech enhancement, Speech interaction, Volume Rendering},

pubstate = {published},

tppubtype = {inproceedings}

}

Volkova, S.; Nguyen, D.; Penafiel, L.; Kao, H. -T.; Cohen, M.; Engberson, G.; Cassani, L.; Almutairi, M.; Chiang, C.; Banerjee, N.; Belcher, M.; Ford, T. W.; Yankoski, M. G.; Weninger, T.; Gomez-Zara, D.; Rebensky, S.

VirTLab: Augmented Intelligence for Modeling and Evaluating Human-AI Teaming Through Agent Interactions Proceedings Article

In: R.A., Sottilare; J., Schwarz (Ed.): Lect. Notes Comput. Sci., pp. 279–301, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303192969-4 (ISBN).

Abstract | Links | BibTeX | Tags: Agent based simulation, agent-based simulation, Augmented Reality, Causal analysis, HAT processes and states, Human digital twin, human digital twins, Human-AI team process and state, Human-AI teaming, Intelligent virtual agents, Operational readiness, Personnel training, Team performance, Team process, Virtual teaming, Visual analytics

@inproceedings{volkova_virtlab_2025,

title = {VirTLab: Augmented Intelligence for Modeling and Evaluating Human-AI Teaming Through Agent Interactions},

author = {S. Volkova and D. Nguyen and L. Penafiel and H. -T. Kao and M. Cohen and G. Engberson and L. Cassani and M. Almutairi and C. Chiang and N. Banerjee and M. Belcher and T. W. Ford and M. G. Yankoski and T. Weninger and D. Gomez-Zara and S. Rebensky},

editor = {Sottilare R.A. and Schwarz J.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007830752&doi=10.1007%2f978-3-031-92970-0_20&partnerID=40&md5=c578dc95176a617f6de2a1c6f998f73f},

doi = {10.1007/978-3-031-92970-0_20},

isbn = {03029743 (ISSN); 978-303192969-4 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15813 LNCS},

pages = {279–301},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {This paper introduces VirTLab (Virtual Teaming Laboratory), a novel augmented intelligence platform designed to simulate and analyze interactions between human-AI teams (HATs) through the use of human digital twins (HDTs) and AI agents. VirTLab enhances operational readiness by systematically analyzing HAT dynamics, fostering trust development, and providing actionable recommendations to improve team performance outcomes. VirTLab combines agents driven by large language models (LLM) interacting in a simulated environment with integrated HAT performance measures obtained using interactive visual analytics. VirTLab integrates four key components: (1) HDTs with configurable profiles, (2) operational AI teammates, (3) a simulation engine that enforces temporal and spatial environment constraints, ensures situational awareness, and coordinates events between HDT and AI agents to deliver high-fidelity simulations, and (4) an evaluation platform that validates simulations against ground truth and enables exploration of how HDTs and AI attributes influence HAT functioning. We demonstrate VirTLab’s capabilities through focused experiments examining how variations in HDT openness, agreeableness, propensity to trust, and AI reliability and transparency influence HAT performance. Our HAT performance evaluation framework incorporates both objective measures such as communication patterns and mission completion, and subjective measures to include perceived trust and team coordination. Results on search and rescue missions reveal that AI teammate reliability significantly impacts communication dynamics and team assistance behaviors, whereas HDT personality traits influence trust development and team coordination -insights that directly inform the design of HAT training programs. VirTLab enables instructional designers to explore interventions in HAT behaviors through controlled experiments and causal analysis, leading to improved HAT performance. Visual analytics support the examination of HAT functioning across different conditions, allowing for real-time assessment and adaptation of scenarios. VirTLab contributes to operational readiness by preparing human operators to work seamlessly with AI counterparts in real-world situations. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {Agent based simulation, agent-based simulation, Augmented Reality, Causal analysis, HAT processes and states, Human digital twin, human digital twins, Human-AI team process and state, Human-AI teaming, Intelligent virtual agents, Operational readiness, Personnel training, Team performance, Team process, Virtual teaming, Visual analytics},

pubstate = {published},

tppubtype = {inproceedings}

}

This paper introduces VirTLab (Virtual Teaming Laboratory), a novel augmented intelligence platform designed to simulate and analyze interactions between human-AI teams (HATs) through the use of human digital twins (HDTs) and AI agents. VirTLab enhances operational readiness by systematically analyzing HAT dynamics, fostering trust development, and providing actionable recommendations to improve team performance outcomes. VirTLab combines agents driven by large language models (LLM) interacting in a simulated environment with integrated HAT performance measures obtained using interactive visual analytics. VirTLab integrates four key components: (1) HDTs with configurable profiles, (2) operational AI teammates, (3) a simulation engine that enforces temporal and spatial environment constraints, ensures situational awareness, and coordinates events between HDT and AI agents to deliver high-fidelity simulations, and (4) an evaluation platform that validates simulations against ground truth and enables exploration of how HDTs and AI attributes influence HAT functioning. We demonstrate VirTLab’s capabilities through focused experiments examining how variations in HDT openness, agreeableness, propensity to trust, and AI reliability and transparency influence HAT performance. Our HAT performance evaluation framework incorporates both objective measures such as communication patterns and mission completion, and subjective measures to include perceived trust and team coordination. Results on search and rescue missions reveal that AI teammate reliability significantly impacts communication dynamics and team assistance behaviors, whereas HDT personality traits influence trust development and team coordination -insights that directly inform the design of HAT training programs. VirTLab enables instructional designers to explore interventions in HAT behaviors through controlled experiments and causal analysis, leading to improved HAT performance. Visual analytics support the examination of HAT functioning across different conditions, allowing for real-time assessment and adaptation of scenarios. VirTLab contributes to operational readiness by preparing human operators to work seamlessly with AI counterparts in real-world situations. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Behravan, M.; Haghani, M.; Gračanin, D.

Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality Proceedings Article

In: J.Y.C., Chen; G., Fragomeni (Ed.): Lect. Notes Comput. Sci., pp. 13–32, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193699-9 (ISBN).

Abstract | Links | BibTeX | Tags: 3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering

@inproceedings{behravan_transcending_2025,

title = {Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality},

author = {M. Behravan and M. Haghani and D. Gračanin},

editor = {Chen J.Y.C. and Fragomeni G.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007690904&doi=10.1007%2f978-3-031-93700-2_2&partnerID=40&md5=1c4d643aad88d08cbbc9dd2c02413f10},

doi = {10.1007/978-3-031-93700-2_2},

isbn = {03029743 (ISSN); 978-303193699-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15788 LNCS},

pages = {13–32},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Traditional 3D modeling requires technical expertise, specialized software, and time-intensive processes, making it inaccessible for many users. Our research aims to lower these barriers by combining generative AI and augmented reality (AR) into a cohesive system that allows users to easily generate, manipulate, and interact with 3D models in real time, directly within AR environments. Utilizing cutting-edge AI models like Shap-E, we address the complex challenges of transforming 2D images into 3D representations in AR environments. Key challenges such as object isolation, handling intricate backgrounds, and achieving seamless user interaction are tackled through advanced object detection methods, such as Mask R-CNN. Evaluation results from 35 participants reveal an overall System Usability Scale (SUS) score of 69.64, with participants who engaged with AR/VR technologies more frequently rating the system significantly higher, at 80.71. This research is particularly relevant for applications in gaming, education, and AR-based e-commerce, offering intuitive, model creation for users without specialized skills. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering},

pubstate = {published},

tppubtype = {inproceedings}

}

Graziano, M.; Cante, L. Colucci; Martino, B. Di

Deploying Large Language Model on Cloud-Edge Architectures: A Case Study for Conversational Historical Characters Book Section

In: Lecture Notes on Data Engineering and Communications Technologies, vol. 250, pp. 196–205, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 23674512 (ISSN).

Abstract | Links | BibTeX | Tags: Agent based, Augmented Reality, Case-studies, Chatbots, Cloud computing architecture, Conversational Agents, EDGE architectures, Historical characters, Language Model, Modeling languages, Real time performance, WEB application, Web applications, Work analysis

Behravan, M.; Matković, K.; Gračanin, D.

Generative AI for Context-Aware 3D Object Creation Using Vision-Language Models in Augmented Reality Proceedings Article

In: Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR, pp. 73–81, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152157-8 (ISBN).

Abstract | Links | BibTeX | Tags: 3D object, 3D Object Generation, Artificial intelligence systems, Augmented Reality, Capture images, Context-Aware, Generative adversarial networks, Generative AI, generative artificial intelligence, Generative model, Language Model, Object creation, Vision language model, vision language models, Visual languages

@inproceedings{behravan_generative_2025,

title = {Generative AI for Context-Aware 3D Object Creation Using Vision-Language Models in Augmented Reality},

author = {M. Behravan and K. Matković and D. Gračanin},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000292700&doi=10.1109%2fAIxVR63409.2025.00018&partnerID=40&md5=b40fa769a6b427918c3fcd86f7c52a75},

doi = {10.1109/AIxVR63409.2025.00018},

isbn = {979-833152157-8 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR},

pages = {73–81},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {We present a novel Artificial Intelligence (AI) system that functions as a designer assistant in augmented reality (AR) environments. Leveraging Vision Language Models (VLMs) like LLaVA and advanced text-to-3D generative models, users can capture images of their surroundings with an Augmented Reality (AR) headset. The system analyzes these images to recommend contextually relevant objects that enhance both functionality and visual appeal. The recommended objects are generated as 3D models and seamlessly integrated into the AR environment for interactive use. Our system utilizes open-source AI models running on local systems to enhance data security and reduce operational costs. Key features include context-aware object suggestions, optimal placement guidance, aesthetic matching, and an intuitive user interface for real-time interaction. Evaluations using the COCO 2017 dataset and real-world AR testing demonstrated high accuracy in object detection and contextual fit rating of 4.1 out of 5. By addressing the challenge of providing context-aware object recommendations in AR, our system expands the capabilities of AI applications in this domain. It enables users to create personalized digital spaces efficiently, leveraging AI for contextually relevant suggestions. © 2025 IEEE.},

keywords = {3D object, 3D Object Generation, Artificial intelligence systems, Augmented Reality, Capture images, Context-Aware, Generative adversarial networks, Generative AI, generative artificial intelligence, Generative model, Language Model, Object creation, Vision language model, vision language models, Visual languages},

pubstate = {published},

tppubtype = {inproceedings}

}

Xiao, T.; Chen, Y.; Zhong, S.; Kiefer, P.; Krukar, J.; Kim, K. G.; Hurni, L.; Schwering, A.; Raubal, M.

Sketch2Terrain: AI-Driven Real-Time Terrain Sketch Mapping in Augmented Reality Proceedings Article

In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071394-1 (ISBN).

Abstract | Links | BibTeX | Tags: 3D information, Augmented Reality, Drawing (graphics), Freehand sketching, Generative 3D sketch mapping, Generative AI, Mapping systems, Photomapping, Real-time terrains, Sketch maps, Spatial cognition, Spatial informations, terrain generation, Terrain generations, Three dimensional computer graphics

@inproceedings{xiao_sketch2terrain_2025,

title = {Sketch2Terrain: AI-Driven Real-Time Terrain Sketch Mapping in Augmented Reality},

author = {T. Xiao and Y. Chen and S. Zhong and P. Kiefer and J. Krukar and K. G. Kim and L. Hurni and A. Schwering and M. Raubal},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005747437&doi=10.1145%2f3706598.3713467&partnerID=40&md5=bc38e658cfe7ae83792e8837d496f2c7},

doi = {10.1145/3706598.3713467},

isbn = {979-840071394-1 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Conf Hum Fact Comput Syst Proc},

publisher = {Association for Computing Machinery},

abstract = {Sketch mapping is an effective technique to externalize and communicate spatial information. However, it has been limited to 2D mediums, making it difficult to represent 3D information, particularly for terrains with elevation changes. We present Sketch2Terrain, an intuitive generative-3D-sketch-mapping system combining freehand sketching with generative Artificial Intelligence that radically changes sketch map creation and representation using Augmented Reality. Sketch2Terrain empowers non-experts to create unambiguous sketch maps of natural environments and provides a homogeneous interface for researchers to collect data and conduct experiments. A between-subject study (N=36) revealed that generative-3D-sketch-mapping improved efficiency by 38.4%, terrain-topology accuracy by 12.5%, and landmark accuracy by up to 12.1%, with only a 4.7% trade-off in terrain-elevation accuracy compared to freehand 3D-sketch-mapping. Additionally, generative-3D-sketch-mapping reduced perceived strain by 60.5% and stress by 39.5% over 2D-sketch-mapping. These findings underscore potential applications of generative-3D-sketch-mapping for in-depth understanding and accurate representation of vertically complex environments. The implementation is publicly available. © 2025 Copyright held by the owner/author(s).},

keywords = {3D information, Augmented Reality, Drawing (graphics), Freehand sketching, Generative 3D sketch mapping, Generative AI, Mapping systems, Photomapping, Real-time terrains, Sketch maps, Spatial cognition, Spatial informations, terrain generation, Terrain generations, Three dimensional computer graphics},

pubstate = {published},

tppubtype = {inproceedings}

}

Guo, P.; Zhang, Q.; Tian, C.; Xue, W.; Feng, X.

Digital Human Techniques for Education Reform Proceedings Article

In: ICETM - Proc. Int. Conf. Educ. Technol. Manag., pp. 173–178, Association for Computing Machinery, Inc, 2025, ISBN: 979-840071746-8 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented Reality, Contrastive Learning, Digital elevation model, Digital human technique, Digital Human Techniques, Digital humans, Education Reform, Education reforms, Educational Technology, Express emotions, Federated learning, Human behaviors, Human form models, Human techniques, Immersive, Innovative technology, Modeling languages, Natural language processing systems, Teachers', Teaching, Virtual environments, Virtual humans

@inproceedings{guo_digital_2025,

title = {Digital Human Techniques for Education Reform},

author = {P. Guo and Q. Zhang and C. Tian and W. Xue and X. Feng},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001671326&doi=10.1145%2f3711403.3711428&partnerID=40&md5=dd96647315af9409d119f68f9cf4e980},

doi = {10.1145/3711403.3711428},

isbn = {979-840071746-8 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {ICETM - Proc. Int. Conf. Educ. Technol. Manag.},

pages = {173–178},

publisher = {Association for Computing Machinery, Inc},

abstract = {The rapid evolution of artificial intelligence, big data, and generative AI models has ushered in significant transformations across various sectors, including education. Digital Human Technique, an innovative technology grounded in advanced computer science and artificial intelligence, is reshaping educational paradigms by enabling virtual humans to simulate human behavior, express emotions, and interact with users. This paper explores the application of Digital Human Technique in education reform, focusing on creating immersive, intelligent classroom experiences that foster meaningful interactions between teachers and students. We define Digital Human Technique and delve into its key technical components such as character modeling and rendering, natural language processing, computer vision, and augmented reality technologies. Our methodology involves analyzing the role of educational digital humans created through these technologies, assessing their impact on educational processes, and examining various application scenarios in educational reform. Results indicate that Digital Human Technique significantly enhances the learning experience by enabling personalized teaching, increasing engagement, and fostering emotional connections. Educational digital humans serve as virtual teachers, interactive learning aids, and facilitators of emotional interaction, effectively addressing the challenges of traditional educational methods. They also promote a deeper understanding of complex concepts through simulated environments and interactive digital content. © 2024 Copyright held by the owner/author(s).},

keywords = {Augmented Reality, Contrastive Learning, Digital elevation model, Digital human technique, Digital Human Techniques, Digital humans, Education Reform, Education reforms, Educational Technology, Express emotions, Federated learning, Human behaviors, Human form models, Human techniques, Immersive, Innovative technology, Modeling languages, Natural language processing systems, Teachers', Teaching, Virtual environments, Virtual humans},

pubstate = {published},

tppubtype = {inproceedings}

}

Angelopoulos, J.; Manettas, C.; Alexopoulos, K.

Industrial Maintenance Optimization Based on the Integration of Large Language Models (LLM) and Augmented Reality (AR) Proceedings Article

In: K., Alexopoulos; S., Makris; P., Stavropoulos (Ed.): Lect. Notes Mech. Eng., pp. 197–205, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 21954356 (ISSN); 978-303186488-9 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented Reality, Competition, Cost reduction, Critical path analysis, Crushed stone plants, Generative AI, generative artificial intelligence, Human expertise, Industrial equipment, Industrial maintenance, Language Model, Large language model, Maintenance, Maintenance optimization, Maintenance procedures, Manufacturing data processing, Potential errors, Problem oriented languages, Scheduled maintenance, Shopfloors, Solar power plants

@inproceedings{angelopoulos_industrial_2025,

title = {Industrial Maintenance Optimization Based on the Integration of Large Language Models (LLM) and Augmented Reality (AR)},

author = {J. Angelopoulos and C. Manettas and K. Alexopoulos},

editor = {Alexopoulos K. and Makris S. and Stavropoulos P.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001421726&doi=10.1007%2f978-3-031-86489-6_20&partnerID=40&md5=63be31b9f4dda4aafd6a641630506c09},

doi = {10.1007/978-3-031-86489-6_20},

isbn = {21954356 (ISSN); 978-303186488-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Mech. Eng.},

pages = {197–205},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Traditional maintenance procedures often rely on manual data processing and human expertise, leading to inefficiencies and potential errors. In the context of Industry 4.0 several digital technologies, such as Artificial Intelligence (AI), Big Data Analytics (BDA), and eXtended Reality (XR) have been developed and are constantly being integrated in a plethora of manufacturing activities (including industrial maintenance), in an attempt to minimize human error, facilitate shop floor technicians, reduce costs as well as reduce equipment downtimes. The latest developments in the field of AI point towards Large Language Models (LLM) which can communicate with human operators in an intuitive manner. On the other hand, Augmented Reality, as part of XR technologies, offers useful functionalities for improving user perception and interaction with modern, complex industrial equipment. Therefore, the context of this research work lies in the development and training of an LLM in order to provide suggestions and actionable items for the mitigation of unforeseen events (e.g. equipment breakdowns), in order to facilitate shop-floor technicians during their everyday tasks. Paired with AR visualizations over the physical environment, the technicians will get instructions for performing tasks and checks on the industrial equipment in a manner similar to human-to-human communication. The functionality of the proposed framework extends to the integration of modules for exchanging information with the engineering department towards the scheduling of Maintenance and Repair Operations (MRO) as well as the creation of a repository of historical data in order to constantly retrain and optimize the LLM. © The Author(s) 2025.},

keywords = {Augmented Reality, Competition, Cost reduction, Critical path analysis, Crushed stone plants, Generative AI, generative artificial intelligence, Human expertise, Industrial equipment, Industrial maintenance, Language Model, Large language model, Maintenance, Maintenance optimization, Maintenance procedures, Manufacturing data processing, Potential errors, Problem oriented languages, Scheduled maintenance, Shopfloors, Solar power plants},

pubstate = {published},

tppubtype = {inproceedings}

}

Xu, F.; Zhou, T.; Nguyen, T.; Bao, H.; Lin, C.; Du, J.

Integrating augmented reality and LLM for enhanced cognitive support in critical audio communications Journal Article

In: International Journal of Human Computer Studies, vol. 194, 2025, ISSN: 10715819 (ISSN).

Abstract | Links | BibTeX | Tags: Audio communications, Augmented Reality, Cognitive loads, Cognitive support, Decisions makings, Language Model, Large language model, LLM, Logic reasoning, Maintenance, Operations and maintenance, Oral communication, Situational awareness

@article{xu_integrating_2025,

title = {Integrating augmented reality and LLM for enhanced cognitive support in critical audio communications},

author = {F. Xu and T. Zhou and T. Nguyen and H. Bao and C. Lin and J. Du},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85208467299&doi=10.1016%2fj.ijhcs.2024.103402&partnerID=40&md5=153d095b837ee1666a7da0f7ed03362c},

doi = {10.1016/j.ijhcs.2024.103402},

issn = {10715819 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {International Journal of Human Computer Studies},

volume = {194},

abstract = {Operation and Maintenance (O&M) missions are often time-sensitive and accuracy-dependent, requiring rapid and precise information processing in noisy, chaotic environments where oral communication can lead to cognitive overload and impaired decision-making. Augmented Reality (AR) and Large Language Models (LLMs) offer potential for enhancing situational awareness and lowering cognitive load by integrating digital visualizations with the physical world and improving dialogue management. However, synthesizing these technologies into a real-time system that effectively aids operators remains a challenge. This study explores the integration of AR and GPT-4, an advanced LLM, in time-sensitive O&M tasks, aiming to enhance situational awareness and manage cognitive load during oral communications. A customized AR system, incorporating the Microsoft HoloLens2 for cognitive monitoring and GPT-4 for decision making assistance, was tested in a human subject experiment with 30 participants. The 2×2 factorial experiment evaluated the effects of AR and LLM assistance on task performance and cognitive load. Results demonstrated significant improvements in task accuracy and reductions in cognitive load, highlighting the effectiveness of AR and LLM integration in supporting O&M missions. These findings emphasize the need for further research to optimize operational strategies in mission critical environments. © 2024 Elsevier Ltd},

keywords = {Audio communications, Augmented Reality, Cognitive loads, Cognitive support, Decisions makings, Language Model, Large language model, LLM, Logic reasoning, Maintenance, Operations and maintenance, Oral communication, Situational awareness},

pubstate = {published},

tppubtype = {article}

}

Casas, L.; Mitchell, K.

Structured Teaching Prompt Articulation for Generative-AI Role Embodiment with Augmented Mirror Video Displays Proceedings Article

In: S.N., Spencer (Ed.): Proc.: VRCAI - ACM SIGGRAPH Int. Conf. Virtual-Reality Contin. Appl. Ind., Association for Computing Machinery, Inc, 2025, ISBN: 979-840071348-4 (ISBN).

Abstract | Links | BibTeX | Tags: Artificial intelligence, Augmented Reality, Computer interaction, Contrastive Learning, Cultural icon, Experiential learning, Generative adversarial networks, Generative AI, human-computer interaction, Immersive, Pedagogical practices, Role-based, Teachers', Teaching, Video display, Virtual environments, Virtual Reality

Aloudat, M. Z.; Aboumadi, A.; Soliman, A.; Al-Mohammed, H. A.; Al-Ali, M.; Mahgoub, A.; Barhamgi, M.; Yaacoub, E.

Metaverse Unbound: A Survey on Synergistic Integration Between Semantic Communication, 6G, and Edge Learning Journal Article

In: IEEE Access, vol. 13, pp. 58302–58350, 2025, ISSN: 21693536 (ISSN).

Abstract | Links | BibTeX | Tags: 6g wireless system, 6G wireless systems, Augmented Reality, Block-chain, Blockchain, Blockchain technology, Digital Twin Technology, Edge learning, Extended reality (XR), Language Model, Large language model, large language models (LLMs), Metaverse, Metaverses, Semantic communication, Virtual environments, Wireless systems

@article{aloudat_metaverse_2025,

title = {Metaverse Unbound: A Survey on Synergistic Integration Between Semantic Communication, 6G, and Edge Learning},

author = {M. Z. Aloudat and A. Aboumadi and A. Soliman and H. A. Al-Mohammed and M. Al-Ali and A. Mahgoub and M. Barhamgi and E. Yaacoub},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105003088610&doi=10.1109%2fACCESS.2025.3555753&partnerID=40&md5=8f3f9421ce2d6be57f8154a122ee192c},

doi = {10.1109/ACCESS.2025.3555753},

issn = {21693536 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Access},

volume = {13},

pages = {58302–58350},

abstract = {With a focus on edge learning, blockchain, sixth generation (6G) wireless systems, semantic communication, and large language models (LLMs), this survey paper examines the revolutionary integration of cutting-edge technologies within the metaverse. This thorough examination highlights the critical role these technologies play in improving realism and user engagement on three main levels: technical, virtual, and physical. While the virtual layer focuses on building immersive experiences, the physical layer highlights improvements to the user interface through augmented reality (AR) goggles and virtual reality (VR) headsets. Blockchain-powered technical layer enables safe, decentralized communication. The survey highlights how the metaverse has the potential to drastically change how people interact in society by exploring applications in a variety of fields, such as immersive education, remote work, and entertainment. Concerns about privacy, scalability, and interoperability are raised, highlighting the necessity of continued study to realize the full potential of the metaverse. For scholars looking to broaden the reach and significance of the metaverse in the digital age, this paper is a useful tool. © 2013 IEEE.},

keywords = {6g wireless system, 6G wireless systems, Augmented Reality, Block-chain, Blockchain, Blockchain technology, Digital Twin Technology, Edge learning, Extended reality (XR), Language Model, Large language model, large language models (LLMs), Metaverse, Metaverses, Semantic communication, Virtual environments, Wireless systems},

pubstate = {published},

tppubtype = {article}

}

Zhang, G.; Wang, Y.; Luo, C.; Xu, S.; Ming, Y.; Peng, J.; Zhang, M.

Visual Harmony: LLM’s Power in Crafting Coherent Indoor Scenes from Images Proceedings Article

In: Z., Lin; H., Zha; M.-M., Cheng; R., He; C.-L., Liu; K., Ubul; W., Silamu; J., Zhou (Ed.): Lect. Notes Comput. Sci., pp. 3–17, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-981978507-0 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented Reality, Depth perception, Indoor scene generation, Input image, Language Model, Large language model, Metaverses, Point-clouds, Power, Scene completion, Scene Generation, Scene-graphs, Semantic Segmentation, Semantics, Virtual Reality, Visual languages

@inproceedings{zhang_visual_2025,

title = {Visual Harmony: LLM’s Power in Crafting Coherent Indoor Scenes from Images},

author = {G. Zhang and Y. Wang and C. Luo and S. Xu and Y. Ming and J. Peng and M. Zhang},

editor = {Lin Z. and Zha H. and Cheng M.-M. and He R. and Liu C.-L. and Ubul K. and Silamu W. and Zhou J.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209374797&doi=10.1007%2f978-981-97-8508-7_1&partnerID=40&md5=5231ab0bce95fb3f09db80392acd58ff},

doi = {10.1007/978-981-97-8508-7_1},

isbn = {03029743 (ISSN); 978-981978507-0 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15036 LNCS},

pages = {3–17},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Indoor scene generation has recently attracted significant attention as it is crucial for metaverse, 3D animation, visual effects in movies, and virtual/augmented reality. Existing image-based indoor scene generation methods often produce scenes that are not realistic enough, with issues such as floating objects, incorrect object orientations, and incomplete scenes that only include the part of the scenes captured by the input image. To address these challenges, we propose Visual Harmony, a method that leverages the powerful spatial imagination capabilities of Large Language Model (LLM) to generate corresponding indoor scenes based on the input image. Specifically, we first extract information from the input image through depth estimation and panorama segmentation, reconstructing a semantic point cloud. Using this reconstructed semantic point cloud, we extract a scene graph that describes only the objects in the image. Then we leverage the strong spatial imagination capabilities of LLM to complete the scene graph, forming a representation of a complete room scene. Based on this fine scene graph, we can generate entire indoor scene that includes both the captured and not captured parts of the input image. Extensive experiments demonstrate that our method can generate realistic, plausible, and highly relevant complete indoor scenes related to the input image. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.},

keywords = {Augmented Reality, Depth perception, Indoor scene generation, Input image, Language Model, Large language model, Metaverses, Point-clouds, Power, Scene completion, Scene Generation, Scene-graphs, Semantic Segmentation, Semantics, Virtual Reality, Visual languages},

pubstate = {published},

tppubtype = {inproceedings}

}

Chen, J.; Wu, X.; Lan, T.; Li, B.

LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models Journal Article

In: IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 5, pp. 2715–2724, 2025, ISSN: 10772626 (ISSN).

Abstract | Links | BibTeX | Tags: % reductions, 3D modeling, algorithm, Algorithms, Augmented Reality, Coding errors, Computer graphics, Computer interaction, computer interface, Computer simulation languages, Extended reality, generative artificial intelligence, human, Human users, human-computer interaction, Humans, Imaging, Immersive, Language, Language Model, Large language model, large language models, Metadata, Natural Language Processing, Natural language processing systems, Natural languages, procedures, Script generation, Spatio-temporal data, Three dimensional computer graphics, Three-Dimensional, three-dimensional imaging, User-Computer Interface, Virtual Reality

@article{chen_llmer_2025,

title = {LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models},

author = {J. Chen and X. Wu and T. Lan and B. Li},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105003825793&doi=10.1109%2fTVCG.2025.3549549&partnerID=40&md5=da4681d0714548e3a7e0c8c3295d2348},

doi = {10.1109/TVCG.2025.3549549},

issn = {10772626 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Transactions on Visualization and Computer Graphics},

volume = {31},

number = {5},

pages = {2715–2724},

abstract = {The integration of Large Language Models (LLMs) like GPT-4 with Extended Reality (XR) technologies offers the potential to build truly immersive XR environments that interact with human users through natural language, e.g., generating and animating 3D scenes from audio inputs. However, the complexity of XR environments makes it difficult to accurately extract relevant contextual data and scene/object parameters from an overwhelming volume of XR artifacts. It leads to not only increased costs with pay-per-use models, but also elevated levels of generation errors. Moreover, existing approaches focusing on coding script generation are often prone to generation errors, resulting in flawed or invalid scripts, application crashes, and ultimately a degraded user experience. To overcome these challenges, we introduce LLMER, a novel framework that creates interactive XR worlds using JSON data generated by LLMs. Unlike prior approaches focusing on coding script generation, LLMER translates natural language inputs into JSON data, significantly reducing the likelihood of application crashes and processing latency. It employs a multi-stage strategy to supply only the essential contextual information adapted to the user's request and features multiple modules designed for various XR tasks. Our preliminary user study reveals the effectiveness of the proposed system, with over 80% reduction in consumed tokens and around 60% reduction in task completion time compared to state-of-the-art approaches. The analysis of users' feedback also illuminates a series of directions for further optimization. © 1995-2012 IEEE.},

keywords = {% reductions, 3D modeling, algorithm, Algorithms, Augmented Reality, Coding errors, Computer graphics, Computer interaction, computer interface, Computer simulation languages, Extended reality, generative artificial intelligence, human, Human users, human-computer interaction, Humans, Imaging, Immersive, Language, Language Model, Large language model, large language models, Metadata, Natural Language Processing, Natural language processing systems, Natural languages, procedures, Script generation, Spatio-temporal data, Three dimensional computer graphics, Three-Dimensional, three-dimensional imaging, User-Computer Interface, Virtual Reality},

pubstate = {published},

tppubtype = {article}

}