AHCI RESEARCH GROUP
Publications
Papers published in international journals,
proceedings of conferences, workshops and books.
OUR RESEARCH
Scientific Publications
How to
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.
2025
Cruz, T. A. Da; Munoz, O.; Giligny, F.; Gouranton, V.
For a Perception of Monumentality in Eastern Arabia from the Neolithic to the Bronze Age: 3D Reconstruction and Multidimensional Simulations of Monuments and Landscapes Proceedings Article
In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 47–50, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).
Abstract | Links | BibTeX | Tags: 3D reconstruction, 4D simulations, Archaeological Site, Bronze age, Digital elevation model, Eastern Arabia, Eastern arabium, Monumentality, Multidimensional simulation, Simulation virtual realities, Spatial dimension, Temporal dimensions, Three dimensional computer graphics, Virtual Reality
@inproceedings{da_cruz_for_2025,
title = {For a Perception of Monumentality in Eastern Arabia from the Neolithic to the Bronze Age: 3D Reconstruction and Multidimensional Simulations of Monuments and Landscapes},
author = {T. A. Da Cruz and O. Munoz and F. Giligny and V. Gouranton},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005139996&doi=10.1109%2fVRW66409.2025.00018&partnerID=40&md5=14e05ff7019a4c9d712fe42aef776c8d},
doi = {10.1109/VRW66409.2025.00018},
isbn = {979-833151484-6 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},
pages = {47–50},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {The monumentality of Neolithic and Early Bronze Age (6th to 3rd millennium BC) structures in the Arabian Peninsula has never been approached through a comprehensive approach of simulations and reconstructions. As a result, its perception remains understudied. By combining archaeological and paleoenvironmental data, 3D reconstruction, 4D simulations, virtual reality and generative AI, this PhD research project proposes to analyse the perception of monuments, exploring their spatial, visual and temporal dimensions, in order to answer to the following question: how can we reconstruct and analyse the perception of monumentality in Eastern Arabia through 4D simulations, and how can the study of this perception influence our understanding of monumentality and territories?This article presents a work in progress, after three months of research of which one month on the Dhabtiyah archaeological site (Saudi Arabia, Eastern Province). © 2025 IEEE.},
keywords = {3D reconstruction, 4D simulations, Archaeological Site, Bronze age, Digital elevation model, Eastern Arabia, Eastern arabium, Monumentality, Multidimensional simulation, Simulation virtual realities, Spatial dimension, Temporal dimensions, Three dimensional computer graphics, Virtual Reality},
pubstate = {published},
tppubtype = {inproceedings}
}
Li, C.; Da, F.
Refined dense face alignment through image matching Journal Article
In: Visual Computer, vol. 41, no. 1, pp. 157–171, 2025, ISSN: 01782789 (ISSN).
Abstract | Links | BibTeX | Tags: 3D Avatars, Alignment, Dense geometric supervision, Face alignment, Face deformations, Face reconstruction, Geometry, Human computer interaction, Image enhancement, Image matching, Image Reconstruction, Metaverses, Outlier mixup, Pixels, Rendered images, Rendering (computer graphics), State of the art, Statistics, Target images, Three dimensional computer graphics
@article{li_refined_2025,
title = {Refined dense face alignment through image matching},
author = {C. Li and F. Da},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85187924785&doi=10.1007%2fs00371-024-03316-3&partnerID=40&md5=839834c6ff3320398d5ef75b055947cb},
doi = {10.1007/s00371-024-03316-3},
issn = {01782789 (ISSN)},
year = {2025},
date = {2025-01-01},
journal = {Visual Computer},
volume = {41},
number = {1},
pages = {157–171},
abstract = {Face alignment is the foundation of building 3D avatars for virtue communication in the metaverse, human-computer interaction, AI-generated content, etc., and therefore, it is critical that face deformation is reflected precisely to better convey expression, pose and identity. However, misalignment exists in the currently best methods that fit a face model to a target image and can be easily captured by human perception, thus degrading the reconstruction quality. The main reason is that the widely used metrics for training, including the landmark re-projection loss, pixel-wise loss and perception-level loss, are insufficient to address the misalignment and suffer from ambiguity and local minimums. To address misalignment, we propose an image MAtchinG-driveN dEnse geomeTrIC supervision (MAGNETIC). Specifically, we treat face alignment as a matching problem and establish pixel-wise correspondences between the target and rendered images. Then reconstructed facial points are guided towards their corresponding points on the target image, thus improving reconstruction. Synthesized image pairs are mixed up with face outliers to simulate the target and rendered images with ground-truth pixel-wise correspondences to enable the training of a robust prediction network. Compared with existing methods that turn to 3D scans for dense geometric supervision, our method reaches comparable shape reconstruction results with much lower effort. Experimental results on the NoW testset show that we reach the state-of-the-art among all self-supervised methods and even outperform methods using photo-realistic images. We also achieve comparable results with the state-of-the-art on the benchmark of Feng et al. Codes will be available at: github.com/ChunLLee/ReconstructionFromMatching. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.},
keywords = {3D Avatars, Alignment, Dense geometric supervision, Face alignment, Face deformations, Face reconstruction, Geometry, Human computer interaction, Image enhancement, Image matching, Image Reconstruction, Metaverses, Outlier mixup, Pixels, Rendered images, Rendering (computer graphics), State of the art, Statistics, Target images, Three dimensional computer graphics},
pubstate = {published},
tppubtype = {article}
}
Dong, Y.
Enhancing Painting Exhibition Experiences with the Application of Augmented Reality-Based AI Video Generation Technology Proceedings Article
In: P., Zaphiris; A., Ioannou; A., Ioannou; R.A., Sottilare; J., Schwarz; M., Rauterberg (Ed.): Lect. Notes Comput. Sci., pp. 256–262, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303176814-9 (ISBN).
Abstract | Links | BibTeX | Tags: 3D modeling, AI-generated art, Art and Technology, Arts computing, Augmented Reality, Augmented reality technology, Digital Exhibition Design, Dynamic content, E-Learning, Education computing, Generation technologies, Interactive computer graphics, Knowledge Management, Multi dimensional, Planning designs, Three dimensional computer graphics, Video contents, Video generation
@inproceedings{dong_enhancing_2025,
title = {Enhancing Painting Exhibition Experiences with the Application of Augmented Reality-Based AI Video Generation Technology},
author = {Y. Dong},
editor = {Zaphiris P. and Ioannou A. and Ioannou A. and Sottilare R.A. and Schwarz J. and Rauterberg M.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85213302959&doi=10.1007%2f978-3-031-76815-6_18&partnerID=40&md5=35484f5ed199a831f1a30f265a0d32d5},
doi = {10.1007/978-3-031-76815-6_18},
isbn = {03029743 (ISSN); 978-303176814-9 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Lect. Notes Comput. Sci.},
volume = {15378 LNCS},
pages = {256–262},
publisher = {Springer Science and Business Media Deutschland GmbH},
abstract = {Traditional painting exhibitions often rely on flat presentation methods, such as walls and stands, limiting their impact. Augmented Reality (AR) technology presents an opportunity to transform these experiences by turning static, flat artwork into dynamic, multi-dimensional presentations. However, creating and integrating video or dynamic content can be time-consuming and challenging, requiring meticulous planning, design, and production. In the context of urban renewal and community revitalization, particularly in China’s first-tier cities where real estate development has saturated the market, there is a growing trend to repurpose traditional commercial and office spaces with cultural and artistic exhibitions. These exhibitions not only enhance the spatial quality but also elevate the user experience, making the spaces more competitive. However, these non-traditional exhibition venues often lack the amenities of professional galleries, relying on walls, windows, and corners for displays, and requiring quick setup times. For visitors, who are often office workers or shoppers with limited time, the use of personal mobile devices for interaction is common. WeChat, China’s most widely used mobile application, provides a platform for convenient digital interactive experiences through mini-programs, which can support lightweight AR applications. AI video generation technologies, such as Conditional Generative Adversarial Networks (ControlNet) and Latent Consistency Models (LCM), have seen significant advancements. These technologies now allow for the creation of 3D models and video content from text and images. Tools like Meshy and Pika provide the ability to generate various video styles and offer precise control over video content. New AI video applications like Stable Video further expand the possibilities by rapidly converting static images into dynamic videos, facilitating easy adjustments and edits. This paper explores the application of AR-based AI video generation technology in enhancing the experience of painting exhibitions. By integrating these technologies, traditional paintings can be transformed into interactive, engaging displays that enrich the viewer’s experience. The study demonstrates the potential of these innovations to make art exhibitions more appealing and competitive in various public spaces, thereby improving both artistic expression and audience engagement. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},
keywords = {3D modeling, AI-generated art, Art and Technology, Arts computing, Augmented Reality, Augmented reality technology, Digital Exhibition Design, Dynamic content, E-Learning, Education computing, Generation technologies, Interactive computer graphics, Knowledge Management, Multi dimensional, Planning designs, Three dimensional computer graphics, Video contents, Video generation},
pubstate = {published},
tppubtype = {inproceedings}
}
Scofano, L.; Sampieri, A.; Matteis, E. De; Spinelli, I.; Galasso, F.
Social EgoMesh Estimation Proceedings Article
In: Proc. - IEEE Winter Conf. Appl. Comput. Vis., WACV, pp. 5948–5958, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151083-1 (ISBN).
Abstract | Links | BibTeX | Tags: Augmented reality applications, Ego-motion, Egocentric view, Generative AI, Human behaviors, Human mesh recovery, Limited visibility, Recent researches, Three dimensional computer graphics, Video sequences, Virtual and augmented reality
@inproceedings{scofano_social_2025,
title = {Social EgoMesh Estimation},
author = {L. Scofano and A. Sampieri and E. De Matteis and I. Spinelli and F. Galasso},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105003632729&doi=10.1109%2fWACV61041.2025.00580&partnerID=40&md5=3c2b2d069ffb596c64ee8dbc211b74a8},
doi = {10.1109/WACV61041.2025.00580},
isbn = {979-833151083-1 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Proc. - IEEE Winter Conf. Appl. Comput. Vis., WACV},
pages = {5948–5958},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {Accurately estimating the 3D pose of the camera wearer in egocentric video sequences is crucial to modeling human behavior in virtual and augmented reality applications. The task presents unique challenges due to the limited visibility of the user's body caused by the front-facing camera mounted on their head. Recent research has explored the utilization of the scene and ego-motion, but it has overlooked humans' interactive nature. We propose a novel framework for Social Egocentric Estimation of body MEshes (SEE-ME). Our approach is the first to estimate the wearer's mesh using only a latent probabilistic diffusion model, which we condition on the scene and, for the first time, on the social wearer-interactee interactions. Our in-depth study sheds light on when social interaction matters most for ego-mesh estimation; it quantifies the impact of interpersonal distance and gaze direction. Overall, SEEME surpasses the current best technique, reducing the pose estimation error (MPJPE) by 53%. The code is available at SEEME. © 2025 IEEE.},
keywords = {Augmented reality applications, Ego-motion, Egocentric view, Generative AI, Human behaviors, Human mesh recovery, Limited visibility, Recent researches, Three dimensional computer graphics, Video sequences, Virtual and augmented reality},
pubstate = {published},
tppubtype = {inproceedings}
}
Shen, Y.; Li, B.; Huang, J.; Wang, Z.
GaussianShopVR: Facilitating Immersive 3D Authoring Using Gaussian Splatting in VR Proceedings Article
In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 1292–1293, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).
Abstract | Links | BibTeX | Tags: 3D authoring, 3D modeling, Digital replicas, Gaussian distribution, Gaussian Splatting editing, Gaussians, Graphical user interfaces, High quality, Immersive, Immersive environment, Interactive computer graphics, Rendering (computer graphics), Rendering pipelines, Splatting, Three dimensional computer graphics, User profile, Virtual Reality, Virtual reality user interface, Virtualization, VR user interface
@inproceedings{shen_gaussianshopvr_2025,
title = {GaussianShopVR: Facilitating Immersive 3D Authoring Using Gaussian Splatting in VR},
author = {Y. Shen and B. Li and J. Huang and Z. Wang},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005138672&doi=10.1109%2fVRW66409.2025.00292&partnerID=40&md5=9b644bd19394a289d3027ab9a2dfed6a},
doi = {10.1109/VRW66409.2025.00292},
isbn = {979-833151484-6 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},
pages = {1292–1293},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {Virtual reality (VR) applications require massive high-quality 3D assets to create immersive environments. Generating mesh-based 3D assets typically involves a significant amount of manpower and effort, which makes VR applications less accessible. 3D Gaussian Splatting (3DGS) has attracted much attention for its ability to quickly create digital replicas of real-life scenes and its compatibility with traditional rendering pipelines. However, it remains a challenge to edit 3DGS in a flexible and controllable manner. We propose GaussianShopVR, a system that leverages VR user interfaces to specify target areas to achieve flexible and controllable editing of reconstructed 3DGS. In addition, selected areas can provide 3D information to generative AI models to facilitate the editing. GaussianShopVR integrates object hierarchy management while keeping the backpropagated gradient flow to allow local editing with context information. © 2025 IEEE.},
keywords = {3D authoring, 3D modeling, Digital replicas, Gaussian distribution, Gaussian Splatting editing, Gaussians, Graphical user interfaces, High quality, Immersive, Immersive environment, Interactive computer graphics, Rendering (computer graphics), Rendering pipelines, Splatting, Three dimensional computer graphics, User profile, Virtual Reality, Virtual reality user interface, Virtualization, VR user interface},
pubstate = {published},
tppubtype = {inproceedings}
}
Sajiukumar, A.; Ranjan, A.; Parvathi, P. K.; Satheesh, A.; Udayan, J. Divya; Subramaniam, U.
Generative AI-Enabled Virtual Twin for Meeting Assistants Proceedings Article
In: T., Saba; A., Rehman (Ed.): Proc. - Int. Women Data Sci. Conf. at Prince Sultan Univ., WiDS-PSU, pp. 60–65, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152092-2 (ISBN).
Abstract | Links | BibTeX | Tags: 3D avatar generation, 3D Avatars, 3D reconstruction, AI-augmented interaction, Augmented Reality, Communication and collaborations, Conversational AI, Neural radiation field, neural radiation fields (NeRF), Radiation field, Real time performance, real-time performance, Three dimensional computer graphics, Virtual spaces, Voice cloning
@inproceedings{sajiukumar_generative_2025,
title = {Generative AI-Enabled Virtual Twin for Meeting Assistants},
author = {A. Sajiukumar and A. Ranjan and P. K. Parvathi and A. Satheesh and J. Divya Udayan and U. Subramaniam},
editor = {Saba T. and Rehman A.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007691247&doi=10.1109%2fWiDS-PSU64963.2025.00025&partnerID=40&md5=f0bfb74a8f854c427054c73582909185},
doi = {10.1109/WiDS-PSU64963.2025.00025},
isbn = {979-833152092-2 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Proc. - Int. Women Data Sci. Conf. at Prince Sultan Univ., WiDS-PSU},
pages = {60–65},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {The growing dependence on virtual spaces for communication and collaboration has transformed interactions in numerous industries, ranging from professional meetings to education, entertainment, and healthcare. Despite the advancement of AI technologies such as three-dimensional modeling, voice cloning, and conversational AI, the convergence of these technologies in a single platform is still challenging. This paper introduces a unified framework that brings together state-of-the-art 3D avatar generation, real-time voice cloning, and conversational AI to enhance virtual interactions. The system utilizes Triplane neural representations and neural radiation fields (NeRF) for high-fidelity 3D avatar generation, speaker encoders coupled with Tacotron 2 and WaveRNN for natural voice cloning, and a context-aware chat algorithm for adaptive conversations. By overcoming the challenges of customization, integration, and real-time performance, the proposed framework addresses the increasing needs for realistic virtual representations, setting new benchmarks for AI-augmented interaction in virtual conferences, online representation, education, and healthcare. © 2025 IEEE.},
keywords = {3D avatar generation, 3D Avatars, 3D reconstruction, AI-augmented interaction, Augmented Reality, Communication and collaborations, Conversational AI, Neural radiation field, neural radiation fields (NeRF), Radiation field, Real time performance, real-time performance, Three dimensional computer graphics, Virtual spaces, Voice cloning},
pubstate = {published},
tppubtype = {inproceedings}
}
Ly, D. -N.; Do, H. -N.; Tran, M. -T.; Le, K. -D.
Evaluation of AI-Based Assistant Representations on User Interaction in Virtual Explorations Proceedings Article
In: W., Buntine; M., Fjeld; T., Tran; M.-T., Tran; B., Huynh Thi Thanh; T., Miyoshi (Ed.): Commun. Comput. Info. Sci., pp. 323–337, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 18650929 (ISSN); 978-981964287-8 (ISBN).
Abstract | Links | BibTeX | Tags: 360-degree Video, AI-Based Assistant, Cultural heritages, Cultural science, Multiusers, Single users, Social interactions, Three dimensional computer graphics, User interaction, Users' experiences, Virtual environments, Virtual Exploration, Virtual Reality, Virtualization
@inproceedings{ly_evaluation_2025,
title = {Evaluation of AI-Based Assistant Representations on User Interaction in Virtual Explorations},
author = {D. -N. Ly and H. -N. Do and M. -T. Tran and K. -D. Le},
editor = {Buntine W. and Fjeld M. and Tran T. and Tran M.-T. and Huynh Thi Thanh B. and Miyoshi T.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105004253350&doi=10.1007%2f978-981-96-4288-5_26&partnerID=40&md5=5f0a8c1e356cd3bdd4dda7f96f272154},
doi = {10.1007/978-981-96-4288-5_26},
isbn = {18650929 (ISSN); 978-981964287-8 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Commun. Comput. Info. Sci.},
volume = {2352 CCIS},
pages = {323–337},
publisher = {Springer Science and Business Media Deutschland GmbH},
abstract = {Exploration activities, such as tourism, cultural heritage, and science, enhance knowledge and understanding. The rise of 360-degree videos allows users to explore cultural landmarks and destinations remotely. While multi-user VR environments encourage collaboration, single-user experiences often lack social interaction. Generative AI, particularly Large Language Models (LLMs), offer a way to improve single-user VR exploration through AI-driven virtual assistants, acting as tour guides or storytellers. However, it’s uncertain whether these assistants require a visual presence, and if so, what form it should take. To investigate this, we developed an AI-based assistant in three different forms: a voice-only avatar, a 3D human-sized avatar, and a mini-hologram avatar, and conducted a user study to evaluate their impact on user experience. The study, which involved 12 participants, found that the visual embodiments significantly reduce feelings of being alone, with distinct user preferences between the Human-sized avatar and the Mini hologram. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.},
keywords = {360-degree Video, AI-Based Assistant, Cultural heritages, Cultural science, Multiusers, Single users, Social interactions, Three dimensional computer graphics, User interaction, Users' experiences, Virtual environments, Virtual Exploration, Virtual Reality, Virtualization},
pubstate = {published},
tppubtype = {inproceedings}
}
Rasch, J.; Töws, J.; Hirzle, T.; Müller, F.; Schmitz, M.
CreepyCoCreator? Investigating AI Representation Modes for 3D Object Co-Creation in Virtual Reality Proceedings Article
In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071394-1 (ISBN).
Abstract | Links | BibTeX | Tags: 3D Creation, 3D modeling, 3D object, Building process, Co-creation, Co-creative system, Co-creative systems, Creative systems, Creatives, Generative AI, Three dimensional computer graphics, User expectations, User Studies, User study, Virtual Reality, Virtualization
@inproceedings{rasch_creepycocreator_2025,
title = {CreepyCoCreator? Investigating AI Representation Modes for 3D Object Co-Creation in Virtual Reality},
author = {J. Rasch and J. Töws and T. Hirzle and F. Müller and M. Schmitz},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005742763&doi=10.1145%2f3706598.3713720&partnerID=40&md5=e6cdcb6cc7249a8836ecc39ae103cd53},
doi = {10.1145/3706598.3713720},
isbn = {979-840071394-1 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Conf Hum Fact Comput Syst Proc},
publisher = {Association for Computing Machinery},
abstract = {Generative AI in Virtual Reality offers the potential for collaborative object-building, yet challenges remain in aligning AI contributions with user expectations. In particular, users often struggle to understand and collaborate with AI when its actions are not transparently represented. This paper thus explores the co-creative object-building process through a Wizard-of-Oz study, focusing on how AI can effectively convey its intent to users during object customization in Virtual Reality. Inspired by human-to-human collaboration, we focus on three representation modes: the presence of an embodied avatar, whether the AI's contributions are visualized immediately or incrementally, and whether the areas modified are highlighted in advance. The findings provide insights into how these factors affect user perception and interaction with object-generating AI tools in Virtual Reality as well as satisfaction and ownership of the created objects. The results offer design implications for co-creative world-building systems, aiming to foster more effective and satisfying collaborations between humans and AI in Virtual Reality. © 2025 Copyright held by the owner/author(s).},
keywords = {3D Creation, 3D modeling, 3D object, Building process, Co-creation, Co-creative system, Co-creative systems, Creative systems, Creatives, Generative AI, Three dimensional computer graphics, User expectations, User Studies, User study, Virtual Reality, Virtualization},
pubstate = {published},
tppubtype = {inproceedings}
}
Cao, X.; Ju, K. P.; Li, C.; Jain, D.
SceneGenA11y: How can Runtime Generative tools improve the Accessibility of a Virtual 3D Scene? Proceedings Article
In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071395-8 (ISBN).
Abstract | Links | BibTeX | Tags: 3D application, 3D modeling, 3D scenes, Accessibility, BLV, DHH, Discrete event simulation, Generative AI, Generative tools, Interactive computer graphics, One dimensional, Runtimes, Three dimensional computer graphics, Video-games, Virtual 3d scene, virtual 3D scenes, Virtual environments, Virtual Reality
@inproceedings{cao_scenegena11y_2025,
title = {SceneGenA11y: How can Runtime Generative tools improve the Accessibility of a Virtual 3D Scene?},
author = {X. Cao and K. P. Ju and C. Li and D. Jain},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005772656&doi=10.1145%2f3706599.3720265&partnerID=40&md5=9b0bf29c3e89b70efa2d6a3e740829fb},
doi = {10.1145/3706599.3720265},
isbn = {979-840071395-8 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Conf Hum Fact Comput Syst Proc},
publisher = {Association for Computing Machinery},
abstract = {With the popularity of virtual 3D applications, from video games to educational content and virtual reality scenarios, the accessibility of 3D scene information is vital to ensure inclusive and equitable experiences for all. Previous work include information substitutions like audio description and captions, as well as personalized modifications, but they could only provide predefined accommodations. In this work, we propose SceneGenA11y, a system that responds to the user’s natural language prompts to improve accessibility of a 3D virtual scene in runtime. The system primes LLM agents with accessibility-related knowledge, allowing users to explore the scene and perform verifiable modifications to improve accessibility. We conducted a preliminary evaluation of our system with three blind and low-vision people and three deaf and hard-of-hearing people. The results show that our system is intuitive to use and can successfully improve accessibility. We discussed usage patterns of the system, potential improvements, and integration into apps. We ended with highlighting plans for future work. © 2025 Copyright held by the owner/author(s).},
keywords = {3D application, 3D modeling, 3D scenes, Accessibility, BLV, DHH, Discrete event simulation, Generative AI, Generative tools, Interactive computer graphics, One dimensional, Runtimes, Three dimensional computer graphics, Video-games, Virtual 3d scene, virtual 3D scenes, Virtual environments, Virtual Reality},
pubstate = {published},
tppubtype = {inproceedings}
}
Tong, Y.; Qiu, Y.; Li, R.; Qiu, S.; Heng, P. -A.
MS2Mesh-XR: Multi-Modal Sketch-to-Mesh Generation in XR Environments Proceedings Article
In: Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR, pp. 272–276, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152157-8 (ISBN).
Abstract | Links | BibTeX | Tags: 3D meshes, 3D object, ControlNet, Hand-drawn sketches, Hands movement, High quality, Image-based, immersive visualization, Mesh generation, Multi-modal, Pipeline codes, Realistic images, Three dimensional computer graphics, Virtual environments, Virtual Reality
@inproceedings{tong_ms2mesh-xr_2025,
title = {MS2Mesh-XR: Multi-Modal Sketch-to-Mesh Generation in XR Environments},
author = {Y. Tong and Y. Qiu and R. Li and S. Qiu and P. -A. Heng},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000423684&doi=10.1109%2fAIxVR63409.2025.00052&partnerID=40&md5=caeace6850dcbdf8c1fa0441b98fa8d9},
doi = {10.1109/AIxVR63409.2025.00052},
isbn = {979-833152157-8 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR},
pages = {272–276},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {We present MS2Mesh-XR, a novel multimodal sketch-to-mesh generation pipeline that enables users to create realistic 3D objects in extended reality (XR) environments using hand-drawn sketches assisted by voice inputs. In specific, users can intuitively sketch objects using natural hand movements in mid-air within a virtual environment. By integrating voice inputs, we devise ControlNet to infer realistic images based on the drawn sketches and interpreted text prompts. Users can then review and select their preferred image, which is subsequently reconstructed into a detailed 3D mesh using the Convolutional Reconstruction Model. In particular, our proposed pipeline can generate a high-quality 3D mesh in less than 20 seconds, allowing for immersive visualization and manipulation in runtime XR scenes. We demonstrate the practicability of our pipeline through two use cases in XR settings. By leveraging natural user inputs and cutting-edge generative AI capabilities, our approach can significantly facilitate XR-based creative production and enhance user experiences. Our code and demo will be available at: https://yueqiu0911.github.io/MS2Mesh-XR/. © 2025 IEEE.},
keywords = {3D meshes, 3D object, ControlNet, Hand-drawn sketches, Hands movement, High quality, Image-based, immersive visualization, Mesh generation, Multi-modal, Pipeline codes, Realistic images, Three dimensional computer graphics, Virtual environments, Virtual Reality},
pubstate = {published},
tppubtype = {inproceedings}
}
Behravan, M.; Haghani, M.; Gračanin, D.
Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality Proceedings Article
In: J.Y.C., Chen; G., Fragomeni (Ed.): Lect. Notes Comput. Sci., pp. 13–32, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193699-9 (ISBN).
Abstract | Links | BibTeX | Tags: 3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering
@inproceedings{behravan_transcending_2025,
title = {Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality},
author = {M. Behravan and M. Haghani and D. Gračanin},
editor = {Chen J.Y.C. and Fragomeni G.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007690904&doi=10.1007%2f978-3-031-93700-2_2&partnerID=40&md5=1c4d643aad88d08cbbc9dd2c02413f10},
doi = {10.1007/978-3-031-93700-2_2},
isbn = {03029743 (ISSN); 978-303193699-9 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Lect. Notes Comput. Sci.},
volume = {15788 LNCS},
pages = {13–32},
publisher = {Springer Science and Business Media Deutschland GmbH},
abstract = {Traditional 3D modeling requires technical expertise, specialized software, and time-intensive processes, making it inaccessible for many users. Our research aims to lower these barriers by combining generative AI and augmented reality (AR) into a cohesive system that allows users to easily generate, manipulate, and interact with 3D models in real time, directly within AR environments. Utilizing cutting-edge AI models like Shap-E, we address the complex challenges of transforming 2D images into 3D representations in AR environments. Key challenges such as object isolation, handling intricate backgrounds, and achieving seamless user interaction are tackled through advanced object detection methods, such as Mask R-CNN. Evaluation results from 35 participants reveal an overall System Usability Scale (SUS) score of 69.64, with participants who engaged with AR/VR technologies more frequently rating the system significantly higher, at 80.71. This research is particularly relevant for applications in gaming, education, and AR-based e-commerce, offering intuitive, model creation for users without specialized skills. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},
keywords = {3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering},
pubstate = {published},
tppubtype = {inproceedings}
}
Xiao, T.; Chen, Y.; Zhong, S.; Kiefer, P.; Krukar, J.; Kim, K. G.; Hurni, L.; Schwering, A.; Raubal, M.
Sketch2Terrain: AI-Driven Real-Time Terrain Sketch Mapping in Augmented Reality Proceedings Article
In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071394-1 (ISBN).
Abstract | Links | BibTeX | Tags: 3D information, Augmented Reality, Drawing (graphics), Freehand sketching, Generative 3D sketch mapping, Generative AI, Mapping systems, Photomapping, Real-time terrains, Sketch maps, Spatial cognition, Spatial informations, terrain generation, Terrain generations, Three dimensional computer graphics
@inproceedings{xiao_sketch2terrain_2025,
title = {Sketch2Terrain: AI-Driven Real-Time Terrain Sketch Mapping in Augmented Reality},
author = {T. Xiao and Y. Chen and S. Zhong and P. Kiefer and J. Krukar and K. G. Kim and L. Hurni and A. Schwering and M. Raubal},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005747437&doi=10.1145%2f3706598.3713467&partnerID=40&md5=bc38e658cfe7ae83792e8837d496f2c7},
doi = {10.1145/3706598.3713467},
isbn = {979-840071394-1 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Conf Hum Fact Comput Syst Proc},
publisher = {Association for Computing Machinery},
abstract = {Sketch mapping is an effective technique to externalize and communicate spatial information. However, it has been limited to 2D mediums, making it difficult to represent 3D information, particularly for terrains with elevation changes. We present Sketch2Terrain, an intuitive generative-3D-sketch-mapping system combining freehand sketching with generative Artificial Intelligence that radically changes sketch map creation and representation using Augmented Reality. Sketch2Terrain empowers non-experts to create unambiguous sketch maps of natural environments and provides a homogeneous interface for researchers to collect data and conduct experiments. A between-subject study (N=36) revealed that generative-3D-sketch-mapping improved efficiency by 38.4%, terrain-topology accuracy by 12.5%, and landmark accuracy by up to 12.1%, with only a 4.7% trade-off in terrain-elevation accuracy compared to freehand 3D-sketch-mapping. Additionally, generative-3D-sketch-mapping reduced perceived strain by 60.5% and stress by 39.5% over 2D-sketch-mapping. These findings underscore potential applications of generative-3D-sketch-mapping for in-depth understanding and accurate representation of vertically complex environments. The implementation is publicly available. © 2025 Copyright held by the owner/author(s).},
keywords = {3D information, Augmented Reality, Drawing (graphics), Freehand sketching, Generative 3D sketch mapping, Generative AI, Mapping systems, Photomapping, Real-time terrains, Sketch maps, Spatial cognition, Spatial informations, terrain generation, Terrain generations, Three dimensional computer graphics},
pubstate = {published},
tppubtype = {inproceedings}
}
Vachha, C.; Kang, Y.; Dive, Z.; Chidambaram, A.; Gupta, A.; Jun, E.; Hartmann, B.
Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs Proceedings Article
In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071394-1 (ISBN).
Abstract | Links | BibTeX | Tags: 3D modeling, 3D scenes, AI assisted creativity tool, Animation, Computer vision, Direct manipulation, Drawing (graphics), Gaussian Splatting, Gaussians, Generative AI, Graphic, Graphics, High level languages, Immersive, Interactive computer graphics, Splatting, Three dimensional computer graphics, Virtual Reality, Worldbuilding interface
@inproceedings{vachha_dreamcrafter_2025,
title = {Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs},
author = {C. Vachha and Y. Kang and Z. Dive and A. Chidambaram and A. Gupta and E. Jun and B. Hartmann},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005725679&doi=10.1145%2f3706598.3714312&partnerID=40&md5=68cf2a08d3057fd9756e25d53959872b},
doi = {10.1145/3706598.3714312},
isbn = {979-840071394-1 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Conf Hum Fact Comput Syst Proc},
publisher = {Association for Computing Machinery},
abstract = {Authoring 3D scenes is a central task for spatial computing applications. Competing visions for lowering existing barriers are (1) focus on immersive, direct manipulation of 3D content or (2) leverage AI techniques that capture real scenes (3D Radiance Fields such as, NeRFs, 3D Gaussian Splatting) and modify them at a higher level of abstraction, at the cost of high latency. We unify the complementary strengths of these approaches and investigate how to integrate generative AI advances into real-time, immersive 3D Radiance Field editing. We introduce Dreamcrafter, a VR-based 3D scene editing system that: (1) provides a modular architecture to integrate generative AI algorithms; (2) combines different levels of control for creating objects, including natural language and direct manipulation; and (3) introduces proxy representations that support interaction during high-latency operations. We contribute empirical findings on control preferences and discuss how generative AI interfaces beyond text input enhance creativity in scene editing and world building. © 2025 Copyright held by the owner/author(s).},
keywords = {3D modeling, 3D scenes, AI assisted creativity tool, Animation, Computer vision, Direct manipulation, Drawing (graphics), Gaussian Splatting, Gaussians, Generative AI, Graphic, Graphics, High level languages, Immersive, Interactive computer graphics, Splatting, Three dimensional computer graphics, Virtual Reality, Worldbuilding interface},
pubstate = {published},
tppubtype = {inproceedings}
}
Chen, J.; Wu, X.; Lan, T.; Li, B.
LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models Journal Article
In: IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 5, pp. 2715–2724, 2025, ISSN: 10772626 (ISSN).
Abstract | Links | BibTeX | Tags: % reductions, 3D modeling, algorithm, Algorithms, Augmented Reality, Coding errors, Computer graphics, Computer interaction, computer interface, Computer simulation languages, Extended reality, generative artificial intelligence, human, Human users, human-computer interaction, Humans, Imaging, Immersive, Language, Language Model, Large language model, large language models, Metadata, Natural Language Processing, Natural language processing systems, Natural languages, procedures, Script generation, Spatio-temporal data, Three dimensional computer graphics, Three-Dimensional, three-dimensional imaging, User-Computer Interface, Virtual Reality
@article{chen_llmer_2025,
title = {LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models},
author = {J. Chen and X. Wu and T. Lan and B. Li},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105003825793&doi=10.1109%2fTVCG.2025.3549549&partnerID=40&md5=da4681d0714548e3a7e0c8c3295d2348},
doi = {10.1109/TVCG.2025.3549549},
issn = {10772626 (ISSN)},
year = {2025},
date = {2025-01-01},
journal = {IEEE Transactions on Visualization and Computer Graphics},
volume = {31},
number = {5},
pages = {2715–2724},
abstract = {The integration of Large Language Models (LLMs) like GPT-4 with Extended Reality (XR) technologies offers the potential to build truly immersive XR environments that interact with human users through natural language, e.g., generating and animating 3D scenes from audio inputs. However, the complexity of XR environments makes it difficult to accurately extract relevant contextual data and scene/object parameters from an overwhelming volume of XR artifacts. It leads to not only increased costs with pay-per-use models, but also elevated levels of generation errors. Moreover, existing approaches focusing on coding script generation are often prone to generation errors, resulting in flawed or invalid scripts, application crashes, and ultimately a degraded user experience. To overcome these challenges, we introduce LLMER, a novel framework that creates interactive XR worlds using JSON data generated by LLMs. Unlike prior approaches focusing on coding script generation, LLMER translates natural language inputs into JSON data, significantly reducing the likelihood of application crashes and processing latency. It employs a multi-stage strategy to supply only the essential contextual information adapted to the user's request and features multiple modules designed for various XR tasks. Our preliminary user study reveals the effectiveness of the proposed system, with over 80% reduction in consumed tokens and around 60% reduction in task completion time compared to state-of-the-art approaches. The analysis of users' feedback also illuminates a series of directions for further optimization. © 1995-2012 IEEE.},
keywords = {% reductions, 3D modeling, algorithm, Algorithms, Augmented Reality, Coding errors, Computer graphics, Computer interaction, computer interface, Computer simulation languages, Extended reality, generative artificial intelligence, human, Human users, human-computer interaction, Humans, Imaging, Immersive, Language, Language Model, Large language model, large language models, Metadata, Natural Language Processing, Natural language processing systems, Natural languages, procedures, Script generation, Spatio-temporal data, Three dimensional computer graphics, Three-Dimensional, three-dimensional imaging, User-Computer Interface, Virtual Reality},
pubstate = {published},
tppubtype = {article}
}
Azzarelli, A.; Anantrasirichai, N.; Bull, D. R.
Intelligent Cinematography: a review of AI research for cinematographic production Journal Article
In: Artificial Intelligence Review, vol. 58, no. 4, 2025, ISSN: 02692821 (ISSN).
Abstract | Links | BibTeX | Tags: Artificial intelligence research, Computer vision, Content acquisition, Creative industries, Holistic view, machine learning, Machine-learning, Mergers and acquisitions, Review papers, Three dimensional computer graphics, Video applications, Video processing, Video processing and applications, Virtual production, Virtual Reality, Vision research
@article{azzarelli_intelligent_2025,
title = {Intelligent Cinematography: a review of AI research for cinematographic production},
author = {A. Azzarelli and N. Anantrasirichai and D. R. Bull},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85217373428&doi=10.1007%2fs10462-024-11089-3&partnerID=40&md5=360923b5ba8f63b6edfa1b7fd135c926},
doi = {10.1007/s10462-024-11089-3},
issn = {02692821 (ISSN)},
year = {2025},
date = {2025-01-01},
journal = {Artificial Intelligence Review},
volume = {58},
number = {4},
abstract = {This paper offers the first comprehensive review of artificial intelligence (AI) research in the context of real camera content acquisition for entertainment purposes and is aimed at both researchers and cinematographers. Addressing the lack of review papers in the field of intelligent cinematography (IC) and the breadth of related computer vision research, we present a holistic view of the IC landscape while providing technical insight, important for experts across disciplines. We provide technical background on generative AI, object detection, automated camera calibration and 3-D content acquisition, with references to assist non-technical readers. The application sections categorize work in terms of four production types: General Production, Virtual Production, Live Production and Aerial Production. Within each application section, we (1) sub-classify work according to research topic and (2) describe the trends and challenges relevant to each type of production. In the final chapter, we address the greater scope of IC research and summarize the significant potential of this area to influence the creative industries sector. We suggest that work relating to virtual production has the greatest potential to impact other mediums of production, driven by the growing interest in LED volumes/stages for in-camera virtual effects (ICVFX) and automated 3-D capture for virtual modeling of real world scenes and actors. We also address ethical and legal concerns regarding the use of creative AI that impact on artists, actors, technologists and the general public. © The Author(s) 2025.},
keywords = {Artificial intelligence research, Computer vision, Content acquisition, Creative industries, Holistic view, machine learning, Machine-learning, Mergers and acquisitions, Review papers, Three dimensional computer graphics, Video applications, Video processing, Video processing and applications, Virtual production, Virtual Reality, Vision research},
pubstate = {published},
tppubtype = {article}
}
Chen, J.; Grubert, J.; Kristensson, P. O.
Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes Proceedings Article
In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR, pp. 206–216, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833153645-9 (ISBN).
Abstract | Links | BibTeX | Tags: 3D modeling, 3D reconstruction, 3D scene editing, 3D scenes, Computer simulation languages, Editing systems, Immersive environment, Interaction pattern, Interaction strategy, Language Model, Large language model, large language models, Multimodal Interaction, Scene editing, Three dimensional computer graphics, Virtual environments, Virtual Reality
@inproceedings{chen_analyzing_2025,
title = {Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes},
author = {J. Chen and J. Grubert and P. O. Kristensson},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105002716635&doi=10.1109%2fVR59515.2025.00045&partnerID=40&md5=306aa7fbb3dad0aa9d43545f3c7eb9ea},
doi = {10.1109/VR59515.2025.00045},
isbn = {979-833153645-9 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR},
pages = {206–216},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {As more applications of large language models (LLMs) for 3D content in immersive environments emerge, it is crucial to study user behavior to identify interaction patterns and potential barriers to guide the future design of immersive content creation and editing systems which involve LLMs. In an empirical user study with 12 participants, we combine quantitative usage data with post-experience questionnaire feedback to reveal common interaction patterns and key barriers in LLM-assisted 3D scene editing systems. We identify opportunities for improving natural language interfaces in 3D design tools and propose design recommendations. Through an empirical study, we demonstrate that LLM-assisted interactive systems can be used productively in immersive environments. © 2025 IEEE.},
keywords = {3D modeling, 3D reconstruction, 3D scene editing, 3D scenes, Computer simulation languages, Editing systems, Immersive environment, Interaction pattern, Interaction strategy, Language Model, Large language model, large language models, Multimodal Interaction, Scene editing, Three dimensional computer graphics, Virtual environments, Virtual Reality},
pubstate = {published},
tppubtype = {inproceedings}
}
Zhou, J.; Weber, R.; Wen, E.; Lottridge, D.
Real-Time Full-body Interaction with AI Dance Models: Responsiveness to Contemporary Dance Proceedings Article
In: Int Conf Intell User Interfaces Proc IUI, pp. 1177–1187, Association for Computing Machinery, 2025, ISBN: 979-840071306-4 (ISBN).
Abstract | Links | BibTeX | Tags: 3D modeling, Chatbots, Computer interaction, Deep learning, Deep-Learning Dance Model, Design of Human-Computer Interaction, Digital elevation model, Generative AI, Input output programs, Input sequence, Interactivity, Motion capture, Motion tracking, Movement analysis, Output sequences, Problem oriented languages, Real- time, Text mining, Three dimensional computer graphics, User input, Virtual environments, Virtual Reality
@inproceedings{zhou_real-time_2025,
title = {Real-Time Full-body Interaction with AI Dance Models: Responsiveness to Contemporary Dance},
author = {J. Zhou and R. Weber and E. Wen and D. Lottridge},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001922427&doi=10.1145%2f3708359.3712077&partnerID=40&md5=cea9213198220480b80b7a4840d26ccc},
doi = {10.1145/3708359.3712077},
isbn = {979-840071306-4 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Int Conf Intell User Interfaces Proc IUI},
pages = {1177–1187},
publisher = {Association for Computing Machinery},
abstract = {Interactive AI chatbots put the power of Large-Language Models (LLMs) into people's hands; it is this interactivity that fueled explosive worldwide influence. In the generative dance space, however, there are few deep-learning-based generative dance models built with interactivity in mind. The release of the AIST++ dance dataset in 2021 led to an uptick of capabilities in generative dance models. Whether these models could be adapted to support interactivity and how well this approach will work is not known. In this study, we explore the capabilities of existing generative dance models for motion-to-motion synthesis on real-time, full-body motion-captured contemporary dance data. We identify an existing model that we adapted to support interactivity: the Bailando++ model, which is trained on the AIST++ dataset and was modified to take music and a motion sequence as input parameters in an interactive loop. We worked with two professional contemporary choreographers and dancers to record and curate a diverse set of 203 motion-captured dance sequences as a set of "user inputs"captured through the Optitrack high-precision motion capture 3D tracking system. We extracted 17 quantitative movement features from the motion data using the well-established Laban Movement Analysis theory, which allowed for quantitative comparisons of inter-movement correlations, which we used for clustering input data and comparing input and output sequences. A total of 10 pieces of music were used to generate a variety of outputs using the adapted Bailando++ model. We found that, on average, the generated output motion achieved only moderate correlations to the user input, with some exceptions of movement and music pairs achieving high correlation. The high-correlation generated output sequences were deemed responsive and relevant co-creations in relation to the input sequences. We discuss implications for interactive generative dance agents, where the use of 3D joint coordinate data should be used over SMPL parameters for ease of real-time generation, and how the use of Laban Movement Analysis could be used to extract useful features and fine-tune deep-learning models. © 2025 Copyright held by the owner/author(s).},
keywords = {3D modeling, Chatbots, Computer interaction, Deep learning, Deep-Learning Dance Model, Design of Human-Computer Interaction, Digital elevation model, Generative AI, Input output programs, Input sequence, Interactivity, Motion capture, Motion tracking, Movement analysis, Output sequences, Problem oriented languages, Real- time, Text mining, Three dimensional computer graphics, User input, Virtual environments, Virtual Reality},
pubstate = {published},
tppubtype = {inproceedings}
}
Leininger, P.; Weber, C. J.; Rothe, S.
Understanding Creative Potential and Use Cases of AI-Generated Environments for Virtual Film Productions: Insights from Industry Professionals Proceedings Article
In: IMX - Proc. ACM Int. Conf. Interact. Media Experiences, pp. 60–78, Association for Computing Machinery, Inc, 2025, ISBN: 979-840071391-0 (ISBN).
Abstract | Links | BibTeX | Tags: 3-D environments, 3D reconstruction, 3D Scene Reconstruction, 3d scenes reconstruction, AI-generated 3d environment, AI-Generated 3D Environments, Computer interaction, Creative Collaboration, Creatives, Digital content creation, Digital Content Creation., Filmmaking workflow, Filmmaking Workflows, Gaussian distribution, Gaussian Splatting, Gaussians, Generative AI, Graphical user interface, Graphical User Interface (GUI), Graphical user interfaces, Human computer interaction, human-computer interaction, Human-Computer Interaction (HCI), Immersive, Immersive Storytelling, Interactive computer graphics, Interactive computer systems, Interactive media, Mesh generation, Previsualization, Real-Time Rendering, Splatting, Three dimensional computer graphics, Virtual production, Virtual Production (VP), Virtual Reality, Work-flows
@inproceedings{leininger_understanding_2025,
title = {Understanding Creative Potential and Use Cases of AI-Generated Environments for Virtual Film Productions: Insights from Industry Professionals},
author = {P. Leininger and C. J. Weber and S. Rothe},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007976841&doi=10.1145%2f3706370.3727853&partnerID=40&md5=0d4cf7a2398d12d04e4f0ab182474a10},
doi = {10.1145/3706370.3727853},
isbn = {979-840071391-0 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {IMX - Proc. ACM Int. Conf. Interact. Media Experiences},
pages = {60–78},
publisher = {Association for Computing Machinery, Inc},
abstract = {Virtual production (VP) is transforming filmmaking by integrating real-time digital elements with live-action footage, offering new creative possibilities and streamlined workflows. While industry experts recognize AI's potential to revolutionize VP, its practical applications and value across different production phases and user groups remain underexplored. Building on initial research into generative and data-driven approaches, this paper presents the first systematic pilot study evaluating three types of AI-generated 3D environments - Depth Mesh, 360° Panoramic Meshes, and Gaussian Splatting - through the participation of 15 filmmaking professionals from diverse roles. Unlike commonly used 2D AI-generated visuals, our approach introduces navigable 3D environments that offer greater control and flexibility, aligning more closely with established VP workflows. Through expert interviews and literature research, we developed evaluation criteria to assess their usefulness beyond concept development, extending to previsualization, scene exploration, and interdisciplinary collaboration. Our findings indicate that different environments cater to distinct production needs, from early ideation to detailed visualization. Gaussian Splatting proved effective for high-fidelity previsualization, while 360° Panoramic Meshes excelled in rapid concept ideation. Despite their promise, challenges such as limited interactivity and customization highlight areas for improvement. Our prototype, EnVisualAIzer, built in Unreal Engine 5, provides an accessible platform for diverse filmmakers to engage with AI-generated environments, fostering a more inclusive production process. By lowering technical barriers, these environments have the potential to make advanced VP tools more widely available. This study offers valuable insights into the evolving role of AI in VP and sets the stage for future research and development. © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.},
keywords = {3-D environments, 3D reconstruction, 3D Scene Reconstruction, 3d scenes reconstruction, AI-generated 3d environment, AI-Generated 3D Environments, Computer interaction, Creative Collaboration, Creatives, Digital content creation, Digital Content Creation., Filmmaking workflow, Filmmaking Workflows, Gaussian distribution, Gaussian Splatting, Gaussians, Generative AI, Graphical user interface, Graphical User Interface (GUI), Graphical user interfaces, Human computer interaction, human-computer interaction, Human-Computer Interaction (HCI), Immersive, Immersive Storytelling, Interactive computer graphics, Interactive computer systems, Interactive media, Mesh generation, Previsualization, Real-Time Rendering, Splatting, Three dimensional computer graphics, Virtual production, Virtual Production (VP), Virtual Reality, Work-flows},
pubstate = {published},
tppubtype = {inproceedings}
}
Ademola, A.; Sinclair, D.; Koniaris, B.; Hannah, S.; Mitchell, K.
NeFT-Net: N-window extended frequency transformer for rhythmic motion prediction Journal Article
In: Computers and Graphics, vol. 129, 2025, ISSN: 00978493 (ISSN).
Abstract | Links | BibTeX | Tags: Cosine transforms, Discrete cosine transforms, Human motions, Immersive, machine learning, Machine-learning, Motion analysis, Motion prediction, Motion processing, Motion sequences, Motion tracking, Real-world, Rendering, Rendering (computer graphics), Rhythmic motion, Three dimensional computer graphics, Virtual environments, Virtual Reality
@article{ademola_neft-net_2025,
title = {NeFT-Net: N-window extended frequency transformer for rhythmic motion prediction},
author = {A. Ademola and D. Sinclair and B. Koniaris and S. Hannah and K. Mitchell},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105006724723&doi=10.1016%2fj.cag.2025.104244&partnerID=40&md5=08fd0792837332404ec9acdd16f608bf},
doi = {10.1016/j.cag.2025.104244},
issn = {00978493 (ISSN)},
year = {2025},
date = {2025-01-01},
journal = {Computers and Graphics},
volume = {129},
abstract = {Advancements in prediction of human motion sequences are critical for enabling online virtual reality (VR) users to dance and move in ways that accurately mirror real-world actions, delivering a more immersive and connected experience. However, latency in networked motion tracking remains a significant challenge, disrupting engagement and necessitating predictive solutions to achieve real-time synchronization of remote motions. To address this issue, we propose a novel approach leveraging a synthetically generated dataset based on supervised foot anchor placement timings for rhythmic motions, ensuring periodicity and reducing prediction errors. Our model integrates a discrete cosine transform (DCT) to encode motion, refine high-frequency components, and smooth motion sequences, mitigating jittery artifacts. Additionally, we introduce a feed-forward attention mechanism designed to learn from N-window pairs of 3D key-point pose histories for precise future motion prediction. Quantitative and qualitative evaluations on the Human3.6M dataset highlight significant improvements in mean per joint position error (MPJPE) metrics, demonstrating the superiority of our technique over state-of-the-art approaches. We further introduce novel result pose visualizations through the use of generative AI methods. © 2025 The Authors},
keywords = {Cosine transforms, Discrete cosine transforms, Human motions, Immersive, machine learning, Machine-learning, Motion analysis, Motion prediction, Motion processing, Motion sequences, Motion tracking, Real-world, Rendering, Rendering (computer graphics), Rhythmic motion, Three dimensional computer graphics, Virtual environments, Virtual Reality},
pubstate = {published},
tppubtype = {article}
}
Zhang, H.; Chen, P.; Xie, X.; Jiang, Z.; Wu, Y.; Li, Z.; Chen, X.; Sun, L.
FusionProtor: A Mixed-Prototype Tool for Component-level Physical-to-Virtual 3D Transition and Simulation Proceedings Article
In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071394-1 (ISBN).
Abstract | Links | BibTeX | Tags: 3D modeling, 3D prototype, 3D simulations, 3d transition, Component levels, Conceptual design, Creatives, Generative AI, High-fidelity, Integrated circuit layout, Mixed reality, Product conceptual designs, Prototype tools, Prototype workflow, Three dimensional computer graphics, Usability engineering, Virtual Prototyping
@inproceedings{zhang_fusionprotor_2025,
title = {FusionProtor: A Mixed-Prototype Tool for Component-level Physical-to-Virtual 3D Transition and Simulation},
author = {H. Zhang and P. Chen and X. Xie and Z. Jiang and Y. Wu and Z. Li and X. Chen and L. Sun},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005745450&doi=10.1145%2f3706598.3713686&partnerID=40&md5=e51eac0cc99293538422d98a4070cd09},
doi = {10.1145/3706598.3713686},
isbn = {979-840071394-1 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Conf Hum Fact Comput Syst Proc},
publisher = {Association for Computing Machinery},
abstract = {Developing and simulating 3D prototypes is crucial in product conceptual design for ideation and presentation. Traditional methods often keep physical and virtual prototypes separate, leading to a disjointed prototype workflow. In addition, acquiring high-fidelity prototypes is time-consuming and resource-intensive, distracting designers from creative exploration. Recent advancements in generative artificial intelligence (GAI) and extended reality (XR) provided new solutions for rapid prototype transition and mixed simulation. We conducted a formative study to understand current challenges in the traditional prototype process and explore how to effectively utilize GAI and XR ability in prototype. Then we introduced FusionProtor, a mixed-prototype tool for component-level 3D prototype transition and simulation. We proposed a step-by-step generation pipeline in FusionProtor, effectively transiting 3D prototypes from physical to virtual and low- to high-fidelity for rapid ideation and iteration. We also innovated a component-level 3D creation method and applied it in XR environment for the mixed-prototype presentation and interaction. We conducted technical and user experiments to verify FusionProtor's usability in supporting diverse designs. Our results verified that it achieved a seamless workflow between physical and virtual domains, enhancing efficiency and promoting ideation. We also explored the effect of mixed interaction on design and critically discussed its best practices for HCI community. © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.},
keywords = {3D modeling, 3D prototype, 3D simulations, 3d transition, Component levels, Conceptual design, Creatives, Generative AI, High-fidelity, Integrated circuit layout, Mixed reality, Product conceptual designs, Prototype tools, Prototype workflow, Three dimensional computer graphics, Usability engineering, Virtual Prototyping},
pubstate = {published},
tppubtype = {inproceedings}
}
2024
Harinee, S.; Raja, R. Vimal; Mugila, E.; Govindharaj, I.; Sanjaykumar, V.; Ragavendhiran, T.
Elevating Medical Training: A Synergistic Fusion of AI and VR for Immersive Anatomy Learning and Practical Procedure Mastery Proceedings Article
In: Int. Conf. Syst., Comput., Autom. Netw., ICSCAN, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-833151002-2 (ISBN).
Abstract | Links | BibTeX | Tags: 'current, Anatomy education, Anatomy educations, Computer interaction, Curricula, Embodied virtual assistant, Embodied virtual assistants, Generative AI, Human- Computer Interaction, Immersive, Intelligent virtual agents, Medical computing, Medical education, Medical procedure practice, Medical procedures, Medical training, Personnel training, Students, Teaching, Three dimensional computer graphics, Usability engineering, Virtual assistants, Virtual environments, Virtual Reality, Visualization
@inproceedings{harinee_elevating_2024,
title = {Elevating Medical Training: A Synergistic Fusion of AI and VR for Immersive Anatomy Learning and Practical Procedure Mastery},
author = {S. Harinee and R. Vimal Raja and E. Mugila and I. Govindharaj and V. Sanjaykumar and T. Ragavendhiran},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000334626&doi=10.1109%2fICSCAN62807.2024.10894451&partnerID=40&md5=100899b489c00335e0a652f2efd33e23},
doi = {10.1109/ICSCAN62807.2024.10894451},
isbn = {979-833151002-2 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {Int. Conf. Syst., Comput., Autom. Netw., ICSCAN},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {Virtual reality with its 3D visualization have brought an overwhelming change in the face of medical education, especially for courses like human anatomy. The proposed virtual reality system to bring massive improvements in the education received by a medical student studying for their degree courses. The project puts forward the text-to-speech and speech-to-text aligned system that simplifies the usage of a chatbot empowered by OpenAI GPT-4 and allows pupils to vocally speak with Avatar, the set virtual assistant. Contrary to the current methodologies, the setup of virtual reality is powered by avatars and thus covers an enhanced virtual assistant environment. Avatars offer students the set of repeated practicing of medical procedures on it, and the real uniqueness in the proposed product. The developed virtual reality environment is enhanced over other current training techniques where a student should interact and immerse in three-dimensional human organs for visualization in three dimensions and hence get better knowledge of the subjects in greater depth. A virtual assistant guides the whole process, giving insights and support to help the student bridge the gap from theory to practice. Then, the system is essentially Knowledge based and Analysis based approach. The combination of generative AI along with embodied virtual agents has great potential when it comes to customized virtual conversation assistant for much wider range of applications. The study brings out the value of acquiring hands-on skills through simulated medical procedures and opens new frontiers of research and development in AI, VR, and medical education. In addition to assessing the effectiveness of such novel functionalities, the study also explores user experience related dimensions such as usability, task loading, and the sense of presence in proposed virtual medical environment. © 2024 IEEE.},
keywords = {'current, Anatomy education, Anatomy educations, Computer interaction, Curricula, Embodied virtual assistant, Embodied virtual assistants, Generative AI, Human- Computer Interaction, Immersive, Intelligent virtual agents, Medical computing, Medical education, Medical procedure practice, Medical procedures, Medical training, Personnel training, Students, Teaching, Three dimensional computer graphics, Usability engineering, Virtual assistants, Virtual environments, Virtual Reality, Visualization},
pubstate = {published},
tppubtype = {inproceedings}
}
Gaudi, T.; Kapralos, B.; Quevedo, A.
Structural and Functional Fidelity of Virtual Humans in Immersive Virtual Learning Environments Proceedings Article
In: IEEE Gaming, Entertain., Media Conf., GEM, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037453-7 (ISBN).
Abstract | Links | BibTeX | Tags: 3D modeling, Computer aided instruction, Digital representations, E-Learning, Engagement, fidelity, Immersive, Immersive virtual learning environment, Serious game, Serious games, Three dimensional computer graphics, Virtual character, virtual human, Virtual humans, Virtual instructors, Virtual learning environments, Virtual Reality, virtual simulation, Virtual simulations
@inproceedings{gaudi_structural_2024,
title = {Structural and Functional Fidelity of Virtual Humans in Immersive Virtual Learning Environments},
author = {T. Gaudi and B. Kapralos and A. Quevedo},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199517136&doi=10.1109%2fGEM61861.2024.10585535&partnerID=40&md5=bf271019e077b5e464bcd62b1b28312b},
doi = {10.1109/GEM61861.2024.10585535},
isbn = {979-835037453-7 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {IEEE Gaming, Entertain., Media Conf., GEM},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {Central to many immersive virtual learning environments (iVLEs) are virtual humans, or characters that are digital representations, which can serve as virtual instructors to facilitate learning. Current technology is allowing the production of photo-realistic (high fidelity/highly realistic) avatars, whether using traditional approaches relying on 3D modeling, or modern tools leveraging generative AI and virtual character creation tools. However, fidelity (i.e., level of realism) is complex as it can be analyzed from various points of view referring to its structure, function, interactivity, and behavior among others. Given its relevance, fidelity can influence various aspects of iVLEs including engagement and ultimately learning outcomes. In this work-in-progress paper, we propose a study that will examine the effect of structural and functional fidelity of a virtual human assistant on engagement within a virtual simulation designed to teach the cognitive aspects (e.g., the steps of a procedure) of the heart auscultation procedure. © 2024 IEEE.},
keywords = {3D modeling, Computer aided instruction, Digital representations, E-Learning, Engagement, fidelity, Immersive, Immersive virtual learning environment, Serious game, Serious games, Three dimensional computer graphics, Virtual character, virtual human, Virtual humans, Virtual instructors, Virtual learning environments, Virtual Reality, virtual simulation, Virtual simulations},
pubstate = {published},
tppubtype = {inproceedings}
}
Liu, Z.; Zhu, Z.; Zhu, L.; Jiang, E.; Hu, X.; Peppler, K.; Ramani, K.
ClassMeta: Designing Interactive Virtual Classmate to Promote VR Classroom Participation Proceedings Article
In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2024, ISBN: 979-840070330-0 (ISBN).
Abstract | Links | BibTeX | Tags: 3D Avatars, Behavioral Research, Classroom learning, Collaborative learning, Computational Linguistics, Condition, E-Learning, Human behaviors, Language Model, Large language model, Learning experiences, Learning systems, pedagogical agent, Pedagogical agents, Students, Three dimensional computer graphics, Virtual Reality, VR classroom
@inproceedings{liu_classmeta_2024,
title = {ClassMeta: Designing Interactive Virtual Classmate to Promote VR Classroom Participation},
author = {Z. Liu and Z. Zhu and L. Zhu and E. Jiang and X. Hu and K. Peppler and K. Ramani},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85194868458&doi=10.1145%2f3613904.3642947&partnerID=40&md5=0592b2f977a2ad2e6366c6fa05808a6a},
doi = {10.1145/3613904.3642947},
isbn = {979-840070330-0 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {Conf Hum Fact Comput Syst Proc},
publisher = {Association for Computing Machinery},
abstract = {Peer influence plays a crucial role in promoting classroom participation, where behaviors from active students can contribute to a collective classroom learning experience. However, the presence of these active students depends on several conditions and is not consistently available across all circumstances. Recently, Large Language Models (LLMs) such as GPT have demonstrated the ability to simulate diverse human behaviors convincingly due to their capacity to generate contextually coherent responses based on their role settings. Inspired by this advancement in technology, we designed ClassMeta, a GPT-4 powered agent to help promote classroom participation by playing the role of an active student. These agents, which are embodied as 3D avatars in virtual reality, interact with actual instructors and students with both spoken language and body gestures. We conducted a comparative study to investigate the potential of ClassMeta for improving the overall learning experience of the class. © 2024 Copyright held by the owner/author(s)},
keywords = {3D Avatars, Behavioral Research, Classroom learning, Collaborative learning, Computational Linguistics, Condition, E-Learning, Human behaviors, Language Model, Large language model, Learning experiences, Learning systems, pedagogical agent, Pedagogical agents, Students, Three dimensional computer graphics, Virtual Reality, VR classroom},
pubstate = {published},
tppubtype = {inproceedings}
}
Guo, Y.; Hou, K.; Yan, Z.; Chen, H.; Xing, G.; Jiang, X.
Sensor2Scene: Foundation Model-Driven Interactive Realities Proceedings Article
In: Proc. - IEEE Int. Workshop Found. Model. Cyber-Phys. Syst. Internet Things, FMSys, pp. 13–19, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835036345-6 (ISBN).
Abstract | Links | BibTeX | Tags: 3D modeling, Augmented Reality, Computational Linguistics, Data integration, Data visualization, Foundation models, Generative model, Language Model, Large language model, large language models, Model-driven, Sensor Data Integration, Sensors data, Text-to-3d generative model, Text-to-3D Generative Models, Three dimensional computer graphics, User interaction, User Interaction in AR, User interaction in augmented reality, User interfaces, Virtual Reality, Visualization
@inproceedings{guo_sensor2scene_2024,
title = {Sensor2Scene: Foundation Model-Driven Interactive Realities},
author = {Y. Guo and K. Hou and Z. Yan and H. Chen and G. Xing and X. Jiang},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199893762&doi=10.1109%2fFMSys62467.2024.00007&partnerID=40&md5=c3bf1739e8c1dc6227d61609ddc66910},
doi = {10.1109/FMSys62467.2024.00007},
isbn = {979-835036345-6 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {Proc. - IEEE Int. Workshop Found. Model. Cyber-Phys. Syst. Internet Things, FMSys},
pages = {13–19},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {Augmented Reality (AR) is acclaimed for its potential to bridge the physical and virtual worlds. Yet, current integration between these realms often lacks a deep under-standing of the physical environment and the subsequent scene generation that reflects this understanding. This research introduces Sensor2Scene, a novel system framework designed to enhance user interactions with sensor data through AR. At its core, an AI agent leverages large language models (LLMs) to decode subtle information from sensor data, constructing detailed scene descriptions for visualization. To enable these scenes to be rendered in AR, we decompose the scene creation process into tasks of text-to-3D model generation and spatial composition, allowing new AR scenes to be sketched from the descriptions. We evaluated our framework using an LLM evaluator based on five metrics on various datasets to examine the correlation between sensor readings and corresponding visualizations, and demonstrated the system's effectiveness with scenes generated from end-to-end. The results highlight the potential of LLMs to understand IoT sensor data. Furthermore, generative models can aid in transforming these interpretations into visual formats, thereby enhancing user interaction. This work not only displays the capabilities of Sensor2Scene but also lays a foundation for advancing AR with the goal of creating more immersive and contextually rich experiences. © 2024 IEEE.},
keywords = {3D modeling, Augmented Reality, Computational Linguistics, Data integration, Data visualization, Foundation models, Generative model, Language Model, Large language model, large language models, Model-driven, Sensor Data Integration, Sensors data, Text-to-3d generative model, Text-to-3D Generative Models, Three dimensional computer graphics, User interaction, User Interaction in AR, User interaction in augmented reality, User interfaces, Virtual Reality, Visualization},
pubstate = {published},
tppubtype = {inproceedings}
}
Chheang, V.; Sharmin, S.; Marquez-Hernandez, R.; Patel, M.; Rajasekaran, D.; Caulfield, G.; Kiafar, B.; Li, J.; Kullu, P.; Barmaki, R. L.
Towards Anatomy Education with Generative AI-based Virtual Assistants in Immersive Virtual Reality Environments Proceedings Article
In: Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR, pp. 21–30, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037202-1 (ISBN).
Abstract | Links | BibTeX | Tags: 3-D visualization systems, Anatomy education, Anatomy educations, Cognitive complexity, E-Learning, Embodied virtual assistant, Embodied virtual assistants, Generative AI, generative artificial intelligence, Human computer interaction, human-computer interaction, Immersive virtual reality, Interactive 3d visualizations, Knowledge Management, Medical education, Three dimensional computer graphics, Verbal communications, Virtual assistants, Virtual Reality, Virtual-reality environment
@inproceedings{chheang_towards_2024,
title = {Towards Anatomy Education with Generative AI-based Virtual Assistants in Immersive Virtual Reality Environments},
author = {V. Chheang and S. Sharmin and R. Marquez-Hernandez and M. Patel and D. Rajasekaran and G. Caulfield and B. Kiafar and J. Li and P. Kullu and R. L. Barmaki},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85187216893&doi=10.1109%2fAIxVR59861.2024.00011&partnerID=40&md5=33e8744309add5fe400f4f341326505f},
doi = {10.1109/AIxVR59861.2024.00011},
isbn = {979-835037202-1 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR},
pages = {21–30},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {Virtual reality (VR) and interactive 3D visualization systems have enhanced educational experiences and environments, particularly in complicated subjects such as anatomy education. VR-based systems surpass the potential limitations of traditional training approaches in facilitating interactive engagement among students. However, research on embodied virtual assistants that leverage generative artificial intelligence (AI) and verbal communication in the anatomy education context is underrepresented. In this work, we introduce a VR environment with a generative AI-embodied virtual assistant to support participants in responding to varying cognitive complexity anatomy questions and enable verbal communication. We assessed the technical efficacy and usability of the proposed environment in a pilot user study with 16 participants. We conducted a within-subject design for virtual assistant configuration (avatar- and screen-based), with two levels of cognitive complexity (knowledge- and analysis-based). The results reveal a significant difference in the scores obtained from knowledge- and analysis-based questions in relation to avatar configuration. Moreover, results provide insights into usability, cognitive task load, and the sense of presence in the proposed virtual assistant configurations. Our environment and results of the pilot study offer potential benefits and future research directions beyond medical education, using generative AI and embodied virtual agents as customized virtual conversational assistants. © 2024 IEEE.},
keywords = {3-D visualization systems, Anatomy education, Anatomy educations, Cognitive complexity, E-Learning, Embodied virtual assistant, Embodied virtual assistants, Generative AI, generative artificial intelligence, Human computer interaction, human-computer interaction, Immersive virtual reality, Interactive 3d visualizations, Knowledge Management, Medical education, Three dimensional computer graphics, Verbal communications, Virtual assistants, Virtual Reality, Virtual-reality environment},
pubstate = {published},
tppubtype = {inproceedings}
}