AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

86 entries « ‹ 1 of 4 › »

2025

Liu, G.; Du, H.; Wang, J.; Niyato, D.; Kim, D. I.

Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse Journal Article

In: IEEE Transactions on Mobile Computing, 2025, ISSN: 15361233 (ISSN).

Abstract | Links | BibTeX | Tags: Contest Theory, Deep learning, Deep reinforcement learning, Diffusion Model, Generative adversarial networks, Generative AI, High quality, Image generation, Image generations, Immersive technologies, Metaverses, Mobile edge computing, Reinforcement Learning, Reinforcement learnings, Resource allocation, Resources allocation, Semantic data, Virtual addresses, Virtual environments, Virtual Reality

@article{liu_contract-inspired_2025,

title = {Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse},

author = {G. Liu and H. Du and J. Wang and D. Niyato and D. I. Kim},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000066834&doi=10.1109%2fTMC.2025.3550815&partnerID=40&md5=3cb5a2143b9ce4ca7f931a60f1bf239c},

doi = {10.1109/TMC.2025.3550815},

issn = {15361233 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Transactions on Mobile Computing},

abstract = {The rapid advancement of immersive technologies has propelled the development of the Metaverse, where the convergence of virtual and physical realities necessitates the generation of high-quality, photorealistic images to enhance user experience. However, generating these images, especially through Generative Diffusion Models (GDMs), in mobile edge computing environments presents significant challenges due to the limited computing resources of edge devices and the dynamic nature of wireless networks. This paper proposes a novel framework that integrates contract-inspired contest theory, Deep Reinforcement Learning (DRL), and GDMs to optimize image generation in these resource-constrained environments. The framework addresses the critical challenges of resource allocation and semantic data transmission quality by incentivizing edge devices to efficiently transmit high-quality semantic data, which is essential for creating realistic and immersive images. The use of contest and contract theory ensures that edge devices are motivated to allocate resources effectively, while DRL dynamically adjusts to network conditions, optimizing the overall image generation process. Experimental results demonstrate that the proposed approach not only improves the quality of generated images but also achieves superior convergence speed and stability compared to traditional methods. This makes the framework particularly effective for optimizing complex resource allocation tasks in mobile edge Metaverse applications, offering enhanced performance and efficiency in creating immersive virtual environments. © 2002-2012 IEEE.},

keywords = {Contest Theory, Deep learning, Deep reinforcement learning, Diffusion Model, Generative adversarial networks, Generative AI, High quality, Image generation, Image generations, Immersive technologies, Metaverses, Mobile edge computing, Reinforcement Learning, Reinforcement learnings, Resource allocation, Resources allocation, Semantic data, Virtual addresses, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {article}

}

Kurai, R.; Hiraki, T.; Hiroi, Y.; Hirao, Y.; Perusquia-Hernandez, M.; Uchiyama, H.; Kiyokawa, K.

An implementation of MagicCraft: Generating Interactive 3D Objects and Their Behaviors from Text for Commercial Metaverse Platforms Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 1284–1285, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, 3D models, 3D object, 3D Object Generation, 3d-modeling, AI-Assisted Design, Generative AI, Immersive, Metaverse, Metaverses, Model skill, Object oriented programming, Programming skills

Arai, K.

Digital Twin Model from Freehanded Sketch to Facade Design, 2D-3D Conversion for Volume Design Journal Article

In: International Journal of Advanced Computer Science and Applications, vol. 16, no. 1, pp. 88–95, 2025, ISSN: 2158107X (ISSN).

Abstract | Links | BibTeX | Tags: 2D/3D conversion, AI, Architectural design, BIM, Digital Twins, Facade design, Facades, GauGAN, Generative AI, GeoTiff, GIS, IFC format, Metaverse, Metaverses, SketchUp, TriPo, Volume design, Volume Rendering

Chang, K. -Y.; Lee, C. -F.

Enhancing Virtual Restorative Environment with Generative AI: Personalized Immersive Stress-Relief Experiences Proceedings Article

In: V.G., Duffy (Ed.): Lect. Notes Comput. Sci., pp. 132–144, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193501-5 (ISBN).

Abstract | Links | BibTeX | Tags: Artificial intelligence generated content, Artificial Intelligence Generated Content (AIGC), Electroencephalography, Electroencephalography (EEG), Generative AI, Immersive, Immersive environment, Mental health, Physical limitations, Restorative environment, Stress relief, Virtual reality exposure therapies, Virtual reality exposure therapy, Virtual Reality Exposure Therapy (VRET), Virtualization

@inproceedings{chang_enhancing_2025,

title = {Enhancing Virtual Restorative Environment with Generative AI: Personalized Immersive Stress-Relief Experiences},

author = {K. -Y. Chang and C. -F. Lee},

editor = {Duffy V.G.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007759157&doi=10.1007%2f978-3-031-93502-2_9&partnerID=40&md5=ee620a5da9b65e90ccb1eaa75ec8b724},

doi = {10.1007/978-3-031-93502-2_9},

isbn = {03029743 (ISSN); 978-303193501-5 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15791 LNCS},

pages = {132–144},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {In today’s fast-paced world, stress and mental health challenges are becoming more common. Restorative environments help people relax and recover emotionally, and Virtual Reality Exposure Therapy (VRET) offers a way to experience these benefits beyond physical limitations. However, most VRET applications rely on pre-designed content, limiting their adaptability to individual needs. This study explores how Generative AI can enhance VRET by creating personalized, immersive environments that better match users’ preferences and improve relaxation. To evaluate the impact of AI-generated restorative environments, we combined EEG measurements with user interviews. Thirty university students participated in the study, experiencing two different modes: static mode and walking mode. The EEG results showed an increase in Theta (θ) and High Beta (β) brain waves, suggesting a state of deep immersion accompanied by heightened cognitive engagement and mental effort. While participants found the experience enjoyable and engaging, the AI-generated environments tended to create excitement and focus rather than conventional relaxation. These findings suggest that for AI-generated environments in VRET to be more effective for stress relief, future designs should reduce cognitive load while maintaining immersion. This study provides insights into how AI can enhance relaxation experiences and introduces a new perspective on personalized digital stress-relief solutions. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {Artificial intelligence generated content, Artificial Intelligence Generated Content (AIGC), Electroencephalography, Electroencephalography (EEG), Generative AI, Immersive, Immersive environment, Mental health, Physical limitations, Restorative environment, Stress relief, Virtual reality exposure therapies, Virtual reality exposure therapy, Virtual Reality Exposure Therapy (VRET), Virtualization},

pubstate = {published},

tppubtype = {inproceedings}

}

Paduraru, C.; Bouruc, P. -L.; Stefanescu, A.

Generative AI for Human 3D Body Emotions: A Dataset and Baseline Methods Proceedings Article

In: A.P., Rocha; L., Steels; H.J., Herik (Ed.): Int. Conf. Agent. Artif. Intell., pp. 646–653, Science and Technology Publications, Lda, 2025, ISBN: 21843589 (ISSN).

Abstract | Links | BibTeX | Tags: Animations, Body Emotions, Generative AI, Parametric Models

Koizumi, M.; Ohsuga, M.; Corchado, J. M.

Development and Assessment of a System to Help Students Improve Self-compassion Proceedings Article

In: R., Chinthaginjala; P., Sitek; N., Min-Allah; K., Matsui; S., Ossowski; S., Rodríguez (Ed.): Lect. Notes Networks Syst., pp. 43–52, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 23673370 (ISSN); 978-303182072-4 (ISBN).

Abstract | Links | BibTeX | Tags: Avatar, Generative adversarial networks, Generative AI, Health issues, Mental health, Self-compassion, Students, Training program, University students, Virtual avatar, Virtual environments, Virtual Reality, Virtual Space, Virtual spaces, Visual imagery

@inproceedings{koizumi_development_2025,

title = {Development and Assessment of a System to Help Students Improve Self-compassion},

author = {M. Koizumi and M. Ohsuga and J. M. Corchado},

editor = {Chinthaginjala R. and Sitek P. and Min-Allah N. and Matsui K. and Ossowski S. and Rodríguez S.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85218979175&doi=10.1007%2f978-3-031-82073-1_5&partnerID=40&md5=b136d4a114ce5acfa89f907ccecc145f},

doi = {10.1007/978-3-031-82073-1_5},

isbn = {23673370 (ISSN); 978-303182072-4 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Networks Syst.},

volume = {1259},

pages = {43–52},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Mental health issues are becoming more prevalent among university students. The mindful self-compassion (MSC) training program, which was introduced to address this issue, has shown some efficacy. However, many people, particularly Japanese people, have difficulty recalling visual imagery or feel uncomfortable or resistant to treating themselves with compassion. This study proposes and develops a system that uses virtual space and avatars to help individuals improve their self-compassion. In the proposed system, the user first selects an avatar of a person with whom to talk (hereafter referred to as “partner”), and then talks about the problem to the avatar of his/her choice. Next, the user changes viewpoints and listens to the problem as the partner’s avatar and responds with compassion. Finally, the user returns to his/her own avatar and listens to the compassionate response spoken as the partner avatar. We first conducted surveys to understand the important system components, and then developed prototypes. In light of the results of the experiments, we improved the prototype by introducing a generative AI. The first prototype used the user’s spoken voice as it was, but the improved system uses the generative AI to organize and convert the voice and present it. In addition, we added a function to generate and add advice with compression. The proposed system is expected to contribute to the improvement of students’ self-compassion. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {Avatar, Generative adversarial networks, Generative AI, Health issues, Mental health, Self-compassion, Students, Training program, University students, Virtual avatar, Virtual environments, Virtual Reality, Virtual Space, Virtual spaces, Visual imagery},

pubstate = {published},

tppubtype = {inproceedings}

}

Yokoyama, N.; Kimura, R.; Nakajima, T.

ViGen: Defamiliarizing Everyday Perception for Discovering Unexpected Insights Proceedings Article

In: H., Degen; S., Ntoa (Ed.): Lect. Notes Comput. Sci., pp. 397–417, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193417-9 (ISBN).

Abstract | Links | BibTeX | Tags: Artful Expression, Artistic technique, Augmented Reality, Daily lives, Defamiliarization, Dynamic environments, Engineering education, Enhanced vision systems, Generative AI, generative artificial intelligence, Human augmentation, Human engineering, Human-AI Interaction, Human-artificial intelligence interaction, Semi-transparent

@inproceedings{yokoyama_vigen_2025,

title = {ViGen: Defamiliarizing Everyday Perception for Discovering Unexpected Insights},

author = {N. Yokoyama and R. Kimura and T. Nakajima},

editor = {Degen H. and Ntoa S.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007760030&doi=10.1007%2f978-3-031-93418-6_26&partnerID=40&md5=dee6f54688284313a45579aab5f934d6},

doi = {10.1007/978-3-031-93418-6_26},

isbn = {03029743 (ISSN); 978-303193417-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15821 LNAI},

pages = {397–417},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {This paper proposes ViGen, an Augmented Reality (AR) and Artificial Intelligence (AI)-enhanced vision system designed to facilitate defamiliarization in daily life. Humans rely on sight to gather information, think, and act, yet the act of seeing often becomes passive in daily life. Inspired by Victor Shklovsky’s concept of defamiliarization and the artistic technique of photomontage, ViGen seeks to disrupt habitual perceptions. It achieves this by overlaying semi-transparent, AI-generated images, created based on the user’s view, through an AR display. The system is evaluated by several structured interviews, in which participants experience ViGen in three different scenarios. Results indicate that AI-generated visuals effectively supported defamiliarization by transforming ordinary scenes into unfamiliar ones. However, the user’s familiarity with a place plays a significant role. Also, while the feature that adjusts the transparency of overlaid images enhances safety, its limitations in dynamic environments suggest the need for further research across diverse cultural and geographic contexts. This study demonstrates the potential of AI-augmented vision systems to stimulate new ways of seeing, offering insights for further development in visual augmentation technologies. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {Artful Expression, Artistic technique, Augmented Reality, Daily lives, Defamiliarization, Dynamic environments, Engineering education, Enhanced vision systems, Generative AI, generative artificial intelligence, Human augmentation, Human engineering, Human-AI Interaction, Human-artificial intelligence interaction, Semi-transparent},

pubstate = {published},

tppubtype = {inproceedings}

}

Tortora, A.; Amaro, I.; Greca, A. Della; Barra, P.

Exploring the Role of Generative Artificial Intelligence in Virtual Reality: Opportunities and Future Perspectives Proceedings Article

In: J.Y.C., Chen; G., Fragomeni (Ed.): Lect. Notes Comput. Sci., pp. 125–142, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193699-9 (ISBN).

Abstract | Links | BibTeX | Tags: Ethical technology, Future perspectives, Generative AI, Image modeling, Immersive, immersive experience, Immersive Experiences, Information Management, Language Model, Personnel training, Professional training, Real- time, Sensitive data, Training design, Users' experiences, Virtual Reality

@inproceedings{tortora_exploring_2025,

title = {Exploring the Role of Generative Artificial Intelligence in Virtual Reality: Opportunities and Future Perspectives},

author = {A. Tortora and I. Amaro and A. Della Greca and P. Barra},

editor = {Chen J.Y.C. and Fragomeni G.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007788684&doi=10.1007%2f978-3-031-93700-2_9&partnerID=40&md5=7b69183bbf8172f9595f939254fb6831},

doi = {10.1007/978-3-031-93700-2_9},

isbn = {03029743 (ISSN); 978-303193699-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15788 LNCS},

pages = {125–142},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {In recent years, generative AI, such as language and image models, have started to revolutionize virtual reality (VR) by offering new opportunities for immersive and personalized interaction. This paper explores the potential of these Intelligent Augmentation technologies in the context of VR, analyzing how the generation of text and images in real time can enhance the user experience through dynamic and personalized environments and contents. The integration of generative AI in VR scenarios holds promise in multiple fields, including education, professional training, design, and healthcare. However, their implementation involves significant challenges, such as privacy management, data security, and ethical issues related to cognitive manipulation and representation of reality. Through an overview of current applications and future prospects, this paper highlights the crucial role of generative AI in enhancing VR, helping to outline a path for the ethical and sustainable development of these immersive technologies. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {Ethical technology, Future perspectives, Generative AI, Image modeling, Immersive, immersive experience, Immersive Experiences, Information Management, Language Model, Personnel training, Professional training, Real- time, Sensitive data, Training design, Users' experiences, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Li, Y.; Pang, E. C. H.; Ng, C. S. Y.; Azim, M.; Leung, H.

Enhancing Linear Algebra Education with AI-Generated Content in the CityU Metaverse: A Comparative Study Proceedings Article

In: T., Hao; J.G., Wu; X., Luo; Y., Sun; Y., Mu; S., Ge; W., Xie (Ed.): Lect. Notes Comput. Sci., pp. 3–16, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-981964406-3 (ISBN).

Abstract | Links | BibTeX | Tags: Comparatives studies, Digital age, Digital interactions, digital twin, Educational metaverse, Engineering education, Generative AI, Immersive, Matrix algebra, Metaverse, Metaverses, Personnel training, Students, Teaching, University campus, Virtual environments, virtual learning environment, Virtual learning environments, Virtual Reality, Virtualization

@inproceedings{li_enhancing_2025,

title = {Enhancing Linear Algebra Education with AI-Generated Content in the CityU Metaverse: A Comparative Study},

author = {Y. Li and E. C. H. Pang and C. S. Y. Ng and M. Azim and H. Leung},

editor = {Hao T. and Wu J.G. and Luo X. and Sun Y. and Mu Y. and Ge S. and Xie W.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105003632691&doi=10.1007%2f978-981-96-4407-0_1&partnerID=40&md5=c067ba5d4c15e9c0353bf315680531fc},

doi = {10.1007/978-981-96-4407-0_1},

isbn = {03029743 (ISSN); 978-981964406-3 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15589 LNCS},

pages = {3–16},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {In today’s digital age, the metaverse is emerging as the forthcoming evolution of the internet. It provides an immersive space that marks a new frontier in the way digital interactions are facilitated and experienced. In this paper, we present the CityU Metaverse, which aims to construct a digital twin of our university campus. It is designed as an educational virtual world where learning applications can be embedded in this virtual campus, supporting not only remote and collaborative learning but also professional technical training to enhance educational experiences through immersive and interactive learning. To evaluate the effectiveness of this educational metaverse, we conducted an experiment focused on 3D linear transformation in linear algebra, with teaching content generated by generative AI, comparing our metaverse system with traditional teaching methods. Knowledge tests and surveys assessing learning interest revealed that students engaged with the CityU Metaverse, facilitated by AI-generated content, outperformed those in traditional settings and reported greater enjoyment during the learning process. The work provides valuable perspectives on the behaviors and interactions within the metaverse by analyzing user preferences and learning outcomes. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.},

keywords = {Comparatives studies, Digital age, Digital interactions, digital twin, Educational metaverse, Engineering education, Generative AI, Immersive, Matrix algebra, Metaverse, Metaverses, Personnel training, Students, Teaching, University campus, Virtual environments, virtual learning environment, Virtual learning environments, Virtual Reality, Virtualization},

pubstate = {published},

tppubtype = {inproceedings}

}

Tracy, K.; Spantidi, O.

Impact of GPT-Driven Teaching Assistants in VR Learning Environments Journal Article

In: IEEE Transactions on Learning Technologies, vol. 18, pp. 192–205, 2025, ISSN: 19391382 (ISSN).

Abstract | Links | BibTeX | Tags: Adversarial machine learning, Cognitive loads, Computer interaction, Contrastive Learning, Control groups, Experimental groups, Federated learning, Generative AI, Generative artificial intelligence (GenAI), human–computer interaction, Interactive learning environment, interactive learning environments, Learning efficacy, Learning outcome, learning outcomes, Student engagement, Teaching assistants, Virtual environments, Virtual Reality (VR)

@article{tracy_impact_2025,

title = {Impact of GPT-Driven Teaching Assistants in VR Learning Environments},

author = {K. Tracy and O. Spantidi},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001083336&doi=10.1109%2fTLT.2025.3539179&partnerID=40&md5=34fea4ea8517a061fe83b8294e1a9a87},

doi = {10.1109/TLT.2025.3539179},

issn = {19391382 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Transactions on Learning Technologies},

volume = {18},

pages = {192–205},

abstract = {Virtual reality (VR) has emerged as a transformative educational tool, enabling immersive learning environments that promote student engagement and understanding of complex concepts. However, despite the growing adoption of VR in education, there remains a significant gap in research exploring how generative artificial intelligence (AI), such as generative pretrained transformer can further enhance these experiences by reducing cognitive load and improving learning outcomes. This study examines the impact of an AI-driven instructor assistant in VR classrooms on student engagement, cognitive load, knowledge retention, and performance. A total of 52 participants were divided into two groups experiencing a VR lesson on the bubble sort algorithm, one with only a prescripted virtual instructor (control group), and the other with the addition of an AI instructor assistant (experimental group). Statistical analysis of postlesson quizzes and cognitive load assessments was conducted using independent t-tests and analysis of variance (ANOVA), with the cognitive load being measured through a postexperiment questionnaire. The study results indicate that the experimental group reported significantly higher engagement compared to the control group. While the AI assistant did not significantly improve postlesson assessment scores, it enhanced conceptual knowledge transfer. The experimental group also demonstrated lower intrinsic cognitive load, suggesting the assistant reduced the perceived complexity of the material. Higher germane and general cognitive loads indicated that students were more invested in meaningful learning without feeling overwhelmed. © 2008-2011 IEEE.},

keywords = {Adversarial machine learning, Cognitive loads, Computer interaction, Contrastive Learning, Control groups, Experimental groups, Federated learning, Generative AI, Generative artificial intelligence (GenAI), human–computer interaction, Interactive learning environment, interactive learning environments, Learning efficacy, Learning outcome, learning outcomes, Student engagement, Teaching assistants, Virtual environments, Virtual Reality (VR)},

pubstate = {published},

tppubtype = {article}

}

Nguyen, A.; Gul, F.; Dang, B.; Huynh, L.; Tuunanen, T.

Designing embodied generative artificial intelligence in mixed reality for active learning in higher education Journal Article

In: Innovations in Education and Teaching International, 2025, ISSN: 14703297 (ISSN).

Abstract | Links | BibTeX | Tags: Active learning, Generative AI, higher education, Mixed reality, Self-regulated learning

Scofano, L.; Sampieri, A.; Matteis, E. De; Spinelli, I.; Galasso, F.

Social EgoMesh Estimation Proceedings Article

In: Proc. - IEEE Winter Conf. Appl. Comput. Vis., WACV, pp. 5948–5958, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151083-1 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented reality applications, Ego-motion, Egocentric view, Generative AI, Human behaviors, Human mesh recovery, Limited visibility, Recent researches, Three dimensional computer graphics, Video sequences, Virtual and augmented reality

Otsuka, T.; Li, D.; Siriaraya, P.; Nakajima, S.

Development of A Relaxation Support System Utilizing Stereophonic AR Proceedings Article

In: Int. Conf. Comput., Netw. Commun., ICNC, pp. 463–467, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152096-0 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented Reality, Environmental sounds, Generative AI, Immersive, Mental Well-being, Soundscapes, Spatial Audio, Stereo image processing, Support method, Support systems, Well being

Oskooei, A. Rafiei; Aktaş, M. S.; Keleş, M.

Seeing the Sound: Multilingual Lip Sync for Real-Time Face-to-Face Translation † Journal Article

In: Computers, vol. 14, no. 1, 2025, ISSN: 2073431X (ISSN).

Abstract | Links | BibTeX | Tags: Computer vision, Deep learning, face-to-face translation, Generative AI, human–computer interaction, lip synchronization, talking head generation

@article{rafiei_oskooei_seeing_2025,

title = {Seeing the Sound: Multilingual Lip Sync for Real-Time Face-to-Face Translation †},

author = {A. Rafiei Oskooei and M. S. Aktaş and M. Keleş},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85215974883&doi=10.3390%2fcomputers14010007&partnerID=40&md5=f4d244e3e1cba572d2a3beb9c0895d32},

doi = {10.3390/computers14010007},

issn = {2073431X (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Computers},

volume = {14},

number = {1},

abstract = {Imagine a future where language is no longer a barrier to real-time conversations, enabling instant and lifelike communication across the globe. As cultural boundaries blur, the demand for seamless multilingual communication has become a critical technological challenge. This paper addresses the lack of robust solutions for real-time face-to-face translation, particularly for low-resource languages, by introducing a comprehensive framework that not only translates language but also replicates voice nuances and synchronized facial expressions. Our research tackles the primary challenge of achieving accurate lip synchronization across culturally diverse languages, filling a significant gap in the literature by evaluating the generalizability of lip sync models beyond English. Specifically, we develop a novel evaluation framework combining quantitative lip sync error metrics and qualitative assessments by human observers. This framework is applied to assess two state-of-the-art lip sync models with different architectures for Turkish, Persian, and Arabic languages, using a newly collected dataset. Based on these findings, we propose and implement a modular system that integrates language-agnostic lip sync models with neural networks to deliver a fully functional face-to-face translation experience. Inference Time Analysis shows this system achieves highly realistic, face-translated talking heads in real time, with a throughput as low as 0.381 s. This transformative framework is primed for deployment in immersive environments such as VR/AR, Metaverse ecosystems, and advanced video conferencing platforms. It offers substantial benefits to developers and businesses aiming to build next-generation multilingual communication systems for diverse applications. While this work focuses on three languages, its modular design allows scalability to additional languages. However, further testing in broader linguistic and cultural contexts is required to confirm its universal applicability, paving the way for a more interconnected and inclusive world where language ceases to hinder human connection. © 2024 by the authors.},

keywords = {Computer vision, Deep learning, face-to-face translation, Generative AI, human–computer interaction, lip synchronization, talking head generation},

pubstate = {published},

tppubtype = {article}

}

Imagine a future where language is no longer a barrier to real-time conversations, enabling instant and lifelike communication across the globe. As cultural boundaries blur, the demand for seamless multilingual communication has become a critical technological challenge. This paper addresses the lack of robust solutions for real-time face-to-face translation, particularly for low-resource languages, by introducing a comprehensive framework that not only translates language but also replicates voice nuances and synchronized facial expressions. Our research tackles the primary challenge of achieving accurate lip synchronization across culturally diverse languages, filling a significant gap in the literature by evaluating the generalizability of lip sync models beyond English. Specifically, we develop a novel evaluation framework combining quantitative lip sync error metrics and qualitative assessments by human observers. This framework is applied to assess two state-of-the-art lip sync models with different architectures for Turkish, Persian, and Arabic languages, using a newly collected dataset. Based on these findings, we propose and implement a modular system that integrates language-agnostic lip sync models with neural networks to deliver a fully functional face-to-face translation experience. Inference Time Analysis shows this system achieves highly realistic, face-translated talking heads in real time, with a throughput as low as 0.381 s. This transformative framework is primed for deployment in immersive environments such as VR/AR, Metaverse ecosystems, and advanced video conferencing platforms. It offers substantial benefits to developers and businesses aiming to build next-generation multilingual communication systems for diverse applications. While this work focuses on three languages, its modular design allows scalability to additional languages. However, further testing in broader linguistic and cultural contexts is required to confirm its universal applicability, paving the way for a more interconnected and inclusive world where language ceases to hinder human connection. © 2024 by the authors.

Li, Z.; Zhang, H.; Peng, C.; Peiris, R.

Exploring Large Language Model-Driven Agents for Environment-Aware Spatial Interactions and Conversations in Virtual Reality Role-Play Scenarios Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR, pp. 1–11, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833153645-9 (ISBN).

Abstract | Links | BibTeX | Tags: Chatbots, Computer simulation languages, Context- awareness, context-awareness, Digital elevation model, Generative AI, Human-AI Interaction, Language Model, Large language model, large language models, Model agents, Role-play simulation, role-play simulations, Role-plays, Spatial interaction, Virtual environments, Virtual Reality, Virtual-reality environment

@inproceedings{li_exploring_2025,

title = {Exploring Large Language Model-Driven Agents for Environment-Aware Spatial Interactions and Conversations in Virtual Reality Role-Play Scenarios},

author = {Z. Li and H. Zhang and C. Peng and R. Peiris},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105002706893&doi=10.1109%2fVR59515.2025.00025&partnerID=40&md5=60f22109e054c9035a0c2210bb797039},

doi = {10.1109/VR59515.2025.00025},

isbn = {979-833153645-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR},

pages = {1–11},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Recent research has begun adopting Large Language Model (LLM) agents to enhance Virtual Reality (VR) interactions, creating immersive chatbot experiences. However, while current studies focus on generating dialogue from user speech inputs, their abilities to generate richer experiences based on the perception of LLM agents' VR environments and interaction cues remain unexplored. Hence, in this work, we propose an approach that enables LLM agents to perceive virtual environments and generate environment-aware interactions and conversations for an embodied human-AI interaction experience in VR environments. Here, we define a schema for describing VR environments and their interactions through text prompts. We evaluate the performance of our method through five role-play scenarios created using our approach in a study with 14 participants. The findings discuss the opportunities and challenges of our proposed approach for developing environment-aware LLM agents that facilitate spatial interactions and conversations within VR role-play scenarios. © 2025 IEEE.},

keywords = {Chatbots, Computer simulation languages, Context- awareness, context-awareness, Digital elevation model, Generative AI, Human-AI Interaction, Language Model, Large language model, large language models, Model agents, Role-play simulation, role-play simulations, Role-plays, Spatial interaction, Virtual environments, Virtual Reality, Virtual-reality environment},

pubstate = {published},

tppubtype = {inproceedings}

}

Rasch, J.; Töws, J.; Hirzle, T.; Müller, F.; Schmitz, M.

CreepyCoCreator? Investigating AI Representation Modes for 3D Object Co-Creation in Virtual Reality Proceedings Article

In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071394-1 (ISBN).

Abstract | Links | BibTeX | Tags: 3D Creation, 3D modeling, 3D object, Building process, Co-creation, Co-creative system, Co-creative systems, Creative systems, Creatives, Generative AI, Three dimensional computer graphics, User expectations, User Studies, User study, Virtual Reality, Virtualization

@inproceedings{rasch_creepycocreator_2025,

title = {CreepyCoCreator? Investigating AI Representation Modes for 3D Object Co-Creation in Virtual Reality},

author = {J. Rasch and J. Töws and T. Hirzle and F. Müller and M. Schmitz},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005742763&doi=10.1145%2f3706598.3713720&partnerID=40&md5=e6cdcb6cc7249a8836ecc39ae103cd53},

doi = {10.1145/3706598.3713720},

isbn = {979-840071394-1 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Conf Hum Fact Comput Syst Proc},

publisher = {Association for Computing Machinery},

abstract = {Generative AI in Virtual Reality offers the potential for collaborative object-building, yet challenges remain in aligning AI contributions with user expectations. In particular, users often struggle to understand and collaborate with AI when its actions are not transparently represented. This paper thus explores the co-creative object-building process through a Wizard-of-Oz study, focusing on how AI can effectively convey its intent to users during object customization in Virtual Reality. Inspired by human-to-human collaboration, we focus on three representation modes: the presence of an embodied avatar, whether the AI's contributions are visualized immediately or incrementally, and whether the areas modified are highlighted in advance. The findings provide insights into how these factors affect user perception and interaction with object-generating AI tools in Virtual Reality as well as satisfaction and ownership of the created objects. The results offer design implications for co-creative world-building systems, aiming to foster more effective and satisfying collaborations between humans and AI in Virtual Reality. © 2025 Copyright held by the owner/author(s).},

keywords = {3D Creation, 3D modeling, 3D object, Building process, Co-creation, Co-creative system, Co-creative systems, Creative systems, Creatives, Generative AI, Three dimensional computer graphics, User expectations, User Studies, User study, Virtual Reality, Virtualization},

pubstate = {published},

tppubtype = {inproceedings}

}

Cao, X.; Ju, K. P.; Li, C.; Jain, D.

SceneGenA11y: How can Runtime Generative tools improve the Accessibility of a Virtual 3D Scene? Proceedings Article

In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071395-8 (ISBN).

Abstract | Links | BibTeX | Tags: 3D application, 3D modeling, 3D scenes, Accessibility, BLV, DHH, Discrete event simulation, Generative AI, Generative tools, Interactive computer graphics, One dimensional, Runtimes, Three dimensional computer graphics, Video-games, Virtual 3d scene, virtual 3D scenes, Virtual environments, Virtual Reality

@inproceedings{cao_scenegena11y_2025,

title = {SceneGenA11y: How can Runtime Generative tools improve the Accessibility of a Virtual 3D Scene?},

author = {X. Cao and K. P. Ju and C. Li and D. Jain},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005772656&doi=10.1145%2f3706599.3720265&partnerID=40&md5=9b0bf29c3e89b70efa2d6a3e740829fb},

doi = {10.1145/3706599.3720265},

isbn = {979-840071395-8 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Conf Hum Fact Comput Syst Proc},

publisher = {Association for Computing Machinery},

abstract = {With the popularity of virtual 3D applications, from video games to educational content and virtual reality scenarios, the accessibility of 3D scene information is vital to ensure inclusive and equitable experiences for all. Previous work include information substitutions like audio description and captions, as well as personalized modifications, but they could only provide predefined accommodations. In this work, we propose SceneGenA11y, a system that responds to the user’s natural language prompts to improve accessibility of a 3D virtual scene in runtime. The system primes LLM agents with accessibility-related knowledge, allowing users to explore the scene and perform verifiable modifications to improve accessibility. We conducted a preliminary evaluation of our system with three blind and low-vision people and three deaf and hard-of-hearing people. The results show that our system is intuitive to use and can successfully improve accessibility. We discussed usage patterns of the system, potential improvements, and integration into apps. We ended with highlighting plans for future work. © 2025 Copyright held by the owner/author(s).},

keywords = {3D application, 3D modeling, 3D scenes, Accessibility, BLV, DHH, Discrete event simulation, Generative AI, Generative tools, Interactive computer graphics, One dimensional, Runtimes, Three dimensional computer graphics, Video-games, Virtual 3d scene, virtual 3D scenes, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Pielage, L.; Schmidle, P.; Marschall, B.; Risse, B.

Interactive High-Quality Skin Lesion Generation using Diffusion Models for VR-based Dermatological Education Proceedings Article

In: Int Conf Intell User Interfaces Proc IUI, pp. 878–897, Association for Computing Machinery, 2025, ISBN: 979-840071306-4 (ISBN).

Abstract | Links | BibTeX | Tags: Deep learning, Dermatology, Diffusion Model, diffusion models, Digital elevation model, Generative AI, Graphical user interfaces, Guidance Strategies, Guidance strategy, Image generation, Image generations, Inpainting, Interactive Generation, Medical education, Medical Imaging, Simulation training, Skin lesion, Upsampling, Virtual environments, Virtual Reality

@inproceedings{pielage_interactive_2025,

title = {Interactive High-Quality Skin Lesion Generation using Diffusion Models for VR-based Dermatological Education},

author = {L. Pielage and P. Schmidle and B. Marschall and B. Risse},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001923208&doi=10.1145%2f3708359.3712101&partnerID=40&md5=639eec55b08a54ce813f7c1016c621e7},

doi = {10.1145/3708359.3712101},

isbn = {979-840071306-4 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Int Conf Intell User Interfaces Proc IUI},

pages = {878–897},

publisher = {Association for Computing Machinery},

abstract = {Malignant melanoma is one of the most lethal forms of cancer when not detected early. As a result, cancer screening programs have been implemented internationally, all of which require visual inspection of skin lesions. Early melanoma detection is a crucial competence in medical and dermatological education, and it is primarily trained using 2D imagery. However, given the intrinsic 3D nature of skin lesions and the importance of incorporating additional contextual information about the patient (e.g., skin type, nearby lesions, etc.), this approach falls short of providing a comprehensive and scalable learning experience. A potential solution is the use of Virtual Reality (VR) scenarios, which can offer an effective strategy to train skin cancer screenings in a realistic 3D setting, thereby enhancing medical students' awareness of early melanoma detection. In this paper, we present a comprehensive pipeline and models for generating malignant melanomas and benign nevi, which can be utilized in VR-based medical training. We use diffusion models for the generation of skin lesions, which we have enhanced with various guiding strategies to give educators maximum flexibility in designing scenarios and seamlessly placing lesions on virtual agents. Additionally, we have developed a tool which comprises a graphical user interface (GUI) enabling the generation of new lesions and adapting existing ones using an intuitive and interactive inpainting strategy. The tool also offers a novel custom upsampling strategy to achieve a sufficient resolution required for diagnostic purposes. The generated skin lesions have been validated in a user study with trained dermatologists, confirming the overall high quality of the generated lesions and the utility for educational purposes. © 2025 Copyright held by the owner/author(s).},

keywords = {Deep learning, Dermatology, Diffusion Model, diffusion models, Digital elevation model, Generative AI, Graphical user interfaces, Guidance Strategies, Guidance strategy, Image generation, Image generations, Inpainting, Interactive Generation, Medical education, Medical Imaging, Simulation training, Skin lesion, Upsampling, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Sousa, R. T.; Oliveira, E. A. M.; Cintra, L. M. F.; Filho, A. R. G.

Transformative Technologies for Rehabilitation: Leveraging Immersive and AI-Driven Solutions to Reduce Recidivism and Promote Decent Work Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 168–171, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).

Abstract | Links | BibTeX | Tags: AI- Driven Rehabilitation, Artificial intelligence- driven rehabilitation, Emotional intelligence, Engineering education, Generative AI, generative artificial intelligence, Immersive, Immersive technologies, Immersive Technology, Language Model, Large language model, large language models, Skills development, Social Reintegration, Social skills, Sociology, Vocational training

@inproceedings{sousa_transformative_2025,

title = {Transformative Technologies for Rehabilitation: Leveraging Immersive and AI-Driven Solutions to Reduce Recidivism and Promote Decent Work},

author = {R. T. Sousa and E. A. M. Oliveira and L. M. F. Cintra and A. R. G. Filho},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005140551&doi=10.1109%2fVRW66409.2025.00042&partnerID=40&md5=89da6954863a272d48c0d8da3760bfb6},

doi = {10.1109/VRW66409.2025.00042},

isbn = {979-833151484-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},

pages = {168–171},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {The reintegration of incarcerated individuals into society presents significant challenges, particularly in addressing barriers related to vocational training, social skill development, and emotional rehabilitation. Immersive technologies, such as Virtual Reality and Augmented Reality, combined with generative Artificial Intelligence (AI) and Large Language Models, offer innovative opportunities to enhance these areas. These technologies create practical, controlled environments for skill acquisition and behavioral training, while generative AI enables dynamic, personalized, and adaptive experiences. This paper explores the broader potential of these integrated technologies in supporting rehabilitation, reducing recidivism, and fostering sustainable employment opportunities and these initiatives align with the overarching equity objective of ensuring Decent Work for All, reinforcing the commitment to inclusive and equitable progress across diverse communities, through the transformative potential of immersive and AI-driven systems in correctional systems. © 2025 IEEE.},

keywords = {AI- Driven Rehabilitation, Artificial intelligence- driven rehabilitation, Emotional intelligence, Engineering education, Generative AI, generative artificial intelligence, Immersive, Immersive technologies, Immersive Technology, Language Model, Large language model, large language models, Skills development, Social Reintegration, Social skills, Sociology, Vocational training},

pubstate = {published},

tppubtype = {inproceedings}

}

Behravan, M.; Gračanin, D.

From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 150–155, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, 3D object, 3D Object Generation, 3D reconstruction, Augmented Reality, Cutting edges, Generative AI, Interactive computer systems, Language Model, Large language model, large language models, matrix, Multilingual speech interaction, Real- time, Speech enhancement, Speech interaction, Volume Rendering

@inproceedings{behravan_voices_2025,

title = {From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality},

author = {M. Behravan and D. Gračanin},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005153589&doi=10.1109%2fVRW66409.2025.00038&partnerID=40&md5=b8aaab4e2378cde3595d98d79266d371},

doi = {10.1109/VRW66409.2025.00038},

isbn = {979-833151484-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},

pages = {150–155},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This paper presents Matrix, an advanced AI-powered framework designed for real-time 3D object generation in Augmented Reality (AR) environments. By integrating a cutting-edge text-to-3D generative AI model, multilingual speech-to-text translation, and large language models (LLMs), the system enables seamless user interactions through spoken commands. The framework processes speech inputs, generates 3D objects, and provides object recommendations based on contextual understanding, enhancing AR experiences. A key feature of this framework is its ability to optimize 3D models by reducing mesh complexity, resulting in significantly smaller file sizes and faster processing on resource-constrained AR devices. Our approach addresses the challenges of high GPU usage, large model output sizes, and real-time system responsiveness, ensuring a smoother user experience. Moreover, the system is equipped with a pre-generated object repository, further reducing GPU load and improving efficiency. We demonstrate the practical applications of this framework in various fields such as education, design, and accessibility, and discuss future enhancements including image-to-3D conversion, environmental object detection, and multimodal support. The open-source nature of the framework promotes ongoing innovation and its utility across diverse industries. © 2025 IEEE.},

keywords = {3D modeling, 3D object, 3D Object Generation, 3D reconstruction, Augmented Reality, Cutting edges, Generative AI, Interactive computer systems, Language Model, Large language model, large language models, matrix, Multilingual speech interaction, Real- time, Speech enhancement, Speech interaction, Volume Rendering},

pubstate = {published},

tppubtype = {inproceedings}

}

Behravan, M.; Haghani, M.; Gračanin, D.

Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality Proceedings Article

In: J.Y.C., Chen; G., Fragomeni (Ed.): Lect. Notes Comput. Sci., pp. 13–32, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193699-9 (ISBN).

Abstract | Links | BibTeX | Tags: 3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering

@inproceedings{behravan_transcending_2025,

title = {Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality},

author = {M. Behravan and M. Haghani and D. Gračanin},

editor = {Chen J.Y.C. and Fragomeni G.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007690904&doi=10.1007%2f978-3-031-93700-2_2&partnerID=40&md5=1c4d643aad88d08cbbc9dd2c02413f10},

doi = {10.1007/978-3-031-93700-2_2},

isbn = {03029743 (ISSN); 978-303193699-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15788 LNCS},

pages = {13–32},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Traditional 3D modeling requires technical expertise, specialized software, and time-intensive processes, making it inaccessible for many users. Our research aims to lower these barriers by combining generative AI and augmented reality (AR) into a cohesive system that allows users to easily generate, manipulate, and interact with 3D models in real time, directly within AR environments. Utilizing cutting-edge AI models like Shap-E, we address the complex challenges of transforming 2D images into 3D representations in AR environments. Key challenges such as object isolation, handling intricate backgrounds, and achieving seamless user interaction are tackled through advanced object detection methods, such as Mask R-CNN. Evaluation results from 35 participants reveal an overall System Usability Scale (SUS) score of 69.64, with participants who engaged with AR/VR technologies more frequently rating the system significantly higher, at 80.71. This research is particularly relevant for applications in gaming, education, and AR-based e-commerce, offering intuitive, model creation for users without specialized skills. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering},

pubstate = {published},

tppubtype = {inproceedings}

}

Espinal, W. Y. Arevalo; Jimenez, J.; Corneo, L.

An eXtended Reality Data Transformation Framework for Internet of Things Devices Integration Proceedings Article

In: IoT - Proc. Int. Conf. Internet Things, pp. 10–18, Association for Computing Machinery, Inc, 2025, ISBN: 979-840071285-2 (ISBN).

Abstract | Links | BibTeX | Tags: Application programs, Comprehensive evaluation, Data integration, Data Transformation, Device and Data Integration, Devices integration, Extended reality, Generative AI, Interactive objects, Internet of Things, Language Model, Software runtime, Time-consuming tasks

@inproceedings{arevalo_espinal_extended_2025,

title = {An eXtended Reality Data Transformation Framework for Internet of Things Devices Integration},

author = {W. Y. Arevalo Espinal and J. Jimenez and L. Corneo},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105002862430&doi=10.1145%2f3703790.3703792&partnerID=40&md5=6ba7d70e00e3b0803149854b5744e55e},

doi = {10.1145/3703790.3703792},

isbn = {979-840071285-2 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {IoT - Proc. Int. Conf. Internet Things},

pages = {10–18},

publisher = {Association for Computing Machinery, Inc},

abstract = {The multidisciplinary nature of XR applications makes device and data integration a resource-intensive and time-consuming task, especially in the context of the Internet of Things (IoT). This paper presents Visualize Interactive Objects, VIO for short, a data transformation framework aimed at simplifying visualization and interaction of IoT devices and their data into XR applications. VIO comprises a software runtime (VRT) running on XR headsets, and a JSON-based syntax for defining VIO Descriptions (VDs). The VRT interprets VDs to facilitate visualization and interaction within the application. By raising the level of abstraction, VIO enhances interoperability among XR experiences and enables developers to integrate IoT data with minimal coding effort. A comprehensive evaluation demonstrated that VIO is lightweight, incurring in negligible overhead compared to native implementations. Ten Large Language Models (LLM) were used to generate VDs and native source-code from user intents. The results showed that LLMs have superior syntactical and semantical accuracy in generating VDs compared to native XR application development code, thus indicating that the task of creating VDs can be effectively automated using LLMs. Additionally, a user study with 12 participants found that VIO is developer-friendly and easily extensible. © 2024 Copyright held by the owner/author(s).},

keywords = {Application programs, Comprehensive evaluation, Data integration, Data Transformation, Device and Data Integration, Devices integration, Extended reality, Generative AI, Interactive objects, Internet of Things, Language Model, Software runtime, Time-consuming tasks},

pubstate = {published},

tppubtype = {inproceedings}

}

Gatti, E.; Giunchi, D.; Numan, N.; Steed, A.

Around the Virtual Campfire: Early UX Insights into AI-Generated Stories in VR Proceedings Article

In: Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR, pp. 136–141, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152157-8 (ISBN).

Abstract | Links | BibTeX | Tags: Generative AI, Images synthesis, Immersive, Interactive Environments, Language Model, Large language model, Storytelling, User input, User study, Users' experiences, Virtual environments, VR

Kim, M.; Kim, T.; Lee, K. -T.

3D Digital Human Generation from a Single Image Using Generative AI with Real-Time Motion Synchronization Journal Article

In: Electronics (Switzerland), vol. 14, no. 4, 2025, ISSN: 20799292 (ISSN).

Abstract | Links | BibTeX | Tags: 3D digital human, 3D human generation, digital twin, Generative AI, pose estimation, real-time motion synchronization, single image processing, SMPL-X, Unity 3D

@article{kim_3d_2025,

title = {3D Digital Human Generation from a Single Image Using Generative AI with Real-Time Motion Synchronization},

author = {M. Kim and T. Kim and K. -T. Lee},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85218855876&doi=10.3390%2felectronics14040777&partnerID=40&md5=f1d0a0238c6422327901e4d4b6a43727},

doi = {10.3390/electronics14040777},

issn = {20799292 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Electronics (Switzerland)},

volume = {14},

number = {4},

abstract = {The generation of 3D digital humans has traditionally relied on multi-view imaging systems and large-scale datasets, posing challenges in cost, accessibility, and real-time applicability. To overcome these limitations, this study presents an efficient pipeline that constructs high-fidelity 3D digital humans from a single frontal image. By leveraging generative AI, the system synthesizes additional views and generates UV maps compatible with the SMPL-X model, ensuring anatomically accurate and photorealistic reconstructions. The generated 3D models are imported into Unity 3D, where they are rigged for real-time motion synchronization using BlazePose-based lightweight pose estimation. To further enhance motion realism, custom algorithms—including ground detection and rotation smoothing—are applied, improving movement stability and fluidity. The system was rigorously evaluated through both quantitative and qualitative analyses. Results show an average generation time of 211.1 s, segmentation accuracy of 92.1%, and real-time rendering at 64.4 FPS. In qualitative assessments, expert reviewers rated the system using the SUS usability framework and heuristic evaluation, confirming its usability and effectiveness. This method eliminates the need for multi-view cameras or depth sensors, significantly reducing the barrier to entry for real-time 3D avatar creation and interactive AI-driven applications. It has broad applications in virtual reality (VR), gaming, digital content creation, AI-driven simulation, digital twins, and telepresence systems. By introducing a scalable and accessible 3D modeling pipeline, this research lays the groundwork for future advancements in immersive and interactive environments. © 2025 by the authors.},

keywords = {3D digital human, 3D human generation, digital twin, Generative AI, pose estimation, real-time motion synchronization, single image processing, SMPL-X, Unity 3D},

pubstate = {published},

tppubtype = {article}

}

Behravan, M.; Matković, K.; Gračanin, D.

Generative AI for Context-Aware 3D Object Creation Using Vision-Language Models in Augmented Reality Proceedings Article

In: Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR, pp. 73–81, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152157-8 (ISBN).

Abstract | Links | BibTeX | Tags: 3D object, 3D Object Generation, Artificial intelligence systems, Augmented Reality, Capture images, Context-Aware, Generative adversarial networks, Generative AI, generative artificial intelligence, Generative model, Language Model, Object creation, Vision language model, vision language models, Visual languages

@inproceedings{behravan_generative_2025,

title = {Generative AI for Context-Aware 3D Object Creation Using Vision-Language Models in Augmented Reality},

author = {M. Behravan and K. Matković and D. Gračanin},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000292700&doi=10.1109%2fAIxVR63409.2025.00018&partnerID=40&md5=b40fa769a6b427918c3fcd86f7c52a75},

doi = {10.1109/AIxVR63409.2025.00018},

isbn = {979-833152157-8 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR},

pages = {73–81},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {We present a novel Artificial Intelligence (AI) system that functions as a designer assistant in augmented reality (AR) environments. Leveraging Vision Language Models (VLMs) like LLaVA and advanced text-to-3D generative models, users can capture images of their surroundings with an Augmented Reality (AR) headset. The system analyzes these images to recommend contextually relevant objects that enhance both functionality and visual appeal. The recommended objects are generated as 3D models and seamlessly integrated into the AR environment for interactive use. Our system utilizes open-source AI models running on local systems to enhance data security and reduce operational costs. Key features include context-aware object suggestions, optimal placement guidance, aesthetic matching, and an intuitive user interface for real-time interaction. Evaluations using the COCO 2017 dataset and real-world AR testing demonstrated high accuracy in object detection and contextual fit rating of 4.1 out of 5. By addressing the challenge of providing context-aware object recommendations in AR, our system expands the capabilities of AI applications in this domain. It enables users to create personalized digital spaces efficiently, leveraging AI for contextually relevant suggestions. © 2025 IEEE.},

keywords = {3D object, 3D Object Generation, Artificial intelligence systems, Augmented Reality, Capture images, Context-Aware, Generative adversarial networks, Generative AI, generative artificial intelligence, Generative model, Language Model, Object creation, Vision language model, vision language models, Visual languages},

pubstate = {published},

tppubtype = {inproceedings}

}