AHCI RESEARCH GROUP
Publications
Papers published in international journals,
proceedings of conferences, workshops and books.
OUR RESEARCH
Scientific Publications
How to
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.
2025
Coronado, A.; Carvalho, S. T.; Berretta, L.
See Through My Eyes: Using Multimodal Large Language Model for Describing Rendered Environments to Blind People Proceedings Article
In: IMX - Proc. ACM Int. Conf. Interact. Media Experiences, pp. 451–457, Association for Computing Machinery, Inc, 2025, ISBN: 979-840071391-0 (ISBN).
Abstract | Links | BibTeX | Tags: Accessibility, Behavioral Research, Blind, Blind people, Helmet mounted displays, Human engineering, Human rehabilitation equipment, Interactive computer graphics, Interactive computer systems, Language Model, LLM, Multi-modal, Rendered environment, rendered environments, Spatial cognition, Virtual Reality, Vision aids, Visual impairment, Visual languages, Visually impaired people
@inproceedings{coronado_see_2025,
title = {See Through My Eyes: Using Multimodal Large Language Model for Describing Rendered Environments to Blind People},
author = {A. Coronado and S. T. Carvalho and L. Berretta},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007991842&doi=10.1145%2f3706370.3731641&partnerID=40&md5=2f7cb1535d39d5e59b1f43f773de3272},
doi = {10.1145/3706370.3731641},
isbn = {979-840071391-0 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {IMX - Proc. ACM Int. Conf. Interact. Media Experiences},
pages = {451–457},
publisher = {Association for Computing Machinery, Inc},
abstract = {Extended Reality (XR) is quickly expanding "as the next major technology wave in personal computing". Nevertheless, this expansion and adoption could also exclude certain disabled users, particularly people with visual impairment (VIP). According to the World Health Organization (WHO) in their 2019 publication, there were at least 2.2 billion people with visual impairment, a number that is also estimated to have increased in recent years. Therefore, it is important to include disabled users, especially visually impaired people, in the design of Head-Mounted Displays and Extended Reality environments. Indeed, this objective can be pursued by incorporating Multimodal Large Language Model (MLLM) technology, which can assist visually impaired people. As a case study, this study employs different prompts that result in environment descriptions from an MLLM integrated into a virtual reality (VR) escape room. Therefore, six potential prompts were engineered to generate valuable outputs for visually impaired users inside a VR environment. These outputs were evaluated using the G-Eval, and VIEScore metrics. Even though, the results show that the prompt patterns provided a description that aligns with the user's point of view, it is highly recommended to evaluate these outputs through "expected outputs"from Orientation and Mobility Specialists, and Sighted Guides. Furthermore, the subsequent step in the process is to evaluate these outputs by visually impaired people themselves to identify the most effective prompt pattern. © 2025 Copyright held by the owner/author(s).},
keywords = {Accessibility, Behavioral Research, Blind, Blind people, Helmet mounted displays, Human engineering, Human rehabilitation equipment, Interactive computer graphics, Interactive computer systems, Language Model, LLM, Multi-modal, Rendered environment, rendered environments, Spatial cognition, Virtual Reality, Vision aids, Visual impairment, Visual languages, Visually impaired people},
pubstate = {published},
tppubtype = {inproceedings}
}
Sabir, A.; Hussain, R.; Pedro, A.; Park, C.
Personalized construction safety training system using conversational AI in virtual reality Journal Article
In: Automation in Construction, vol. 175, 2025, ISSN: 09265805 (ISSN).
Abstract | Links | BibTeX | Tags: Construction safety, Construction safety training, Conversational AI, Digital elevation model, Helmet mounted displays, Language Model, Large language model, large language models, Personalized safety training, Personnel training, Safety training, Training Systems, Virtual environments, Virtual Reality, Workers'
@article{sabir_personalized_2025,
title = {Personalized construction safety training system using conversational AI in virtual reality},
author = {A. Sabir and R. Hussain and A. Pedro and C. Park},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105002741042&doi=10.1016%2fj.autcon.2025.106207&partnerID=40&md5=376284339bf10fd5d799cc56c6643d36},
doi = {10.1016/j.autcon.2025.106207},
issn = {09265805 (ISSN)},
year = {2025},
date = {2025-01-01},
journal = {Automation in Construction},
volume = {175},
abstract = {Training workers in safety protocols is crucial for mitigating job site hazards, yet traditional methods often fall short. This paper explores integrating virtual reality (VR) and large language models (LLMs) into iSafeTrainer, an AI-powered safety training system. The system allows trainees to engage with trade-specific content tailored to their expertise level in a third-person perspective in a non-immersive desktop virtual environment, eliminating the need for head-mounted displays. An experimental study evaluated the system through qualitative, survey-based assessments, focusing on user satisfaction, experience, engagement, guidance, and confidence. Results showed high satisfaction rates (>85 %) among novice users, with improved safety knowledge. Expert users suggested advanced scenarios, highlighting the system's potential for expansion. The modular architecture supports customization across various construction settings, ensuring adaptability for future improvements. © 2024},
keywords = {Construction safety, Construction safety training, Conversational AI, Digital elevation model, Helmet mounted displays, Language Model, Large language model, large language models, Personalized safety training, Personnel training, Safety training, Training Systems, Virtual environments, Virtual Reality, Workers'},
pubstate = {published},
tppubtype = {article}
}
Buldu, K. B.; Özdel, S.; Lau, K. H. Carrie; Wang, M.; Saad, D.; Schönborn, S.; Boch, A.; Kasneci, E.; Bozkir, E.
CUIfy the XR: An Open-Source Package to Embed LLM-Powered Conversational Agents in XR Proceedings Article
In: Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR, pp. 192–197, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152157-8 (ISBN).
Abstract | Links | BibTeX | Tags: Augmented Reality, Computational Linguistics, Conversational user interface, conversational user interfaces, Extended reality, Head-mounted-displays, Helmet mounted displays, Language Model, Large language model, large language models, Non-player character, non-player characters, Open source software, Personnel training, Problem oriented languages, Speech models, Speech-based interaction, Text to speech, Unity, Virtual environments, Virtual Reality
@inproceedings{buldu_cuify_2025,
title = {CUIfy the XR: An Open-Source Package to Embed LLM-Powered Conversational Agents in XR},
author = {K. B. Buldu and S. Özdel and K. H. Carrie Lau and M. Wang and D. Saad and S. Schönborn and A. Boch and E. Kasneci and E. Bozkir},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000229165&doi=10.1109%2fAIxVR63409.2025.00037&partnerID=40&md5=837b0e3425d2e5a9358bbe6c8ecb5754},
doi = {10.1109/AIxVR63409.2025.00037},
isbn = {979-833152157-8 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR},
pages = {192–197},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {Recent developments in computer graphics, machine learning, and sensor technologies enable numerous opportunities for extended reality (XR) setups for everyday life, from skills training to entertainment. With large corporations offering affordable consumer-grade head-mounted displays (HMDs), XR will likely become pervasive, and HMDs will develop as personal devices like smartphones and tablets. However, having intelligent spaces and naturalistic interactions in XR is as important as tech-nological advances so that users grow their engagement in virtual and augmented spaces. To this end, large language model (LLM)-powered non-player characters (NPCs) with speech-to-text (STT) and text-to-speech (TTS) models bring significant advantages over conventional or pre-scripted NPCs for facilitating more natural conversational user interfaces (CUIs) in XR. This paper provides the community with an open-source, customizable, extendable, and privacy-aware Unity package, CUIfy, that facili-tates speech-based NPC-user interaction with widely used LLMs, STT, and TTS models. Our package also supports multiple LLM-powered NPCs per environment and minimizes latency between different computational models through streaming to achieve us-able interactions between users and NPCs. We publish our source code in the following repository: https://gitlab.lrz.de/hctl/cuify © 2025 IEEE.},
keywords = {Augmented Reality, Computational Linguistics, Conversational user interface, conversational user interfaces, Extended reality, Head-mounted-displays, Helmet mounted displays, Language Model, Large language model, large language models, Non-player character, non-player characters, Open source software, Personnel training, Problem oriented languages, Speech models, Speech-based interaction, Text to speech, Unity, Virtual environments, Virtual Reality},
pubstate = {published},
tppubtype = {inproceedings}
}
2024
Zheng, P.; Li, C.; Fan, J.; Wang, L.
In: CIRP Annals, vol. 73, no. 1, pp. 341–344, 2024, ISSN: 00078506 (ISSN).
Abstract | Links | BibTeX | Tags: Collaboration task, Collaborative manufacturing, Deep learning, Helmet mounted displays, Human robots, Human-centric, Human-guided robot learning, Human-Robot Collaboration, Interface states, Manipulators, Manufacturing system, Manufacturing tasks, Mixed reality, Mixed reality head-mounted displays, Reinforcement Learning, Reinforcement learnings, Robot vision, Smart manufacturing
@article{zheng_vision-language-guided_2024,
title = {A vision-language-guided and deep reinforcement learning-enabled approach for unstructured human-robot collaborative manufacturing task fulfilment},
author = {P. Zheng and C. Li and J. Fan and L. Wang},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85190754943&doi=10.1016%2fj.cirp.2024.04.003&partnerID=40&md5=59c453e1931e912472e76b86b77a881b},
doi = {10.1016/j.cirp.2024.04.003},
issn = {00078506 (ISSN)},
year = {2024},
date = {2024-01-01},
journal = {CIRP Annals},
volume = {73},
number = {1},
pages = {341–344},
abstract = {Human-Robot Collaboration (HRC) has emerged as a pivot in contemporary human-centric smart manufacturing scenarios. However, the fulfilment of HRC tasks in unstructured scenes brings many challenges to be overcome. In this work, mixed reality head-mounted display is modelled as an effective data collection, communication, and state representation interface/tool for HRC task settings. By integrating vision-language cues with large language model, a vision-language-guided HRC task planning approach is firstly proposed. Then, a deep reinforcement learning-enabled mobile manipulator motion control policy is generated to fulfil HRC task primitives. Its feasibility is demonstrated in several HRC unstructured manufacturing tasks with comparative results. © 2024 The Author(s)},
keywords = {Collaboration task, Collaborative manufacturing, Deep learning, Helmet mounted displays, Human robots, Human-centric, Human-guided robot learning, Human-Robot Collaboration, Interface states, Manipulators, Manufacturing system, Manufacturing tasks, Mixed reality, Mixed reality head-mounted displays, Reinforcement Learning, Reinforcement learnings, Robot vision, Smart manufacturing},
pubstate = {published},
tppubtype = {article}
}
Wong, A.; Zhao, Y.; Baghaei, N.
Effects of Customizable Intelligent VR Shopping Assistant on Shopping for Stress Relief Proceedings Article
In: U., Eck; M., Sra; J., Stefanucci; M., Sugimoto; M., Tatzgern; I., Williams (Ed.): Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct, pp. 304–308, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-833150691-9 (ISBN).
Abstract | Links | BibTeX | Tags: Customisation, Customizable, generative artificial intelligence, Head-mounted-displays, Helmet mounted displays, Immersive, Mental health, mHealth, Realistic rendering, stress, Stress relief, Users' experiences, Virtual environments, Virtual Reality, Virtual shopping, Virtual shopping assistant
@inproceedings{wong_effects_2024,
title = {Effects of Customizable Intelligent VR Shopping Assistant on Shopping for Stress Relief},
author = {A. Wong and Y. Zhao and N. Baghaei},
editor = {Eck U. and Sra M. and Stefanucci J. and Sugimoto M. and Tatzgern M. and Williams I.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85214427097&doi=10.1109%2fISMAR-Adjunct64951.2024.00069&partnerID=40&md5=1530bc0a2139fb33b1a2917c3eb31296},
doi = {10.1109/ISMAR-Adjunct64951.2024.00069},
isbn = {979-833150691-9 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct},
pages = {304–308},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {Shopping has long since been a method of distraction and relieving stress. Virtual Reality (VR) effectively simulates immersive experiences, including shopping through head-mounted displays (HMD), which create an environment through realistic renderings and sounds. Current studies in VR have shown that assistants can support users by reducing stress, indicating their ability to improve mental health within VR. Customization and personalization have also been used to enhance the user experience with users preferring the tailored experience and leading to a greater sense of immersion. There is a gap in knowledge on the effects of customization on a VR assistant's ability to reduce stress within the VR retailing space. This research aims to identify relationships between customization and shopping assistants within VR to better understand its effects on the user experience. Understanding this will help the development of VR assistants for mental health and consumer-ready VR shopping experiences. © 2024 IEEE.},
keywords = {Customisation, Customizable, generative artificial intelligence, Head-mounted-displays, Helmet mounted displays, Immersive, Mental health, mHealth, Realistic rendering, stress, Stress relief, Users' experiences, Virtual environments, Virtual Reality, Virtual shopping, Virtual shopping assistant},
pubstate = {published},
tppubtype = {inproceedings}
}
Rosati, R.; Senesi, P.; Lonzi, B.; Mancini, A.; Mandolini, M.
An automated CAD-to-XR framework based on generative AI and Shrinkwrap modelling for a User-Centred design approach Journal Article
In: Advanced Engineering Informatics, vol. 62, 2024, ISSN: 14740346 (ISSN).
Abstract | Links | BibTeX | Tags: Adversarial networks, Artificial intelligence, CAD-to-XR, Computer aided design models, Computer aided logic design, Computer-aided design, Computer-aided design-to-XR, Design simplification, Digital elevation model, Digital storage, Extended reality, Flow visualization, Generative adversarial networks, Guns (armament), Helmet mounted displays, Intellectual property core, Mixed reality, Photo-realistic, Shrinkfitting, Structural dynamics, User centered design, User-centered design, User-centered design approaches, User-centred, Virtual Prototyping, Work-flows
@article{rosati_automated_2024,
title = {An automated CAD-to-XR framework based on generative AI and Shrinkwrap modelling for a User-Centred design approach},
author = {R. Rosati and P. Senesi and B. Lonzi and A. Mancini and M. Mandolini},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85204897460&doi=10.1016%2fj.aei.2024.102848&partnerID=40&md5=3acce73b986bed7a9de42e6336d637ad},
doi = {10.1016/j.aei.2024.102848},
issn = {14740346 (ISSN)},
year = {2024},
date = {2024-01-01},
journal = {Advanced Engineering Informatics},
volume = {62},
abstract = {CAD-to-XR is the workflow to generate interactive Photorealistic Virtual Prototypes (iPVPs) for Extended Reality (XR) apps from Computer-Aided Design (CAD) models. This process entails modelling, texturing, and XR programming. In the literature, no automatic CAD-to-XR frameworks simultaneously manage CAD simplification and texturing. There are no examples of their adoption for User-Centered Design (UCD). Moreover, such CAD-to-XR workflows do not seize the potentialities of generative algorithms to produce synthetic images (textures). The paper presents a framework for implementing the CAD-to-XR workflow. The solution consists of a module for texture generation based on Generative Adversarial Networks (GANs). The generated texture is then managed by another module (based on Shrinkwrap modelling) to develop the iPVP by simplifying the 3D model and UV mapping the generated texture. The geometric and material data is integrated into a graphic engine, which allows for programming an interactive experience with the iPVP in XR. The CAD-to-XR framework was validated on two components (rifle stock and forend) of a sporting rifle. The solution can automate the texturing process of different product versions in shorter times (compared to a manual procedure). After each product revision, it avoids tedious and manual activities required to generate a new iPVP. The image quality metrics highlight that images are generated in a “realistic” manner (the perceived quality of generated textures is highly comparable to real images). The quality of the iPVPs, generated through the proposed framework and visualised by users through a mixed reality head-mounted display, is equivalent to traditionally designed prototypes. © 2024 The Author(s)},
keywords = {Adversarial networks, Artificial intelligence, CAD-to-XR, Computer aided design models, Computer aided logic design, Computer-aided design, Computer-aided design-to-XR, Design simplification, Digital elevation model, Digital storage, Extended reality, Flow visualization, Generative adversarial networks, Guns (armament), Helmet mounted displays, Intellectual property core, Mixed reality, Photo-realistic, Shrinkfitting, Structural dynamics, User centered design, User-centered design, User-centered design approaches, User-centred, Virtual Prototyping, Work-flows},
pubstate = {published},
tppubtype = {article}
}