AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Carcangiu, A.; Manca, M.; Mereu, J.; Santoro, C.; Simeoli, L.; Spano, L. D.

Conversational Rule Creation in XR: User’s Strategies in VR and AR Automation Proceedings Article

In: C., Santoro; A., Schmidt; M., Matera; A., Bellucci (Ed.): Lect. Notes Comput. Sci., pp. 59–79, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303195451-1 (ISBN).

Abstract | Links | BibTeX | Tags: 'current, Automation, Chatbots, Condition, End-User Development, Extended reality, Human computer interaction, Immersive authoring, Language Model, Large language model, large language models, Rule, Rule-based approach, rules, User interfaces

@inproceedings{carcangiu_conversational_2025,

title = {Conversational Rule Creation in XR: User’s Strategies in VR and AR Automation},

author = {A. Carcangiu and M. Manca and J. Mereu and C. Santoro and L. Simeoli and L. D. Spano},

editor = {Santoro C. and Schmidt A. and Matera M. and Bellucci A.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105009012634&doi=10.1007%2f978-3-031-95452-8_4&partnerID=40&md5=67e2b8ca4bb2b508cd41548e3471705b},

doi = {10.1007/978-3-031-95452-8_4},

isbn = {03029743 (ISSN); 978-303195451-1 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15713 LNCS},

pages = {59–79},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Rule-based approaches allow users to customize XR environments. However, the current menu-based interfaces still create barriers for end-user developers. Chatbots based on Large Language Models (LLMs) have the potential to reduce the threshold needed for rule creation, but how users articulate their intentions through conversation remains under-explored. This work investigates how users express event-condition-action automation rules in Virtual Reality (VR) and Augmented Reality (AR) environments. Through two user studies, we show that the dialogues share consistent strategies across the interaction setting (keywords, difficulties in expressing conditions, task success), even if we registered different adaptations for each setting (verbal structure, event vs action first rules). Our findings are relevant for the design and implementation of chatbot-based support for expressing automations in an XR setting. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {'current, Automation, Chatbots, Condition, End-User Development, Extended reality, Human computer interaction, Immersive authoring, Language Model, Large language model, large language models, Rule, Rule-based approach, rules, User interfaces},

pubstate = {published},

tppubtype = {inproceedings}

}

Xi, Z.; Yao, Z.; Huang, J.; Lu, Z. -Q.; Yan, H.; Mu, T. -J.; Wang, Z.; Xu, Q. -C.

TerraCraft: City-scale generative procedural modeling with natural languages Journal Article

In: Graphical Models, vol. 141, 2025, ISSN: 15240703 (ISSN), (Publisher: Elsevier Inc.).

Abstract | Links | BibTeX | Tags: 3D scene generation, 3D scenes, algorithm, Automation, City layout, City scale, data set, Diffusion Model, Game design, Geometry, High quality, Language, Language Model, Large datasets, Large language model, LLMs, Modeling languages, Natural language processing systems, Procedural modeling, Procedural models, Scene Generation, Three dimensional computer graphics, three-dimensional modeling, urban area, Virtual Reality

@article{xi_terracraft_2025,

title = {TerraCraft: City-scale generative procedural modeling with natural languages},

author = {Z. Xi and Z. Yao and J. Huang and Z. -Q. Lu and H. Yan and T. -J. Mu and Z. Wang and Q. -C. Xu},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105012397682&doi=10.1016%2Fj.gmod.2025.101285&partnerID=40&md5=15a84050280e5015b1f7b1ef40c62100},

doi = {10.1016/j.gmod.2025.101285},

issn = {15240703 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Graphical Models},

volume = {141},

abstract = {Automated generation of large-scale 3D scenes presents a significant challenge due to the resource-intensive training and datasets required. This is in sharp contrast to the 2D counterparts that have become readily available due to their superior speed and quality. However, prior work in 3D procedural modeling has demonstrated promise in generating high-quality assets using the combination of algorithms and user-defined rules. To leverage the best of both 2D generative models and procedural modeling tools, we present TerraCraft, a novel framework for generating geometrically high-quality 3D city-scale scenes. By utilizing Large Language Models (LLMs), TerraCraft can generate city-scale 3D scenes from natural text descriptions. With its intuitive operation and powerful capabilities, TerraCraft enables users to easily create geometrically high-quality scenes readily for various applications, such as virtual reality and game design. We validate TerraCraft's effectiveness through extensive experiments and user studies, showing its superior performance compared to existing baselines. © 2025 Elsevier B.V., All rights reserved.},

note = {Publisher: Elsevier Inc.},

keywords = {3D scene generation, 3D scenes, algorithm, Automation, City layout, City scale, data set, Diffusion Model, Game design, Geometry, High quality, Language, Language Model, Large datasets, Large language model, LLMs, Modeling languages, Natural language processing systems, Procedural modeling, Procedural models, Scene Generation, Three dimensional computer graphics, three-dimensional modeling, urban area, Virtual Reality},

pubstate = {published},

tppubtype = {article}

}

2024

Yin, Z.; Wang, Y.; Papatheodorou, T.; Hui, P.

Text2VRScene: Exploring the Framework of Automated Text-driven Generation System for VR Experience Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR, pp. 701–711, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 9798350374025 (ISBN).

Abstract | Links | BibTeX | Tags: Automated systems, Automation, Digital contents, Generation systems, Generative model, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Interaction techniques, Language Model, Natural language processing systems, Text input, User interfaces, Virtual Reality

@inproceedings{yin_text2vrscene_2024,

title = {Text2VRScene: Exploring the Framework of Automated Text-driven Generation System for VR Experience},

author = {Z. Yin and Y. Wang and T. Papatheodorou and P. Hui},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85191431035&doi=10.1109%2FVR58804.2024.00090&partnerID=40&md5=8d04e98b6579e58fb1c6293eac5fa7bc},

doi = {10.1109/VR58804.2024.00090},

isbn = {9798350374025 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR},

pages = {701–711},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {With the recent development of the Virtual Reality (VR) industry, the increasing number of VR users pushes the demand for the massive production of immersive and expressive VR scenes in related industries. However, creating expressive VR scenes involves the reasonable organization of various digital content to express a coherent and logical theme, which is time-consuming and labor-intensive. In recent years, Large Language Models (LLMs) such as ChatGPT 3.5 and generative models such as stable diffusion have emerged as powerful tools for comprehending natural language and generating digital contents such as text, code, images, and 3D objects. In this paper, we have explored how we can generate VR scenes from text by incorporating LLMs and various generative models into an automated system. To achieve this, we first identify the possible limitations of LLMs for an automated system and propose a systematic framework to mitigate them. Subsequently, we developed Text2VRScene, a VR scene generation system, based on our proposed framework with well-designed prompts. To validate the effectiveness of our proposed framework and the designed prompts, we carry out a series of test cases. The results show that the proposed framework contributes to improving the reliability of the system and the quality of the generated VR scenes. The results also illustrate the promising performance of the Text2VRScene in generating satisfying VR scenes with a clear theme regularized by our well-designed prompts. This paper ends with a discussion about the limitations of the current system and the potential of developing similar generation systems based on our framework. © 2024 Elsevier B.V., All rights reserved.},

keywords = {Automated systems, Automation, Digital contents, Generation systems, Generative model, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Interaction techniques, Language Model, Natural language processing systems, Text input, User interfaces, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Martini, M.; Valentini, V.; Ciprian, A.; Bottino, A.; Iacoviello, R.; Montagnuolo, M.; Messina, A.; Strada, F.; Zappia, D.

Semi -Automated Digital Human Production for Enhanced Media Broadcasting Proceedings Article

In: IEEE Gaming, Entertain., Media Conf., GEM, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 9798350374537 (ISBN).

Abstract | Links | BibTeX | Tags: AI automation, Automation, Creation process, Digital humans, Economic and social effects, Extensive explorations, Face reconstruction, Generative AI, Image enhancement, media archive, Media archives, Metaverses, Rendering (computer graphics), Synthetic human, Synthetic Humans, Textures, Three dimensional computer graphics, Virtual production, Virtual Reality

@inproceedings{martini_semi_2024,

title = {Semi -Automated Digital Human Production for Enhanced Media Broadcasting},

author = {M. Martini and V. Valentini and A. Ciprian and A. Bottino and R. Iacoviello and M. Montagnuolo and A. Messina and F. Strada and D. Zappia},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199536742&doi=10.1109%2FGEM61861.2024.10585601&partnerID=40&md5=a7d19507124982fecf34297e01bee45e},

doi = {10.1109/GEM61861.2024.10585601},

isbn = {9798350374537 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {IEEE Gaming, Entertain., Media Conf., GEM},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {In recent years, the application of synthetic humans in various fields has attracted considerable attention, leading to extensive exploration of their integration into the Metaverse and virtual production environments. This work presents a semi-automated approach that aims to find a fair trade-off between high-quality outputs and efficient production times. The project focuses on the Rai photo and video archives to find images of target characters for texturing and 3D reconstruction with the goal of reviving Rai's 2D footage and enhance the media experience. A key aspect of this study is to minimize the human intervention, ensuring an efficient, flexible, and scalable creation process. In this work, the improvements have been distributed among different stages of the digital human creation process, starting with the generation of 3D head meshes from 2D images of the reference character and then moving on to the generation, using a Diffusion model, of suitable images for texture development. These assets are then integrated into the Unreal Engine, where a custom widget facilitates posing, rendering, and texturing of Synthetic Humans models. Finally, an in-depth quantitative comparison and subjective tests were carried out between the original character images and the rendered synthetic humans, confirming the validity of the approach. © 2024 Elsevier B.V., All rights reserved.},

keywords = {AI automation, Automation, Creation process, Digital humans, Economic and social effects, Extensive explorations, Face reconstruction, Generative AI, Image enhancement, media archive, Media archives, Metaverses, Rendering (computer graphics), Synthetic human, Synthetic Humans, Textures, Three dimensional computer graphics, Virtual production, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

2023

Joseph, S.; Priya, B. S. Baghavathi; Poorvaja, R.; Kumaran, M. Santhosh; Shivaraj, S.; Jeyanth, V.; Shivesh, R. P.

IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI Proceedings Article

In: Arya, K. V.; Wada, T. (Ed.): Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI, Institute of Electrical and Electronics Engineers Inc., 2023, ISBN: 9798350305142 (ISBN).

Abstract | Links | BibTeX | Tags: 2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones

@inproceedings{joseph_iot_2023,

title = {IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI},

author = {S. Joseph and B. S. Baghavathi Priya and R. Poorvaja and M. Santhosh Kumaran and S. Shivaraj and V. Jeyanth and R. P. Shivesh},

editor = {K. V. Arya and T. Wada},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189754688&doi=10.1109%2FCVMI59935.2023.10465077&partnerID=40&md5=668a934a8558e5855fa176a5a25b037f},

doi = {10.1109/CVMI59935.2023.10465077},

isbn = {9798350305142 (ISBN)},

year  = {2023},

date = {2023-01-01},

booktitle = {Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {In anticipation of the widespread adoption of augmented reality in the future, this paper introduces an advanced mobile application that seamlessly integrates AR and IoT technologies. The application aims to make these cutting-edge technologies more affordable and accessible to users while highlighting their immense benefits in assisting with household appliance control, as well as providing interactive and educational experiences. The app employs advanced algorithms such as object detection, Natural Language Processing (NLP), and Optical Character Recognition (OCR) to scan the smartphone's camera feed. Upon identification, AR controls for appliances, their power consumption, and electric bill tracking are displayed. Additionally, the application makes use of APIs to access the internet, retrieving relevant 3D generative models, 360-degree videos, 2D images, and textual information based on user interactions with detected objects. Users can effortlessly explore and interact with the 3D generative models using intuitive hand gestures, providing an immersive experience without the need for additional hardware or dedicated VR headsets. Beyond home automation, the app offers valuable educational benefits, serving as a unique learning tool for students to gain hands-on experience. Medical practitioners can quickly reference organ anatomy and utilize its feature-rich functionalities. Its cost-effectiveness, requiring only installation, ensures accessibility to a wide audience. The app's functionality is both intuitive and efficient, detecting objects in the camera feed and prompting user interactions. Users can select objects through simple hand gestures, choosing desired content like 3D generative models, 2D images, textual information, 360-degree videos, or shopping-related details. The app then retrieves and overlays the requested information onto the real-world view in AR. In conclusion, this groundbreaking AR and IoT -powered app revolutionizes home automation and learning experiences, leveraging only a smartphone's camera, without the need for additional hardware or expensive installations. Its potential applications extend to education, industries, and health care, making it a versatile and valuable tool for a broad range of users. © 2024 Elsevier B.V., All rights reserved.},

keywords = {2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones},

pubstate = {published},

tppubtype = {inproceedings}

}

In anticipation of the widespread adoption of augmented reality in the future, this paper introduces an advanced mobile application that seamlessly integrates AR and IoT technologies. The application aims to make these cutting-edge technologies more affordable and accessible to users while highlighting their immense benefits in assisting with household appliance control, as well as providing interactive and educational experiences. The app employs advanced algorithms such as object detection, Natural Language Processing (NLP), and Optical Character Recognition (OCR) to scan the smartphone's camera feed. Upon identification, AR controls for appliances, their power consumption, and electric bill tracking are displayed. Additionally, the application makes use of APIs to access the internet, retrieving relevant 3D generative models, 360-degree videos, 2D images, and textual information based on user interactions with detected objects. Users can effortlessly explore and interact with the 3D generative models using intuitive hand gestures, providing an immersive experience without the need for additional hardware or dedicated VR headsets. Beyond home automation, the app offers valuable educational benefits, serving as a unique learning tool for students to gain hands-on experience. Medical practitioners can quickly reference organ anatomy and utilize its feature-rich functionalities. Its cost-effectiveness, requiring only installation, ensures accessibility to a wide audience. The app's functionality is both intuitive and efficient, detecting objects in the camera feed and prompting user interactions. Users can select objects through simple hand gestures, choosing desired content like 3D generative models, 2D images, textual information, 360-degree videos, or shopping-related details. The app then retrieves and overlays the requested information onto the real-world view in AR. In conclusion, this groundbreaking AR and IoT -powered app revolutionizes home automation and learning experiences, leveraging only a smartphone's camera, without the need for additional hardware or expensive installations. Its potential applications extend to education, industries, and health care, making it a versatile and valuable tool for a broad range of users. © 2024 Elsevier B.V., All rights reserved.