AHCI RESEARCH GROUP
Publications
Papers published in international journals,
proceedings of conferences, workshops and books.
OUR RESEARCH
Scientific Publications
How to
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.
2025
Behravan, M.; Matković, K.; Gračanin, D.
Generative AI for Context-Aware 3D Object Creation Using Vision-Language Models in Augmented Reality Proceedings Article
In: Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR, pp. 73–81, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152157-8 (ISBN).
Abstract | Links | BibTeX | Tags: 3D object, 3D Object Generation, Artificial intelligence systems, Augmented Reality, Capture images, Context-Aware, Generative adversarial networks, Generative AI, generative artificial intelligence, Generative model, Language Model, Object creation, Vision language model, vision language models, Visual languages
@inproceedings{behravan_generative_2025,
title = {Generative AI for Context-Aware 3D Object Creation Using Vision-Language Models in Augmented Reality},
author = {M. Behravan and K. Matković and D. Gračanin},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000292700&doi=10.1109%2fAIxVR63409.2025.00018&partnerID=40&md5=b40fa769a6b427918c3fcd86f7c52a75},
doi = {10.1109/AIxVR63409.2025.00018},
isbn = {979-833152157-8 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR},
pages = {73–81},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {We present a novel Artificial Intelligence (AI) system that functions as a designer assistant in augmented reality (AR) environments. Leveraging Vision Language Models (VLMs) like LLaVA and advanced text-to-3D generative models, users can capture images of their surroundings with an Augmented Reality (AR) headset. The system analyzes these images to recommend contextually relevant objects that enhance both functionality and visual appeal. The recommended objects are generated as 3D models and seamlessly integrated into the AR environment for interactive use. Our system utilizes open-source AI models running on local systems to enhance data security and reduce operational costs. Key features include context-aware object suggestions, optimal placement guidance, aesthetic matching, and an intuitive user interface for real-time interaction. Evaluations using the COCO 2017 dataset and real-world AR testing demonstrated high accuracy in object detection and contextual fit rating of 4.1 out of 5. By addressing the challenge of providing context-aware object recommendations in AR, our system expands the capabilities of AI applications in this domain. It enables users to create personalized digital spaces efficiently, leveraging AI for contextually relevant suggestions. © 2025 IEEE.},
keywords = {3D object, 3D Object Generation, Artificial intelligence systems, Augmented Reality, Capture images, Context-Aware, Generative adversarial networks, Generative AI, generative artificial intelligence, Generative model, Language Model, Object creation, Vision language model, vision language models, Visual languages},
pubstate = {published},
tppubtype = {inproceedings}
}
2024
Xu, F.; Nguyen, T.; Du, J.
Augmented Reality for Maintenance Tasks with ChatGPT for Automated Text-To-Action Journal Article
In: Journal of Construction Engineering and Management, vol. 150, no. 4, 2024, ISSN: 07339364 (ISSN).
Abstract | Links | BibTeX | Tags: Artificial intelligence systems, Augmented Reality, Augmented Reality (AR), ChatGPT, Complex sequences, Computational Linguistics, Diverse fields, Human like, Language Model, Maintenance, Maintenance tasks, Operations and maintenance, Optical character recognition, Sensor technologies, Virtual Reality
@article{xu_augmented_2024,
title = {Augmented Reality for Maintenance Tasks with ChatGPT for Automated Text-To-Action},
author = {F. Xu and T. Nguyen and J. Du},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85183669638&doi=10.1061%2fJCEMD4.COENG-14142&partnerID=40&md5=6b02d2f4f6e74a8152adf2eb30ee2d88},
doi = {10.1061/JCEMD4.COENG-14142},
issn = {07339364 (ISSN)},
year = {2024},
date = {2024-01-01},
journal = {Journal of Construction Engineering and Management},
volume = {150},
number = {4},
abstract = {Advancements in sensor technology, artificial intelligence (AI), and augmented reality (AR) have unlocked opportunities across various domains. AR and large language models like GPT have witnessed substantial progress and increasingly are being employed in diverse fields. One such promising application is in operations and maintenance (OM). OM tasks often involve complex procedures and sequences that can be challenging to memorize and execute correctly, particularly for novices or in high-stress situations. By combining the advantages of superimposing virtual objects onto the physical world and generating human-like text using GPT, we can revolutionize OM operations. This study introduces a system that combines AR, optical character recognition (OCR), and the GPT language model to optimize user performance while offering trustworthy interactions and alleviating workload in OM tasks. This system provides an interactive virtual environment controlled by the Unity game engine, facilitating a seamless interaction between virtual and physical realities. A case study (N=30) was conducted to illustrate the findings and answer the research questions. The Multidimensional Measurement of Trust (MDMT) was applied to understand the complexity of trust engagement with such a human-like system. The results indicate that users can complete similarly challenging tasks in less time using our proposed AR and AI system. Moreover, the collected data also suggest a reduction in cognitive load when executing the same operations using the AR and AI system. A divergence of trust was observed concerning capability and ethical dimensions. © 2024 American Society of Civil Engineers.},
keywords = {Artificial intelligence systems, Augmented Reality, Augmented Reality (AR), ChatGPT, Complex sequences, Computational Linguistics, Diverse fields, Human like, Language Model, Maintenance, Maintenance tasks, Operations and maintenance, Optical character recognition, Sensor technologies, Virtual Reality},
pubstate = {published},
tppubtype = {article}
}