AHCI RESEARCH GROUP
Publications
Papers published in international journals,
proceedings of conferences, workshops and books.
OUR RESEARCH
Scientific Publications
How to
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.
2025
Behravan, M.; Haghani, M.; Gračanin, D.
Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality Proceedings Article
In: J.Y.C., Chen; G., Fragomeni (Ed.): Lect. Notes Comput. Sci., pp. 13–32, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193699-9 (ISBN).
Abstract | Links | BibTeX | Tags: 3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering
@inproceedings{behravan_transcending_2025,
title = {Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality},
author = {M. Behravan and M. Haghani and D. Gračanin},
editor = {Chen J.Y.C. and Fragomeni G.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007690904&doi=10.1007%2f978-3-031-93700-2_2&partnerID=40&md5=1c4d643aad88d08cbbc9dd2c02413f10},
doi = {10.1007/978-3-031-93700-2_2},
isbn = {03029743 (ISSN); 978-303193699-9 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Lect. Notes Comput. Sci.},
volume = {15788 LNCS},
pages = {13–32},
publisher = {Springer Science and Business Media Deutschland GmbH},
abstract = {Traditional 3D modeling requires technical expertise, specialized software, and time-intensive processes, making it inaccessible for many users. Our research aims to lower these barriers by combining generative AI and augmented reality (AR) into a cohesive system that allows users to easily generate, manipulate, and interact with 3D models in real time, directly within AR environments. Utilizing cutting-edge AI models like Shap-E, we address the complex challenges of transforming 2D images into 3D representations in AR environments. Key challenges such as object isolation, handling intricate backgrounds, and achieving seamless user interaction are tackled through advanced object detection methods, such as Mask R-CNN. Evaluation results from 35 participants reveal an overall System Usability Scale (SUS) score of 69.64, with participants who engaged with AR/VR technologies more frequently rating the system significantly higher, at 80.71. This research is particularly relevant for applications in gaming, education, and AR-based e-commerce, offering intuitive, model creation for users without specialized skills. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},
keywords = {3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering},
pubstate = {published},
tppubtype = {inproceedings}
}
Shi, L.; Gu, Y.; Zheng, Y.; Kameda, S.; Lu, H.
LWD-IUM: A Lightweight Detector for Advancing Robotic Grasp in VR-Based Industrial and Underwater Metaverse Proceedings Article
In: pp. 1384–1391, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 9798331508876 (ISBN).
Abstract | Links | BibTeX | Tags: 3D object, 3D object detection, Deep learning, generative artificial intelligence, Grasping and manipulation, Intelligent robots, Learning systems, Metaverses, Neural Networks, Object Detection, Object recognition, Objects detection, Real- time, Real-time, Robotic grasping, robotic grasping and manipulation, Robotic manipulation, Virtual Reality, Vision transformer, Visual servoing
@inproceedings{shi_lwd-ium_2025,
title = {LWD-IUM: A Lightweight Detector for Advancing Robotic Grasp in VR-Based Industrial and Underwater Metaverse},
author = {L. Shi and Y. Gu and Y. Zheng and S. Kameda and H. Lu},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105011354353&doi=10.1109%2FIWCMC65282.2025.11059637&partnerID=40&md5=77aa4cdb0a08a1db5d0027a71403da89},
doi = {10.1109/IWCMC65282.2025.11059637},
isbn = {9798331508876 (ISBN)},
year = {2025},
date = {2025-01-01},
pages = {1384–1391},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {In the burgeoning field of virtual reality (VR) metaverse, the sophistication of interactions between robotic agents and their environment has become a critical concern. In this work, we present LWD-IUM, a novel light-weight detector designed to enhance robotic grasp capabilities in the VR metaverse. LWD-IUM applies deep learning techniques to discern and navigate the complex VR metaverse environment, aiding robotic agents in the identification and grasping of objects with high precision and efficiency. The algorithm is constructed with an advanced lightweight neural network structure based on self-attention mechanism that ensures optimal balance between computational cost and performance, making it highly suitable for real-time applications in VR. Evaluation on the KITTI 3D dataset demonstrated real-time detection capabilities (24-30 fps) of LWD-IUM, with its mean average precision (mAP) remaining 80% above standard 3D detectors, even with a 50% parameter reduction. In addition, we show that LWD-IUM outperforms existing models for object detection and grasping tasks through the real environment testing on a Baxter dual-arm collaborative robot. By pioneering advancements in robotic grasp in the VR metaverse, LWD-IUM promotes more immersive and realistic interactions, pushing the boundaries of what's possible in virtual experiences. © 2025 Elsevier B.V., All rights reserved.},
keywords = {3D object, 3D object detection, Deep learning, generative artificial intelligence, Grasping and manipulation, Intelligent robots, Learning systems, Metaverses, Neural Networks, Object Detection, Object recognition, Objects detection, Real- time, Real-time, Robotic grasping, robotic grasping and manipulation, Robotic manipulation, Virtual Reality, Vision transformer, Visual servoing},
pubstate = {published},
tppubtype = {inproceedings}
}
B, C. E. Pardo; R, O. I. Iglesias; A, M. D. León; M., C. G. Quintero
EverydAI: Virtual Assistant for Decision-Making in Daily Contexts, Powered by Artificial Intelligence Journal Article
In: Systems, vol. 13, no. 9, 2025, ISSN: 20798954 (ISSN), (Publisher: Multidisciplinary Digital Publishing Institute (MDPI)).
Abstract | Links | BibTeX | Tags: Artificial intelligence, Augmented Reality, Behavioral Research, Decision making, Decisions makings, Digital avatar, Digital avatars, Information overloads, Informed decision, Interactive computer graphics, Language Model, Large language model, large language models, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, recommendation systems, Recommender systems, Three dimensional computer graphics, Virtual assistants, Virtual Reality, web scraping, Web scrapings
@article{pardo_b_everydai_2025,
title = {EverydAI: Virtual Assistant for Decision-Making in Daily Contexts, Powered by Artificial Intelligence},
author = {C. E. Pardo B and O. I. Iglesias R and M. D. León A and C. G. Quintero M.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105017115803&doi=10.3390%2Fsystems13090753&partnerID=40&md5=475327fffcdc43ee3466b4a65111866a},
doi = {10.3390/systems13090753},
issn = {20798954 (ISSN)},
year = {2025},
date = {2025-01-01},
journal = {Systems},
volume = {13},
number = {9},
abstract = {In an era of information overload, artificial intelligence plays a pivotal role in supporting everyday decision-making. This paper introduces EverydAI, a virtual AI-powered assistant designed to help users make informed decisions across various daily domains such as cooking, fashion, and fitness. By integrating advanced natural language processing, object detection, augmented reality, contextual understanding, digital 3D avatar models, web scraping, and image generation, EverydAI delivers personalized recommendations and insights tailored to individual needs. The proposed framework addresses challenges related to decision fatigue and information overload by combining real-time object detection and web scraping to enhance the relevance and reliability of its suggestions. EverydAI is evaluated through a two-phase survey, each one involving 30 participants with diverse demographic backgrounds. Results indicate that on average, 92.7% of users agreed or strongly agreed with statements reflecting the system’s usefulness, ease of use, and overall performance, indicating a high level of acceptance and perceived effectiveness. Additionally, EverydAI received an average user satisfaction score of 4.53 out of 5, underscoring its effectiveness in supporting users’ daily routines. © 2025 Elsevier B.V., All rights reserved.},
note = {Publisher: Multidisciplinary Digital Publishing Institute (MDPI)},
keywords = {Artificial intelligence, Augmented Reality, Behavioral Research, Decision making, Decisions makings, Digital avatar, Digital avatars, Information overloads, Informed decision, Interactive computer graphics, Language Model, Large language model, large language models, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, recommendation systems, Recommender systems, Three dimensional computer graphics, Virtual assistants, Virtual Reality, web scraping, Web scrapings},
pubstate = {published},
tppubtype = {article}
}
2024
Qin, X.; Weaver, G.
Utilizing Generative AI for VR Exploration Testing: A Case Study Proceedings Article
In: Proc. - ACM/IEEE Int. Conf. Autom. Softw. Eng. Workshops, ASEW, pp. 228–232, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 9798400712494 (ISBN).
Abstract | Links | BibTeX | Tags: Ability testing, Accuracy rate, Case Study, Case-studies, Entity selections, Field of views, Generative adversarial networks, GUI Exploration Testing, GUI testing, Localisation, Long term memory, Mixed data, Object identification, Object recognition, Virtual environments, Virtual Reality
@inproceedings{qin_utilizing_2024,
title = {Utilizing Generative AI for VR Exploration Testing: A Case Study},
author = {X. Qin and G. Weaver},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85213332710&doi=10.1145%2F3691621.3694955&partnerID=40&md5=4a05244d58e7be8e1c8adc52b5ffab4a},
doi = {10.1145/3691621.3694955},
isbn = {9798400712494 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {Proc. - ACM/IEEE Int. Conf. Autom. Softw. Eng. Workshops, ASEW},
pages = {228–232},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {As the virtual reality (VR) industry expands, the need for automated GUI testing for applications is growing rapidly. With its long-term memory and ability to process mixed data, including images and text, Generative AI (GenAI) shows the potential to understand complex user interfaces. In this paper, we conduct a case study to investigate the potential of using GenAI for field of view (FOV) analysis in VR exploration testing. Specifically, we examine how the model can assist in test entity selection and test action suggestions. Our experiments demonstrate that while GPT-4o achieves a 63% accuracy rate in object identification within an arbitrary FOV, it struggles with object organization and localization. We also identify critical contexts that can improve the accuracy of suggested actions across multiple FOVs. Finally, we discuss the limitations found during the experiment and offer insights into future research directions. © 2024 Elsevier B.V., All rights reserved.},
keywords = {Ability testing, Accuracy rate, Case Study, Case-studies, Entity selections, Field of views, Generative adversarial networks, GUI Exploration Testing, GUI testing, Localisation, Long term memory, Mixed data, Object identification, Object recognition, Virtual environments, Virtual Reality},
pubstate = {published},
tppubtype = {inproceedings}
}
2023
Joseph, S.; Priya, B. S. Baghavathi; Poorvaja, R.; Kumaran, M. Santhosh; Shivaraj, S.; Jeyanth, V.; Shivesh, R. P.
IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI Proceedings Article
In: Arya, K. V.; Wada, T. (Ed.): Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI, Institute of Electrical and Electronics Engineers Inc., 2023, ISBN: 9798350305142 (ISBN).
Abstract | Links | BibTeX | Tags: 2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones
@inproceedings{joseph_iot_2023,
title = {IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI},
author = {S. Joseph and B. S. Baghavathi Priya and R. Poorvaja and M. Santhosh Kumaran and S. Shivaraj and V. Jeyanth and R. P. Shivesh},
editor = {K. V. Arya and T. Wada},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189754688&doi=10.1109%2FCVMI59935.2023.10465077&partnerID=40&md5=668a934a8558e5855fa176a5a25b037f},
doi = {10.1109/CVMI59935.2023.10465077},
isbn = {9798350305142 (ISBN)},
year = {2023},
date = {2023-01-01},
booktitle = {Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {In anticipation of the widespread adoption of augmented reality in the future, this paper introduces an advanced mobile application that seamlessly integrates AR and IoT technologies. The application aims to make these cutting-edge technologies more affordable and accessible to users while highlighting their immense benefits in assisting with household appliance control, as well as providing interactive and educational experiences. The app employs advanced algorithms such as object detection, Natural Language Processing (NLP), and Optical Character Recognition (OCR) to scan the smartphone's camera feed. Upon identification, AR controls for appliances, their power consumption, and electric bill tracking are displayed. Additionally, the application makes use of APIs to access the internet, retrieving relevant 3D generative models, 360-degree videos, 2D images, and textual information based on user interactions with detected objects. Users can effortlessly explore and interact with the 3D generative models using intuitive hand gestures, providing an immersive experience without the need for additional hardware or dedicated VR headsets. Beyond home automation, the app offers valuable educational benefits, serving as a unique learning tool for students to gain hands-on experience. Medical practitioners can quickly reference organ anatomy and utilize its feature-rich functionalities. Its cost-effectiveness, requiring only installation, ensures accessibility to a wide audience. The app's functionality is both intuitive and efficient, detecting objects in the camera feed and prompting user interactions. Users can select objects through simple hand gestures, choosing desired content like 3D generative models, 2D images, textual information, 360-degree videos, or shopping-related details. The app then retrieves and overlays the requested information onto the real-world view in AR. In conclusion, this groundbreaking AR and IoT -powered app revolutionizes home automation and learning experiences, leveraging only a smartphone's camera, without the need for additional hardware or expensive installations. Its potential applications extend to education, industries, and health care, making it a versatile and valuable tool for a broad range of users. © 2024 Elsevier B.V., All rights reserved.},
keywords = {2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones},
pubstate = {published},
tppubtype = {inproceedings}
}