AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Behravan, M.; Haghani, M.; Gračanin, D.

Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality Proceedings Article

In: J.Y.C., Chen; G., Fragomeni (Ed.): Lect. Notes Comput. Sci., pp. 13–32, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193699-9 (ISBN).

Abstract | Links | BibTeX | Tags: 3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering

@inproceedings{behravan_transcending_2025,

title = {Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality},

author = {M. Behravan and M. Haghani and D. Gračanin},

editor = {Chen J.Y.C. and Fragomeni G.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007690904&doi=10.1007%2f978-3-031-93700-2_2&partnerID=40&md5=1c4d643aad88d08cbbc9dd2c02413f10},

doi = {10.1007/978-3-031-93700-2_2},

isbn = {03029743 (ISSN); 978-303193699-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15788 LNCS},

pages = {13–32},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Traditional 3D modeling requires technical expertise, specialized software, and time-intensive processes, making it inaccessible for many users. Our research aims to lower these barriers by combining generative AI and augmented reality (AR) into a cohesive system that allows users to easily generate, manipulate, and interact with 3D models in real time, directly within AR environments. Utilizing cutting-edge AI models like Shap-E, we address the complex challenges of transforming 2D images into 3D representations in AR environments. Key challenges such as object isolation, handling intricate backgrounds, and achieving seamless user interaction are tackled through advanced object detection methods, such as Mask R-CNN. Evaluation results from 35 participants reveal an overall System Usability Scale (SUS) score of 69.64, with participants who engaged with AR/VR technologies more frequently rating the system significantly higher, at 80.71. This research is particularly relevant for applications in gaming, education, and AR-based e-commerce, offering intuitive, model creation for users without specialized skills. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},

keywords = {3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering},

pubstate = {published},

tppubtype = {inproceedings}

}

Shi, L.; Gu, Y.; Zheng, Y.; Kameda, S.; Lu, H.

LWD-IUM: A Lightweight Detector for Advancing Robotic Grasp in VR-Based Industrial and Underwater Metaverse Proceedings Article

In: pp. 1384–1391, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 9798331508876 (ISBN).

Abstract | Links | BibTeX | Tags: 3D object, 3D object detection, Deep learning, generative artificial intelligence, Grasping and manipulation, Intelligent robots, Learning systems, Metaverses, Neural Networks, Object Detection, Object recognition, Objects detection, Real- time, Real-time, Robotic grasping, robotic grasping and manipulation, Robotic manipulation, Virtual Reality, Vision transformer, Visual servoing

@inproceedings{shi_lwd-ium_2025,

title = {LWD-IUM: A Lightweight Detector for Advancing Robotic Grasp in VR-Based Industrial and Underwater Metaverse},

author = {L. Shi and Y. Gu and Y. Zheng and S. Kameda and H. Lu},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105011354353&doi=10.1109%2FIWCMC65282.2025.11059637&partnerID=40&md5=77aa4cdb0a08a1db5d0027a71403da89},

doi = {10.1109/IWCMC65282.2025.11059637},

isbn = {9798331508876 (ISBN)},

year  = {2025},

date = {2025-01-01},

pages = {1384–1391},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {In the burgeoning field of virtual reality (VR) metaverse, the sophistication of interactions between robotic agents and their environment has become a critical concern. In this work, we present LWD-IUM, a novel light-weight detector designed to enhance robotic grasp capabilities in the VR metaverse. LWD-IUM applies deep learning techniques to discern and navigate the complex VR metaverse environment, aiding robotic agents in the identification and grasping of objects with high precision and efficiency. The algorithm is constructed with an advanced lightweight neural network structure based on self-attention mechanism that ensures optimal balance between computational cost and performance, making it highly suitable for real-time applications in VR. Evaluation on the KITTI 3D dataset demonstrated real-time detection capabilities (24-30 fps) of LWD-IUM, with its mean average precision (mAP) remaining 80% above standard 3D detectors, even with a 50% parameter reduction. In addition, we show that LWD-IUM outperforms existing models for object detection and grasping tasks through the real environment testing on a Baxter dual-arm collaborative robot. By pioneering advancements in robotic grasp in the VR metaverse, LWD-IUM promotes more immersive and realistic interactions, pushing the boundaries of what's possible in virtual experiences. © 2025 Elsevier B.V., All rights reserved.},

keywords = {3D object, 3D object detection, Deep learning, generative artificial intelligence, Grasping and manipulation, Intelligent robots, Learning systems, Metaverses, Neural Networks, Object Detection, Object recognition, Objects detection, Real- time, Real-time, Robotic grasping, robotic grasping and manipulation, Robotic manipulation, Virtual Reality, Vision transformer, Visual servoing},

pubstate = {published},

tppubtype = {inproceedings}

}

Sinha, Y.; Shanmugam, S.; Sahu, Y. K.; Mukhopadhyay, A.; Biswas, P.

Diffuse Your Data Blues: Augmenting Low-Resource Datasets via User-Assisted Diffusion Proceedings Article

In: Int Conf Intell User Interfaces Proc IUI, pp. 538–552, Association for Computing Machinery, 2025, ISBN: 9798400713064 (ISBN).

Abstract | Links | BibTeX | Tags: Data gathering, Detection models, Diffusion Model, diffusion models, Efficient Augmentation, Image Composition, Industrial context, Mixed reality, Object Detection, Objects detection, Synthetic Dataset, Synthetic datasets, Training objects

@inproceedings{sinha_diffuse_2025,

title = {Diffuse Your Data Blues: Augmenting Low-Resource Datasets via User-Assisted Diffusion},

author = {Y. Sinha and S. Shanmugam and Y. K. Sahu and A. Mukhopadhyay and P. Biswas},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001924293&doi=10.1145%2F3708359.3712163&partnerID=40&md5=8de7dcc94f2b2ad9e8d6de168ba05840},

doi = {10.1145/3708359.3712163},

isbn = {9798400713064 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Int Conf Intell User Interfaces Proc IUI},

pages = {538–552},

publisher = {Association for Computing Machinery},

abstract = {Mixed reality applications in industrial contexts necessitate extensive and varied datasets for training object detection models, yet actual data gathering may be obstructed by logistical or cost issues. This study investigates the implementation of generative AI methods to work on this issue for mixed reality applications, with an emphasis on assembly and disassembly tasks. The novel objects found in industrial settings are difficult to describe using words, making text-based models less effective. In this study, a diffusion model is used to generate images by combining novel objects with various backgrounds. The backgrounds are selected where object detection in specific applications has been ineffective. This approach efficiently produces a diverse range of training samples. We compare three approaches: traditional augmentation methods, GAN-based augmentation, and Diffusion-based augmentation. Results show that the diffusion model significantly improved detection metrics. For instance, applying diffusion models to the dataset containing mechanical components of a pneumatic cylinder raised the F1 Score from 69.77 to 84.21 and the mAP@50 from 76.48 to 88.77, resulting in an increase in object detection performance, with a 67% less dataset size compared to the traditional augmented dataset. The proposed image composition diffusion model and user-friendly interface further simplify dataset enrichment, proving effective for augmenting data and improving the robustness of detection models. © 2025 Elsevier B.V., All rights reserved.},

keywords = {Data gathering, Detection models, Diffusion Model, diffusion models, Efficient Augmentation, Image Composition, Industrial context, Mixed reality, Object Detection, Objects detection, Synthetic Dataset, Synthetic datasets, Training objects},

pubstate = {published},

tppubtype = {inproceedings}

}

B, C. E. Pardo; R, O. I. Iglesias; A, M. D. León; M., C. G. Quintero

EverydAI: Virtual Assistant for Decision-Making in Daily Contexts, Powered by Artificial Intelligence Journal Article

In: Systems, vol. 13, no. 9, 2025, ISSN: 20798954 (ISSN), (Publisher: Multidisciplinary Digital Publishing Institute (MDPI)).

Abstract | Links | BibTeX | Tags: Artificial intelligence, Augmented Reality, Behavioral Research, Decision making, Decisions makings, Digital avatar, Digital avatars, Information overloads, Informed decision, Interactive computer graphics, Language Model, Large language model, large language models, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, recommendation systems, Recommender systems, Three dimensional computer graphics, Virtual assistants, Virtual Reality, web scraping, Web scrapings

@article{pardo_b_everydai_2025,

title = {EverydAI: Virtual Assistant for Decision-Making in Daily Contexts, Powered by Artificial Intelligence},

author = {C. E. Pardo B and O. I. Iglesias R and M. D. León A and C. G. Quintero M.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105017115803&doi=10.3390%2Fsystems13090753&partnerID=40&md5=475327fffcdc43ee3466b4a65111866a},

doi = {10.3390/systems13090753},

issn = {20798954 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Systems},

volume = {13},

number = {9},

abstract = {In an era of information overload, artificial intelligence plays a pivotal role in supporting everyday decision-making. This paper introduces EverydAI, a virtual AI-powered assistant designed to help users make informed decisions across various daily domains such as cooking, fashion, and fitness. By integrating advanced natural language processing, object detection, augmented reality, contextual understanding, digital 3D avatar models, web scraping, and image generation, EverydAI delivers personalized recommendations and insights tailored to individual needs. The proposed framework addresses challenges related to decision fatigue and information overload by combining real-time object detection and web scraping to enhance the relevance and reliability of its suggestions. EverydAI is evaluated through a two-phase survey, each one involving 30 participants with diverse demographic backgrounds. Results indicate that on average, 92.7% of users agreed or strongly agreed with statements reflecting the system’s usefulness, ease of use, and overall performance, indicating a high level of acceptance and perceived effectiveness. Additionally, EverydAI received an average user satisfaction score of 4.53 out of 5, underscoring its effectiveness in supporting users’ daily routines. © 2025 Elsevier B.V., All rights reserved.},

note = {Publisher: Multidisciplinary Digital Publishing Institute (MDPI)},

keywords = {Artificial intelligence, Augmented Reality, Behavioral Research, Decision making, Decisions makings, Digital avatar, Digital avatars, Information overloads, Informed decision, Interactive computer graphics, Language Model, Large language model, large language models, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, recommendation systems, Recommender systems, Three dimensional computer graphics, Virtual assistants, Virtual Reality, web scraping, Web scrapings},

pubstate = {published},

tppubtype = {article}

}

2024

Liang, Q.; Chen, Y.; Li, W.; Lai, M.; Ni, W.; Qiu, H.

iKnowiSee: AR Glasses with Language Learning Translation System and Identity Recognition System Built Based on Large Pre-trained Models of Language and Vision and Internet of Things Technology Proceedings Article

In: L., Zhang; W., Yu; Q., Wang; Y., Laili; Y., Liu (Ed.): Commun. Comput. Info. Sci., pp. 12–24, Springer Science and Business Media Deutschland GmbH, 2024, ISBN: 18650929 (ISSN); 978-981973947-9 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented Reality, Glass, Identity recognition, Internet of Things, Internet of things technologies, IoT, Language learning, Learning systems, LLM, Object Detection, Objects detection, Open Vocabulary Object Detection, Recognition systems, Semantics, Telephone sets, Translation (languages), Translation systems, Visual languages, Wearable computers, Wearable device, Wearable devices

@inproceedings{liang_iknowisee_2024,

title = {iKnowiSee: AR Glasses with Language Learning Translation System and Identity Recognition System Built Based on Large Pre-trained Models of Language and Vision and Internet of Things Technology},

author = {Q. Liang and Y. Chen and W. Li and M. Lai and W. Ni and H. Qiu},

editor = {Zhang L. and Yu W. and Wang Q. and Laili Y. and Liu Y.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200663840&doi=10.1007%2f978-981-97-3948-6_2&partnerID=40&md5=a0324ba6108674b1d39a338574269d60},

doi = {10.1007/978-981-97-3948-6_2},

isbn = {18650929 (ISSN); 978-981973947-9 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Commun. Comput. Info. Sci.},

volume = {2139 CCIS},

pages = {12–24},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {AR glasses used in daily life have made good progress and have some practical value.However, the current design concept of AR glasses is basically to simply port the content of a cell phone and act as a secondary screen for the phone. In contrast, the AR glasses we designed are based on actual situations, focus on real-world interactions, and utilize IoT technology with the aim of enabling users to fully extract and utilize the digital information in their lives. We have created two innovative features, one is a language learning translation system for users to learn foreign languages, which integrates a large language model with an open vocabulary recognition model to fully extract the visual semantic information of the scene; and the other is a social conferencing system, which utilizes the IoT cloud, pipe, edge, and end development to reduce the cost of communication and improve the efficiency of exchanges in social situations. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.},

keywords = {Augmented Reality, Glass, Identity recognition, Internet of Things, Internet of things technologies, IoT, Language learning, Learning systems, LLM, Object Detection, Objects detection, Open Vocabulary Object Detection, Recognition systems, Semantics, Telephone sets, Translation (languages), Translation systems, Visual languages, Wearable computers, Wearable device, Wearable devices},

pubstate = {published},

tppubtype = {inproceedings}

}

Du, B.; Du, H.; Liu, H.; Niyato, D.; Xin, P.; Yu, J.; Qi, M.; Tang, Y.

YOLO-Based Semantic Communication with Generative AI-Aided Resource Allocation for Digital Twins Construction Journal Article

In: IEEE Internet of Things Journal, vol. 11, no. 5, pp. 7664–7678, 2024, ISSN: 23274662 (ISSN); 9781728176055 (ISBN), (Publisher: Institute of Electrical and Electronics Engineers Inc.).

Abstract | Links | BibTeX | Tags: Cost reduction, Data transfer, Digital Twins, Edge detection, Image edge detection, Network layers, Object Detection, Object detectors, Objects detection, Physical world, Resource allocation, Resource management, Resources allocation, Semantic communication, Semantics, Semantics Information, Virtual Reality, Virtual worlds, Wireless communications

@article{du_yolo-based_2024,

title = {YOLO-Based Semantic Communication with Generative AI-Aided Resource Allocation for Digital Twins Construction},

author = {B. Du and H. Du and H. Liu and D. Niyato and P. Xin and J. Yu and M. Qi and Y. Tang},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85173060990&doi=10.1109%2FJIOT.2023.3317629&partnerID=40&md5=f43d68afa033607054ea56a6080d8b3d},

doi = {10.1109/JIOT.2023.3317629},

issn = {23274662 (ISSN); 9781728176055 (ISBN)},

year  = {2024},

date = {2024-01-01},

journal = {IEEE Internet of Things Journal},

volume = {11},

number = {5},

pages = {7664–7678},

abstract = {Digital Twins play a crucial role in bridging the physical and virtual worlds. Given the dynamic and evolving characteristics of the physical world, a huge volume of data transmission and exchange is necessary to attain synchronized updates in the virtual world. In this article, we propose a semantic communication framework based on you only look once (YOLO) to construct a virtual apple orchard with the aim of mitigating the costs associated with data transmission. Specifically, we first employ the YOLOv7-X object detector to extract semantic information from captured images of edge devices, thereby reducing the volume of transmitted data and saving transmission costs. Afterwards, we quantify the importance of each semantic information by the confidence generated through the object detector. Based on this, we propose two resource allocation schemes, i.e., the confidence-based scheme and the acrlong AI-generated scheme, aimed at enhancing the transmission quality of important semantic information. The proposed diffusion model generates an optimal allocation scheme that outperforms both the average allocation scheme and the confidence-based allocation scheme. Moreover, to obtain semantic information more effectively, we enhance the detection capability of the YOLOv7-X object detector by introducing new efficient layer aggregation network-horNet (ELAN-H) and SimAM attention modules, while reducing the model parameters and computational complexity, making it easier to run on edge devices with limited performance. The numerical results indicate that our proposed semantic communication framework and resource allocation schemes significantly reduce transmission costs while enhancing the transmission quality of important information in communication services. © 2024 Elsevier B.V., All rights reserved.},

note = {Publisher: Institute of Electrical and Electronics Engineers Inc.},

keywords = {Cost reduction, Data transfer, Digital Twins, Edge detection, Image edge detection, Network layers, Object Detection, Object detectors, Objects detection, Physical world, Resource allocation, Resource management, Resources allocation, Semantic communication, Semantics, Semantics Information, Virtual Reality, Virtual worlds, Wireless communications},

pubstate = {published},

tppubtype = {article}

}

2023

Joseph, S.; Priya, B. S. Baghavathi; Poorvaja, R.; Kumaran, M. Santhosh; Shivaraj, S.; Jeyanth, V.; Shivesh, R. P.

IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI Proceedings Article

In: Arya, K. V.; Wada, T. (Ed.): Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI, Institute of Electrical and Electronics Engineers Inc., 2023, ISBN: 9798350305142 (ISBN).

Abstract | Links | BibTeX | Tags: 2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones

@inproceedings{joseph_iot_2023,

title = {IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI},

author = {S. Joseph and B. S. Baghavathi Priya and R. Poorvaja and M. Santhosh Kumaran and S. Shivaraj and V. Jeyanth and R. P. Shivesh},

editor = {K. V. Arya and T. Wada},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189754688&doi=10.1109%2FCVMI59935.2023.10465077&partnerID=40&md5=668a934a8558e5855fa176a5a25b037f},

doi = {10.1109/CVMI59935.2023.10465077},

isbn = {9798350305142 (ISBN)},

year  = {2023},

date = {2023-01-01},

booktitle = {Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {In anticipation of the widespread adoption of augmented reality in the future, this paper introduces an advanced mobile application that seamlessly integrates AR and IoT technologies. The application aims to make these cutting-edge technologies more affordable and accessible to users while highlighting their immense benefits in assisting with household appliance control, as well as providing interactive and educational experiences. The app employs advanced algorithms such as object detection, Natural Language Processing (NLP), and Optical Character Recognition (OCR) to scan the smartphone's camera feed. Upon identification, AR controls for appliances, their power consumption, and electric bill tracking are displayed. Additionally, the application makes use of APIs to access the internet, retrieving relevant 3D generative models, 360-degree videos, 2D images, and textual information based on user interactions with detected objects. Users can effortlessly explore and interact with the 3D generative models using intuitive hand gestures, providing an immersive experience without the need for additional hardware or dedicated VR headsets. Beyond home automation, the app offers valuable educational benefits, serving as a unique learning tool for students to gain hands-on experience. Medical practitioners can quickly reference organ anatomy and utilize its feature-rich functionalities. Its cost-effectiveness, requiring only installation, ensures accessibility to a wide audience. The app's functionality is both intuitive and efficient, detecting objects in the camera feed and prompting user interactions. Users can select objects through simple hand gestures, choosing desired content like 3D generative models, 2D images, textual information, 360-degree videos, or shopping-related details. The app then retrieves and overlays the requested information onto the real-world view in AR. In conclusion, this groundbreaking AR and IoT -powered app revolutionizes home automation and learning experiences, leveraging only a smartphone's camera, without the need for additional hardware or expensive installations. Its potential applications extend to education, industries, and health care, making it a versatile and valuable tool for a broad range of users. © 2024 Elsevier B.V., All rights reserved.},

keywords = {2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones},

pubstate = {published},

tppubtype = {inproceedings}

}

In anticipation of the widespread adoption of augmented reality in the future, this paper introduces an advanced mobile application that seamlessly integrates AR and IoT technologies. The application aims to make these cutting-edge technologies more affordable and accessible to users while highlighting their immense benefits in assisting with household appliance control, as well as providing interactive and educational experiences. The app employs advanced algorithms such as object detection, Natural Language Processing (NLP), and Optical Character Recognition (OCR) to scan the smartphone's camera feed. Upon identification, AR controls for appliances, their power consumption, and electric bill tracking are displayed. Additionally, the application makes use of APIs to access the internet, retrieving relevant 3D generative models, 360-degree videos, 2D images, and textual information based on user interactions with detected objects. Users can effortlessly explore and interact with the 3D generative models using intuitive hand gestures, providing an immersive experience without the need for additional hardware or dedicated VR headsets. Beyond home automation, the app offers valuable educational benefits, serving as a unique learning tool for students to gain hands-on experience. Medical practitioners can quickly reference organ anatomy and utilize its feature-rich functionalities. Its cost-effectiveness, requiring only installation, ensures accessibility to a wide audience. The app's functionality is both intuitive and efficient, detecting objects in the camera feed and prompting user interactions. Users can select objects through simple hand gestures, choosing desired content like 3D generative models, 2D images, textual information, 360-degree videos, or shopping-related details. The app then retrieves and overlays the requested information onto the real-world view in AR. In conclusion, this groundbreaking AR and IoT -powered app revolutionizes home automation and learning experiences, leveraging only a smartphone's camera, without the need for additional hardware or expensive installations. Its potential applications extend to education, industries, and health care, making it a versatile and valuable tool for a broad range of users. © 2024 Elsevier B.V., All rights reserved.