AHCI RESEARCH GROUP
Publications
Papers published in international journals,
proceedings of conferences, workshops and books.
OUR RESEARCH
Scientific Publications
How to
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.
2025
Behravan, M.; Haghani, M.; Gračanin, D.
Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality Proceedings Article
In: J.Y.C., Chen; G., Fragomeni (Ed.): Lect. Notes Comput. Sci., pp. 13–32, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-303193699-9 (ISBN).
Abstract | Links | BibTeX | Tags: 3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering
@inproceedings{behravan_transcending_2025,
title = {Transcending Dimensions Using Generative AI: Real-Time 3D Model Generation in Augmented Reality},
author = {M. Behravan and M. Haghani and D. Gračanin},
editor = {Chen J.Y.C. and Fragomeni G.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105007690904&doi=10.1007%2f978-3-031-93700-2_2&partnerID=40&md5=1c4d643aad88d08cbbc9dd2c02413f10},
doi = {10.1007/978-3-031-93700-2_2},
isbn = {03029743 (ISSN); 978-303193699-9 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Lect. Notes Comput. Sci.},
volume = {15788 LNCS},
pages = {13–32},
publisher = {Springer Science and Business Media Deutschland GmbH},
abstract = {Traditional 3D modeling requires technical expertise, specialized software, and time-intensive processes, making it inaccessible for many users. Our research aims to lower these barriers by combining generative AI and augmented reality (AR) into a cohesive system that allows users to easily generate, manipulate, and interact with 3D models in real time, directly within AR environments. Utilizing cutting-edge AI models like Shap-E, we address the complex challenges of transforming 2D images into 3D representations in AR environments. Key challenges such as object isolation, handling intricate backgrounds, and achieving seamless user interaction are tackled through advanced object detection methods, such as Mask R-CNN. Evaluation results from 35 participants reveal an overall System Usability Scale (SUS) score of 69.64, with participants who engaged with AR/VR technologies more frequently rating the system significantly higher, at 80.71. This research is particularly relevant for applications in gaming, education, and AR-based e-commerce, offering intuitive, model creation for users without specialized skills. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.},
keywords = {3D Model Generation, 3D modeling, 3D models, 3d-modeling, Augmented Reality, Generative AI, Image-to-3D conversion, Model generation, Object Detection, Object recognition, Objects detection, Real- time, Specialized software, Technical expertise, Three dimensional computer graphics, Usability engineering},
pubstate = {published},
tppubtype = {inproceedings}
}
Sinha, Y.; Shanmugam, S.; Sahu, Y. K.; Mukhopadhyay, A.; Biswas, P.
Diffuse Your Data Blues: Augmenting Low-Resource Datasets via User-Assisted Diffusion Proceedings Article
In: Int Conf Intell User Interfaces Proc IUI, pp. 538–552, Association for Computing Machinery, 2025, ISBN: 979-840071306-4 (ISBN).
Abstract | Links | BibTeX | Tags: Data gathering, Detection models, Diffusion Model, diffusion models, Efficient Augmentation, Image Composition, Industrial context, Mixed reality, Object Detection, Objects detection, Synthetic Dataset, Synthetic datasets, Training objects
@inproceedings{sinha_diffuse_2025,
title = {Diffuse Your Data Blues: Augmenting Low-Resource Datasets via User-Assisted Diffusion},
author = {Y. Sinha and S. Shanmugam and Y. K. Sahu and A. Mukhopadhyay and P. Biswas},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001924293&doi=10.1145%2f3708359.3712163&partnerID=40&md5=c13cb6b2ef757546239de8b3ba93fb14},
doi = {10.1145/3708359.3712163},
isbn = {979-840071306-4 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Int Conf Intell User Interfaces Proc IUI},
pages = {538–552},
publisher = {Association for Computing Machinery},
abstract = {Mixed reality applications in industrial contexts necessitate extensive and varied datasets for training object detection models, yet actual data gathering may be obstructed by logistical or cost issues. This study investigates the implementation of generative AI methods to work on this issue for mixed reality applications, with an emphasis on assembly and disassembly tasks. The novel objects found in industrial settings are difficult to describe using words, making text-based models less effective. In this study, a diffusion model is used to generate images by combining novel objects with various backgrounds. The backgrounds are selected where object detection in specific applications has been ineffective. This approach efficiently produces a diverse range of training samples. We compare three approaches: traditional augmentation methods, GAN-based augmentation, and Diffusion-based augmentation. Results show that the diffusion model significantly improved detection metrics. For instance, applying diffusion models to the dataset containing mechanical components of a pneumatic cylinder raised the F1 Score from 69.77 to 84.21 and the mAP@50 from 76.48 to 88.77, resulting in an increase in object detection performance, with a 67% less dataset size compared to the traditional augmented dataset. The proposed image composition diffusion model and user-friendly interface further simplify dataset enrichment, proving effective for augmenting data and improving the robustness of detection models. © 2025 Copyright held by the owner/author(s).},
keywords = {Data gathering, Detection models, Diffusion Model, diffusion models, Efficient Augmentation, Image Composition, Industrial context, Mixed reality, Object Detection, Objects detection, Synthetic Dataset, Synthetic datasets, Training objects},
pubstate = {published},
tppubtype = {inproceedings}
}
2024
Du, B.; Du, H.; Liu, H.; Niyato, D.; Xin, P.; Yu, J.; Qi, M.; Tang, Y.
YOLO-Based Semantic Communication with Generative AI-Aided Resource Allocation for Digital Twins Construction Journal Article
In: IEEE Internet of Things Journal, vol. 11, no. 5, pp. 7664–7678, 2024, ISSN: 23274662 (ISSN).
Abstract | Links | BibTeX | Tags: Cost reduction, Data transfer, Digital Twins, Edge detection, Image edge detection, Network layers, Object Detection, Object detectors, Objects detection, Physical world, Resource allocation, Resource management, Resources allocation, Semantic communication, Semantics, Semantics Information, Virtual Reality, Virtual worlds, Wireless communications
@article{du_yolo-based_2024,
title = {YOLO-Based Semantic Communication with Generative AI-Aided Resource Allocation for Digital Twins Construction},
author = {B. Du and H. Du and H. Liu and D. Niyato and P. Xin and J. Yu and M. Qi and Y. Tang},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85173060990&doi=10.1109%2fJIOT.2023.3317629&partnerID=40&md5=60507e2f6ce2b1c345248867a9c527a1},
doi = {10.1109/JIOT.2023.3317629},
issn = {23274662 (ISSN)},
year = {2024},
date = {2024-01-01},
journal = {IEEE Internet of Things Journal},
volume = {11},
number = {5},
pages = {7664–7678},
abstract = {Digital Twins play a crucial role in bridging the physical and virtual worlds. Given the dynamic and evolving characteristics of the physical world, a huge volume of data transmission and exchange is necessary to attain synchronized updates in the virtual world. In this article, we propose a semantic communication framework based on you only look once (YOLO) to construct a virtual apple orchard with the aim of mitigating the costs associated with data transmission. Specifically, we first employ the YOLOv7-X object detector to extract semantic information from captured images of edge devices, thereby reducing the volume of transmitted data and saving transmission costs. Afterwards, we quantify the importance of each semantic information by the confidence generated through the object detector. Based on this, we propose two resource allocation schemes, i.e., the confidence-based scheme and the acrlong AI-generated scheme, aimed at enhancing the transmission quality of important semantic information. The proposed diffusion model generates an optimal allocation scheme that outperforms both the average allocation scheme and the confidence-based allocation scheme. Moreover, to obtain semantic information more effectively, we enhance the detection capability of the YOLOv7-X object detector by introducing new efficient layer aggregation network-horNet (ELAN-H) and SimAM attention modules, while reducing the model parameters and computational complexity, making it easier to run on edge devices with limited performance. The numerical results indicate that our proposed semantic communication framework and resource allocation schemes significantly reduce transmission costs while enhancing the transmission quality of important information in communication services. © 2014 IEEE.},
keywords = {Cost reduction, Data transfer, Digital Twins, Edge detection, Image edge detection, Network layers, Object Detection, Object detectors, Objects detection, Physical world, Resource allocation, Resource management, Resources allocation, Semantic communication, Semantics, Semantics Information, Virtual Reality, Virtual worlds, Wireless communications},
pubstate = {published},
tppubtype = {article}
}
Liang, Q.; Chen, Y.; Li, W.; Lai, M.; Ni, W.; Qiu, H.
In: L., Zhang; W., Yu; Q., Wang; Y., Laili; Y., Liu (Ed.): Commun. Comput. Info. Sci., pp. 12–24, Springer Science and Business Media Deutschland GmbH, 2024, ISBN: 18650929 (ISSN); 978-981973947-9 (ISBN).
Abstract | Links | BibTeX | Tags: Augmented Reality, Glass, Identity recognition, Internet of Things, Internet of things technologies, IoT, Language learning, Learning systems, LLM, Object Detection, Objects detection, Open Vocabulary Object Detection, Recognition systems, Semantics, Telephone sets, Translation (languages), Translation systems, Visual languages, Wearable computers, Wearable device, Wearable devices
@inproceedings{liang_iknowisee_2024,
title = {iKnowiSee: AR Glasses with Language Learning Translation System and Identity Recognition System Built Based on Large Pre-trained Models of Language and Vision and Internet of Things Technology},
author = {Q. Liang and Y. Chen and W. Li and M. Lai and W. Ni and H. Qiu},
editor = {Zhang L. and Yu W. and Wang Q. and Laili Y. and Liu Y.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200663840&doi=10.1007%2f978-981-97-3948-6_2&partnerID=40&md5=a0324ba6108674b1d39a338574269d60},
doi = {10.1007/978-981-97-3948-6_2},
isbn = {18650929 (ISSN); 978-981973947-9 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {Commun. Comput. Info. Sci.},
volume = {2139 CCIS},
pages = {12–24},
publisher = {Springer Science and Business Media Deutschland GmbH},
abstract = {AR glasses used in daily life have made good progress and have some practical value.However, the current design concept of AR glasses is basically to simply port the content of a cell phone and act as a secondary screen for the phone. In contrast, the AR glasses we designed are based on actual situations, focus on real-world interactions, and utilize IoT technology with the aim of enabling users to fully extract and utilize the digital information in their lives. We have created two innovative features, one is a language learning translation system for users to learn foreign languages, which integrates a large language model with an open vocabulary recognition model to fully extract the visual semantic information of the scene; and the other is a social conferencing system, which utilizes the IoT cloud, pipe, edge, and end development to reduce the cost of communication and improve the efficiency of exchanges in social situations. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.},
keywords = {Augmented Reality, Glass, Identity recognition, Internet of Things, Internet of things technologies, IoT, Language learning, Learning systems, LLM, Object Detection, Objects detection, Open Vocabulary Object Detection, Recognition systems, Semantics, Telephone sets, Translation (languages), Translation systems, Visual languages, Wearable computers, Wearable device, Wearable devices},
pubstate = {published},
tppubtype = {inproceedings}
}
2023
Joseph, S.; Priya, B. S.; Poorvaja, R.; Kumaran, M. Santhosh; Shivaraj, S.; Jeyanth, V.; Shivesh, R. P.
IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI Proceedings Article
In: K.V., Arya; T., Wada (Ed.): Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI, Institute of Electrical and Electronics Engineers Inc., 2023, ISBN: 979-835030514-2 (ISBN).
Abstract | Links | BibTeX | Tags: 2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones
@inproceedings{joseph_iot_2023,
title = {IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI},
author = {S. Joseph and B. S. Priya and R. Poorvaja and M. Santhosh Kumaran and S. Shivaraj and V. Jeyanth and R. P. Shivesh},
editor = {Arya K.V. and Wada T.},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189754688&doi=10.1109%2fCVMI59935.2023.10465077&partnerID=40&md5=9c1a9d7151c0b04bab83586f515d30aa},
doi = {10.1109/CVMI59935.2023.10465077},
isbn = {979-835030514-2 (ISBN)},
year = {2023},
date = {2023-01-01},
booktitle = {Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {In anticipation of the widespread adoption of augmented reality in the future, this paper introduces an advanced mobile application that seamlessly integrates AR and IoT technologies. The application aims to make these cutting-edge technologies more affordable and accessible to users while highlighting their immense benefits in assisting with household appliance control, as well as providing interactive and educational experiences. The app employs advanced algorithms such as object detection, Natural Language Processing (NLP), and Optical Character Recognition (OCR) to scan the smartphone's camera feed. Upon identification, AR controls for appliances, their power consumption, and electric bill tracking are displayed. Additionally, the application makes use of APIs to access the internet, retrieving relevant 3D generative models, 360-degree videos, 2D images, and textual information based on user interactions with detected objects. Users can effortlessly explore and interact with the 3D generative models using intuitive hand gestures, providing an immersive experience without the need for additional hardware or dedicated VR headsets. Beyond home automation, the app offers valuable educational benefits, serving as a unique learning tool for students to gain hands-on experience. Medical practitioners can quickly reference organ anatomy and utilize its feature-rich functionalities. Its cost-effectiveness, requiring only installation, ensures accessibility to a wide audience. The app's functionality is both intuitive and efficient, detecting objects in the camera feed and prompting user interactions. Users can select objects through simple hand gestures, choosing desired content like 3D generative models, 2D images, textual information, 360-degree videos, or shopping-related details. The app then retrieves and overlays the requested information onto the real-world view in AR. In conclusion, this groundbreaking AR and IoT -powered app revolutionizes home automation and learning experiences, leveraging only a smartphone's camera, without the need for additional hardware or expensive installations. Its potential applications extend to education, industries, and health care, making it a versatile and valuable tool for a broad range of users. © 2023 IEEE.},
keywords = {2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones},
pubstate = {published},
tppubtype = {inproceedings}
}