AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Ding, S.; Chen, Y.

RAG-VR: Leveraging Retrieval-Augmented Generation for 3D Question Answering in VR Environments Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 131–136, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).

Abstract | Links | BibTeX | Tags: Ambient intelligence, Computational Linguistics, Computer interaction, Computing methodologies, Computing methodologies-Artificial intelligence-Natural language processing-Natural language generation, Computing methodology-artificial intelligence-natural language processing-natural language generation, Data handling, Formal languages, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Language Model, Language processing, Natural language generation, Natural language processing systems, Natural languages, Virtual Reality, Word processing

@inproceedings{ding_rag-vr_2025,

title = {RAG-VR: Leveraging Retrieval-Augmented Generation for 3D Question Answering in VR Environments},

author = {S. Ding and Y. Chen},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005140593&doi=10.1109%2fVRW66409.2025.00034&partnerID=40&md5=36dc5fef97aeea4d6e183c83ce9fcd89},

doi = {10.1109/VRW66409.2025.00034},

isbn = {979-833151484-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},

pages = {131–136},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Recent advances in large language models (LLMs) provide new opportunities for context understanding in virtual reality (VR). However, VR contexts are often highly localized and personalized, limiting the effectiveness of general-purpose LLMs. To address this challenge, we present RAG-VR, the first 3D question-answering system for VR that incorporates retrieval-augmented generation (RAG), which augments an LLM with external knowledge retrieved from a localized knowledge database to improve the answer quality. RAG-VR includes a pipeline for extracting comprehensive knowledge about virtual environments and user conditions for accurate answer generation. To ensure efficient retrieval, RAG-VR offloads the retrieval process to a nearby edge server and uses only essential information during retrieval. Moreover, we train the retriever to effectively distinguish among relevant, irrelevant, and hard-to-differentiate information in relation to questions. RAG-VR improves answer accuracy by 17.9%-41.8% and reduces end-to-end latency by 34.5%-47.3% compared with two baseline systems. © 2025 IEEE.},

keywords = {Ambient intelligence, Computational Linguistics, Computer interaction, Computing methodologies, Computing methodologies-Artificial intelligence-Natural language processing-Natural language generation, Computing methodology-artificial intelligence-natural language processing-natural language generation, Data handling, Formal languages, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Language Model, Language processing, Natural language generation, Natural language processing systems, Natural languages, Virtual Reality, Word processing},

pubstate = {published},

tppubtype = {inproceedings}

}

2024

Venkatachalam, N.; Rayana, M.; Vignesh, S. Bala; Prathamesh, S.

Voice-Driven Panoramic Imagery: Real-Time Generative AI for Immersive Experiences Proceedings Article

In: Int. Conf. Intell. Data Commun. Technol. Internet Things, IDCIoT, pp. 1133–1138, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835032753-3 (ISBN).

Abstract | Links | BibTeX | Tags: Adaptive Visual Experience, First person, First-Person view, generative artificial intelligence, Generative Artificial Intelligence (AI), Image processing, Immersive, Immersive visual scene, Immersive Visual Scenes, Language processing, Natural Language Processing, Natural Language Processing (NLP), Natural language processing systems, Natural languages, Panoramic Images, Patient treatment, Personalized environment, Personalized Environments, Phobia Treatment, Prompt, prompts, Psychological intervention, Psychological Interventions, Real-Time Synthesis, User interaction, User interfaces, Virtual experience, Virtual Experiences, Virtual Reality, Virtual Reality (VR), Virtual-reality headsets, Visual experiences, Visual languages, Visual scene, Voice command, Voice commands, VR Headsets

@inproceedings{venkatachalam_voice-driven_2024,

title = {Voice-Driven Panoramic Imagery: Real-Time Generative AI for Immersive Experiences},

author = {N. Venkatachalam and M. Rayana and S. Bala Vignesh and S. Prathamesh},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85190121845&doi=10.1109%2fIDCIoT59759.2024.10467441&partnerID=40&md5=6594fbab013d9156b79a887f0d7209cb},

doi = {10.1109/IDCIoT59759.2024.10467441},

isbn = {979-835032753-3 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Int. Conf. Intell. Data Commun. Technol. Internet Things, IDCIoT},

pages = {1133–1138},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This research study introduces an innovative system that aims to synthesize 360-degree panoramic images in Realtime based on vocal prompts from the user, leveraging state-of-The-Art Generative AI with a combination of advanced NLP models. The primary objective of this system is to transform spoken descriptions into immersive and interactive visual scenes, specifically designed to provide users with first-person field views. This cutting-edge technology has the potential to revolutionize the realm of virtual reality (VR) experiences, enabling users to effortlessly create and navigate through personalized environments. The fundamental goal of this system is to enable the generation of real-Time images that are seamlessly compatible with VR headsets, offering a truly immersive and adaptive visual experience. Beyond its technological advancements, this research also highlights its significant potential for creating a positive social impact. One notable application lies in psychological interventions, particularly in the context of phobia treatment and therapeutic settings. Here, patients can safely confront and work through their fears within these synthesized environments, potentially offering new avenues for therapy. Furthermore, the system serves educational and entertainment purposes by bringing users' imaginations to life, providing an unparalleled platform for exploring the boundaries of virtual experiences. Overall, this research represents a promising stride towards a more immersive and adaptable future in VR technology, with the potential to enhance various aspects of human lives, from mental health treatment to entertainment and education. © 2024 IEEE.},

keywords = {Adaptive Visual Experience, First person, First-Person view, generative artificial intelligence, Generative Artificial Intelligence (AI), Image processing, Immersive, Immersive visual scene, Immersive Visual Scenes, Language processing, Natural Language Processing, Natural Language Processing (NLP), Natural language processing systems, Natural languages, Panoramic Images, Patient treatment, Personalized environment, Personalized Environments, Phobia Treatment, Prompt, prompts, Psychological intervention, Psychological Interventions, Real-Time Synthesis, User interaction, User interfaces, Virtual experience, Virtual Experiences, Virtual Reality, Virtual Reality (VR), Virtual-reality headsets, Visual experiences, Visual languages, Visual scene, Voice command, Voice commands, VR Headsets},

pubstate = {published},

tppubtype = {inproceedings}

}

Jeong, E.; Kim, H.; Park, S.; Yoon, S.; Ahn, J.; Woo, W.

Function-Adaptive Affordance Extraction from 3D Objects Using LLM for Interaction Authoring with Augmented Artifacts Proceedings Article

In: U., Eck; M., Sra; J., Stefanucci; M., Sugimoto; M., Tatzgern; I., Williams (Ed.): Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct, pp. 205–208, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-833150691-9 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, Applied computing, Art and humanity, Artificial intelligence, Arts and humanities, Augmented Reality, Computer interaction, Computer vision, Computing methodologies, computing methodology, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Humanities computing, Interaction paradigm, Interaction paradigms, Language processing, Mixed / augmented reality, Mixed reality, Modeling languages, Natural Language Processing, Natural language processing systems, Natural languages, Three dimensional computer graphics

@inproceedings{jeong_function-adaptive_2024,

title = {Function-Adaptive Affordance Extraction from 3D Objects Using LLM for Interaction Authoring with Augmented Artifacts},

author = {E. Jeong and H. Kim and S. Park and S. Yoon and J. Ahn and W. Woo},

editor = {Eck U. and Sra M. and Stefanucci J. and Sugimoto M. and Tatzgern M. and Williams I.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85214379963&doi=10.1109%2fISMAR-Adjunct64951.2024.00050&partnerID=40&md5=7222e0599a7e2aa0adaea38e4b9e13cc},

doi = {10.1109/ISMAR-Adjunct64951.2024.00050},

isbn = {979-833150691-9 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct},

pages = {205–208},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {We propose an algorithm that extracts the most suitable affordances, interaction targets, and corresponding coordinates adaptively from 3D models of various artifacts based on their functional context for efficient authoring of XR content with artifacts. Traditionally, authoring AR scenes to convey artifact context required one-to-one manual work. Our approach leverages a Large Language Model (LLM) to extract interaction types, positions, and subjects based on the artifact's name and usage context. This enables templated XR experience creation, replacing repetitive manual labor. Consequently, our system streamlines the XR authoring process, making it more efficient and scalable. © 2024 IEEE.},

keywords = {3D modeling, Applied computing, Art and humanity, Artificial intelligence, Arts and humanities, Augmented Reality, Computer interaction, Computer vision, Computing methodologies, computing methodology, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Humanities computing, Interaction paradigm, Interaction paradigms, Language processing, Mixed / augmented reality, Mixed reality, Modeling languages, Natural Language Processing, Natural language processing systems, Natural languages, Three dimensional computer graphics},

pubstate = {published},

tppubtype = {inproceedings}

}

Cronin, I.

Understanding Generative AI Business Applications: A Guide to Technical Principles and Real-World Applications Book

Apress Media LLC, 2024, ISBN: 979-886880282-9 (ISBN); 979-886880281-2 (ISBN).

Abstract | Links | BibTeX | Tags: Artificial intelligence, Augmented Reality, Autonomous system, Autonomous systems, Business applications, Computer vision, Decision making, Gaussian Splatting, Gaussians, Generative AI, Language processing, Learning algorithms, Learning systems, machine learning, Machine-learning, Natural Language Processing, Natural Language Processing (NLP), Natural language processing systems, Natural languages, Splatting

@book{cronin_understanding_2024,

title = {Understanding Generative AI Business Applications: A Guide to Technical Principles and Real-World Applications},

author = {I. Cronin},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001777571&doi=10.1007%2f979-8-8688-0282-9&partnerID=40&md5=c0714ff3e1ad755596426ea092b830d6},

doi = {10.1007/979-8-8688-0282-9},

isbn = {979-886880282-9 (ISBN); 979-886880281-2 (ISBN)},

year  = {2024},

date = {2024-01-01},

publisher = {Apress Media LLC},

series = {Understanding Generative AI Business Applications: A Guide to Technical Principles and Real-World Applications},

abstract = {This guide covers the fundamental technical principles and various business applications of Generative AI for planning, developing, and evaluating AI-driven products. It equips you with the knowledge you need to harness the potential of Generative AI for enhancing business creativity and productivity. The book is organized into three sections: text-based, senses-based, and rationale-based. Each section provides an in-depth exploration of the specific methods and applications of Generative AI. In the text-based section, you will find detailed discussions on designing algorithms to automate and enhance written communication, including insights into the technical aspects of transformer-based Natural Language Processing (NLP) and chatbot architecture, such as GPT-4, Claude 2, Google Bard, and others. The senses-based section offers a glimpse into the algorithms and data structures that underpin visual, auditory, and multisensory experiences, including NeRF, 3D Gaussian Splatting, Stable Diffusion, AR and VR technologies, and more. The rationale-based section illuminates the decision-making capabilities of AI, with a focus on machine learning and data analytics techniques that empower applications such as simulation models, agents, and autonomous systems. In summary, this book serves as a guide for those seeking to navigate the dynamic landscape of Generative AI. Whether you’re a seasoned AI professional or a business leader looking to harness the power of creative automation, these pages offer a roadmap to leverage Generative AI for your organization’s success. © 2024 by Irena Cronin.},

keywords = {Artificial intelligence, Augmented Reality, Autonomous system, Autonomous systems, Business applications, Computer vision, Decision making, Gaussian Splatting, Gaussians, Generative AI, Language processing, Learning algorithms, Learning systems, machine learning, Machine-learning, Natural Language Processing, Natural Language Processing (NLP), Natural language processing systems, Natural languages, Splatting},

pubstate = {published},

tppubtype = {book}

}

Kapadia, N.; Gokhale, S.; Nepomuceno, A.; Cheng, W.; Bothwell, S.; Mathews, M.; Shallat, J. S.; Schultz, C.; Gupta, A.

Evaluation of Large Language Model Generated Dialogues for an AI Based VR Nurse Training Simulator Proceedings Article

In: J.Y.C., Chen; G., Fragomeni (Ed.): Lect. Notes Comput. Sci., pp. 200–212, Springer Science and Business Media Deutschland GmbH, 2024, ISBN: 03029743 (ISSN); 978-303161040-0 (ISBN).

Abstract | Links | BibTeX | Tags: Bard, ChatGPT, ClaudeAI, Clinical research, Computational Linguistics, Dialogue Generation, Dialogue generations, Education computing, Extended reality, Health care education, Healthcare Education, Language Model, Language processing, Large language model, large language models, Natural Language Processing, Natural language processing systems, Natural languages, Nurse Training Simulation, Nursing, Patient avatar, Patient Avatars, Semantics, Students, Training simulation, Virtual Reality

@inproceedings{kapadia_evaluation_2024,

title = {Evaluation of Large Language Model Generated Dialogues for an AI Based VR Nurse Training Simulator},

author = {N. Kapadia and S. Gokhale and A. Nepomuceno and W. Cheng and S. Bothwell and M. Mathews and J. S. Shallat and C. Schultz and A. Gupta},

editor = {Chen J.Y.C. and Fragomeni G.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85196200653&doi=10.1007%2f978-3-031-61041-7_13&partnerID=40&md5=8890a8d0c289fdf6e7ab82e105249097},

doi = {10.1007/978-3-031-61041-7_13},

isbn = {03029743 (ISSN); 978-303161040-0 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {14706 LNCS},

pages = {200–212},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {This paper explores the efficacy of Large Language Models (LLMs) in generating dialogues for patient avatars in Virtual Reality (VR) nurse training simulators. With the integration of technology in healthcare education evolving rapidly, the potential of NLP to enhance nurse training through realistic patient interactions presents a significant opportunity. Our study introduces a novel LLM-based dialogue generation system, leveraging models such as ChatGPT, GoogleBard, and ClaudeAI. We detail the development of our script generation system, which was a collaborative endeavor involving nurses, technical artists, and developers. The system, tested on the Meta Quest 2 VR headset, integrates complex dialogues created through a synthesis of clinical expertise and advanced NLP, aimed at simulating real-world nursing scenarios. Through a comprehensive evaluation involving lexical and semantic similarity tests compared to clinical expert-generated scripts, we assess the potential of LLMs as suitable alternatives for script generation. The findings aim to contribute to the development of a more interactive and effective VR nurse training simulator, enhancing communication skills among nursing students for improved patient care outcomes. This research underscores the importance of advanced NLP applications in healthcare education, offering insights into the practicality and limitations of employing LLMs in clinical training environments. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.},

keywords = {Bard, ChatGPT, ClaudeAI, Clinical research, Computational Linguistics, Dialogue Generation, Dialogue generations, Education computing, Extended reality, Health care education, Healthcare Education, Language Model, Language processing, Large language model, large language models, Natural Language Processing, Natural language processing systems, Natural languages, Nurse Training Simulation, Nursing, Patient avatar, Patient Avatars, Semantics, Students, Training simulation, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

2023

Fuchs, A.; Appel, S.; Grimm, P.

Immersive Spaces for Creativity: Smart Working Environments Proceedings Article

In: A.A., Yunanto; A.D., Ramadhani; Y.R., Prayogi; P.A.M., Putra; M., Ruswiansari; M., Ridwan; F., Gamar; W.M., Rahmawati; M.R., Rusli; F.M., Humaira; A.F., Adila (Ed.): IES - Int. Electron. Symp.: Unlocking Potential Immersive Technol. Live Better Life, Proceeding, pp. 610–617, Institute of Electrical and Electronics Engineers Inc., 2023, ISBN: 979-835031473-1 (ISBN).

Abstract | Links | BibTeX | Tags: Artificial intelligence, Generative AI, Human computer interaction, Immersive, Innovative approaches, Intelligent systems, Interactive Environments, Language Model, Language processing, Large language model, large language models, Learning algorithms, machine learning, Natural language processing systems, Natural languages, User behaviors, User interfaces, Virtual Reality, Working environment

@inproceedings{fuchs_immersive_2023,

title = {Immersive Spaces for Creativity: Smart Working Environments},

author = {A. Fuchs and S. Appel and P. Grimm},

editor = {Yunanto A.A. and Ramadhani A.D. and Prayogi Y.R. and Putra P.A.M. and Ruswiansari M. and Ridwan M. and Gamar F. and Rahmawati W.M. and Rusli M.R. and Humaira F.M. and Adila A.F.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85173627291&doi=10.1109%2fIES59143.2023.10242458&partnerID=40&md5=6ab1796f68c29d7747574272314a2e9d},

doi = {10.1109/IES59143.2023.10242458},

isbn = {979-835031473-1 (ISBN)},

year  = {2023},

date = {2023-01-01},

booktitle = {IES - Int. Electron. Symp.: Unlocking Potential Immersive Technol. Live Better Life, Proceeding},

pages = {610–617},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This paper presents an innovative approach to designing an immersive space that dynamically supports users (inter-)action based on users' behavior, voice, and mood, providing a personalized experience. The objective of this research is to explore how a space can communicate with users in a seamless, engaging, and interactive environment. Therefore, it integrates natural language processing (NLP), generative artificial intelligence applications and human computer interaction that utilizes a combination of sensors, microphones, and cameras to collect real-time data on users' behavior, voice, and mood. This data is then processed and analyzed by an intelligent system that employs machine learning algorithms to identify patterns and adapt the environment accordingly. The adaptive features include changes in lighting, sound, and visual elements to facilitate creativity, focus, relaxation, or socialization, depending on the user's topics and emotional state. The paper discusses the technical aspects of implementing such a system. Additionally, it highlights the potential applications of this technology in various domains such as education, entertainment, and workplace settings. In conclusion, the immersive creative space represents a paradigm shift in human-environment interaction, offering a dynamic and personalized space that caters to the diverse needs of users. The research findings suggest that this innovative approach holds great promise for enhancing user experiences, fostering creativity, and promoting overall well-being. © 2023 IEEE.},

keywords = {Artificial intelligence, Generative AI, Human computer interaction, Immersive, Innovative approaches, Intelligent systems, Interactive Environments, Language Model, Language processing, Large language model, large language models, Learning algorithms, machine learning, Natural language processing systems, Natural languages, User behaviors, User interfaces, Virtual Reality, Working environment},

pubstate = {published},

tppubtype = {inproceedings}

}

Joseph, S.; Priya, B. S.; Poorvaja, R.; Kumaran, M. Santhosh; Shivaraj, S.; Jeyanth, V.; Shivesh, R. P.

IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI Proceedings Article

In: K.V., Arya; T., Wada (Ed.): Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI, Institute of Electrical and Electronics Engineers Inc., 2023, ISBN: 979-835030514-2 (ISBN).

Abstract | Links | BibTeX | Tags: 2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones

@inproceedings{joseph_iot_2023,

title = {IoT Empowered AI: Transforming Object Recognition and NLP Summarization with Generative AI},

author = {S. Joseph and B. S. Priya and R. Poorvaja and M. Santhosh Kumaran and S. Shivaraj and V. Jeyanth and R. P. Shivesh},

editor = {Arya K.V. and Wada T.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189754688&doi=10.1109%2fCVMI59935.2023.10465077&partnerID=40&md5=9c1a9d7151c0b04bab83586f515d30aa},

doi = {10.1109/CVMI59935.2023.10465077},

isbn = {979-835030514-2 (ISBN)},

year  = {2023},

date = {2023-01-01},

booktitle = {Proc. IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {In anticipation of the widespread adoption of augmented reality in the future, this paper introduces an advanced mobile application that seamlessly integrates AR and IoT technologies. The application aims to make these cutting-edge technologies more affordable and accessible to users while highlighting their immense benefits in assisting with household appliance control, as well as providing interactive and educational experiences. The app employs advanced algorithms such as object detection, Natural Language Processing (NLP), and Optical Character Recognition (OCR) to scan the smartphone's camera feed. Upon identification, AR controls for appliances, their power consumption, and electric bill tracking are displayed. Additionally, the application makes use of APIs to access the internet, retrieving relevant 3D generative models, 360-degree videos, 2D images, and textual information based on user interactions with detected objects. Users can effortlessly explore and interact with the 3D generative models using intuitive hand gestures, providing an immersive experience without the need for additional hardware or dedicated VR headsets. Beyond home automation, the app offers valuable educational benefits, serving as a unique learning tool for students to gain hands-on experience. Medical practitioners can quickly reference organ anatomy and utilize its feature-rich functionalities. Its cost-effectiveness, requiring only installation, ensures accessibility to a wide audience. The app's functionality is both intuitive and efficient, detecting objects in the camera feed and prompting user interactions. Users can select objects through simple hand gestures, choosing desired content like 3D generative models, 2D images, textual information, 360-degree videos, or shopping-related details. The app then retrieves and overlays the requested information onto the real-world view in AR. In conclusion, this groundbreaking AR and IoT -powered app revolutionizes home automation and learning experiences, leveraging only a smartphone's camera, without the need for additional hardware or expensive installations. Its potential applications extend to education, industries, and health care, making it a versatile and valuable tool for a broad range of users. © 2023 IEEE.},

keywords = {2D, 3D, Application program interface, Application Program Interface (API), Application program interfaces, Application programming interfaces (API), Application programs, Augmented Reality, Augmented Reality(AR), Automation, Cameras, Cost effectiveness, Domestic appliances, GenAl, Internet of Things, Internet of Things (IoT) technologies, Internet of things technologies, Language processing, Natural Language Processing, Natural language processing systems, Natural languages, Object Detection, Object recognition, Objects detection, Optical character recognition, Optical Character Recognition (OCR), Smartphones},

pubstate = {published},

tppubtype = {inproceedings}

}

In anticipation of the widespread adoption of augmented reality in the future, this paper introduces an advanced mobile application that seamlessly integrates AR and IoT technologies. The application aims to make these cutting-edge technologies more affordable and accessible to users while highlighting their immense benefits in assisting with household appliance control, as well as providing interactive and educational experiences. The app employs advanced algorithms such as object detection, Natural Language Processing (NLP), and Optical Character Recognition (OCR) to scan the smartphone's camera feed. Upon identification, AR controls for appliances, their power consumption, and electric bill tracking are displayed. Additionally, the application makes use of APIs to access the internet, retrieving relevant 3D generative models, 360-degree videos, 2D images, and textual information based on user interactions with detected objects. Users can effortlessly explore and interact with the 3D generative models using intuitive hand gestures, providing an immersive experience without the need for additional hardware or dedicated VR headsets. Beyond home automation, the app offers valuable educational benefits, serving as a unique learning tool for students to gain hands-on experience. Medical practitioners can quickly reference organ anatomy and utilize its feature-rich functionalities. Its cost-effectiveness, requiring only installation, ensures accessibility to a wide audience. The app's functionality is both intuitive and efficient, detecting objects in the camera feed and prompting user interactions. Users can select objects through simple hand gestures, choosing desired content like 3D generative models, 2D images, textual information, 360-degree videos, or shopping-related details. The app then retrieves and overlays the requested information onto the real-world view in AR. In conclusion, this groundbreaking AR and IoT -powered app revolutionizes home automation and learning experiences, leveraging only a smartphone's camera, without the need for additional hardware or expensive installations. Its potential applications extend to education, industries, and health care, making it a versatile and valuable tool for a broad range of users. © 2023 IEEE.