AHCI RESEARCH GROUP
Publications
Papers published in international journals,
proceedings of conferences, workshops and books.
OUR RESEARCH
Scientific Publications
How to
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.
2025
Behravan, M.; Matković, K.; Gračanin, D.
Generative AI for Context-Aware 3D Object Creation Using Vision-Language Models in Augmented Reality Proceedings Article
In: Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR, pp. 73–81, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833152157-8 (ISBN).
Abstract | Links | BibTeX | Tags: 3D object, 3D Object Generation, Artificial intelligence systems, Augmented Reality, Capture images, Context-Aware, Generative adversarial networks, Generative AI, generative artificial intelligence, Generative model, Language Model, Object creation, Vision language model, vision language models, Visual languages
@inproceedings{behravan_generative_2025,
title = {Generative AI for Context-Aware 3D Object Creation Using Vision-Language Models in Augmented Reality},
author = {M. Behravan and K. Matković and D. Gračanin},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000292700&doi=10.1109%2fAIxVR63409.2025.00018&partnerID=40&md5=b40fa769a6b427918c3fcd86f7c52a75},
doi = {10.1109/AIxVR63409.2025.00018},
isbn = {979-833152157-8 (ISBN)},
year = {2025},
date = {2025-01-01},
booktitle = {Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR},
pages = {73–81},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {We present a novel Artificial Intelligence (AI) system that functions as a designer assistant in augmented reality (AR) environments. Leveraging Vision Language Models (VLMs) like LLaVA and advanced text-to-3D generative models, users can capture images of their surroundings with an Augmented Reality (AR) headset. The system analyzes these images to recommend contextually relevant objects that enhance both functionality and visual appeal. The recommended objects are generated as 3D models and seamlessly integrated into the AR environment for interactive use. Our system utilizes open-source AI models running on local systems to enhance data security and reduce operational costs. Key features include context-aware object suggestions, optimal placement guidance, aesthetic matching, and an intuitive user interface for real-time interaction. Evaluations using the COCO 2017 dataset and real-world AR testing demonstrated high accuracy in object detection and contextual fit rating of 4.1 out of 5. By addressing the challenge of providing context-aware object recommendations in AR, our system expands the capabilities of AI applications in this domain. It enables users to create personalized digital spaces efficiently, leveraging AI for contextually relevant suggestions. © 2025 IEEE.},
keywords = {3D object, 3D Object Generation, Artificial intelligence systems, Augmented Reality, Capture images, Context-Aware, Generative adversarial networks, Generative AI, generative artificial intelligence, Generative model, Language Model, Object creation, Vision language model, vision language models, Visual languages},
pubstate = {published},
tppubtype = {inproceedings}
}
2024
Guo, Y.; Hou, K.; Yan, Z.; Chen, H.; Xing, G.; Jiang, X.
Sensor2Scene: Foundation Model-Driven Interactive Realities Proceedings Article
In: Proc. - IEEE Int. Workshop Found. Model. Cyber-Phys. Syst. Internet Things, FMSys, pp. 13–19, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835036345-6 (ISBN).
Abstract | Links | BibTeX | Tags: 3D modeling, Augmented Reality, Computational Linguistics, Data integration, Data visualization, Foundation models, Generative model, Language Model, Large language model, large language models, Model-driven, Sensor Data Integration, Sensors data, Text-to-3d generative model, Text-to-3D Generative Models, Three dimensional computer graphics, User interaction, User Interaction in AR, User interaction in augmented reality, User interfaces, Virtual Reality, Visualization
@inproceedings{guo_sensor2scene_2024,
title = {Sensor2Scene: Foundation Model-Driven Interactive Realities},
author = {Y. Guo and K. Hou and Z. Yan and H. Chen and G. Xing and X. Jiang},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199893762&doi=10.1109%2fFMSys62467.2024.00007&partnerID=40&md5=c3bf1739e8c1dc6227d61609ddc66910},
doi = {10.1109/FMSys62467.2024.00007},
isbn = {979-835036345-6 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {Proc. - IEEE Int. Workshop Found. Model. Cyber-Phys. Syst. Internet Things, FMSys},
pages = {13–19},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {Augmented Reality (AR) is acclaimed for its potential to bridge the physical and virtual worlds. Yet, current integration between these realms often lacks a deep under-standing of the physical environment and the subsequent scene generation that reflects this understanding. This research introduces Sensor2Scene, a novel system framework designed to enhance user interactions with sensor data through AR. At its core, an AI agent leverages large language models (LLMs) to decode subtle information from sensor data, constructing detailed scene descriptions for visualization. To enable these scenes to be rendered in AR, we decompose the scene creation process into tasks of text-to-3D model generation and spatial composition, allowing new AR scenes to be sketched from the descriptions. We evaluated our framework using an LLM evaluator based on five metrics on various datasets to examine the correlation between sensor readings and corresponding visualizations, and demonstrated the system's effectiveness with scenes generated from end-to-end. The results highlight the potential of LLMs to understand IoT sensor data. Furthermore, generative models can aid in transforming these interpretations into visual formats, thereby enhancing user interaction. This work not only displays the capabilities of Sensor2Scene but also lays a foundation for advancing AR with the goal of creating more immersive and contextually rich experiences. © 2024 IEEE.},
keywords = {3D modeling, Augmented Reality, Computational Linguistics, Data integration, Data visualization, Foundation models, Generative model, Language Model, Large language model, large language models, Model-driven, Sensor Data Integration, Sensors data, Text-to-3d generative model, Text-to-3D Generative Models, Three dimensional computer graphics, User interaction, User Interaction in AR, User interaction in augmented reality, User interfaces, Virtual Reality, Visualization},
pubstate = {published},
tppubtype = {inproceedings}
}
Yin, Z.; Wang, Y.; Papatheodorou, T.; Hui, P.
Text2VRScene: Exploring the Framework of Automated Text-driven Generation System for VR Experience Proceedings Article
In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR, pp. 701–711, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037402-5 (ISBN).
Abstract | Links | BibTeX | Tags: Automated systems, Automation, Digital contents, Generation systems, Generative model, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Interaction techniques, Language Model, Natural language processing systems, Text input, User interfaces, Virtual Reality
@inproceedings{yin_text2vrscene_2024,
title = {Text2VRScene: Exploring the Framework of Automated Text-driven Generation System for VR Experience},
author = {Z. Yin and Y. Wang and T. Papatheodorou and P. Hui},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85191431035&doi=10.1109%2fVR58804.2024.00090&partnerID=40&md5=5484a5bc3939d003efe68308f56b15a6},
doi = {10.1109/VR58804.2024.00090},
isbn = {979-835037402-5 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR},
pages = {701–711},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {With the recent development of the Virtual Reality (VR) industry, the increasing number of VR users pushes the demand for the massive production of immersive and expressive VR scenes in related industries. However, creating expressive VR scenes involves the reasonable organization of various digital content to express a coherent and logical theme, which is time-consuming and labor-intensive. In recent years, Large Language Models (LLMs) such as ChatGPT 3.5 and generative models such as stable diffusion have emerged as powerful tools for comprehending natural language and generating digital contents such as text, code, images, and 3D objects. In this paper, we have explored how we can generate VR scenes from text by incorporating LLMs and various generative models into an automated system. To achieve this, we first identify the possible limitations of LLMs for an automated system and propose a systematic framework to mitigate them. Subsequently, we developed Text2VRScene, a VR scene generation system, based on our proposed framework with well-designed prompts. To validate the effectiveness of our proposed framework and the designed prompts, we carry out a series of test cases. The results show that the proposed framework contributes to improving the reliability of the system and the quality of the generated VR scenes. The results also illustrate the promising performance of the Text2VRScene in generating satisfying VR scenes with a clear theme regularized by our well-designed prompts. This paper ends with a discussion about the limitations of the current system and the potential of developing similar generation systems based on our framework. © 2024 IEEE.},
keywords = {Automated systems, Automation, Digital contents, Generation systems, Generative model, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Interaction techniques, Language Model, Natural language processing systems, Text input, User interfaces, Virtual Reality},
pubstate = {published},
tppubtype = {inproceedings}
}
Jayaraman, S.; Bhavya, R.; Srihari, V.; Rajam, V. Mary Anita
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions Proceedings Article
In: IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037687-6 (ISBN).
Abstract | Links | BibTeX | Tags: Adversarial networks, Computer simulation languages, Deep learning, Depth Estimation, Depth perception, Diffusion Model, diffusion models, Digital elevation model, Generative adversarial networks, Generative model, Generative systems, Language Model, Motion capture, Stereo image processing, Text-to-image, Training data, Video analysis, Video-clips, Virtual environments, Virtual Reality
@inproceedings{jayaraman_texavi_2024,
title = {TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions},
author = {S. Jayaraman and R. Bhavya and V. Srihari and V. Mary Anita Rajam},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85215265234&doi=10.1109%2fCVMI61877.2024.10782691&partnerID=40&md5=8e20576af67b917ecfad83873a87ef29},
doi = {10.1109/CVMI61877.2024.10782691},
isbn = {979-835037687-6 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
abstract = {While generative models such as text-to-image, large language models and text-to-video have seen significant progress, the extension to text-to-virtual-reality remains largely unexplored, due to a deficit in training data and the complexity of achieving realistic depth and motion in virtual environments. This paper proposes an approach to coalesce existing generative systems to form a stereoscopic virtual reality video from text. Carried out in three main stages, we start with a base text-to-image model that captures context from an input text. We then employ Stable Diffusion on the rudimentary image produced, to generate frames with enhanced realism and overall quality. These frames are processed with depth estimation algorithms to create left-eye and right-eye views, which are stitched side-by-side to create an immersive viewing experience. Such systems would be highly beneficial in virtual reality production, since filming and scene building often require extensive hours of work and post-production effort. We utilize image evaluation techniques, specifically Fréchet Inception Distance and CLIP Score, to assess the visual quality of frames produced for the video. These quantitative measures establish the proficiency of the proposed method. Our work highlights the exciting possibilities of using natural language-driven graphics in fields like virtual reality simulations. © 2024 IEEE.},
keywords = {Adversarial networks, Computer simulation languages, Deep learning, Depth Estimation, Depth perception, Diffusion Model, diffusion models, Digital elevation model, Generative adversarial networks, Generative model, Generative systems, Language Model, Motion capture, Stereo image processing, Text-to-image, Training data, Video analysis, Video-clips, Virtual environments, Virtual Reality},
pubstate = {published},
tppubtype = {inproceedings}
}
Upadhyay, A.; Dubey, A.; Bhardwaj, N.; Kuriakose, S. M.; Mohan, R.
CIGMA: Automated 3D House Layout Generation through Generative Models Proceedings Article
In: ACM Int. Conf. Proc. Ser., pp. 542–546, Association for Computing Machinery, 2024, ISBN: 979-840071634-8 (ISBN).
Abstract | Links | BibTeX | Tags: 3d house, 3D House Layout, 3D modeling, Floor Plan, Floorplans, Floors, Generative AI, Generative model, Houses, Large datasets, Layout designs, Layout generations, Metaverses, Textures, User constraints, Wall design
@inproceedings{upadhyay_cigma_2024,
title = {CIGMA: Automated 3D House Layout Generation through Generative Models},
author = {A. Upadhyay and A. Dubey and N. Bhardwaj and S. M. Kuriakose and R. Mohan},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85183577885&doi=10.1145%2f3632410.3632490&partnerID=40&md5=cf0c249faf0ce03590010426e0f6c1e0},
doi = {10.1145/3632410.3632490},
isbn = {979-840071634-8 (ISBN)},
year = {2024},
date = {2024-01-01},
booktitle = {ACM Int. Conf. Proc. Ser.},
pages = {542–546},
publisher = {Association for Computing Machinery},
abstract = {In this work, we introduce CIGMA, a metaverse platform that empowers designers to generate multiple house layout designs using generative models. We propose a generative adversarial network that synthesizes 2D layouts guided by user constraints. Our platform generates 3D views of house layouts and provides users with the ability to customize the 3D house model by generating furniture items and applying various textures for personalized floor and wall designs. We evaluate our approach on a large-scale dataset, RPLAN, consisting of 80,000 real floor plans from residential buildings. The qualitative and quantitative evaluations demonstrate the effectiveness of our approach over the existing baselines. The demo is accessible at https://youtu.be/lgb_V-yZ5lw. © 2024 Owner/Author.},
keywords = {3d house, 3D House Layout, 3D modeling, Floor Plan, Floorplans, Floors, Generative AI, Generative model, Houses, Large datasets, Layout designs, Layout generations, Metaverses, Textures, User constraints, Wall design},
pubstate = {published},
tppubtype = {inproceedings}
}
Schmidt, P.; Arlt, S.; Ruiz-Gonzalez, C.; Gu, X.; Rodríguez, C.; Krenn, M.
Virtual reality for understanding artificial-intelligence-driven scientific discovery with an application in quantum optics Journal Article
In: Machine Learning: Science and Technology, vol. 5, no. 3, 2024, ISSN: 26322153 (ISSN).
Abstract | Links | BibTeX | Tags: 3-dimensional, Analysis process, Digital discovery, Generative adversarial networks, Generative model, generative models, Human capability, Immersive virtual reality, Intelligence models, Quantum entanglement, Quantum optics, Scientific discovery, Scientific understanding, Virtual Reality, Virtual-reality environment
@article{schmidt_virtual_2024,
title = {Virtual reality for understanding artificial-intelligence-driven scientific discovery with an application in quantum optics},
author = {P. Schmidt and S. Arlt and C. Ruiz-Gonzalez and X. Gu and C. Rodríguez and M. Krenn},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85201265211&doi=10.1088%2f2632-2153%2fad5fdb&partnerID=40&md5=3a6af280ba0ac81507ade10f5dd1efb3},
doi = {10.1088/2632-2153/ad5fdb},
issn = {26322153 (ISSN)},
year = {2024},
date = {2024-01-01},
journal = {Machine Learning: Science and Technology},
volume = {5},
number = {3},
abstract = {Generative Artificial Intelligence (AI) models can propose solutions to scientific problems beyond human capability. To truly make conceptual contributions, researchers need to be capable of understanding the AI-generated structures and extracting the underlying concepts and ideas. When algorithms provide little explanatory reasoning alongside the output, scientists have to reverse-engineer the fundamental insights behind proposals based solely on examples. This task can be challenging as the output is often highly complex and thus not immediately accessible to humans. In this work we show how transferring part of the analysis process into an immersive virtual reality (VR) environment can assist researchers in developing an understanding of AI-generated solutions. We demonstrate the usefulness of VR in finding interpretable configurations of abstract graphs, representing Quantum Optics experiments. Thereby, we can manually discover new generalizations of AI-discoveries as well as new understanding in experimental quantum optics. Furthermore, it allows us to customize the search space in an informed way—as a human-in-the-loop—to achieve significantly faster subsequent discovery iterations. As concrete examples, with this technology, we discover a new resource-efficient 3-dimensional entanglement swapping scheme, as well as a 3-dimensional 4-particle Greenberger-Horne-Zeilinger-state analyzer. Our results show the potential of VR to enhance a researcher’s ability to derive knowledge from graph-based generative AI. This type of AI is a widely used abstract data representation in various scientific fields. © 2024 The Author(s). Published by IOP Publishing Ltd.},
keywords = {3-dimensional, Analysis process, Digital discovery, Generative adversarial networks, Generative model, generative models, Human capability, Immersive virtual reality, Intelligence models, Quantum entanglement, Quantum optics, Scientific discovery, Scientific understanding, Virtual Reality, Virtual-reality environment},
pubstate = {published},
tppubtype = {article}
}
Chamola, V.; Bansal, G.; Das, T. K.; Hassija, V.; Sai, S.; Wang, J.; Zeadally, S.; Hussain, A.; Yu, F. R.; Guizani, M.; Niyato, D.
Beyond Reality: The Pivotal Role of Generative AI in the Metaverse Journal Article
In: IEEE Internet of Things Magazine, vol. 7, no. 4, pp. 126–135, 2024, ISSN: 25763180 (ISSN).
Abstract | Links | BibTeX | Tags: ]+ catalyst, 3D object, Diffusion, Generative adversarial networks, Generative model, Image objects, Immersive, Interconnected network, Metaverses, Physical reality, Video objects, Virtual landscapes, Virtual Reality
@article{chamola_beyond_2024,
title = {Beyond Reality: The Pivotal Role of Generative AI in the Metaverse},
author = {V. Chamola and G. Bansal and T. K. Das and V. Hassija and S. Sai and J. Wang and S. Zeadally and A. Hussain and F. R. Yu and M. Guizani and D. Niyato},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85198004913&doi=10.1109%2fIOTM.001.2300174&partnerID=40&md5=03c679195e42e677de596d7a38df0333},
doi = {10.1109/IOTM.001.2300174},
issn = {25763180 (ISSN)},
year = {2024},
date = {2024-01-01},
journal = {IEEE Internet of Things Magazine},
volume = {7},
number = {4},
pages = {126–135},
abstract = {The Metaverse, an interconnected network of immersive digital realms, is poised to reshape the future by seamlessly merging physical reality with virtual environments. Its potential to revolutionize diverse aspects of human existence, from entertainment to commerce, underscores its significance. At the heart of this transformation lies Generative AI, a branch of artificial intelligence focused on creating novel content. Generative AI serves as a catalyst, propelling the Metaverse's evolution by enhancing it with immersive experiences. The Metaverse is comprised of three pivotal domains, namely, text, visual, and audio. The Metaverse's fabric intertwines with Generative AI models, ushering in innovative interactions. Within Visual, the triad of image, video, and 3D Object generation sets the stage for engaging virtual landscapes. Key to this evolution is five generative models: Transformers, Diffusion, Autoencoders, Autoregressive, and Generative Adversarial Networks (GANs). These models empower the Metaverse, enhancing it with dynamic and diverse content. Notably, technologies like BARD, Point-E, Stable Diffusion, DALL-E, GPT, and AIVA, among others, wield these models to enrich the Metaverse across domains. By discussing the technical issues and real-world applications, this study reveals the intricate tapestry of AI's role in the Metaverse. Anchoring these insights is a case study illuminating Stable Diffusion's role in metamorphosing the virtual realm. Collectively, this exploration illuminates the symbiotic relationship between Generative AI and the Metaverse, foreshadowing a future where immersive, interactive, and personalized experiences blackefine human engagement with digital landscapes. © 2018 IEEE.},
keywords = {]+ catalyst, 3D object, Diffusion, Generative adversarial networks, Generative model, Image objects, Immersive, Interconnected network, Metaverses, Physical reality, Video objects, Virtual landscapes, Virtual Reality},
pubstate = {published},
tppubtype = {article}
}