AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Cruz, T. A. Da; Munoz, O.; Giligny, F.; Gouranton, V.

For a Perception of Monumentality in Eastern Arabia from the Neolithic to the Bronze Age: 3D Reconstruction and Multidimensional Simulations of Monuments and Landscapes Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 47–50, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).

Abstract | Links | BibTeX | Tags: 3D reconstruction, 4D simulations, Archaeological Site, Bronze age, Digital elevation model, Eastern Arabia, Eastern arabium, Monumentality, Multidimensional simulation, Simulation virtual realities, Spatial dimension, Temporal dimensions, Three dimensional computer graphics, Virtual Reality

@inproceedings{da_cruz_for_2025,

title = {For a Perception of Monumentality in Eastern Arabia from the Neolithic to the Bronze Age: 3D Reconstruction and Multidimensional Simulations of Monuments and Landscapes},

author = {T. A. Da Cruz and O. Munoz and F. Giligny and V. Gouranton},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005139996&doi=10.1109%2fVRW66409.2025.00018&partnerID=40&md5=14e05ff7019a4c9d712fe42aef776c8d},

doi = {10.1109/VRW66409.2025.00018},

isbn = {979-833151484-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},

pages = {47–50},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {The monumentality of Neolithic and Early Bronze Age (6th to 3rd millennium BC) structures in the Arabian Peninsula has never been approached through a comprehensive approach of simulations and reconstructions. As a result, its perception remains understudied. By combining archaeological and paleoenvironmental data, 3D reconstruction, 4D simulations, virtual reality and generative AI, this PhD research project proposes to analyse the perception of monuments, exploring their spatial, visual and temporal dimensions, in order to answer to the following question: how can we reconstruct and analyse the perception of monumentality in Eastern Arabia through 4D simulations, and how can the study of this perception influence our understanding of monumentality and territories?This article presents a work in progress, after three months of research of which one month on the Dhabtiyah archaeological site (Saudi Arabia, Eastern Province). © 2025 IEEE.},

keywords = {3D reconstruction, 4D simulations, Archaeological Site, Bronze age, Digital elevation model, Eastern Arabia, Eastern arabium, Monumentality, Multidimensional simulation, Simulation virtual realities, Spatial dimension, Temporal dimensions, Three dimensional computer graphics, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Li, Z.; Zhang, H.; Peng, C.; Peiris, R.

Exploring Large Language Model-Driven Agents for Environment-Aware Spatial Interactions and Conversations in Virtual Reality Role-Play Scenarios Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR, pp. 1–11, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833153645-9 (ISBN).

Abstract | Links | BibTeX | Tags: Chatbots, Computer simulation languages, Context- awareness, context-awareness, Digital elevation model, Generative AI, Human-AI Interaction, Language Model, Large language model, large language models, Model agents, Role-play simulation, role-play simulations, Role-plays, Spatial interaction, Virtual environments, Virtual Reality, Virtual-reality environment

@inproceedings{li_exploring_2025,

title = {Exploring Large Language Model-Driven Agents for Environment-Aware Spatial Interactions and Conversations in Virtual Reality Role-Play Scenarios},

author = {Z. Li and H. Zhang and C. Peng and R. Peiris},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105002706893&doi=10.1109%2fVR59515.2025.00025&partnerID=40&md5=60f22109e054c9035a0c2210bb797039},

doi = {10.1109/VR59515.2025.00025},

isbn = {979-833153645-9 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR},

pages = {1–11},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Recent research has begun adopting Large Language Model (LLM) agents to enhance Virtual Reality (VR) interactions, creating immersive chatbot experiences. However, while current studies focus on generating dialogue from user speech inputs, their abilities to generate richer experiences based on the perception of LLM agents' VR environments and interaction cues remain unexplored. Hence, in this work, we propose an approach that enables LLM agents to perceive virtual environments and generate environment-aware interactions and conversations for an embodied human-AI interaction experience in VR environments. Here, we define a schema for describing VR environments and their interactions through text prompts. We evaluate the performance of our method through five role-play scenarios created using our approach in a study with 14 participants. The findings discuss the opportunities and challenges of our proposed approach for developing environment-aware LLM agents that facilitate spatial interactions and conversations within VR role-play scenarios. © 2025 IEEE.},

keywords = {Chatbots, Computer simulation languages, Context- awareness, context-awareness, Digital elevation model, Generative AI, Human-AI Interaction, Language Model, Large language model, large language models, Model agents, Role-play simulation, role-play simulations, Role-plays, Spatial interaction, Virtual environments, Virtual Reality, Virtual-reality environment},

pubstate = {published},

tppubtype = {inproceedings}

}

Pielage, L.; Schmidle, P.; Marschall, B.; Risse, B.

Interactive High-Quality Skin Lesion Generation using Diffusion Models for VR-based Dermatological Education Proceedings Article

In: Int Conf Intell User Interfaces Proc IUI, pp. 878–897, Association for Computing Machinery, 2025, ISBN: 979-840071306-4 (ISBN).

Abstract | Links | BibTeX | Tags: Deep learning, Dermatology, Diffusion Model, diffusion models, Digital elevation model, Generative AI, Graphical user interfaces, Guidance Strategies, Guidance strategy, Image generation, Image generations, Inpainting, Interactive Generation, Medical education, Medical Imaging, Simulation training, Skin lesion, Upsampling, Virtual environments, Virtual Reality

@inproceedings{pielage_interactive_2025,

title = {Interactive High-Quality Skin Lesion Generation using Diffusion Models for VR-based Dermatological Education},

author = {L. Pielage and P. Schmidle and B. Marschall and B. Risse},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001923208&doi=10.1145%2f3708359.3712101&partnerID=40&md5=639eec55b08a54ce813f7c1016c621e7},

doi = {10.1145/3708359.3712101},

isbn = {979-840071306-4 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Int Conf Intell User Interfaces Proc IUI},

pages = {878–897},

publisher = {Association for Computing Machinery},

abstract = {Malignant melanoma is one of the most lethal forms of cancer when not detected early. As a result, cancer screening programs have been implemented internationally, all of which require visual inspection of skin lesions. Early melanoma detection is a crucial competence in medical and dermatological education, and it is primarily trained using 2D imagery. However, given the intrinsic 3D nature of skin lesions and the importance of incorporating additional contextual information about the patient (e.g., skin type, nearby lesions, etc.), this approach falls short of providing a comprehensive and scalable learning experience. A potential solution is the use of Virtual Reality (VR) scenarios, which can offer an effective strategy to train skin cancer screenings in a realistic 3D setting, thereby enhancing medical students' awareness of early melanoma detection. In this paper, we present a comprehensive pipeline and models for generating malignant melanomas and benign nevi, which can be utilized in VR-based medical training. We use diffusion models for the generation of skin lesions, which we have enhanced with various guiding strategies to give educators maximum flexibility in designing scenarios and seamlessly placing lesions on virtual agents. Additionally, we have developed a tool which comprises a graphical user interface (GUI) enabling the generation of new lesions and adapting existing ones using an intuitive and interactive inpainting strategy. The tool also offers a novel custom upsampling strategy to achieve a sufficient resolution required for diagnostic purposes. The generated skin lesions have been validated in a user study with trained dermatologists, confirming the overall high quality of the generated lesions and the utility for educational purposes. © 2025 Copyright held by the owner/author(s).},

keywords = {Deep learning, Dermatology, Diffusion Model, diffusion models, Digital elevation model, Generative AI, Graphical user interfaces, Guidance Strategies, Guidance strategy, Image generation, Image generations, Inpainting, Interactive Generation, Medical education, Medical Imaging, Simulation training, Skin lesion, Upsampling, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Sabir, A.; Hussain, R.; Pedro, A.; Park, C.

Personalized construction safety training system using conversational AI in virtual reality Journal Article

In: Automation in Construction, vol. 175, 2025, ISSN: 09265805 (ISSN).

Abstract | Links | BibTeX | Tags: Construction safety, Construction safety training, Conversational AI, Digital elevation model, Helmet mounted displays, Language Model, Large language model, large language models, Personalized safety training, Personnel training, Safety training, Training Systems, Virtual environments, Virtual Reality, Workers'

Guo, P.; Zhang, Q.; Tian, C.; Xue, W.; Feng, X.

Digital Human Techniques for Education Reform Proceedings Article

In: ICETM - Proc. Int. Conf. Educ. Technol. Manag., pp. 173–178, Association for Computing Machinery, Inc, 2025, ISBN: 979-840071746-8 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented Reality, Contrastive Learning, Digital elevation model, Digital human technique, Digital Human Techniques, Digital humans, Education Reform, Education reforms, Educational Technology, Express emotions, Federated learning, Human behaviors, Human form models, Human techniques, Immersive, Innovative technology, Modeling languages, Natural language processing systems, Teachers', Teaching, Virtual environments, Virtual humans

@inproceedings{guo_digital_2025,

title = {Digital Human Techniques for Education Reform},

author = {P. Guo and Q. Zhang and C. Tian and W. Xue and X. Feng},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001671326&doi=10.1145%2f3711403.3711428&partnerID=40&md5=dd96647315af9409d119f68f9cf4e980},

doi = {10.1145/3711403.3711428},

isbn = {979-840071746-8 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {ICETM - Proc. Int. Conf. Educ. Technol. Manag.},

pages = {173–178},

publisher = {Association for Computing Machinery, Inc},

abstract = {The rapid evolution of artificial intelligence, big data, and generative AI models has ushered in significant transformations across various sectors, including education. Digital Human Technique, an innovative technology grounded in advanced computer science and artificial intelligence, is reshaping educational paradigms by enabling virtual humans to simulate human behavior, express emotions, and interact with users. This paper explores the application of Digital Human Technique in education reform, focusing on creating immersive, intelligent classroom experiences that foster meaningful interactions between teachers and students. We define Digital Human Technique and delve into its key technical components such as character modeling and rendering, natural language processing, computer vision, and augmented reality technologies. Our methodology involves analyzing the role of educational digital humans created through these technologies, assessing their impact on educational processes, and examining various application scenarios in educational reform. Results indicate that Digital Human Technique significantly enhances the learning experience by enabling personalized teaching, increasing engagement, and fostering emotional connections. Educational digital humans serve as virtual teachers, interactive learning aids, and facilitators of emotional interaction, effectively addressing the challenges of traditional educational methods. They also promote a deeper understanding of complex concepts through simulated environments and interactive digital content. © 2024 Copyright held by the owner/author(s).},

keywords = {Augmented Reality, Contrastive Learning, Digital elevation model, Digital human technique, Digital Human Techniques, Digital humans, Education Reform, Education reforms, Educational Technology, Express emotions, Federated learning, Human behaviors, Human form models, Human techniques, Immersive, Innovative technology, Modeling languages, Natural language processing systems, Teachers', Teaching, Virtual environments, Virtual humans},

pubstate = {published},

tppubtype = {inproceedings}

}

Mao, H.; Xu, Z.; Wei, S.; Quan, Y.; Deng, N.; Yang, X.

LLM-powered Gaussian Splatting in VR interactions Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 1654–1655, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).

Abstract | Links | BibTeX | Tags: 3D Gaussian Splatting, 3D reconstruction, Content creation, Digital elevation model, Gaussians, High quality, Language Model, material analysis, Materials analysis, Physical simulation, Quality rendering, Rendering (computer graphics), Splatting, Virtual Reality, Volume Rendering, VR systems

@inproceedings{mao_llm-powered_2025,

title = {LLM-powered Gaussian Splatting in VR interactions},

author = {H. Mao and Z. Xu and S. Wei and Y. Quan and N. Deng and X. Yang},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005148017&doi=10.1109%2fVRW66409.2025.00472&partnerID=40&md5=ee725f655a37251ff335ad2098d15f22},

doi = {10.1109/VRW66409.2025.00472},

isbn = {979-833151484-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},

pages = {1654–1655},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Recent advances in radiance field rendering, particularly 3D Gaussian Splatting (3DGS), have demonstrated significant potential for VR content creation, offering both high-quality rendering and an efficient production pipeline. However, current physics-based interaction systems for 3DGS are limited to either simplistic, unrealistic simulations or require substantial user input for complex scenes, largely due to the lack of scene comprehension. In this demonstration, we present a highly realistic interactive VR system powered by large language models (LLMs). After object-aware GS reconstruction, we prompt GPT-4o to analyze the physical properties of objects in the scene, which then guide physical simulations that adhere to real-world phenomena. Additionally, We design a GPT-assisted GS inpainting module to complete the areas occluded by manipulated objects. To facilitate rich interaction, we introduce a computationally efficient physical simulation framework through a PBD-based unified interpolation method, which supports various forms of physical interactions. In our research demonstrations, we reconstruct varieties of scenes enhanced by LLM's understanding, showcasing how our VR system can support complex, realistic interactions without additional manual design or annotation. © 2025 IEEE.},

keywords = {3D Gaussian Splatting, 3D reconstruction, Content creation, Digital elevation model, Gaussians, High quality, Language Model, material analysis, Materials analysis, Physical simulation, Quality rendering, Rendering (computer graphics), Splatting, Virtual Reality, Volume Rendering, VR systems},

pubstate = {published},

tppubtype = {inproceedings}

}

Zhou, J.; Weber, R.; Wen, E.; Lottridge, D.

Real-Time Full-body Interaction with AI Dance Models: Responsiveness to Contemporary Dance Proceedings Article

In: Int Conf Intell User Interfaces Proc IUI, pp. 1177–1187, Association for Computing Machinery, 2025, ISBN: 979-840071306-4 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, Chatbots, Computer interaction, Deep learning, Deep-Learning Dance Model, Design of Human-Computer Interaction, Digital elevation model, Generative AI, Input output programs, Input sequence, Interactivity, Motion capture, Motion tracking, Movement analysis, Output sequences, Problem oriented languages, Real- time, Text mining, Three dimensional computer graphics, User input, Virtual environments, Virtual Reality

@inproceedings{zhou_real-time_2025,

title = {Real-Time Full-body Interaction with AI Dance Models: Responsiveness to Contemporary Dance},

author = {J. Zhou and R. Weber and E. Wen and D. Lottridge},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001922427&doi=10.1145%2f3708359.3712077&partnerID=40&md5=cea9213198220480b80b7a4840d26ccc},

doi = {10.1145/3708359.3712077},

isbn = {979-840071306-4 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Int Conf Intell User Interfaces Proc IUI},

pages = {1177–1187},

publisher = {Association for Computing Machinery},

abstract = {Interactive AI chatbots put the power of Large-Language Models (LLMs) into people's hands; it is this interactivity that fueled explosive worldwide influence. In the generative dance space, however, there are few deep-learning-based generative dance models built with interactivity in mind. The release of the AIST++ dance dataset in 2021 led to an uptick of capabilities in generative dance models. Whether these models could be adapted to support interactivity and how well this approach will work is not known. In this study, we explore the capabilities of existing generative dance models for motion-to-motion synthesis on real-time, full-body motion-captured contemporary dance data. We identify an existing model that we adapted to support interactivity: the Bailando++ model, which is trained on the AIST++ dataset and was modified to take music and a motion sequence as input parameters in an interactive loop. We worked with two professional contemporary choreographers and dancers to record and curate a diverse set of 203 motion-captured dance sequences as a set of "user inputs"captured through the Optitrack high-precision motion capture 3D tracking system. We extracted 17 quantitative movement features from the motion data using the well-established Laban Movement Analysis theory, which allowed for quantitative comparisons of inter-movement correlations, which we used for clustering input data and comparing input and output sequences. A total of 10 pieces of music were used to generate a variety of outputs using the adapted Bailando++ model. We found that, on average, the generated output motion achieved only moderate correlations to the user input, with some exceptions of movement and music pairs achieving high correlation. The high-correlation generated output sequences were deemed responsive and relevant co-creations in relation to the input sequences. We discuss implications for interactive generative dance agents, where the use of 3D joint coordinate data should be used over SMPL parameters for ease of real-time generation, and how the use of Laban Movement Analysis could be used to extract useful features and fine-tune deep-learning models. © 2025 Copyright held by the owner/author(s).},

keywords = {3D modeling, Chatbots, Computer interaction, Deep learning, Deep-Learning Dance Model, Design of Human-Computer Interaction, Digital elevation model, Generative AI, Input output programs, Input sequence, Interactivity, Motion capture, Motion tracking, Movement analysis, Output sequences, Problem oriented languages, Real- time, Text mining, Three dimensional computer graphics, User input, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Interactive AI chatbots put the power of Large-Language Models (LLMs) into people's hands; it is this interactivity that fueled explosive worldwide influence. In the generative dance space, however, there are few deep-learning-based generative dance models built with interactivity in mind. The release of the AIST++ dance dataset in 2021 led to an uptick of capabilities in generative dance models. Whether these models could be adapted to support interactivity and how well this approach will work is not known. In this study, we explore the capabilities of existing generative dance models for motion-to-motion synthesis on real-time, full-body motion-captured contemporary dance data. We identify an existing model that we adapted to support interactivity: the Bailando++ model, which is trained on the AIST++ dataset and was modified to take music and a motion sequence as input parameters in an interactive loop. We worked with two professional contemporary choreographers and dancers to record and curate a diverse set of 203 motion-captured dance sequences as a set of "user inputs"captured through the Optitrack high-precision motion capture 3D tracking system. We extracted 17 quantitative movement features from the motion data using the well-established Laban Movement Analysis theory, which allowed for quantitative comparisons of inter-movement correlations, which we used for clustering input data and comparing input and output sequences. A total of 10 pieces of music were used to generate a variety of outputs using the adapted Bailando++ model. We found that, on average, the generated output motion achieved only moderate correlations to the user input, with some exceptions of movement and music pairs achieving high correlation. The high-correlation generated output sequences were deemed responsive and relevant co-creations in relation to the input sequences. We discuss implications for interactive generative dance agents, where the use of 3D joint coordinate data should be used over SMPL parameters for ease of real-time generation, and how the use of Laban Movement Analysis could be used to extract useful features and fine-tune deep-learning models. © 2025 Copyright held by the owner/author(s).

2024

Taheri, M.; Tan, K.

Enhancing Presentation Skills: A Virtual Reality-Based Simulator with Integrated Generative AI for Dynamic Pitch Presentations and Interviews Proceedings Article

In: L.T., De Paolis; P., Arpaia; M., Sacco (Ed.): Lect. Notes Comput. Sci., pp. 360–366, Springer Science and Business Media Deutschland GmbH, 2024, ISBN: 03029743 (ISSN); 978-303171706-2 (ISBN).

Abstract | Links | BibTeX | Tags: Adversarial machine learning, AI feedback, Contrastive Learning, Digital elevation model, Dynamic pitch, Federated learning, feedback, Generative adversarial networks, Iterative practice, Language Model, Open source language, Open source software, Presentation skills, Simulation Design, Spoken words, Trial and error, Virtual environments, Virtual reality based simulators

@inproceedings{taheri_enhancing_2024,

title = {Enhancing Presentation Skills: A Virtual Reality-Based Simulator with Integrated Generative AI for Dynamic Pitch Presentations and Interviews},

author = {M. Taheri and K. Tan},

editor = {De Paolis L.T. and Arpaia P. and Sacco M.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85204618832&doi=10.1007%2f978-3-031-71707-9_30&partnerID=40&md5=fd649ec5c0e2ce96593fe8a129e94449},

doi = {10.1007/978-3-031-71707-9_30},

isbn = {03029743 (ISSN); 978-303171706-2 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15027 LNCS},

pages = {360–366},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Presenting before an audience presents challenges throughout preparation and delivery, necessitating tools to securely refine skills securely. Interviews mirror presentations, showcasing oneself to convey qualifications. Virtual environments offer safe spaces for trial and error, crucial for iterative practice without emotional distress. This research proposes a Virtual Reality-Based Dynamic Pitch Simulation with Integrated Generative AI to effectively enhance presentation skills. The simulation converts spoken words to text, then uses AI to generate relevant questions for practice. Benefits include realistic feedback and adaptability to user proficiency. Open-source language models evaluate content, coherence, and delivery, offering personalized challenges. This approach supplements learning, enhancing presentation skills effectively. Voice-to-text conversion and AI feedback create a potent pedagogical tool, fostering a prompt feedback loop vital for learning effectiveness. Challenges in simulation design must be addressed for robustness and efficacy. The study validates these concepts by proposing a real-time 3D dialogue simulator, emphasizing the importance of continual improvement in presentation skill development. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.},

keywords = {Adversarial machine learning, AI feedback, Contrastive Learning, Digital elevation model, Dynamic pitch, Federated learning, feedback, Generative adversarial networks, Iterative practice, Language Model, Open source language, Open source software, Presentation skills, Simulation Design, Spoken words, Trial and error, Virtual environments, Virtual reality based simulators},

pubstate = {published},

tppubtype = {inproceedings}

}

Hart, A.; Shakir, M. Z.

Realtime AI Driven Environment Development for Virtual Metaverse Proceedings Article

In: IEEE Int. Conf. Metrol. Ext. Real., Artif. Intell. Neural Eng., MetroXRAINE - Proc., pp. 313–318, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037800-9 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, 3D models, 3d-modeling, AI in Metaverse Development, Artificial intelligence in metaverse development, Digital elevation model, Digital Innovation, Digital innovations, Metaverses, Real- time, Real-Time Adaptation, Scalable virtual world, Scalable Virtual Worlds, Unity Integration, Virtual environments, Virtual worlds

Jayaraman, S.; Bhavya, R.; Srihari, V.; Rajam, V. Mary Anita

TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions Proceedings Article

In: IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037687-6 (ISBN).

Abstract | Links | BibTeX | Tags: Adversarial networks, Computer simulation languages, Deep learning, Depth Estimation, Depth perception, Diffusion Model, diffusion models, Digital elevation model, Generative adversarial networks, Generative model, Generative systems, Language Model, Motion capture, Stereo image processing, Text-to-image, Training data, Video analysis, Video-clips, Virtual environments, Virtual Reality

@inproceedings{jayaraman_texavi_2024,

title = {TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions},

author = {S. Jayaraman and R. Bhavya and V. Srihari and V. Mary Anita Rajam},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85215265234&doi=10.1109%2fCVMI61877.2024.10782691&partnerID=40&md5=8e20576af67b917ecfad83873a87ef29},

doi = {10.1109/CVMI61877.2024.10782691},

isbn = {979-835037687-6 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {While generative models such as text-to-image, large language models and text-to-video have seen significant progress, the extension to text-to-virtual-reality remains largely unexplored, due to a deficit in training data and the complexity of achieving realistic depth and motion in virtual environments. This paper proposes an approach to coalesce existing generative systems to form a stereoscopic virtual reality video from text. Carried out in three main stages, we start with a base text-to-image model that captures context from an input text. We then employ Stable Diffusion on the rudimentary image produced, to generate frames with enhanced realism and overall quality. These frames are processed with depth estimation algorithms to create left-eye and right-eye views, which are stitched side-by-side to create an immersive viewing experience. Such systems would be highly beneficial in virtual reality production, since filming and scene building often require extensive hours of work and post-production effort. We utilize image evaluation techniques, specifically Fréchet Inception Distance and CLIP Score, to assess the visual quality of frames produced for the video. These quantitative measures establish the proficiency of the proposed method. Our work highlights the exciting possibilities of using natural language-driven graphics in fields like virtual reality simulations. © 2024 IEEE.},

keywords = {Adversarial networks, Computer simulation languages, Deep learning, Depth Estimation, Depth perception, Diffusion Model, diffusion models, Digital elevation model, Generative adversarial networks, Generative model, Generative systems, Language Model, Motion capture, Stereo image processing, Text-to-image, Training data, Video analysis, Video-clips, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Geetha, S.; Aditya, G.; Reddy, M. Chetan; Nischith, G.

Human Interaction in Virtual and Mixed Reality Through Hand Tracking Proceedings Article

In: Proc. CONECCT - IEEE Int. Conf. Electron., Comput. Commun. Technol., Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835038592-2 (ISBN).

Abstract | Links | BibTeX | Tags: Computer interaction, Computer simulation languages, Daily lives, Digital elevation model, Hand gesture, hand tracking, Hand-tracking, human-computer interaction, Humaninteraction, Interaction dynamics, Mixed reality, Unity, User friendly interface, User interfaces, Virtual environments, Virtual Reality, Virtual spaces

@inproceedings{geetha_human_2024,

title = {Human Interaction in Virtual and Mixed Reality Through Hand Tracking},

author = {S. Geetha and G. Aditya and M. Chetan Reddy and G. Nischith},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85205768661&doi=10.1109%2fCONECCT62155.2024.10677239&partnerID=40&md5=173e590ca9a1e30b760d05af562f311a},

doi = {10.1109/CONECCT62155.2024.10677239},

isbn = {979-835038592-2 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. CONECCT - IEEE Int. Conf. Electron., Comput. Commun. Technol.},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This paper explores the potential and possibilities of hand tracking in virtual reality (VR) and mixed reality (MR), focusing on its role in human interaction dynamics. An application was designed in Unity leveraging the XR Interaction toolkit, within which various items across three important domains: daily life, education, and recreation, were crafted to demonstrate the versatility of hand tracking along with hand gesture-based shortcuts for interaction. Integration of elements in MR ensures that users can seamlessly enjoy virtual experiences while remaining connected to their physical surroundings. Precise hand tracking enables effortless interaction with the virtual space, enhancing presence and control with a user-friendly interface. Additionally, the paper explores the effectiveness of integrating hand tracking into education and training scenarios. A computer assembly simulation was created to demonstrate this, featuring component inspection and zoom capabilities along with a large language model (LLM) integrated with hand gestures to provide for interaction capabilities. © 2024 IEEE.},

keywords = {Computer interaction, Computer simulation languages, Daily lives, Digital elevation model, Hand gesture, hand tracking, Hand-tracking, human-computer interaction, Humaninteraction, Interaction dynamics, Mixed reality, Unity, User friendly interface, User interfaces, Virtual environments, Virtual Reality, Virtual spaces},

pubstate = {published},

tppubtype = {inproceedings}

}

Min, Y.; Jeong, J. -W.

Public Speaking Q&A Practice with LLM-Generated Personas in Virtual Reality Proceedings Article

In: U., Eck; M., Sra; J., Stefanucci; M., Sugimoto; M., Tatzgern; I., Williams (Ed.): Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct, pp. 493–496, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-833150691-9 (ISBN).

Abstract | Links | BibTeX | Tags: Digital elevation model, Economic and social effects, Language Model, Large language model-based persona generation, LLM-based Persona Generation, Model-based OPC, Personnel training, Power, Practice systems, Presentation Anxiety, Public speaking, Q&A practice, user experience, Users' experiences, Virtual environments, Virtual Reality, VR training

@inproceedings{min_public_2024,

title = {Public Speaking Q&A Practice with LLM-Generated Personas in Virtual Reality},

author = {Y. Min and J. -W. Jeong},

editor = {Eck U. and Sra M. and Stefanucci J. and Sugimoto M. and Tatzgern M. and Williams I.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85214393734&doi=10.1109%2fISMAR-Adjunct64951.2024.00143&partnerID=40&md5=992d9599bde26f9d57d549639869d124},

doi = {10.1109/ISMAR-Adjunct64951.2024.00143},

isbn = {979-833150691-9 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct},

pages = {493–496},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This paper introduces a novel VR-based Q&A practice system that harnesses the power of Large Language Models (LLMs). We support Q&A practice for upcoming public speaking by providing an immersive VR training environment populated with LLM-generated audiences, each capable of posing diverse and realistic questions based on different personas. We conducted a pilot user study involving 20 participants who engaged in VR-based Q&A practice sessions. The sessions featured a variety of questions regarding presentation material provided by the participants, all of which were generated by LLM-based personas. Through post-surveys and interviews, we evaluated the effectiveness of the proposed method. The participants valued the system for engagement and focus while also identifying several areas for improvement. Our study demonstrated the potential of integrating VR and LLMs to create a powerful, immersive tool for Q&A practice. © 2024 IEEE.},

keywords = {Digital elevation model, Economic and social effects, Language Model, Large language model-based persona generation, LLM-based Persona Generation, Model-based OPC, Personnel training, Power, Practice systems, Presentation Anxiety, Public speaking, Q&A practice, user experience, Users' experiences, Virtual environments, Virtual Reality, VR training},

pubstate = {published},

tppubtype = {inproceedings}

}

Jia, Y.; Sin, Z. P. T.; Wang, X. E.; Li, C.; Ng, P. H. F.; Huang, X.; Dong, J.; Wang, Y.; Baciu, G.; Cao, J.; Li, Q.

NivTA: Towards a Naturally Interactable Edu-Metaverse Teaching Assistant for CAVE Proceedings Article

In: Proc. - IEEE Int. Conf. Metaverse Comput., Netw., Appl., MetaCom, pp. 57–64, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-833151599-7 (ISBN).

Abstract | Links | BibTeX | Tags: Active learning, Adversarial machine learning, cave automatic virtual environment, Cave automatic virtual environments, Caves, Chatbots, Contrastive Learning, Digital elevation model, Federated learning, Interactive education, Language Model, Large language model agent, Learning Activity, LLM agents, Metaverses, Model agents, Natural user interface, Students, Teaching, Teaching assistants, Virtual environments, Virtual Reality, virtual teaching assistant, Virtual teaching assistants

@inproceedings{jia_nivta_2024,

title = {NivTA: Towards a Naturally Interactable Edu-Metaverse Teaching Assistant for CAVE},

author = {Y. Jia and Z. P. T. Sin and X. E. Wang and C. Li and P. H. F. Ng and X. Huang and J. Dong and Y. Wang and G. Baciu and J. Cao and Q. Li},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211447638&doi=10.1109%2fMetaCom62920.2024.00023&partnerID=40&md5=efefd453c426e74705518254bdc49e87},

doi = {10.1109/MetaCom62920.2024.00023},

isbn = {979-833151599-7 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Int. Conf. Metaverse Comput., Netw., Appl., MetaCom},

pages = {57–64},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Edu-metaverse is a specialized metaverse dedicated for interactive education in an immersive environment. Its main purpose is to immerse the learners in a digital environment and conduct learning activities that could mirror reality. Not only does it enable activities that may be difficult to perform in the real world, but it also extends the interaction to personalized and CL. This is a more effective pedagogical approach as it tends to enhance the motivation and engagement of students and it increases their active participation in lessons delivered. To this extend, we propose to realize an interactive virtual teaching assistant called NivTA. To make NivTA easily accessible and engaging by multiple users simultaneously, we also propose to use a CAVE virtual environment (CAVE-VR) as a "metaverse window"into concepts, ideas, topics, and learning activities. The students simply need to step into the CAVE-VR and interact with a life-size teaching assistant that they can engage with naturally, as if they are approaching a real person. Instead of text-based interaction currently developed for large language models (LLM), NivTA is given additional cues regarding the users so it can react more naturally via a specific prompt design. For example, the user can simply point to an educational concept and ask NivTA to explain what it is. To guide NivTA onto the educational concept, the prompt is also designed to feed in an educational KG to provide NivTA with the context of the student's question. The NivTA system is an integration of several components that are discussed in this paper. We further describe how the system is designed and implemented, along with potential applications and future work on interactive collaborative edu-metaverse environments dedicated for teaching and learning. © 2024 IEEE.},

keywords = {Active learning, Adversarial machine learning, cave automatic virtual environment, Cave automatic virtual environments, Caves, Chatbots, Contrastive Learning, Digital elevation model, Federated learning, Interactive education, Language Model, Large language model agent, Learning Activity, LLM agents, Metaverses, Model agents, Natural user interface, Students, Teaching, Teaching assistants, Virtual environments, Virtual Reality, virtual teaching assistant, Virtual teaching assistants},

pubstate = {published},

tppubtype = {inproceedings}

}

Shabanijou, M.; Sharma, V.; Ray, S.; Lu, R.; Xiong, P.

Large Language Model Empowered Spatio-Visual Queries for Extended Reality Environments Proceedings Article

In: W., Ding; C.-T., Lu; F., Wang; L., Di; K., Wu; J., Huan; R., Nambiar; J., Li; F., Ilievski; R., Baeza-Yates; X., Hu (Ed.): Proc. - IEEE Int. Conf. Big Data, BigData, pp. 5843–5846, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835036248-0 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, Digital elevation model, Emerging applications, Immersive environment, Language Model, Metaverses, Modeling languages, Natural language interfaces, Query languages, spatial data, Spatial queries, Structured Query Language, Technological advances, Users perspective, Virtual environments, Visual languages, Visual query

@inproceedings{shabanijou_large_2024,

title = {Large Language Model Empowered Spatio-Visual Queries for Extended Reality Environments},

author = {M. Shabanijou and V. Sharma and S. Ray and R. Lu and P. Xiong},

editor = {Ding W. and Lu C.-T. and Wang F. and Di L. and Wu K. and Huan J. and Nambiar R. and Li J. and Ilievski F. and Baeza-Yates R. and Hu X.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85218011140&doi=10.1109%2fBigData62323.2024.10825084&partnerID=40&md5=fdd78814b8e19830d1b8ecd4b33b0102},

doi = {10.1109/BigData62323.2024.10825084},

isbn = {979-835036248-0 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Int. Conf. Big Data, BigData},

pages = {5843–5846},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {With the technological advances in creation and capture of 3D spatial data, new emerging applications are being developed. Digital Twins, metaverse and extended reality (XR) based immersive environments can be enriched by leveraging geocoded 3D spatial data. Unlike 2D spatial queries, queries involving 3D immersive environments need to take the query user's viewpoint into account. Spatio-visual queries return objects that are visible from the user's perspective.In this paper, we propose enhancing 3D spatio-visual queries with large language models (LLM). These kinds of queries allow a user to interact with the visible objects using a natural language interface. We have implemented a proof-of-concept prototype and conducted preliminary evaluation. Our results demonstrate the potential of truly interactive immersive environments. © 2024 IEEE.},

keywords = {3D modeling, Digital elevation model, Emerging applications, Immersive environment, Language Model, Metaverses, Modeling languages, Natural language interfaces, Query languages, spatial data, Spatial queries, Structured Query Language, Technological advances, Users perspective, Virtual environments, Visual languages, Visual query},

pubstate = {published},

tppubtype = {inproceedings}

}

Si, J.; Yang, S.; Song, J.; Son, S.; Lee, S.; Kim, D.; Kim, S.

Generating and Integrating Diffusion Model-Based Panoramic Views for Virtual Interview Platform Proceedings Article

In: IEEE Int. Conf. Artif. Intell. Eng. Technol., IICAIET, pp. 343–348, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835038969-2 (ISBN).

Abstract | Links | BibTeX | Tags: AI, Deep learning, Diffusion, Diffusion Model, Diffusion technology, Digital elevation model, High quality, Manual process, Model-based OPC, New approaches, Panorama, Panoramic views, Virtual environments, Virtual Interview, Virtual Reality

@inproceedings{si_generating_2024,

title = {Generating and Integrating Diffusion Model-Based Panoramic Views for Virtual Interview Platform},

author = {J. Si and S. Yang and J. Song and S. Son and S. Lee and D. Kim and S. Kim},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209663031&doi=10.1109%2fIICAIET62352.2024.10730450&partnerID=40&md5=a52689715ec912c54696948c34fc0263},

doi = {10.1109/IICAIET62352.2024.10730450},

isbn = {979-835038969-2 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {IEEE Int. Conf. Artif. Intell. Eng. Technol., IICAIET},

pages = {343–348},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This paper presents a new approach to improve virtual interview platforms in education, which are gaining significant attention. This study aims to simplify the complex manual process of equipment setup to enhance the realism and reliability of virtual interviews. To this end, this study proposes a method for automatically constructing 3D virtual interview environments using diffusion technology in generative AI. In this research, we exploit a diffusion model capable of generating high-quality panoramic images. We generate images of interview rooms capable of delivering immersive interview experiences via refined text prompts. The resulting imagery is then reconstituted 3D VR content utilizing the Unity engine, facilitating enhanced interaction and engagement within virtual environments. This research compares and analyzes various methods presented in related research and proposes a new process for efficiently constructing 360-degree virtual environments. When wearing Oculus Quest 2 and experiencing the virtual environment created using the proposed method, a high sense of immersion was experienced, similar to the actual interview environment. © 2024 IEEE.},

keywords = {AI, Deep learning, Diffusion, Diffusion Model, Diffusion technology, Digital elevation model, High quality, Manual process, Model-based OPC, New approaches, Panorama, Panoramic views, Virtual environments, Virtual Interview, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Jiang, H.; Song, L.; Weng, D.; Sun, Z.; Li, H.; Dongye, X.; Zhang, Z.

In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces Proceedings Article

In: MM - Proc. ACM Int. Conf. Multimed., pp. 3666–3675, Association for Computing Machinery, Inc, 2024, ISBN: 979-840070686-8 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, 3D scenes, affordance, Affordances, Chatbots, Computer simulation languages, Digital elevation model, Embodied interfaces, Language Model, Large language model, Physical environments, Scene synthesis, Synthesised, Three dimensional computer graphics, user demand, User demands, Virtual environments, Virtual Reality, Virtual scenes

@inproceedings{jiang_situ_2024,

title = {In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces},

author = {H. Jiang and L. Song and D. Weng and Z. Sun and H. Li and X. Dongye and Z. Zhang},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209812307&doi=10.1145%2f3664647.3681616&partnerID=40&md5=e58acd404c8785868c69a4647cecacb2},

doi = {10.1145/3664647.3681616},

isbn = {979-840070686-8 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {MM - Proc. ACM Int. Conf. Multimed.},

pages = {3666–3675},

publisher = {Association for Computing Machinery, Inc},

abstract = {Virtual reality enables us to access and interact with immersive virtual environments anytime and anywhere in various fields such as entertainment, training, and education. However, users immersed in virtual scenes remain physically connected to their real-world surroundings, which can pose safety and immersion challenges. Although virtual scene synthesis has attracted widespread attention, many popular methods are limited to generating purely virtual scenes independent of physical environments or simply mapping physical objects as obstacles. To this end, we propose a scene agent that synthesizes situated 3D virtual scenes as a kind of ubiquitous embodied interface in VR for users. The scene agent synthesizes scenes by perceiving the user's physical environment as well as inferring the user's demands. The synthesized scenes maintain the affordances of the physical environment, enabling immersive users to interact with the physical environment and improving the user's sense of security. Meanwhile, the synthesized scenes maintain the style described by the user, improving the user's immersion. The comparison results show that the proposed scene agent can synthesize virtual scenes with better affordance maintenance, scene diversity, style maintenance, and 3D intersection over union compared to baselines. To the best of our knowledge, this is the first work that achieves in situ scene synthesis with virtual-real affordance consistency and user demand. © 2024 ACM.},

keywords = {3D modeling, 3D scenes, affordance, Affordances, Chatbots, Computer simulation languages, Digital elevation model, Embodied interfaces, Language Model, Large language model, Physical environments, Scene synthesis, Synthesised, Three dimensional computer graphics, user demand, User demands, Virtual environments, Virtual Reality, Virtual scenes},

pubstate = {published},

tppubtype = {inproceedings}

}

Rosati, R.; Senesi, P.; Lonzi, B.; Mancini, A.; Mandolini, M.

An automated CAD-to-XR framework based on generative AI and Shrinkwrap modelling for a User-Centred design approach Journal Article

In: Advanced Engineering Informatics, vol. 62, 2024, ISSN: 14740346 (ISSN).

Abstract | Links | BibTeX | Tags: Adversarial networks, Artificial intelligence, CAD-to-XR, Computer aided design models, Computer aided logic design, Computer-aided design, Computer-aided design-to-XR, Design simplification, Digital elevation model, Digital storage, Extended reality, Flow visualization, Generative adversarial networks, Guns (armament), Helmet mounted displays, Intellectual property core, Mixed reality, Photo-realistic, Shrinkfitting, Structural dynamics, User centered design, User-centered design, User-centered design approaches, User-centred, Virtual Prototyping, Work-flows

@article{rosati_automated_2024,

title = {An automated CAD-to-XR framework based on generative AI and Shrinkwrap modelling for a User-Centred design approach},

author = {R. Rosati and P. Senesi and B. Lonzi and A. Mancini and M. Mandolini},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85204897460&doi=10.1016%2fj.aei.2024.102848&partnerID=40&md5=3acce73b986bed7a9de42e6336d637ad},

doi = {10.1016/j.aei.2024.102848},

issn = {14740346 (ISSN)},

year  = {2024},

date = {2024-01-01},

journal = {Advanced Engineering Informatics},

volume = {62},

abstract = {CAD-to-XR is the workflow to generate interactive Photorealistic Virtual Prototypes (iPVPs) for Extended Reality (XR) apps from Computer-Aided Design (CAD) models. This process entails modelling, texturing, and XR programming. In the literature, no automatic CAD-to-XR frameworks simultaneously manage CAD simplification and texturing. There are no examples of their adoption for User-Centered Design (UCD). Moreover, such CAD-to-XR workflows do not seize the potentialities of generative algorithms to produce synthetic images (textures). The paper presents a framework for implementing the CAD-to-XR workflow. The solution consists of a module for texture generation based on Generative Adversarial Networks (GANs). The generated texture is then managed by another module (based on Shrinkwrap modelling) to develop the iPVP by simplifying the 3D model and UV mapping the generated texture. The geometric and material data is integrated into a graphic engine, which allows for programming an interactive experience with the iPVP in XR. The CAD-to-XR framework was validated on two components (rifle stock and forend) of a sporting rifle. The solution can automate the texturing process of different product versions in shorter times (compared to a manual procedure). After each product revision, it avoids tedious and manual activities required to generate a new iPVP. The image quality metrics highlight that images are generated in a “realistic” manner (the perceived quality of generated textures is highly comparable to real images). The quality of the iPVPs, generated through the proposed framework and visualised by users through a mixed reality head-mounted display, is equivalent to traditionally designed prototypes. © 2024 The Author(s)},

keywords = {Adversarial networks, Artificial intelligence, CAD-to-XR, Computer aided design models, Computer aided logic design, Computer-aided design, Computer-aided design-to-XR, Design simplification, Digital elevation model, Digital storage, Extended reality, Flow visualization, Generative adversarial networks, Guns (armament), Helmet mounted displays, Intellectual property core, Mixed reality, Photo-realistic, Shrinkfitting, Structural dynamics, User centered design, User-centered design, User-centered design approaches, User-centred, Virtual Prototyping, Work-flows},

pubstate = {published},

tppubtype = {article}

}

de Oliveira, E. A. Masasi; Silva, D. F. C.; Filho, A. R. G.

Improving VR Accessibility Through Automatic 360 Scene Description Using Multimodal Large Language Models Proceedings Article

In: ACM Int. Conf. Proc. Ser., pp. 289–293, Association for Computing Machinery, 2024, ISBN: 979-840070979-1 (ISBN).

Abstract | Links | BibTeX | Tags: 3D Scene, 3D scenes, Accessibility, Computer simulation languages, Descriptive information, Digital elevation model, Immersive, Language Model, Multi-modal, Multimodal large language model, Multimodal Large Language Models (MLLMs), Scene description, Virtual environments, Virtual Reality, Virtual Reality (VR), Virtual reality technology

Kang, Z.; Liu, Y.; Zheng, J.; Sun, Z.

Revealing the Difficulty in Jailbreak Defense on Language Models for Metaverse Proceedings Article

In: Q., Gong; X., He (Ed.): SocialMeta - Proc. Int. Workshop Soc. Metaverse Comput., Sens. Netw., Part: ACM SenSys, pp. 31–37, Association for Computing Machinery, Inc, 2024, ISBN: 979-840071299-9 (ISBN).

Abstract | Links | BibTeX | Tags: % reductions, Attack strategies, Computer simulation languages, Defense, Digital elevation model, Guard rails, Jailbreak, Language Model, Large language model, Metaverse Security, Metaverses, Natural languages, Performance, Virtual Reality

@inproceedings{kang_revealing_2024,

title = {Revealing the Difficulty in Jailbreak Defense on Language Models for Metaverse},

author = {Z. Kang and Y. Liu and J. Zheng and Z. Sun},

editor = {Gong Q. and He X.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85212189363&doi=10.1145%2f3698387.3699998&partnerID=40&md5=673326728c3db35ffbbaf807eb7f003c},

doi = {10.1145/3698387.3699998},

isbn = {979-840071299-9 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {SocialMeta - Proc. Int. Workshop Soc. Metaverse Comput., Sens. Netw., Part: ACM SenSys},

pages = {31–37},

publisher = {Association for Computing Machinery, Inc},

abstract = {Large language models (LLMs) have demonstrated exceptional capabilities in natural language processing tasks, fueling innovations in emerging areas such as the metaverse. These models enable dynamic virtual communities, enhancing user interactions and revolutionizing industries. However, their increasing deployment exposes vulnerabilities to jailbreak attacks, where adversaries can manipulate LLM-driven systems to generate harmful content. While various defense mechanisms have been proposed, their efficacy against diverse jailbreak techniques remains unclear. This paper addresses this gap by evaluating the performance of three popular defense methods (Backtranslation, Self-reminder, and Paraphrase) against different jailbreak attack strategies (GCG, BEAST, and Deepinception), while also utilizing three distinct models. Our findings reveal that while defenses are highly effective against optimization-based jailbreak attacks and reduce the attack success rate by 79% on average, they struggle in defending against attacks that alter attack motivations. Additionally, methods relying on self-reminding perform better when integrated with models featuring robust safety guardrails. For instance, Llama2-7b shows a 100% reduction in Attack Success Rate, while Vicuna-7b and Mistral-7b, lacking safety alignment, exhibit a lower average reduction of 65.8%. This study highlights the challenges in developing universal defense solutions for securing LLMs in dynamic environments like the metaverse. Furthermore, our study highlights that the three distinct models utilized demonstrate varying initial defense performance against different jailbreak attack strategies, underscoring the complexity of effectively securing LLMs. © 2024 Copyright held by the owner/author(s).},

keywords = {% reductions, Attack strategies, Computer simulation languages, Defense, Digital elevation model, Guard rails, Jailbreak, Language Model, Large language model, Metaverse Security, Metaverses, Natural languages, Performance, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

2023

Si, J.; Song, J.; Woo, M.; Kim, D.; Lee, Y.; Kim, S.

Generative AI Models for Virtual Interviewers: Applicability and Performance Comparison Proceedings Article

In: IET. Conf. Proc., pp. 27–28, Institution of Engineering and Technology, 2023, ISBN: 27324494 (ISSN).

Abstract | Links | BibTeX | Tags: 3D Generation, College admissions, Digital elevation model, Effective practices, Generative AI, Job hunting, Metaverse, Metaverses, Performance, Performance comparison, Virtual environments, Virtual Interview, Virtual Reality