AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Hu, Y. -H.; Matsumoto, A.; Ito, K.; Narumi, T.; Kuzuoka, H.; Amemiya, T.

Avatar Motion Generation Pipeline for the Metaverse via Synthesis of Generative Models of Text and Video Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 767–771, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).

Abstract | Links | BibTeX | Tags: Ambient intelligence, Design and evaluation methods, Distributed computer systems, Human-centered computing, Language Model, Metaverses, Processing capability, Text-processing, Treemap, Treemaps, Visualization, Visualization design and evaluation method, Visualization design and evaluation methods, Visualization designs, Visualization technique, Visualization techniques

@inproceedings{hu_avatar_2025,

title = {Avatar Motion Generation Pipeline for the Metaverse via Synthesis of Generative Models of Text and Video},

author = {Y. -H. Hu and A. Matsumoto and K. Ito and T. Narumi and H. Kuzuoka and T. Amemiya},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005158851&doi=10.1109%2fVRW66409.2025.00155&partnerID=40&md5=2bc9a6390e1cf710206835722ca8dbbf},

doi = {10.1109/VRW66409.2025.00155},

isbn = {979-833151484-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},

pages = {767–771},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Efforts to integrate AI avatars into the metaverse to enhance interactivity have progressed in both research and commercial domains. AI avatars in the metaverse are expected to exhibit not only verbal responses but also avatar motions, such as non-verbal gestures, to enable seamless communication with users. Large Language Models (LLMs) are known for their advanced text processing capabilities, such as user input, avatar actions, and even entire virtual environments as text, making them a promising approach for planning avatar motions. However, generating the avatar motions solely from the textual information often requires extensive training data whereas the configuration is very challenging, with results that often lack diversity and fail to match user expectations. On the other hand, AI technologies for generating videos have progressed to the point where they can depict diverse and natural human movements based on prompts. Therefore, this paper introduces a novel pipeline, TVMP, that synthesizes LLMs with advanced text processing capabilities and video generation models with the ability to generate videos containing a variety of motions. The pipeline first generates videos from text input, then estimates the motions from the generated videos, and lastly exports the estimated motion data into the avatars in the metaverse. Feedback on the TVMP prototype suggests further refinement is needed, such as speed control, display of the progress, and direct edition for contextual relevance and usability enhancements. The proposed method enables AI avatars to perform highly adaptive and diverse movements to fulfill user expectations and contributes to developing a more immersive metaverse. © 2025 IEEE.},

keywords = {Ambient intelligence, Design and evaluation methods, Distributed computer systems, Human-centered computing, Language Model, Metaverses, Processing capability, Text-processing, Treemap, Treemaps, Visualization, Visualization design and evaluation method, Visualization design and evaluation methods, Visualization designs, Visualization technique, Visualization techniques},

pubstate = {published},

tppubtype = {inproceedings}

}

Ding, S.; Chen, Y.

RAG-VR: Leveraging Retrieval-Augmented Generation for 3D Question Answering in VR Environments Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 131–136, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 979-833151484-6 (ISBN).

Abstract | Links | BibTeX | Tags: Ambient intelligence, Computational Linguistics, Computer interaction, Computing methodologies, Computing methodologies-Artificial intelligence-Natural language processing-Natural language generation, Computing methodology-artificial intelligence-natural language processing-natural language generation, Data handling, Formal languages, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Language Model, Language processing, Natural language generation, Natural language processing systems, Natural languages, Virtual Reality, Word processing

@inproceedings{ding_rag-vr_2025,

title = {RAG-VR: Leveraging Retrieval-Augmented Generation for 3D Question Answering in VR Environments},

author = {S. Ding and Y. Chen},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005140593&doi=10.1109%2fVRW66409.2025.00034&partnerID=40&md5=36dc5fef97aeea4d6e183c83ce9fcd89},

doi = {10.1109/VRW66409.2025.00034},

isbn = {979-833151484-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},

pages = {131–136},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Recent advances in large language models (LLMs) provide new opportunities for context understanding in virtual reality (VR). However, VR contexts are often highly localized and personalized, limiting the effectiveness of general-purpose LLMs. To address this challenge, we present RAG-VR, the first 3D question-answering system for VR that incorporates retrieval-augmented generation (RAG), which augments an LLM with external knowledge retrieved from a localized knowledge database to improve the answer quality. RAG-VR includes a pipeline for extracting comprehensive knowledge about virtual environments and user conditions for accurate answer generation. To ensure efficient retrieval, RAG-VR offloads the retrieval process to a nearby edge server and uses only essential information during retrieval. Moreover, we train the retriever to effectively distinguish among relevant, irrelevant, and hard-to-differentiate information in relation to questions. RAG-VR improves answer accuracy by 17.9%-41.8% and reduces end-to-end latency by 34.5%-47.3% compared with two baseline systems. © 2025 IEEE.},

keywords = {Ambient intelligence, Computational Linguistics, Computer interaction, Computing methodologies, Computing methodologies-Artificial intelligence-Natural language processing-Natural language generation, Computing methodology-artificial intelligence-natural language processing-natural language generation, Data handling, Formal languages, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Language Model, Language processing, Natural language generation, Natural language processing systems, Natural languages, Virtual Reality, Word processing},

pubstate = {published},

tppubtype = {inproceedings}

}

2024

Gottsacker, M.; Bruder, G.; Welch, G. F.

rlty2rlty: Transitioning Between Realities with Generative AI Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 1160–1161, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037449-0 (ISBN).

Abstract | Links | BibTeX | Tags: Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Interactive computer graphics, Liminal spaces, Mixed / augmented reality, Mixed reality, Real environments, System use, User interfaces, Virtual worlds

Yin, Z.; Wang, Y.; Papatheodorou, T.; Hui, P.

Text2VRScene: Exploring the Framework of Automated Text-driven Generation System for VR Experience Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR, pp. 701–711, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037402-5 (ISBN).

Abstract | Links | BibTeX | Tags: Automated systems, Automation, Digital contents, Generation systems, Generative model, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Interaction techniques, Language Model, Natural language processing systems, Text input, User interfaces, Virtual Reality

@inproceedings{yin_text2vrscene_2024,

title = {Text2VRScene: Exploring the Framework of Automated Text-driven Generation System for VR Experience},

author = {Z. Yin and Y. Wang and T. Papatheodorou and P. Hui},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85191431035&doi=10.1109%2fVR58804.2024.00090&partnerID=40&md5=5484a5bc3939d003efe68308f56b15a6},

doi = {10.1109/VR58804.2024.00090},

isbn = {979-835037402-5 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces, VR},

pages = {701–711},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {With the recent development of the Virtual Reality (VR) industry, the increasing number of VR users pushes the demand for the massive production of immersive and expressive VR scenes in related industries. However, creating expressive VR scenes involves the reasonable organization of various digital content to express a coherent and logical theme, which is time-consuming and labor-intensive. In recent years, Large Language Models (LLMs) such as ChatGPT 3.5 and generative models such as stable diffusion have emerged as powerful tools for comprehending natural language and generating digital contents such as text, code, images, and 3D objects. In this paper, we have explored how we can generate VR scenes from text by incorporating LLMs and various generative models into an automated system. To achieve this, we first identify the possible limitations of LLMs for an automated system and propose a systematic framework to mitigate them. Subsequently, we developed Text2VRScene, a VR scene generation system, based on our proposed framework with well-designed prompts. To validate the effectiveness of our proposed framework and the designed prompts, we carry out a series of test cases. The results show that the proposed framework contributes to improving the reliability of the system and the quality of the generated VR scenes. The results also illustrate the promising performance of the Text2VRScene in generating satisfying VR scenes with a clear theme regularized by our well-designed prompts. This paper ends with a discussion about the limitations of the current system and the potential of developing similar generation systems based on our framework. © 2024 IEEE.},

keywords = {Automated systems, Automation, Digital contents, Generation systems, Generative model, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Interaction paradigm, Interaction paradigms, Interaction techniques, Language Model, Natural language processing systems, Text input, User interfaces, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Jeong, E.; Kim, H.; Park, S.; Yoon, S.; Ahn, J.; Woo, W.

Function-Adaptive Affordance Extraction from 3D Objects Using LLM for Interaction Authoring with Augmented Artifacts Proceedings Article

In: U., Eck; M., Sra; J., Stefanucci; M., Sugimoto; M., Tatzgern; I., Williams (Ed.): Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct, pp. 205–208, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-833150691-9 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, Applied computing, Art and humanity, Artificial intelligence, Arts and humanities, Augmented Reality, Computer interaction, Computer vision, Computing methodologies, computing methodology, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Humanities computing, Interaction paradigm, Interaction paradigms, Language processing, Mixed / augmented reality, Mixed reality, Modeling languages, Natural Language Processing, Natural language processing systems, Natural languages, Three dimensional computer graphics

@inproceedings{jeong_function-adaptive_2024,

title = {Function-Adaptive Affordance Extraction from 3D Objects Using LLM for Interaction Authoring with Augmented Artifacts},

author = {E. Jeong and H. Kim and S. Park and S. Yoon and J. Ahn and W. Woo},

editor = {Eck U. and Sra M. and Stefanucci J. and Sugimoto M. and Tatzgern M. and Williams I.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85214379963&doi=10.1109%2fISMAR-Adjunct64951.2024.00050&partnerID=40&md5=7222e0599a7e2aa0adaea38e4b9e13cc},

doi = {10.1109/ISMAR-Adjunct64951.2024.00050},

isbn = {979-833150691-9 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct},

pages = {205–208},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {We propose an algorithm that extracts the most suitable affordances, interaction targets, and corresponding coordinates adaptively from 3D models of various artifacts based on their functional context for efficient authoring of XR content with artifacts. Traditionally, authoring AR scenes to convey artifact context required one-to-one manual work. Our approach leverages a Large Language Model (LLM) to extract interaction types, positions, and subjects based on the artifact's name and usage context. This enables templated XR experience creation, replacing repetitive manual labor. Consequently, our system streamlines the XR authoring process, making it more efficient and scalable. © 2024 IEEE.},

keywords = {3D modeling, Applied computing, Art and humanity, Artificial intelligence, Arts and humanities, Augmented Reality, Computer interaction, Computer vision, Computing methodologies, computing methodology, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Humanities computing, Interaction paradigm, Interaction paradigms, Language processing, Mixed / augmented reality, Mixed reality, Modeling languages, Natural Language Processing, Natural language processing systems, Natural languages, Three dimensional computer graphics},

pubstate = {published},

tppubtype = {inproceedings}

}

Imamura, S.; Hiraki, H.; Rekimoto, J.

Serendipity Wall: A Discussion Support System Using Real-Time Speech Recognition and Large Language Model Proceedings Article

In: Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW, pp. 588–590, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037449-0 (ISBN).

Abstract | Links | BibTeX | Tags: Brainstorming sessions, Discussion support, Embeddings, Group discussions, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Language Model, Large displays, Real- time, Speech recognition, Support systems, Virtual Reality

@inproceedings{imamura_serendipity_2024,

title = {Serendipity Wall: A Discussion Support System Using Real-Time Speech Recognition and Large Language Model},

author = {S. Imamura and H. Hiraki and J. Rekimoto},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195557406&doi=10.1109%2fVRW62533.2024.00113&partnerID=40&md5=22c393aa1ea99a9e64d382f1b56fb877},

doi = {10.1109/VRW62533.2024.00113},

isbn = {979-835037449-0 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Conf. Virtual Real. 3D User Interfaces Abstr. Workshops, VRW},

pages = {588–590},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Group discussions are important for exploring new ideas. One method to support discussions is presenting relevant keywords or images. However, the context of the conversation and information tended not to be taken into account. Therefore, we propose a system that develops group discussions by presenting related information in response to discussions. As a specific example, this study addressed academic discussions among HCI researchers. During brainstorming sessions, the system continuously transcribes the dialogue and generates embedding vectors of the discussions. These vectors are matched against those of existing research articles to identify relevant studies. Then, the system presented relevant studies on the large display with summarizing by an LLM. In a case study, this system had the effect of broadening the topics of discussion and facilitating the acquisition of new knowledge. A larger display area is desirable in terms of information volume and size. Therefore, in addition to large displays, virtual reality environments with headsets could be suitable for this system. © 2024 IEEE.},

keywords = {Brainstorming sessions, Discussion support, Embeddings, Group discussions, Human computer interaction, Human computer interaction (HCI), Human-centered computing, Language Model, Large displays, Real- time, Speech recognition, Support systems, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}