AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Shi, J.; Jain, R.; Chi, S.; Doh, H.; Chi, H. -G.; Quinn, A. J.; Ramani, K.

CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial Intelligence Proceedings Article

In: Conf Hum Fact Comput Syst Proc, Association for Computing Machinery, 2025, ISBN: 979-840071394-1 (ISBN).

Abstract | Links | BibTeX | Tags: 'current, Application scenario, AR application, Augmented Reality, Context-Aware, Contextual information, Generative adversarial networks, generative artificial intelligence, Humanoid avatars, In-situ learning, Learning experiences, Power

@inproceedings{shi_caring-ai_2025,

title = {CARING-AI: Towards Authoring Context-aware Augmented Reality INstruction through Generative Artificial Intelligence},

author = {J. Shi and R. Jain and S. Chi and H. Doh and H. -G. Chi and A. J. Quinn and K. Ramani},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105005725461&doi=10.1145%2f3706598.3713348&partnerID=40&md5=e88afd8426e020155599ef3b2a044774},

doi = {10.1145/3706598.3713348},

isbn = {979-840071394-1 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Conf Hum Fact Comput Syst Proc},

publisher = {Association for Computing Machinery},

abstract = {Context-aware AR instruction enables adaptive and in-situ learning experiences. However, hardware limitations and expertise requirements constrain the creation of such instructions. With recent developments in Generative Artificial Intelligence (Gen-AI), current research tries to tackle these constraints by deploying AI-generated content (AIGC) in AR applications. However, our preliminary study with six AR practitioners revealed that the current AIGC lacks contextual information to adapt to varying application scenarios and is therefore limited in authoring. To utilize the strong generative power of GenAI to ease the authoring of AR instruction while capturing the context, we developed CARING-AI, an AR system to author context-aware humanoid-avatar-based instructions with GenAI. By navigating in the environment, users naturally provide contextual information to generate humanoid-avatar animation as AR instructions that blend in the context spatially and temporally. We showcased three application scenarios of CARING-AI: Asynchronous Instructions, Remote Instructions, and Ad Hoc Instructions based on a design space of AIGC in AR Instructions. With two user studies (N=12), we assessed the system usability of CARING-AI and demonstrated the easiness and effectiveness of authoring with Gen-AI. © 2025 Copyright held by the owner/author(s).},

keywords = {'current, Application scenario, AR application, Augmented Reality, Context-Aware, Contextual information, Generative adversarial networks, generative artificial intelligence, Humanoid avatars, In-situ learning, Learning experiences, Power},

pubstate = {published},

tppubtype = {inproceedings}

}

Zhang, G.; Wang, Y.; Luo, C.; Xu, S.; Ming, Y.; Peng, J.; Zhang, M.

Visual Harmony: LLM’s Power in Crafting Coherent Indoor Scenes from Images Proceedings Article

In: Z., Lin; H., Zha; M.-M., Cheng; R., He; C.-L., Liu; K., Ubul; W., Silamu; J., Zhou (Ed.): Lect. Notes Comput. Sci., pp. 3–17, Springer Science and Business Media Deutschland GmbH, 2025, ISBN: 03029743 (ISSN); 978-981978507-0 (ISBN).

Abstract | Links | BibTeX | Tags: Augmented Reality, Depth perception, Indoor scene generation, Input image, Language Model, Large language model, Metaverses, Point-clouds, Power, Scene completion, Scene Generation, Scene-graphs, Semantic Segmentation, Semantics, Virtual Reality, Visual languages

@inproceedings{zhang_visual_2025,

title = {Visual Harmony: LLM’s Power in Crafting Coherent Indoor Scenes from Images},

author = {G. Zhang and Y. Wang and C. Luo and S. Xu and Y. Ming and J. Peng and M. Zhang},

editor = {Lin Z. and Zha H. and Cheng M.-M. and He R. and Liu C.-L. and Ubul K. and Silamu W. and Zhou J.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209374797&doi=10.1007%2f978-981-97-8508-7_1&partnerID=40&md5=5231ab0bce95fb3f09db80392acd58ff},

doi = {10.1007/978-981-97-8508-7_1},

isbn = {03029743 (ISSN); 978-981978507-0 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Lect. Notes Comput. Sci.},

volume = {15036 LNCS},

pages = {3–17},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {Indoor scene generation has recently attracted significant attention as it is crucial for metaverse, 3D animation, visual effects in movies, and virtual/augmented reality. Existing image-based indoor scene generation methods often produce scenes that are not realistic enough, with issues such as floating objects, incorrect object orientations, and incomplete scenes that only include the part of the scenes captured by the input image. To address these challenges, we propose Visual Harmony, a method that leverages the powerful spatial imagination capabilities of Large Language Model (LLM) to generate corresponding indoor scenes based on the input image. Specifically, we first extract information from the input image through depth estimation and panorama segmentation, reconstructing a semantic point cloud. Using this reconstructed semantic point cloud, we extract a scene graph that describes only the objects in the image. Then we leverage the strong spatial imagination capabilities of LLM to complete the scene graph, forming a representation of a complete room scene. Based on this fine scene graph, we can generate entire indoor scene that includes both the captured and not captured parts of the input image. Extensive experiments demonstrate that our method can generate realistic, plausible, and highly relevant complete indoor scenes related to the input image. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.},

keywords = {Augmented Reality, Depth perception, Indoor scene generation, Input image, Language Model, Large language model, Metaverses, Point-clouds, Power, Scene completion, Scene Generation, Scene-graphs, Semantic Segmentation, Semantics, Virtual Reality, Visual languages},

pubstate = {published},

tppubtype = {inproceedings}

}

2024

Liu, Y.; Siau, K. L.

Generative Artificial Intelligence and Metaverse: Future of Work, Future of Society, and Future of Humanity Proceedings Article

In: F., Zhao; D., Miao (Ed.): Commun. Comput. Info. Sci., pp. 118–127, Springer Science and Business Media Deutschland GmbH, 2024, ISBN: 18650929 (ISSN); 978-981997586-0 (ISBN).

Abstract | Links | BibTeX | Tags: Artificial intelligence, ChatGPT, Future of works, Generative AI, Metaverse, Metaverses, Policy makers, Power, Research direction, Research directions, Research questions, Technical experts, Technical professionals

@inproceedings{liu_generative_2024,

title = {Generative Artificial Intelligence and Metaverse: Future of Work, Future of Society, and Future of Humanity},

author = {Y. Liu and K. L. Siau},

editor = {Zhao F. and Miao D.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85177219164&doi=10.1007%2f978-981-99-7587-7_10&partnerID=40&md5=524dd02b8766aa25766293cee4ee0e16},

doi = {10.1007/978-981-99-7587-7_10},

isbn = {18650929 (ISSN); 978-981997586-0 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Commun. Comput. Info. Sci.},

volume = {1946 CCIS},

pages = {118–127},

publisher = {Springer Science and Business Media Deutschland GmbH},

abstract = {The rapid development of Generative Artificial Intelligence (GenAI) and the emergence of the Metaverse are dynamically reshaping our lives and societies. GenAI can enhance the development of Metaverse and empower the applications in Metaverse. Metaverse is also an excellent environment for GenAI to demonstrate its power and usefulness. This interwoven relationship fuels the potential of integrating GenAI and Metaverse. The paper discusses the integration potential of GenAI and Metaverse from four aspects. We further investigate how GenAI, Metaverse, and the integration of GenAI and Metaverse can reshape our future across the realms of work, society, and humanity. This paper offers theoretical and practical contributions by proposing research directions and specific research questions. Academic researchers can glean insights for future research and generate novel topics based on our findings. Policymakers, technical experts, and professionals across industries can gain a comprehensive grasp of GenAI and the Metaverse, enhancing their ability to adapt and contribute effectively to this emerging wave of innovation. © 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.},

keywords = {Artificial intelligence, ChatGPT, Future of works, Generative AI, Metaverse, Metaverses, Policy makers, Power, Research direction, Research directions, Research questions, Technical experts, Technical professionals},

pubstate = {published},

tppubtype = {inproceedings}

}

Min, Y.; Jeong, J. -W.

Public Speaking Q&A Practice with LLM-Generated Personas in Virtual Reality Proceedings Article

In: U., Eck; M., Sra; J., Stefanucci; M., Sugimoto; M., Tatzgern; I., Williams (Ed.): Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct, pp. 493–496, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-833150691-9 (ISBN).

Abstract | Links | BibTeX | Tags: Digital elevation model, Economic and social effects, Language Model, Large language model-based persona generation, LLM-based Persona Generation, Model-based OPC, Personnel training, Power, Practice systems, Presentation Anxiety, Public speaking, Q&A practice, user experience, Users' experiences, Virtual environments, Virtual Reality, VR training

@inproceedings{min_public_2024,

title = {Public Speaking Q&A Practice with LLM-Generated Personas in Virtual Reality},

author = {Y. Min and J. -W. Jeong},

editor = {Eck U. and Sra M. and Stefanucci J. and Sugimoto M. and Tatzgern M. and Williams I.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85214393734&doi=10.1109%2fISMAR-Adjunct64951.2024.00143&partnerID=40&md5=992d9599bde26f9d57d549639869d124},

doi = {10.1109/ISMAR-Adjunct64951.2024.00143},

isbn = {979-833150691-9 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Int. Symp. Mixed Augment. Real. Adjunct, ISMAR-Adjunct},

pages = {493–496},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This paper introduces a novel VR-based Q&A practice system that harnesses the power of Large Language Models (LLMs). We support Q&A practice for upcoming public speaking by providing an immersive VR training environment populated with LLM-generated audiences, each capable of posing diverse and realistic questions based on different personas. We conducted a pilot user study involving 20 participants who engaged in VR-based Q&A practice sessions. The sessions featured a variety of questions regarding presentation material provided by the participants, all of which were generated by LLM-based personas. Through post-surveys and interviews, we evaluated the effectiveness of the proposed method. The participants valued the system for engagement and focus while also identifying several areas for improvement. Our study demonstrated the potential of integrating VR and LLMs to create a powerful, immersive tool for Q&A practice. © 2024 IEEE.},

keywords = {Digital elevation model, Economic and social effects, Language Model, Large language model-based persona generation, LLM-based Persona Generation, Model-based OPC, Personnel training, Power, Practice systems, Presentation Anxiety, Public speaking, Q&A practice, user experience, Users' experiences, Virtual environments, Virtual Reality, VR training},

pubstate = {published},

tppubtype = {inproceedings}

}

Patel, P.; Goiri, Í.; Choukse, E.; Warrier, B.; Bianchini, R.; Zhang, C.; Mahalingam, N.

Characterizing Power Management Opportunities for LLMs in the Cloud Proceedings Article

In: Int Conf Archit Support Program Lang Oper Syst ASPLOS, pp. 207–222, Association for Computing Machinery, 2024, ISBN: 979-840070386-7 (ISBN).

Abstract | Links | BibTeX | Tags: Cloud, Cloud providers, Computational Linguistics, Computing power, Consumption patterns, Datacenter, datacenters, Electric power utilization, GPUs, Language Model, Large language model, large language models, Model inference, Power, Power management, Power oversubscription, Power usage, Profiling, Program processors, Virtual Reality

@inproceedings{patel_characterizing_2024,

title = {Characterizing Power Management Opportunities for LLMs in the Cloud},

author = {P. Patel and Í. Goiri and E. Choukse and B. Warrier and R. Bianchini and C. Zhang and N. Mahalingam},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192199791&doi=10.1145%2f3620666.3651329&partnerID=40&md5=6102cbb096a789e297711420d4b8427a},

doi = {10.1145/3620666.3651329},

isbn = {979-840070386-7 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Int Conf Archit Support Program Lang Oper Syst ASPLOS},

volume = {3},

pages = {207–222},

publisher = {Association for Computing Machinery},

abstract = {Recent innovation in large language models (LLMs), and their myriad use cases have rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and other enterprises plan to substantially grow their datacenter capacity to support these new workloads. A key bottleneck resource in datacenters is power, which LLMs are quickly saturating due to their rapidly increasing model sizes. We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the training and inference power consumption patterns. Based on our analysis, we claim that the average and peak power utilization in LLM inference clusters should not be very high. Our deductions align with data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment make it challenging to build a reliable and robust power management framework. We leverage the insights from our characterization to identify opportunities for better power management. As a detailed use case, we propose a new framework called POLCA, which enables power oversubscription in LLM inference clouds. POLCA is robust, reliable, and readily deployable. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30% more servers in existing clusters with minimal performance loss. © 2024 Copyright held by the owner/author(s).},

keywords = {Cloud, Cloud providers, Computational Linguistics, Computing power, Consumption patterns, Datacenter, datacenters, Electric power utilization, GPUs, Language Model, Large language model, large language models, Model inference, Power, Power management, Power oversubscription, Power usage, Profiling, Program processors, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Li, K.; Gulati, M.; Shah, D.; Waskito, S.; Chakrabarty, S.; Varshney, A.

PixelGen: Rethinking Embedded Cameras for Mixed-Reality Proceedings Article

In: ACM MobiCom - Proc. Int. Conf. Mob. Comput. Netw., pp. 2128–2135, Association for Computing Machinery, Inc, 2024, ISBN: 979-840070489-5 (ISBN).

Abstract | Links | BibTeX | Tags: Blind spots, embedded systems, Embedded-system, Field of views, Language Model, Large language model, large language models, Mixed reality, Networking, Partial views, Pixels, Power, Visible spectrums

@inproceedings{li_pixelgen_2024,

title = {PixelGen: Rethinking Embedded Cameras for Mixed-Reality},

author = {K. Li and M. Gulati and D. Shah and S. Waskito and S. Chakrabarty and A. Varshney},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105002721208&doi=10.1145%2f3636534.3696216&partnerID=40&md5=97ee680318c72552b3e642aa57aaeca5},

doi = {10.1145/3636534.3696216},

isbn = {979-840070489-5 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {ACM MobiCom - Proc. Int. Conf. Mob. Comput. Netw.},

pages = {2128–2135},

publisher = {Association for Computing Machinery, Inc},

abstract = {Mixed-reality headsets offer new ways to perceive our environment. They employ visible spectrum cameras to capture and display the environment on screens in front of the user's eyes. However, these cameras lead to limitations. Firstly, they capture only a partial view of the environment. They are positioned to capture whatever is in front of the user, thus creating blind spots during complete immersion and failing to detect events outside the restricted field of view. Secondly, they capture only visible light fields, ignoring other fields like acoustics and radio that are also present in the environment. Finally, these power-hungry cameras rapidly deplete the mixed-reality headset's battery. We introduce PixelGen to rethink embedded cameras for mixed-reality headsets. PixelGen proposes to decouple cameras from the mixed-reality headset and balance resolution and fidelity to minimize the power consumption. It employs low-resolution, monochrome image sensors and environmental sensors to capture the surroundings around the headset. This approach reduces the system's communication bandwidth and power consumption. A transformer-based language and image model process this information to overcome resolution trade-offs, thus generating a higher-resolution representation of the environment. We present initial experiments that show PixelGen's viability. © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.},

keywords = {Blind spots, embedded systems, Embedded-system, Field of views, Language Model, Large language model, large language models, Mixed reality, Networking, Partial views, Pixels, Power, Visible spectrums},

pubstate = {published},

tppubtype = {inproceedings}

}