AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

26 entries « ‹ 1 of 2 › »

2025

Liu, G.; Du, H.; Wang, J.; Niyato, D.; Kim, D. I.

Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse Journal Article

In: IEEE Transactions on Mobile Computing, 2025, ISSN: 15361233 (ISSN).

Abstract | Links | BibTeX | Tags: Contest Theory, Deep learning, Deep reinforcement learning, Diffusion Model, Generative adversarial networks, Generative AI, High quality, Image generation, Image generations, Immersive technologies, Metaverses, Mobile edge computing, Reinforcement Learning, Reinforcement learnings, Resource allocation, Resources allocation, Semantic data, Virtual addresses, Virtual environments, Virtual Reality

@article{liu_contract-inspired_2025,

title = {Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge Metaverse},

author = {G. Liu and H. Du and J. Wang and D. Niyato and D. I. Kim},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000066834&doi=10.1109%2fTMC.2025.3550815&partnerID=40&md5=3cb5a2143b9ce4ca7f931a60f1bf239c},

doi = {10.1109/TMC.2025.3550815},

issn = {15361233 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Transactions on Mobile Computing},

abstract = {The rapid advancement of immersive technologies has propelled the development of the Metaverse, where the convergence of virtual and physical realities necessitates the generation of high-quality, photorealistic images to enhance user experience. However, generating these images, especially through Generative Diffusion Models (GDMs), in mobile edge computing environments presents significant challenges due to the limited computing resources of edge devices and the dynamic nature of wireless networks. This paper proposes a novel framework that integrates contract-inspired contest theory, Deep Reinforcement Learning (DRL), and GDMs to optimize image generation in these resource-constrained environments. The framework addresses the critical challenges of resource allocation and semantic data transmission quality by incentivizing edge devices to efficiently transmit high-quality semantic data, which is essential for creating realistic and immersive images. The use of contest and contract theory ensures that edge devices are motivated to allocate resources effectively, while DRL dynamically adjusts to network conditions, optimizing the overall image generation process. Experimental results demonstrate that the proposed approach not only improves the quality of generated images but also achieves superior convergence speed and stability compared to traditional methods. This makes the framework particularly effective for optimizing complex resource allocation tasks in mobile edge Metaverse applications, offering enhanced performance and efficiency in creating immersive virtual environments. © 2002-2012 IEEE.},

keywords = {Contest Theory, Deep learning, Deep reinforcement learning, Diffusion Model, Generative adversarial networks, Generative AI, High quality, Image generation, Image generations, Immersive technologies, Metaverses, Mobile edge computing, Reinforcement Learning, Reinforcement learnings, Resource allocation, Resources allocation, Semantic data, Virtual addresses, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {article}

}

Zhang, Z.; Jiang, Y.; Wei, X.; Chen, M.; Dong, H.; Yu, S.

Generative-AI for XR Content Transmission in the Metaverse: Potential Approaches, Challenges, and a Generation-Driven Transmission Framework Journal Article

In: IEEE Network, 2025, ISSN: 08908044 (ISSN).

Abstract | Links | BibTeX | Tags: 'current, Cloud servers, Collaboration architecture, Content transmission, Decision making, Deep learning, Deep reinforcement learning, Edge server, Generative adversarial networks, Intelligence models, Large volumes, Metaverses, Network bottlenecks, Reinforcement Learning, Through current

@article{zhang_generative-ai_2025,

title = {Generative-AI for XR Content Transmission in the Metaverse: Potential Approaches, Challenges, and a Generation-Driven Transmission Framework},

author = {Z. Zhang and Y. Jiang and X. Wei and M. Chen and H. Dong and S. Yu},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-86000661262&doi=10.1109%2fMNET.2025.3547385&partnerID=40&md5=1e00d40542ec58ef1489934abb2a990c},

doi = {10.1109/MNET.2025.3547385},

issn = {08908044 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Network},

abstract = {How to efficiently transmit large volumes of Extended Reality (XR) content through current networks has been a major bottleneck in realizing the Metaverse. The recently emerging Generative Artificial Intelligence (GAI) has already revolutionized various technological fields and provides promising solutions to this challenge. In this article, we first demonstrate current networks' bottlenecks for supporting XR content transmission in the Metaverse. Then, we explore the potential approaches and challenges of utilizing GAI to overcome these bottlenecks. To address these challenges, we propose a GAI-based XR content transmission framework which leverages a cloud-edge collaboration architecture. The cloud servers are responsible for storing and rendering the original XR content, while edge servers utilize GAI models to generate essential parts of XR content (e.g., subsequent frames, selected objects, etc.) when network resources are insufficient to transmit them. A Deep Reinforcement Learning (DRL)-based decision module is proposed to solve the decision-making problems. Our case study demonstrates that the proposed GAI-based transmission framework achieves a 2.8-fold increase in normal frame ratio (percentage of frames that meet the quality and latency requirements for XR content transmission) over baseline approaches, underscoring the potential of GAI models to facilitate XR content transmission in the Metaverse. © 2025 IEEE.},

keywords = {'current, Cloud servers, Collaboration architecture, Content transmission, Decision making, Deep learning, Deep reinforcement learning, Edge server, Generative adversarial networks, Intelligence models, Large volumes, Metaverses, Network bottlenecks, Reinforcement Learning, Through current},

pubstate = {published},

tppubtype = {article}

}

Zhang, Z.; Wang, J.; Chen, J.; Fu, H.; Tong, Z.; Jiang, C.

Diffusion-Based Reinforcement Learning for Cooperative Offloading and Resource Allocation in Multi-UAV Assisted Edge-Enabled Metaverse Journal Article

In: IEEE Transactions on Vehicular Technology, 2025, ISSN: 00189545 (ISSN).

Abstract | Links | BibTeX | Tags: Aerial vehicle, Content creation, Content services, Contrastive Learning, Decision making, Deep learning, Deep reinforcement learning, Diffusion Model, Global industry, Helicopter services, Markov processes, Metaverse, Metaverses, Reinforcement Learning, Reinforcement learnings, Resource allocation, Resources allocation, Typical application, Unmanned aerial vehicle, Unmanned aerial vehicle (UAV), Unmanned aerial vehicles (UAV)

@article{zhang_diffusion-based_2025,

title = {Diffusion-Based Reinforcement Learning for Cooperative Offloading and Resource Allocation in Multi-UAV Assisted Edge-Enabled Metaverse},

author = {Z. Zhang and J. Wang and J. Chen and H. Fu and Z. Tong and C. Jiang},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85219108203&doi=10.1109%2fTVT.2025.3544879&partnerID=40&md5=fdbe1554f6cf7d47d4bbbb73b4b0d487},

doi = {10.1109/TVT.2025.3544879},

issn = {00189545 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Transactions on Vehicular Technology},

abstract = {As one of the typical applications of 6G, the metaverse, with its superior immersion and diversified services, has garnered widespread attention from both the global industry and academia. Simultaneously, the emergence of AI-generated content (AIGC), exemplified by ChatGPT, has revolutionized the mean of content creation in the metaverse. Providing meataverse users with diversified AIGC services anytime and anywhere to meet the demand for immersive and blended virtual-real experiences in the physical world has become a major challenge in the development of the metaverse. Considering the flexibility and mobility of unmanned aerial vehicles (UAVs), we innovatively incorporate multiple UAVs as one of the AIGC service providers and construct a multi-UAV assisted edge-enabled metaverse system in the context of AIGC-as-a-Service (AaaS) scenario. To solve the complex resource management and allocation problem in the aforementioned system, we formulate it as a Markov decision process (MDP) and propose utilizing the generative capabilities of the diffusion model in combination with the robust decision-making abilities of reinforcement learning to tackle these issues. In order to substantiate the efficacy of the proposed diffusion-based reinforcement learning framework, we propose a novel diffusion-based soft actor-critic algorithm for metaverse (Meta-DSAC). Subsequently, a series of experiments are executed and the simulation results empirically validate the proposed algorithm's comparative advantages of the ability to provide stable and substantial long-term rewards, as well as the enhanced capacity to model complex environment. © 2025 IEEE.},

keywords = {Aerial vehicle, Content creation, Content services, Contrastive Learning, Decision making, Deep learning, Deep reinforcement learning, Diffusion Model, Global industry, Helicopter services, Markov processes, Metaverse, Metaverses, Reinforcement Learning, Reinforcement learnings, Resource allocation, Resources allocation, Typical application, Unmanned aerial vehicle, Unmanned aerial vehicle (UAV), Unmanned aerial vehicles (UAV)},

pubstate = {published},

tppubtype = {article}

}

Oskooei, A. Rafiei; Aktaş, M. S.; Keleş, M.

Seeing the Sound: Multilingual Lip Sync for Real-Time Face-to-Face Translation † Journal Article

In: Computers, vol. 14, no. 1, 2025, ISSN: 2073431X (ISSN).

Abstract | Links | BibTeX | Tags: Computer vision, Deep learning, face-to-face translation, Generative AI, human–computer interaction, lip synchronization, talking head generation

@article{rafiei_oskooei_seeing_2025,

title = {Seeing the Sound: Multilingual Lip Sync for Real-Time Face-to-Face Translation †},

author = {A. Rafiei Oskooei and M. S. Aktaş and M. Keleş},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85215974883&doi=10.3390%2fcomputers14010007&partnerID=40&md5=f4d244e3e1cba572d2a3beb9c0895d32},

doi = {10.3390/computers14010007},

issn = {2073431X (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Computers},

volume = {14},

number = {1},

abstract = {Imagine a future where language is no longer a barrier to real-time conversations, enabling instant and lifelike communication across the globe. As cultural boundaries blur, the demand for seamless multilingual communication has become a critical technological challenge. This paper addresses the lack of robust solutions for real-time face-to-face translation, particularly for low-resource languages, by introducing a comprehensive framework that not only translates language but also replicates voice nuances and synchronized facial expressions. Our research tackles the primary challenge of achieving accurate lip synchronization across culturally diverse languages, filling a significant gap in the literature by evaluating the generalizability of lip sync models beyond English. Specifically, we develop a novel evaluation framework combining quantitative lip sync error metrics and qualitative assessments by human observers. This framework is applied to assess two state-of-the-art lip sync models with different architectures for Turkish, Persian, and Arabic languages, using a newly collected dataset. Based on these findings, we propose and implement a modular system that integrates language-agnostic lip sync models with neural networks to deliver a fully functional face-to-face translation experience. Inference Time Analysis shows this system achieves highly realistic, face-translated talking heads in real time, with a throughput as low as 0.381 s. This transformative framework is primed for deployment in immersive environments such as VR/AR, Metaverse ecosystems, and advanced video conferencing platforms. It offers substantial benefits to developers and businesses aiming to build next-generation multilingual communication systems for diverse applications. While this work focuses on three languages, its modular design allows scalability to additional languages. However, further testing in broader linguistic and cultural contexts is required to confirm its universal applicability, paving the way for a more interconnected and inclusive world where language ceases to hinder human connection. © 2024 by the authors.},

keywords = {Computer vision, Deep learning, face-to-face translation, Generative AI, human–computer interaction, lip synchronization, talking head generation},

pubstate = {published},

tppubtype = {article}

}

Imagine a future where language is no longer a barrier to real-time conversations, enabling instant and lifelike communication across the globe. As cultural boundaries blur, the demand for seamless multilingual communication has become a critical technological challenge. This paper addresses the lack of robust solutions for real-time face-to-face translation, particularly for low-resource languages, by introducing a comprehensive framework that not only translates language but also replicates voice nuances and synchronized facial expressions. Our research tackles the primary challenge of achieving accurate lip synchronization across culturally diverse languages, filling a significant gap in the literature by evaluating the generalizability of lip sync models beyond English. Specifically, we develop a novel evaluation framework combining quantitative lip sync error metrics and qualitative assessments by human observers. This framework is applied to assess two state-of-the-art lip sync models with different architectures for Turkish, Persian, and Arabic languages, using a newly collected dataset. Based on these findings, we propose and implement a modular system that integrates language-agnostic lip sync models with neural networks to deliver a fully functional face-to-face translation experience. Inference Time Analysis shows this system achieves highly realistic, face-translated talking heads in real time, with a throughput as low as 0.381 s. This transformative framework is primed for deployment in immersive environments such as VR/AR, Metaverse ecosystems, and advanced video conferencing platforms. It offers substantial benefits to developers and businesses aiming to build next-generation multilingual communication systems for diverse applications. While this work focuses on three languages, its modular design allows scalability to additional languages. However, further testing in broader linguistic and cultural contexts is required to confirm its universal applicability, paving the way for a more interconnected and inclusive world where language ceases to hinder human connection. © 2024 by the authors.

Pielage, L.; Schmidle, P.; Marschall, B.; Risse, B.

Interactive High-Quality Skin Lesion Generation using Diffusion Models for VR-based Dermatological Education Proceedings Article

In: Int Conf Intell User Interfaces Proc IUI, pp. 878–897, Association for Computing Machinery, 2025, ISBN: 979-840071306-4 (ISBN).

Abstract | Links | BibTeX | Tags: Deep learning, Dermatology, Diffusion Model, diffusion models, Digital elevation model, Generative AI, Graphical user interfaces, Guidance Strategies, Guidance strategy, Image generation, Image generations, Inpainting, Interactive Generation, Medical education, Medical Imaging, Simulation training, Skin lesion, Upsampling, Virtual environments, Virtual Reality

@inproceedings{pielage_interactive_2025,

title = {Interactive High-Quality Skin Lesion Generation using Diffusion Models for VR-based Dermatological Education},

author = {L. Pielage and P. Schmidle and B. Marschall and B. Risse},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001923208&doi=10.1145%2f3708359.3712101&partnerID=40&md5=639eec55b08a54ce813f7c1016c621e7},

doi = {10.1145/3708359.3712101},

isbn = {979-840071306-4 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Int Conf Intell User Interfaces Proc IUI},

pages = {878–897},

publisher = {Association for Computing Machinery},

abstract = {Malignant melanoma is one of the most lethal forms of cancer when not detected early. As a result, cancer screening programs have been implemented internationally, all of which require visual inspection of skin lesions. Early melanoma detection is a crucial competence in medical and dermatological education, and it is primarily trained using 2D imagery. However, given the intrinsic 3D nature of skin lesions and the importance of incorporating additional contextual information about the patient (e.g., skin type, nearby lesions, etc.), this approach falls short of providing a comprehensive and scalable learning experience. A potential solution is the use of Virtual Reality (VR) scenarios, which can offer an effective strategy to train skin cancer screenings in a realistic 3D setting, thereby enhancing medical students' awareness of early melanoma detection. In this paper, we present a comprehensive pipeline and models for generating malignant melanomas and benign nevi, which can be utilized in VR-based medical training. We use diffusion models for the generation of skin lesions, which we have enhanced with various guiding strategies to give educators maximum flexibility in designing scenarios and seamlessly placing lesions on virtual agents. Additionally, we have developed a tool which comprises a graphical user interface (GUI) enabling the generation of new lesions and adapting existing ones using an intuitive and interactive inpainting strategy. The tool also offers a novel custom upsampling strategy to achieve a sufficient resolution required for diagnostic purposes. The generated skin lesions have been validated in a user study with trained dermatologists, confirming the overall high quality of the generated lesions and the utility for educational purposes. © 2025 Copyright held by the owner/author(s).},

keywords = {Deep learning, Dermatology, Diffusion Model, diffusion models, Digital elevation model, Generative AI, Graphical user interfaces, Guidance Strategies, Guidance strategy, Image generation, Image generations, Inpainting, Interactive Generation, Medical education, Medical Imaging, Simulation training, Skin lesion, Upsampling, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Alibrahim, Y.; Ibrahim, M.; Gurdayal, D.; Munshi, M.

AI speechbots and 3D segmentations in virtual reality improve radiology on-call training in resource-limited settings Journal Article

In: Intelligence-Based Medicine, vol. 11, 2025, ISSN: 26665212 (ISSN).

Abstract | Links | BibTeX | Tags: 3D segmentation, AI speechbots, Article, artificial intelligence chatbot, ChatGPT, computer assisted tomography, Deep learning, headache, human, Image segmentation, interventional radiology, Large language model, Likert scale, nausea, Proof of concept, prospective study, radiology, radiology on call training, resource limited setting, Teaching, Training, ultrasound, Virtual Reality, voice recognition

@article{alibrahim_ai_2025,

title = {AI speechbots and 3D segmentations in virtual reality improve radiology on-call training in resource-limited settings},

author = {Y. Alibrahim and M. Ibrahim and D. Gurdayal and M. Munshi},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001472313&doi=10.1016%2fj.ibmed.2025.100245&partnerID=40&md5=623a0ceaa07e5516a296420d25c3033b},

doi = {10.1016/j.ibmed.2025.100245},

issn = {26665212 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Intelligence-Based Medicine},

volume = {11},

abstract = {Objective: Evaluate the use of large-language model (LLM) speechbot tools and deep learning-assisted generation of 3D reconstructions when integrated in a virtual reality (VR) setting to teach radiology on-call topics to radiology residents. Methods: Three first year radiology residents in Guyana were enrolled in an 8-week radiology course that focused on preparation for on-call duties. The course, delivered via VR headsets with custom software integrating LLM-powered speechbots trained on imaging reports and 3D reconstructions segmented with the help of a deep learning model. Each session focused on a specific radiology area, employing a didactic and case-based learning approach, enhanced with 3D reconstructions and an LLM-powered speechbot. Post-session, residents reassessed their knowledge and provided feedback on their VR and LLM-powered speechbot experiences. Results/discussion: Residents found that the 3D reconstructions segmented semi-automatically by deep learning algorithms and AI-driven self-learning via speechbot was highly valuable. The 3D reconstructions, especially in the interventional radiology session, were helpful and the benefit is augmented by VR where navigating the models is seamless and perception of depth is pronounced. Residents also found conversing with the AI-speechbot seamless and was valuable in their post session self-learning. The major drawback of VR was motion sickness, which was mild and improved over time. Conclusion: AI-assisted VR radiology education could be used to develop new and accessible ways of teaching a variety of radiology topics in a seamless and cost-effective way. This could be especially useful in supporting radiology education remotely in regions which lack local radiology expertise. © 2025},

keywords = {3D segmentation, AI speechbots, Article, artificial intelligence chatbot, ChatGPT, computer assisted tomography, Deep learning, headache, human, Image segmentation, interventional radiology, Large language model, Likert scale, nausea, Proof of concept, prospective study, radiology, radiology on call training, resource limited setting, Teaching, Training, ultrasound, Virtual Reality, voice recognition},

pubstate = {published},

tppubtype = {article}

}

Zhou, J.; Weber, R.; Wen, E.; Lottridge, D.

Real-Time Full-body Interaction with AI Dance Models: Responsiveness to Contemporary Dance Proceedings Article

In: Int Conf Intell User Interfaces Proc IUI, pp. 1177–1187, Association for Computing Machinery, 2025, ISBN: 979-840071306-4 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, Chatbots, Computer interaction, Deep learning, Deep-Learning Dance Model, Design of Human-Computer Interaction, Digital elevation model, Generative AI, Input output programs, Input sequence, Interactivity, Motion capture, Motion tracking, Movement analysis, Output sequences, Problem oriented languages, Real- time, Text mining, Three dimensional computer graphics, User input, Virtual environments, Virtual Reality

@inproceedings{zhou_real-time_2025,

title = {Real-Time Full-body Interaction with AI Dance Models: Responsiveness to Contemporary Dance},

author = {J. Zhou and R. Weber and E. Wen and D. Lottridge},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001922427&doi=10.1145%2f3708359.3712077&partnerID=40&md5=cea9213198220480b80b7a4840d26ccc},

doi = {10.1145/3708359.3712077},

isbn = {979-840071306-4 (ISBN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Int Conf Intell User Interfaces Proc IUI},

pages = {1177–1187},

publisher = {Association for Computing Machinery},

abstract = {Interactive AI chatbots put the power of Large-Language Models (LLMs) into people's hands; it is this interactivity that fueled explosive worldwide influence. In the generative dance space, however, there are few deep-learning-based generative dance models built with interactivity in mind. The release of the AIST++ dance dataset in 2021 led to an uptick of capabilities in generative dance models. Whether these models could be adapted to support interactivity and how well this approach will work is not known. In this study, we explore the capabilities of existing generative dance models for motion-to-motion synthesis on real-time, full-body motion-captured contemporary dance data. We identify an existing model that we adapted to support interactivity: the Bailando++ model, which is trained on the AIST++ dataset and was modified to take music and a motion sequence as input parameters in an interactive loop. We worked with two professional contemporary choreographers and dancers to record and curate a diverse set of 203 motion-captured dance sequences as a set of "user inputs"captured through the Optitrack high-precision motion capture 3D tracking system. We extracted 17 quantitative movement features from the motion data using the well-established Laban Movement Analysis theory, which allowed for quantitative comparisons of inter-movement correlations, which we used for clustering input data and comparing input and output sequences. A total of 10 pieces of music were used to generate a variety of outputs using the adapted Bailando++ model. We found that, on average, the generated output motion achieved only moderate correlations to the user input, with some exceptions of movement and music pairs achieving high correlation. The high-correlation generated output sequences were deemed responsive and relevant co-creations in relation to the input sequences. We discuss implications for interactive generative dance agents, where the use of 3D joint coordinate data should be used over SMPL parameters for ease of real-time generation, and how the use of Laban Movement Analysis could be used to extract useful features and fine-tune deep-learning models. © 2025 Copyright held by the owner/author(s).},

keywords = {3D modeling, Chatbots, Computer interaction, Deep learning, Deep-Learning Dance Model, Design of Human-Computer Interaction, Digital elevation model, Generative AI, Input output programs, Input sequence, Interactivity, Motion capture, Motion tracking, Movement analysis, Output sequences, Problem oriented languages, Real- time, Text mining, Three dimensional computer graphics, User input, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Interactive AI chatbots put the power of Large-Language Models (LLMs) into people's hands; it is this interactivity that fueled explosive worldwide influence. In the generative dance space, however, there are few deep-learning-based generative dance models built with interactivity in mind. The release of the AIST++ dance dataset in 2021 led to an uptick of capabilities in generative dance models. Whether these models could be adapted to support interactivity and how well this approach will work is not known. In this study, we explore the capabilities of existing generative dance models for motion-to-motion synthesis on real-time, full-body motion-captured contemporary dance data. We identify an existing model that we adapted to support interactivity: the Bailando++ model, which is trained on the AIST++ dataset and was modified to take music and a motion sequence as input parameters in an interactive loop. We worked with two professional contemporary choreographers and dancers to record and curate a diverse set of 203 motion-captured dance sequences as a set of "user inputs"captured through the Optitrack high-precision motion capture 3D tracking system. We extracted 17 quantitative movement features from the motion data using the well-established Laban Movement Analysis theory, which allowed for quantitative comparisons of inter-movement correlations, which we used for clustering input data and comparing input and output sequences. A total of 10 pieces of music were used to generate a variety of outputs using the adapted Bailando++ model. We found that, on average, the generated output motion achieved only moderate correlations to the user input, with some exceptions of movement and music pairs achieving high correlation. The high-correlation generated output sequences were deemed responsive and relevant co-creations in relation to the input sequences. We discuss implications for interactive generative dance agents, where the use of 3D joint coordinate data should be used over SMPL parameters for ease of real-time generation, and how the use of Laban Movement Analysis could be used to extract useful features and fine-tune deep-learning models. © 2025 Copyright held by the owner/author(s).

Banafa, A.

Artificial intelligence in action: Real-world applications and innovations Book

River Publishers, 2025, ISBN: 978-877004619-0 (ISBN); 978-877004620-6 (ISBN).

Abstract | Links | BibTeX | Tags: 5G, Affective Computing, AGI, AI, AI alignments, AI Ethics, AI hallucinations, AI hype, AI models, Alexa, ANI, ASI, Augmented Reality, Autoencoders, Autonomic computing, Autonomous Cars, Autoregressive models, Big Data, Big Data Analytics, Bitcoin, Blockchain, C3PO, Casual AI, Causal reasoning, ChatGPT, Cloud computing, Collective AI, Compression engines, Computer vision, Conditional Automation, Convolutional neural networks (CNNs), Cryptocurrency, Cybersecurity, Deceptive AI, Deep learning, Digital transformation, Driver Assistance, Driverless Cars, Drones, Elon Musk, Entanglement, Environment and sustainability, Ethereum, Explainable AI, Facebook, Facial Recognition, Feedforward. Neural Networks, Fog Computing, Full Automation, Future of AI, General AI, Generative Adversarial Networks (GANs), Generative AI, Google, Green AI, High Automation, Hybrid Blockchain, IEEE, Industrial Internet of Things (IIoT), Internet of things (IoT), Jarvis, Java, JavaScript, Long Short-Term Memory Networks, LTE, machine learning, Microsoft, MultiModal AI, Narrow AI, Natural disasters, Natural Language Generation (NLG), Natural Language Processing (NLP), NetFlix, Network Security, Neural Networks, Nuclear, Nuclear AI, NYTimes, Objective-driven AI, Open Source, Partial Automation, PayPal, Perfect AI, Private Blockchain, Private Cloud Computing, Programming languages, Python, Quantum Communications, Quantum Computing, Quantum Cryptography, Quantum internet, Quantum Machine Learning (QML), R2D2, Reactive machines. limited memory, Recurrent Neural Networks, Responsible AI, Robots, Sci-Fi movies, Self-Aware, Semiconductorâ??s, Sensate AI, Siri, Small Data, Smart Contracts. Hybrid Cloud Computing, Smart Devices, Sovereign AI, Super AI, Superposition, TensorFlow, Theory of Mind, Thick Data, Twitter, Variational Autoencoders (VAEs), Virtual Reality, Voice user interface (VUI), Wearable computing devices (WCD), Wearable Technology, Wi-Fi, XAI, Zero-Trust Model

@book{banafa_artificial_2025,

title = {Artificial intelligence in action: Real-world applications and innovations},

author = {A. Banafa},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000403587&partnerID=40&md5=4b0d94be48194a942b22bef63f36d3bf},

isbn = {978-877004619-0 (ISBN); 978-877004620-6 (ISBN)},

year  = {2025},

date = {2025-01-01},

publisher = {River Publishers},

series = {Artificial Intelligence in Action: Real-World Applications and Innovations},

abstract = {This comprehensive book dives deep into the current landscape of AI, exploring its fundamental principles, development challenges, potential risks, and the cutting-edge breakthroughs that are propelling it forward. Artificial intelligence (AI) is rapidly transforming industries and societies worldwide through groundbreaking innovations and real-world applications. Starting with the core concepts, the book examines the various types of AI systems, generative AI models, and the complexities of machine learning. It delves into the programming languages driving AI development, data pipelines, model creation and deployment processes, while shedding light on issues like AI hallucinations and the intricate path of machine unlearning. The book then showcases the remarkable real-world applications of AI across diverse domains. From preventing job displacement and promoting environmental sustainability, to enhancing disaster response, drone technology, and even nuclear energy innovation, it highlights how AI is tackling complex challenges and driving positive change. The book also explores the double-edged nature of AI, recognizing its tremendous potential while cautioning about the risks of misuse, unintended consequences, and the urgent need for responsible development practices. It examines the intersection of AI and fields like operating system design, warfare, and semiconductor technology, underscoring the wide-ranging implications of this transformative force. As the quest for artificial general intelligence (AGI) and superintelligent AI systems intensifies, the book delves into cutting-edge research, emerging trends, and the pursuit of multimodal, explainable, and causally aware AI systems. It explores the symbiotic relationship between AI and human creativity, the rise of user-friendly "casual AI," and the potential of AI to tackle open-ended tasks. This is an essential guide for understanding the profound impact of AI on our world today and its potential to shape our future. From the frontiers of innovation to the challenges of responsible development, this book offers a comprehensive and insightful exploration of the remarkable real-world applications and innovations driving the AI revolution. © 2025 River Publishers. All rights reserved.},

keywords = {5G, Affective Computing, AGI, AI, AI alignments, AI Ethics, AI hallucinations, AI hype, AI models, Alexa, ANI, ASI, Augmented Reality, Autoencoders, Autonomic computing, Autonomous Cars, Autoregressive models, Big Data, Big Data Analytics, Bitcoin, Blockchain, C3PO, Casual AI, Causal reasoning, ChatGPT, Cloud computing, Collective AI, Compression engines, Computer vision, Conditional Automation, Convolutional neural networks (CNNs), Cryptocurrency, Cybersecurity, Deceptive AI, Deep learning, Digital transformation, Driver Assistance, Driverless Cars, Drones, Elon Musk, Entanglement, Environment and sustainability, Ethereum, Explainable AI, Facebook, Facial Recognition, Feedforward. Neural Networks, Fog Computing, Full Automation, Future of AI, General AI, Generative Adversarial Networks (GANs), Generative AI, Google, Green AI, High Automation, Hybrid Blockchain, IEEE, Industrial Internet of Things (IIoT), Internet of things (IoT), Jarvis, Java, JavaScript, Long Short-Term Memory Networks, LTE, machine learning, Microsoft, MultiModal AI, Narrow AI, Natural disasters, Natural Language Generation (NLG), Natural Language Processing (NLP), NetFlix, Network Security, Neural Networks, Nuclear, Nuclear AI, NYTimes, Objective-driven AI, Open Source, Partial Automation, PayPal, Perfect AI, Private Blockchain, Private Cloud Computing, Programming languages, Python, Quantum Communications, Quantum Computing, Quantum Cryptography, Quantum internet, Quantum Machine Learning (QML), R2D2, Reactive machines. limited memory, Recurrent Neural Networks, Responsible AI, Robots, Sci-Fi movies, Self-Aware, Semiconductorâ??s, Sensate AI, Siri, Small Data, Smart Contracts. Hybrid Cloud Computing, Smart Devices, Sovereign AI, Super AI, Superposition, TensorFlow, Theory of Mind, Thick Data, Twitter, Variational Autoencoders (VAEs), Virtual Reality, Voice user interface (VUI), Wearable computing devices (WCD), Wearable Technology, Wi-Fi, XAI, Zero-Trust Model},

pubstate = {published},

tppubtype = {book}

}

This comprehensive book dives deep into the current landscape of AI, exploring its fundamental principles, development challenges, potential risks, and the cutting-edge breakthroughs that are propelling it forward. Artificial intelligence (AI) is rapidly transforming industries and societies worldwide through groundbreaking innovations and real-world applications. Starting with the core concepts, the book examines the various types of AI systems, generative AI models, and the complexities of machine learning. It delves into the programming languages driving AI development, data pipelines, model creation and deployment processes, while shedding light on issues like AI hallucinations and the intricate path of machine unlearning. The book then showcases the remarkable real-world applications of AI across diverse domains. From preventing job displacement and promoting environmental sustainability, to enhancing disaster response, drone technology, and even nuclear energy innovation, it highlights how AI is tackling complex challenges and driving positive change. The book also explores the double-edged nature of AI, recognizing its tremendous potential while cautioning about the risks of misuse, unintended consequences, and the urgent need for responsible development practices. It examines the intersection of AI and fields like operating system design, warfare, and semiconductor technology, underscoring the wide-ranging implications of this transformative force. As the quest for artificial general intelligence (AGI) and superintelligent AI systems intensifies, the book delves into cutting-edge research, emerging trends, and the pursuit of multimodal, explainable, and causally aware AI systems. It explores the symbiotic relationship between AI and human creativity, the rise of user-friendly "casual AI," and the potential of AI to tackle open-ended tasks. This is an essential guide for understanding the profound impact of AI on our world today and its potential to shape our future. From the frontiers of innovation to the challenges of responsible development, this book offers a comprehensive and insightful exploration of the remarkable real-world applications and innovations driving the AI revolution. © 2025 River Publishers. All rights reserved.

Fernandez, J. A. V.; Lee, J. J.; Vacca, S. A. S.; Magana, A.; Peša, R.; Benes, B.; Popescu, V.

Hands-Free VR Proceedings Article

In: T., Bashford-Rogers; D., Meneveaux; M., Ammi; M., Ziat; S., Jänicke; H., Purchase; P., Radeva; A., Furnari; K., Bouatouch; A.A., Sousa (Ed.): Proc. Int. Jt. Conf. Comput. Vis. Imaging Comput. Graph. Theory Appl., pp. 533–542, Science and Technology Publications, Lda, 2025, ISBN: 21845921 (ISSN).

Abstract | Links | BibTeX | Tags: Deep learning, Large language model, Retrieval-Augmented Generation, Speech-to-Text, Virtual Reality

@inproceedings{fernandez_hands-free_2025,

title = {Hands-Free VR},

author = {J. A. V. Fernandez and J. J. Lee and S. A. S. Vacca and A. Magana and R. Peša and B. Benes and V. Popescu},

editor = {Bashford-Rogers T. and Meneveaux D. and Ammi M. and Ziat M. and Jänicke S. and Purchase H. and Radeva P. and Furnari A. and Bouatouch K. and Sousa A.A.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001963646&doi=10.5220%2f0013115100003912&partnerID=40&md5=a3f2f4e16bcd5e0579b38e062c987eab},

doi = {10.5220/0013115100003912},

isbn = {21845921 (ISSN)},

year  = {2025},

date = {2025-01-01},

booktitle = {Proc. Int. Jt. Conf. Comput. Vis. Imaging Comput. Graph. Theory Appl.},

volume = {1},

pages = {533–542},

publisher = {Science and Technology Publications, Lda},

abstract = {We introduce Hands-Free VR, a voice-based natural-language interface for VR that allows interaction without additional hardware just using voice. The user voice command is converted into text using a fine-tuned speechto-text deep-learning model. Then, the text is mapped to an executable VR command using an LLM, which is robust to natural language diversity. Hands-Free VR was evaluated in a within-subjects study (N = 22) where participants arranged objects using either a conventional VR interface or Hands-Free VR. The results confirm that Hands-Free VR is: (1) significantly more efficient than conventional VR interfaces in task completion time and user motion metrics; (2) highly rated for ease of use, intuitiveness, ergonomics, reliability, and desirability; (3) robust to English accents (20 participants were non-native speakers) and phonetic similarity, accurately transcribing 96.7% of voice commands, and (3) robust to natural language diversity, mapping 97.83% of transcriptions to executable commands. © 2025 by SCITEPRESS–Science and Technology Publications, Lda.},

keywords = {Deep learning, Large language model, Retrieval-Augmented Generation, Speech-to-Text, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

2024

Williams, R.

Deep HoriXons - 3D Virtual Generative AI Assisted Campus for Deep Learning AI and Cybersecurity Proceedings Article

In: M., Blowers; B.T., Wysocki (Ed.): Proc SPIE Int Soc Opt Eng, SPIE, 2024, ISBN: 0277786X (ISSN); 978-151067434-9 (ISBN).

Abstract | Links | BibTeX | Tags: 3D virtual campus, AI and cybersecurity education, AI talent pipeline, ChatGPT digital tutor, CompTIA Security+, Computer aided instruction, Cyber security, Cyber-security educations, Cybersecurity, Deep learning, E-Learning, Immersive, Learning systems, Virtual campus, Virtual learning environments, Virtual Reality

@inproceedings{williams_deep_2024,

title = {Deep HoriXons - 3D Virtual Generative AI Assisted Campus for Deep Learning AI and Cybersecurity},

author = {R. Williams},

editor = {Blowers M. and Wysocki B.T.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85196555361&doi=10.1117%2f12.3011374&partnerID=40&md5=ff7392a37a51044c79d4d2824c9cf46b},

doi = {10.1117/12.3011374},

isbn = {0277786X (ISSN); 978-151067434-9 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc SPIE Int Soc Opt Eng},

volume = {13058},

publisher = {SPIE},

abstract = {This abstract outlines two significant innovations in AI and cybersecurity education within the "Deep HoriXons" 3D virtual campus, addressing the urgent need for skilled professionals in these domains. First, the paper introduces "Deep HoriXons," an immersive 3D virtual learning environment designed to democratize and enhance the educational experience for AI and cybersecurity. This innovation is notable for its global accessibility and ability to simulate real-world scenarios, providing an interactive platform for experiential learning, which is a marked departure from traditional educational models. The second innovation discussed is the strategic integration of ChatGPT as a digital educator and tutor within this virtual environment. ChatGPT's role is pivotal in offering tailored, real-time educational support, making complex AI and cybersecurity concepts more accessible and engaging for learners. This application of ChatGPT is an innovation worth noting for its ability to adapt to individual learning styles, provide interactive scenario-based learning, and support a deeper understanding of technical subjects through dynamic, responsive interaction. Together, these innovations represent a significant advancement in the field of AI and cybersecurity education, addressing the critical talent shortage by making high-quality, interactive learning experiences accessible on a global scale. The paper highlights the importance of these innovations in creating a skilled workforce capable of tackling the evolving challenges in AI and cybersecurity, underscoring the need for ongoing research and development in this area. © 2024 SPIE.},

keywords = {3D virtual campus, AI and cybersecurity education, AI talent pipeline, ChatGPT digital tutor, CompTIA Security+, Computer aided instruction, Cyber security, Cyber-security educations, Cybersecurity, Deep learning, E-Learning, Immersive, Learning systems, Virtual campus, Virtual learning environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Clocchiatti, A.; Fumero, N.; Soccini, A. M.

Character Animation Pipeline based on Latent Diffusion and Large Language Models Proceedings Article

In: Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR, pp. 398–405, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037202-1 (ISBN).

Abstract | Links | BibTeX | Tags: Animation, Animation pipeline, Artificial intelligence, Augmented Reality, Character animation, Computational Linguistics, Computer animation, Deep learning, Diffusion, E-Learning, Extended reality, Film production, Generative art, Language Model, Learning systems, Learning techniques, Natural language processing systems, Pipelines, Production pipelines, Virtual Reality

@inproceedings{clocchiatti_character_2024,

title = {Character Animation Pipeline based on Latent Diffusion and Large Language Models},

author = {A. Clocchiatti and N. Fumero and A. M. Soccini},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85187217072&doi=10.1109%2fAIxVR59861.2024.00067&partnerID=40&md5=d88b9ba7c80d49b60fd0d7acd5e7c4f0},

doi = {10.1109/AIxVR59861.2024.00067},

isbn = {979-835037202-1 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Proc. - IEEE Int. Conf. Artif. Intell. Ext. Virtual Real., AIxVR},

pages = {398–405},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Artificial intelligence and deep learning techniques are revolutionizing the film production pipeline. The majority of the current screenplay-to-animation pipelines focus on understanding the screenplay through natural language processing techniques, and on the generation of the animation through custom engines, missing the possibility to customize the characters. To address these issues, we propose a high-level pipeline for generating 2D characters and animations starting from screenplays, through a combination of Latent Diffusion Models and Large Language Models. Our approach uses ChatGPT to generate character descriptions starting from the screenplay. Then, using that data, it generates images of custom characters with Stable Diffusion and animates them according to their actions in different scenes. The proposed approach avoids well-known problems in generative AI tools such as temporal inconsistency and lack of control on the outcome. The results suggest that the pipeline is consistent and reliable, benefiting industries ranging from film production to virtual, augmented and extended reality content creation. © 2024 IEEE.},

keywords = {Animation, Animation pipeline, Artificial intelligence, Augmented Reality, Character animation, Computational Linguistics, Computer animation, Deep learning, Diffusion, E-Learning, Extended reality, Film production, Generative art, Language Model, Learning systems, Learning techniques, Natural language processing systems, Pipelines, Production pipelines, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Jayaraman, S.; Bhavya, R.; Srihari, V.; Rajam, V. Mary Anita

TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions Proceedings Article

In: IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835037687-6 (ISBN).

Abstract | Links | BibTeX | Tags: Adversarial networks, Computer simulation languages, Deep learning, Depth Estimation, Depth perception, Diffusion Model, diffusion models, Digital elevation model, Generative adversarial networks, Generative model, Generative systems, Language Model, Motion capture, Stereo image processing, Text-to-image, Training data, Video analysis, Video-clips, Virtual environments, Virtual Reality

@inproceedings{jayaraman_texavi_2024,

title = {TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions},

author = {S. Jayaraman and R. Bhavya and V. Srihari and V. Mary Anita Rajam},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85215265234&doi=10.1109%2fCVMI61877.2024.10782691&partnerID=40&md5=8e20576af67b917ecfad83873a87ef29},

doi = {10.1109/CVMI61877.2024.10782691},

isbn = {979-835037687-6 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {IEEE Int. Conf. Comput. Vis. Mach. Intell., CVMI},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {While generative models such as text-to-image, large language models and text-to-video have seen significant progress, the extension to text-to-virtual-reality remains largely unexplored, due to a deficit in training data and the complexity of achieving realistic depth and motion in virtual environments. This paper proposes an approach to coalesce existing generative systems to form a stereoscopic virtual reality video from text. Carried out in three main stages, we start with a base text-to-image model that captures context from an input text. We then employ Stable Diffusion on the rudimentary image produced, to generate frames with enhanced realism and overall quality. These frames are processed with depth estimation algorithms to create left-eye and right-eye views, which are stitched side-by-side to create an immersive viewing experience. Such systems would be highly beneficial in virtual reality production, since filming and scene building often require extensive hours of work and post-production effort. We utilize image evaluation techniques, specifically Fréchet Inception Distance and CLIP Score, to assess the visual quality of frames produced for the video. These quantitative measures establish the proficiency of the proposed method. Our work highlights the exciting possibilities of using natural language-driven graphics in fields like virtual reality simulations. © 2024 IEEE.},

keywords = {Adversarial networks, Computer simulation languages, Deep learning, Depth Estimation, Depth perception, Diffusion Model, diffusion models, Digital elevation model, Generative adversarial networks, Generative model, Generative systems, Language Model, Motion capture, Stereo image processing, Text-to-image, Training data, Video analysis, Video-clips, Virtual environments, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Si, J.; Yang, S.; Song, J.; Son, S.; Lee, S.; Kim, D.; Kim, S.

Generating and Integrating Diffusion Model-Based Panoramic Views for Virtual Interview Platform Proceedings Article

In: IEEE Int. Conf. Artif. Intell. Eng. Technol., IICAIET, pp. 343–348, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 979-835038969-2 (ISBN).

Abstract | Links | BibTeX | Tags: AI, Deep learning, Diffusion, Diffusion Model, Diffusion technology, Digital elevation model, High quality, Manual process, Model-based OPC, New approaches, Panorama, Panoramic views, Virtual environments, Virtual Interview, Virtual Reality

@inproceedings{si_generating_2024,

title = {Generating and Integrating Diffusion Model-Based Panoramic Views for Virtual Interview Platform},

author = {J. Si and S. Yang and J. Song and S. Son and S. Lee and D. Kim and S. Kim},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209663031&doi=10.1109%2fIICAIET62352.2024.10730450&partnerID=40&md5=a52689715ec912c54696948c34fc0263},

doi = {10.1109/IICAIET62352.2024.10730450},

isbn = {979-835038969-2 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {IEEE Int. Conf. Artif. Intell. Eng. Technol., IICAIET},

pages = {343–348},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This paper presents a new approach to improve virtual interview platforms in education, which are gaining significant attention. This study aims to simplify the complex manual process of equipment setup to enhance the realism and reliability of virtual interviews. To this end, this study proposes a method for automatically constructing 3D virtual interview environments using diffusion technology in generative AI. In this research, we exploit a diffusion model capable of generating high-quality panoramic images. We generate images of interview rooms capable of delivering immersive interview experiences via refined text prompts. The resulting imagery is then reconstituted 3D VR content utilizing the Unity engine, facilitating enhanced interaction and engagement within virtual environments. This research compares and analyzes various methods presented in related research and proposes a new process for efficiently constructing 360-degree virtual environments. When wearing Oculus Quest 2 and experiencing the virtual environment created using the proposed method, a high sense of immersion was experienced, similar to the actual interview environment. © 2024 IEEE.},

keywords = {AI, Deep learning, Diffusion, Diffusion Model, Diffusion technology, Digital elevation model, High quality, Manual process, Model-based OPC, New approaches, Panorama, Panoramic views, Virtual environments, Virtual Interview, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

Zheng, P.; Li, C.; Fan, J.; Wang, L.

A vision-language-guided and deep reinforcement learning-enabled approach for unstructured human-robot collaborative manufacturing task fulfilment Journal Article

In: CIRP Annals, vol. 73, no. 1, pp. 341–344, 2024, ISSN: 00078506 (ISSN).

Abstract | Links | BibTeX | Tags: Collaboration task, Collaborative manufacturing, Deep learning, Helmet mounted displays, Human robots, Human-centric, Human-guided robot learning, Human-Robot Collaboration, Interface states, Manipulators, Manufacturing system, Manufacturing tasks, Mixed reality, Mixed reality head-mounted displays, Reinforcement Learning, Reinforcement learnings, Robot vision, Smart manufacturing

Federico, G.; Carrara, F.; Amato, G.; Benedetto, M. Di

Spatio-Temporal 3D Reconstruction from Frame Sequences and Feature Points Proceedings Article

In: ACM Int. Conf. Proc. Ser., pp. 52–64, Association for Computing Machinery, 2024, ISBN: 979-840071794-9 (ISBN).

Abstract | Links | BibTeX | Tags: 3D reconstruction, Adversarial machine learning, Artificial intelligence, Color motion pictures, Color photography, Contrastive Learning, De-noising, Deep learning, Denoising Diffusion Probabilistic Model, Frame features, machine learning, Machine-learning, Probabilistic models, Signed Distance Field, Signed distance fields, Spatio-temporal, Video Reconstruction, Video streaming

@inproceedings{federico_spatio-temporal_2024,

title = {Spatio-Temporal 3D Reconstruction from Frame Sequences and Feature Points},

author = {G. Federico and F. Carrara and G. Amato and M. Di Benedetto},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203128613&doi=10.1145%2f3672406.3672415&partnerID=40&md5=2a0dc51baa15f0dcd7f9d2cca708ec15},

doi = {10.1145/3672406.3672415},

isbn = {979-840071794-9 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {ACM Int. Conf. Proc. Ser.},

pages = {52–64},

publisher = {Association for Computing Machinery},

abstract = {Reconstructing a large real environment is a fundamental task to promote eXtended Reality adoption in industrial and entertainment fields. However, the short range of depth cameras, the sparsity of LiDAR sensors, and the huge computational cost of Structure-from-Motion pipelines prevent scene replication in near real time. To overcome these limitations, we introduce a spatio-temporal diffusion neural architecture, a generative AI technique that fuses temporal information (i.e., a short temporally-ordered list of color photographs, like sparse frames of a video stream) with an approximate spatial resemblance of the explored environment. Our aim is to modify an existing 3D diffusion neural model to produce a Signed Distance Field volume from which a 3D mesh representation can be extracted. Our results show that the hallucination approach of diffusion models is an effective methodology where a fast reconstruction is a crucial target. © 2024 Owner/Author.},

keywords = {3D reconstruction, Adversarial machine learning, Artificial intelligence, Color motion pictures, Color photography, Contrastive Learning, De-noising, Deep learning, Denoising Diffusion Probabilistic Model, Frame features, machine learning, Machine-learning, Probabilistic models, Signed Distance Field, Signed distance fields, Spatio-temporal, Video Reconstruction, Video streaming},

pubstate = {published},

tppubtype = {inproceedings}

}

He, K.; Lapham, A.; Li, Z.

Enhancing Narratives with SayMotion's text-to-3D animation and LLMs Proceedings Article

In: S.N., Spencer (Ed.): Proc. - SIGGRAPH Real-Time Live!, Association for Computing Machinery, Inc, 2024, ISBN: 979-840070526-7 (ISBN).

Abstract | Links | BibTeX | Tags: 3D animation, AI-based animation, Animation, Animation editing, Deep learning, Film production, Human motions, Interactive computer graphics, Interactive media, Language Model, Motion models, Physics simulation, Production medium, Simulation platform, Three dimensional computer graphics

2023

Park, J.; Choi, J.; Kim, S. -L.; Bennis, M.

Enabling the Wireless Metaverse via Semantic Multiverse Communication Proceedings Article

In: Annu. IEEE Commun.Soc. Conf. Sens., Mesh Ad Hoc Commun. Netw. workshops, pp. 85–90, IEEE Computer Society, 2023, ISBN: 21555486 (ISSN); 979-835030052-9 (ISBN).

Abstract | Links | BibTeX | Tags: Deep learning, Extended reality (XR), Federated learning, Fertilizers, Learn+, Learning systems, Metaverse, Metaverses, Modal analysis, Multi agent systems, Multi-agent reinforcement learning, Multi-modal data, Reinforcement Learning, Semantic communication, Semantics, Signal encoding, Signaling game, Split learning, Symbolic artificial intelligence

@inproceedings{park_enabling_2023,

title = {Enabling the Wireless Metaverse via Semantic Multiverse Communication},

author = {J. Park and J. Choi and S. -L. Kim and M. Bennis},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85177465286&doi=10.1109%2fSECON58729.2023.10287438&partnerID=40&md5=b052572fb2f78ce0694c7ae5726c8daf},

doi = {10.1109/SECON58729.2023.10287438},

isbn = {21555486 (ISSN); 979-835030052-9 (ISBN)},

year  = {2023},

date = {2023-01-01},

booktitle = {Annu. IEEE Commun.Soc. Conf. Sens., Mesh Ad Hoc Commun. Netw. workshops},

volume = {2023-September},

pages = {85–90},

publisher = {IEEE Computer Society},

abstract = {Metaverse over wireless networks is an emerging use case of the sixth generation (6G) wireless systems, posing unprecedented challenges in terms of its multi-modal data transmissions with stringent latency and reliability requirements. Towards enabling this wireless metaverse, in this article we propose a novel semantic communication (SC) framework by decomposing the metaverse into human/machine agent-specific semantic multiverses (SMs). An SM stored at each agent comprises a semantic encoder and a generator, leveraging recent advances in generative artificial intelligence (AI). To improve communication efficiency, the encoder learns the semantic representations (SRs) of multi-modal data, while the generator learns how to manipulate them for locally rendering scenes and interactions in the metaverse. Since these learned SMs are biased towards local environments, their success hinges on synchronizing heterogeneous SMs in the background while communicating SRs in the foreground, turning the wireless metaverse problem into the problem of semantic multiverse communication (SMC). Based on this SMC architecture, we propose several promising algorithmic and analytic tools for modeling and designing SMC, ranging from distributed learning and multi-agent reinforcement learning (MARL) to signaling games and symbolic AI. © 2023 IEEE.},

keywords = {Deep learning, Extended reality (XR), Federated learning, Fertilizers, Learn+, Learning systems, Metaverse, Metaverses, Modal analysis, Multi agent systems, Multi-agent reinforcement learning, Multi-modal data, Reinforcement Learning, Semantic communication, Semantics, Signal encoding, Signaling game, Split learning, Symbolic artificial intelligence},

pubstate = {published},

tppubtype = {inproceedings}

}

Banafa, A.

Transformative AI: Responsible, Transparent, and Trustworthy AI Systems Book

River Publishers, 2023, ISBN: 978-877004018-1 (ISBN); 978-877004019-8 (ISBN).

Abstract | Links | BibTeX | Tags: 5G, Affective Computing, AI, AI Ethics, Alexa, Augment Reality, Autoencoders, Autonomous Cars, Autoregressive models, Big Data, Big Data Analytics, Bitcoin, Blockchain, C3PO, ChatGPT, Cloud computing, CNN, Computer vision, Conditional Automation, Convolutional Neural Networks, Cryptocurrency, Cybersecurity, Deep learning, Digital transformation, Driver Assistance, Driverless Cars, Entanglement, Ethereum, Explainable AI. Environment and sustainability, Facebook, Facial Recognition, Feedforward. Neural Networks, Fog Computing, Full Automation, General AI, Generative Adversarial Networks (GANs), Generative AI, Google, High Automation, Hybrid Blockchain, IEEE, IIoT, Industrial Internet of Things, Internet of Things, IoT, Jarvis, Long Short-Term Memory Networks, LTE, Machin Learning, Microsoft, Narrow AI, Natural Language Generation (NLG), Natural Language Processing (NLP), NetFlix, Network Security, Neural Networks, NYTimes, Open Source, Partial Automation, PayPal, Private Blockchain, Private Cloud Computing, Quantum Communications, Quantum Computing, Quantum Cryptography, Quantum Internet. Wearable Computing Devices (WCD). Autonomic Computing, Quantum Machine Learning (QML), R2D2, Reactive Machines . Limited Memory, Recurrent Neural Networks, Robots, Sci-Fi movies, Self-Aware, Siri, Small Data, Smart Contracts. Hybrid Cloud Computing, Smart Devices, Super AI, Superposition, Theory of Mind, Thick Data, Twitter, Variational Autoencoders (VAEs), Virtual Reality, Voice User Interface, VUI, Wearable Technology, Wi-Fi, Zero-Trust Model

@book{banafa_transformative_2023,

title = {Transformative AI: Responsible, Transparent, and Trustworthy AI Systems},

author = {A. Banafa},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85180544759&partnerID=40&md5=c1fcd00f4b40e16156d9877185f66554},

isbn = {978-877004018-1 (ISBN); 978-877004019-8 (ISBN)},

year  = {2023},

date = {2023-01-01},

publisher = {River Publishers},

series = {Transformative AI: Responsible, Transparent, and Trustworthy AI Systems},

abstract = {Transformative AI provides a comprehensive overview of the latest trends, challenges, applications, and opportunities in the field of Artificial Intelligence. The book covers the state of the art in AI research, including machine learning, natural language processing, computer vision, and robotics, and explores how these technologies are transforming various industries and domains, such as healthcare, finance, education, and entertainment. The book also addresses the challenges that come with the widespread adoption of AI, including ethical concerns, bias, and the impact on jobs and society. It provides insights into how to mitigate these challenges and how to design AI systems that are responsible, transparent, and trustworthy. The book offers a forward-looking perspective on the future of AI, exploring the emerging trends and applications that are likely to shape the next decade of AI innovation. It also provides practical guidance for businesses and individuals on how to leverage the power of AI to create new products, services, and opportunities. Overall, the book is an essential read for anyone who wants to stay ahead of the curve in the rapidly evolving field of Artificial Intelligence and understand the impact that this transformative technology will have on our lives in the coming years. © 2024 River Publishers. All rights reserved.},

keywords = {5G, Affective Computing, AI, AI Ethics, Alexa, Augment Reality, Autoencoders, Autonomous Cars, Autoregressive models, Big Data, Big Data Analytics, Bitcoin, Blockchain, C3PO, ChatGPT, Cloud computing, CNN, Computer vision, Conditional Automation, Convolutional Neural Networks, Cryptocurrency, Cybersecurity, Deep learning, Digital transformation, Driver Assistance, Driverless Cars, Entanglement, Ethereum, Explainable AI. Environment and sustainability, Facebook, Facial Recognition, Feedforward. Neural Networks, Fog Computing, Full Automation, General AI, Generative Adversarial Networks (GANs), Generative AI, Google, High Automation, Hybrid Blockchain, IEEE, IIoT, Industrial Internet of Things, Internet of Things, IoT, Jarvis, Long Short-Term Memory Networks, LTE, Machin Learning, Microsoft, Narrow AI, Natural Language Generation (NLG), Natural Language Processing (NLP), NetFlix, Network Security, Neural Networks, NYTimes, Open Source, Partial Automation, PayPal, Private Blockchain, Private Cloud Computing, Quantum Communications, Quantum Computing, Quantum Cryptography, Quantum Internet. Wearable Computing Devices (WCD). Autonomic Computing, Quantum Machine Learning (QML), R2D2, Reactive Machines . Limited Memory, Recurrent Neural Networks, Robots, Sci-Fi movies, Self-Aware, Siri, Small Data, Smart Contracts. Hybrid Cloud Computing, Smart Devices, Super AI, Superposition, Theory of Mind, Thick Data, Twitter, Variational Autoencoders (VAEs), Virtual Reality, Voice User Interface, VUI, Wearable Technology, Wi-Fi, Zero-Trust Model},

pubstate = {published},

tppubtype = {book}

}

Wang, J.; Chen, S.; Liu, Y.; Lau, R.

Intelligent Metaverse Scene Content Construction Journal Article

In: IEEE Access, vol. 11, pp. 76222–76241, 2023, ISSN: 21693536 (ISSN).

Abstract | Links | BibTeX | Tags: Bridges, Content generation, Contents constructions, Current situation, Deep learning, immersive visualization, Intelligent Agents, Metaverse, Metaverses, Solid modelling, Three dimensional computer graphics, Three dimensional displays, Three-dimensional display, Virtual Reality, Visual content, Visualization

@article{wang_intelligent_2023,

title = {Intelligent Metaverse Scene Content Construction},

author = {J. Wang and S. Chen and Y. Liu and R. Lau},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85165350593&doi=10.1109%2fACCESS.2023.3297873&partnerID=40&md5=6004d639bc6313f19a1276588c6d092c},

doi = {10.1109/ACCESS.2023.3297873},

issn = {21693536 (ISSN)},

year  = {2023},

date = {2023-01-01},

journal = {IEEE Access},

volume = {11},

pages = {76222–76241},

abstract = {The integration of artificial intelligence (AI) and virtual reality (VR) has revolutionized research across various scientific fields, with AI-driven VR simulations finding applications in education, healthcare, and entertainment. However, existing literature lacks a comprehensive investigation that systematically summarizes the fundamental characteristics and development trajectory of AI-generated visual content in the metaverse. This survey focuses on intelligent metaverse scene content construction, aiming to address this gap by exploring the application of AI in content generation. It investigates scene content generation, simulation biology, personalized content, and intelligent agents. Analyzing the current state and identifying common features, this survey provides a detailed description of methods for constructing intelligent metaverse scenes. The primary contribution is a comprehensive analysis of the current landscape of intelligent visual content production in the metaverse, highlighting emerging trends. The discussion on methods for constructing intelligent scene content in the metaverse suggests that in the era of intelligence, it has the potential to become the dominant approach for content creation in metaverse scenes. © 2013 IEEE.},

keywords = {Bridges, Content generation, Contents constructions, Current situation, Deep learning, immersive visualization, Intelligent Agents, Metaverse, Metaverses, Solid modelling, Three dimensional computer graphics, Three dimensional displays, Three-dimensional display, Virtual Reality, Visual content, Visualization},

pubstate = {published},

tppubtype = {article}

}

Stacchio, L.; Scorolli, C.; Marfia, G.

Evaluating Human Aesthetic and Emotional Aspects of 3D generated content through eXtended Reality Proceedings Article

In: A., De Filippo; M., Milano; V., Presutti; A., Saffiotti (Ed.): CEUR Workshop Proc., pp. 38–49, CEUR-WS, 2023, ISBN: 16130073 (ISSN).

Abstract | Links | BibTeX | Tags: aesthetic evaluation, Creative industries, Deep learning, Effective tool, Emotional aspect, Entertainment industry, Esthetic evaluation, Extended reality, generative artificial intelligence, Human-in-the-loop, Learning systems, Metaverses, Multimedia contents, Production efficiency, Three dimensional computer graphics, Virtual Reality

@inproceedings{stacchio_evaluating_2023,

title = {Evaluating Human Aesthetic and Emotional Aspects of 3D generated content through eXtended Reality},

author = {L. Stacchio and C. Scorolli and G. Marfia},

editor = {De Filippo A. and Milano M. and Presutti V. and Saffiotti A.},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85176617276&partnerID=40&md5=14d9d23320d6ed236cbb4b0c562bec06},

isbn = {16130073 (ISSN)},

year  = {2023},

date = {2023-01-01},

booktitle = {CEUR Workshop Proc.},

volume = {3519},

pages = {38–49},

publisher = {CEUR-WS},

abstract = {The Metaverse era is rapidly shaping novel and effective tools particularly useful in the entertainment and creative industry. A fundamental role is played by modern generative deep learning models, that can be used to provide varied and high-quality multimedia content, considerably lowering costs while increasing production efficiency. The goodness of such models is usually evaluated quantitatively with established metrics on data and humans using simple constructs such as the Mean Opinion Score. However, these scales and scores don't take into account the aesthetical and emotional components, which could play a role in positively controlling the automatic generation of multimedia content while at the same time introducing novel forms of human-in-the-loop in generative deep learning. Furthermore, considering data such as 3D models/scenes, and 360° panorama images and videos, conventional display hardware may not be the most effective means for human evaluation. A first solution to such a problem could consist of employing eXtendend Reality paradigms and devices. Considering all such aspects, we here discuss a recent contribution that adopted a well-known scale to evaluate the aesthetic and emotional experience of watching a 360° video of a musical concert in Virtual Reality (VR) compared to a classical 2D webstream, showing that adopting fully immersive VR experience could be a possible path to follow. © 2023 CEUR-WS. All rights reserved.},

keywords = {aesthetic evaluation, Creative industries, Deep learning, Effective tool, Emotional aspect, Entertainment industry, Esthetic evaluation, Extended reality, generative artificial intelligence, Human-in-the-loop, Learning systems, Metaverses, Multimedia contents, Production efficiency, Three dimensional computer graphics, Virtual Reality},

pubstate = {published},

tppubtype = {inproceedings}

}

2021

Franchini, Silvia; Vitabile, Salvatore

Geometric Calculus Applications to Medical Imaging: Status and Perspectives Proceedings Article

In: Xambó-Descamps, Sebasti`a (Ed.): Systems, Patterns and Data Engineering with Geometric Calculi, pp. 31–46, Springer International Publishing, Cham, 2021, ISBN: 978-3-030-74486-1.

Abstract | Links | BibTeX | Tags: 3D modeling, Clifford algebra, Deep learning, Geometric algebra, Geometric Calculus, Medical image classification, Medical image registration, Medical image segmentation, Medical Imaging, radiomics

Franchini, Silvia; Vitabile, Salvatore

Geometric Calculus Applications to Medical Imaging: Status and Perspectives Proceedings Article

In: Xambó-Descamps, Sebastià (Ed.): Systems, Patterns and Data Engineering with Geometric Calculi, pp. 31–46, Springer International Publishing, Cham, 2021, ISBN: 978-3-030-74486-1.

2017

Augello, Agnese; Cipolla, Emanuele; Infantino, Ignazio; Manfr`e, Adriano; Pilato, Giovanni; Vella, Filippo

Creative Robot Dance with Variational Encoder Proceedings Article

In: A., Jordanous A. Pease A. Goel (Ed.): Proceedings of the 8th International Conference on Computational Creativity, ICCC 2017, Georgia Institute of Technology, 2017, ISBN: 978-0-692-89564-1.

Abstract | BibTeX | Tags: Anthropomorphic Robots, Computational Creativity, Creative Agents, Deep learning, Robotics

Vella, Filippo; Augello, Agnese; Maniscalco, Umberto; Bentivenga, Vincenzo; Gaglio, Salvatore

Classification of Indoor Actions through Deep Neural Networks Proceedings Article

In: G., Dipanda A. Chbeir R. Gallo L. Yetongnon K. De Pietro (Ed.): Proceedings - 12th International Conference on Signal Image Technology and Internet-Based Systems, SITIS 2016, pp. 82–87, Institute of Electrical and Electronics Engineers Inc., 2017, ISBN: 978-1-5090-5698-9.

Abstract | Links | BibTeX | Tags: Action Recognition, Convolutional Neural Networks, Deep learning, RGB-D