AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Tracy, K.; Spantidi, O.

Impact of GPT-Driven Teaching Assistants in VR Learning Environments Journal Article

In: IEEE Transactions on Learning Technologies, vol. 18, pp. 192–205, 2025, ISSN: 19391382 (ISSN).

Abstract | Links | BibTeX | Tags: Adversarial machine learning, Cognitive loads, Computer interaction, Contrastive Learning, Control groups, Experimental groups, Federated learning, Generative AI, Generative artificial intelligence (GenAI), human–computer interaction, Interactive learning environment, interactive learning environments, Learning efficacy, Learning outcome, learning outcomes, Student engagement, Teaching assistants, Virtual environments, Virtual Reality (VR)

@article{tracy_impact_2025,

title = {Impact of GPT-Driven Teaching Assistants in VR Learning Environments},

author = {K. Tracy and O. Spantidi},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001083336&doi=10.1109%2fTLT.2025.3539179&partnerID=40&md5=34fea4ea8517a061fe83b8294e1a9a87},

doi = {10.1109/TLT.2025.3539179},

issn = {19391382 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {IEEE Transactions on Learning Technologies},

volume = {18},

pages = {192–205},

abstract = {Virtual reality (VR) has emerged as a transformative educational tool, enabling immersive learning environments that promote student engagement and understanding of complex concepts. However, despite the growing adoption of VR in education, there remains a significant gap in research exploring how generative artificial intelligence (AI), such as generative pretrained transformer can further enhance these experiences by reducing cognitive load and improving learning outcomes. This study examines the impact of an AI-driven instructor assistant in VR classrooms on student engagement, cognitive load, knowledge retention, and performance. A total of 52 participants were divided into two groups experiencing a VR lesson on the bubble sort algorithm, one with only a prescripted virtual instructor (control group), and the other with the addition of an AI instructor assistant (experimental group). Statistical analysis of postlesson quizzes and cognitive load assessments was conducted using independent t-tests and analysis of variance (ANOVA), with the cognitive load being measured through a postexperiment questionnaire. The study results indicate that the experimental group reported significantly higher engagement compared to the control group. While the AI assistant did not significantly improve postlesson assessment scores, it enhanced conceptual knowledge transfer. The experimental group also demonstrated lower intrinsic cognitive load, suggesting the assistant reduced the perceived complexity of the material. Higher germane and general cognitive loads indicated that students were more invested in meaningful learning without feeling overwhelmed. © 2008-2011 IEEE.},

keywords = {Adversarial machine learning, Cognitive loads, Computer interaction, Contrastive Learning, Control groups, Experimental groups, Federated learning, Generative AI, Generative artificial intelligence (GenAI), human–computer interaction, Interactive learning environment, interactive learning environments, Learning efficacy, Learning outcome, learning outcomes, Student engagement, Teaching assistants, Virtual environments, Virtual Reality (VR)},

pubstate = {published},

tppubtype = {article}

}

Oskooei, A. Rafiei; Aktaş, M. S.; Keleş, M.

Seeing the Sound: Multilingual Lip Sync for Real-Time Face-to-Face Translation † Journal Article

In: Computers, vol. 14, no. 1, 2025, ISSN: 2073431X (ISSN).

Abstract | Links | BibTeX | Tags: Computer vision, Deep learning, face-to-face translation, Generative AI, human–computer interaction, lip synchronization, talking head generation

@article{rafiei_oskooei_seeing_2025,

title = {Seeing the Sound: Multilingual Lip Sync for Real-Time Face-to-Face Translation †},

author = {A. Rafiei Oskooei and M. S. Aktaş and M. Keleş},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85215974883&doi=10.3390%2fcomputers14010007&partnerID=40&md5=f4d244e3e1cba572d2a3beb9c0895d32},

doi = {10.3390/computers14010007},

issn = {2073431X (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Computers},

volume = {14},

number = {1},

abstract = {Imagine a future where language is no longer a barrier to real-time conversations, enabling instant and lifelike communication across the globe. As cultural boundaries blur, the demand for seamless multilingual communication has become a critical technological challenge. This paper addresses the lack of robust solutions for real-time face-to-face translation, particularly for low-resource languages, by introducing a comprehensive framework that not only translates language but also replicates voice nuances and synchronized facial expressions. Our research tackles the primary challenge of achieving accurate lip synchronization across culturally diverse languages, filling a significant gap in the literature by evaluating the generalizability of lip sync models beyond English. Specifically, we develop a novel evaluation framework combining quantitative lip sync error metrics and qualitative assessments by human observers. This framework is applied to assess two state-of-the-art lip sync models with different architectures for Turkish, Persian, and Arabic languages, using a newly collected dataset. Based on these findings, we propose and implement a modular system that integrates language-agnostic lip sync models with neural networks to deliver a fully functional face-to-face translation experience. Inference Time Analysis shows this system achieves highly realistic, face-translated talking heads in real time, with a throughput as low as 0.381 s. This transformative framework is primed for deployment in immersive environments such as VR/AR, Metaverse ecosystems, and advanced video conferencing platforms. It offers substantial benefits to developers and businesses aiming to build next-generation multilingual communication systems for diverse applications. While this work focuses on three languages, its modular design allows scalability to additional languages. However, further testing in broader linguistic and cultural contexts is required to confirm its universal applicability, paving the way for a more interconnected and inclusive world where language ceases to hinder human connection. © 2024 by the authors.},

keywords = {Computer vision, Deep learning, face-to-face translation, Generative AI, human–computer interaction, lip synchronization, talking head generation},

pubstate = {published},

tppubtype = {article}

}

Imagine a future where language is no longer a barrier to real-time conversations, enabling instant and lifelike communication across the globe. As cultural boundaries blur, the demand for seamless multilingual communication has become a critical technological challenge. This paper addresses the lack of robust solutions for real-time face-to-face translation, particularly for low-resource languages, by introducing a comprehensive framework that not only translates language but also replicates voice nuances and synchronized facial expressions. Our research tackles the primary challenge of achieving accurate lip synchronization across culturally diverse languages, filling a significant gap in the literature by evaluating the generalizability of lip sync models beyond English. Specifically, we develop a novel evaluation framework combining quantitative lip sync error metrics and qualitative assessments by human observers. This framework is applied to assess two state-of-the-art lip sync models with different architectures for Turkish, Persian, and Arabic languages, using a newly collected dataset. Based on these findings, we propose and implement a modular system that integrates language-agnostic lip sync models with neural networks to deliver a fully functional face-to-face translation experience. Inference Time Analysis shows this system achieves highly realistic, face-translated talking heads in real time, with a throughput as low as 0.381 s. This transformative framework is primed for deployment in immersive environments such as VR/AR, Metaverse ecosystems, and advanced video conferencing platforms. It offers substantial benefits to developers and businesses aiming to build next-generation multilingual communication systems for diverse applications. While this work focuses on three languages, its modular design allows scalability to additional languages. However, further testing in broader linguistic and cultural contexts is required to confirm its universal applicability, paving the way for a more interconnected and inclusive world where language ceases to hinder human connection. © 2024 by the authors.