AHCI RESEARCH GROUP
Publications
Papers published in international journals,
proceedings of conferences, workshops and books.
OUR RESEARCH
Scientific Publications
How to
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.
2025
Kai, W. -H.; Xing, K. -X.
Video-driven musical composition using large language model with memory-augmented state space Journal Article
In: Visual Computer, vol. 41, no. 5, pp. 3345–3357, 2025, ISSN: 01782789 (ISSN).
Abstract | Links | BibTeX | Tags: 'current, Associative storage, Augmented Reality, Augmented state space, Computer simulation languages, Computer system recovery, Distributed computer systems, HTTP, Language Model, Large language model, Long-term video-to-music generation, Mamba, Memory architecture, Memory-augmented, Modeling languages, Music, Musical composition, Natural language processing systems, Object oriented programming, Performance, Problem oriented languages, State space, State-space
@article{kai_video-driven_2025,
title = {Video-driven musical composition using large language model with memory-augmented state space},
author = {W. -H. Kai and K. -X. Xing},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105001073242&doi=10.1007%2fs00371-024-03606-w&partnerID=40&md5=7ea24f13614a9a24caf418c37a10bd8c},
doi = {10.1007/s00371-024-03606-w},
issn = {01782789 (ISSN)},
year = {2025},
date = {2025-01-01},
journal = {Visual Computer},
volume = {41},
number = {5},
pages = {3345–3357},
abstract = {The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. However, the research work on LLms for music inspiration is still in its infancy. To fill the gap in this field and break through the dilemma that LLMs can only understand short videos with limited frames, we propose a large language model with state space for long-term video-to-music generation. To capture long-range dependency and maintaining high performance, while further decrease the computing cost, our overall network includes the Enhanced Video Mamba, which incorporates continuous moving window partitioning and local feature augmentation, and a long-term memory bank that captures and aggregates historical video information to mitigate information loss in long sequences. This framework achieves both subquadratic-time computation and near-linear memory complexity, enabling effective long-term video-to-music generation. We conduct a thorough evaluation of our proposed framework. The experimental results demonstrate that our model achieves or surpasses the performance of the current state-of-the-art models. Our code released on https://github.com/kai211233/S2L2-V2M. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.},
keywords = {'current, Associative storage, Augmented Reality, Augmented state space, Computer simulation languages, Computer system recovery, Distributed computer systems, HTTP, Language Model, Large language model, Long-term video-to-music generation, Mamba, Memory architecture, Memory-augmented, Modeling languages, Music, Musical composition, Natural language processing systems, Object oriented programming, Performance, Problem oriented languages, State space, State-space},
pubstate = {published},
tppubtype = {article}
}
2024
Sehgal, V.; Sekaran, N.
Virtual Recording Generation Using Generative AI and Carla Simulator Proceedings Article
In: SAE Techni. Paper., SAE International, 2024, ISBN: 01487191 (ISSN).
Abstract | Links | BibTeX | Tags: Access control, Air cushion vehicles, Associative storage, Augmented Reality, Automobile driver simulators, Automobile drivers, Automobile simulators, Automobile testing, Autonomous Vehicles, benchmarking, Computer testing, Condition, Continuous functions, Dynamic random access storage, Formal concept analysis, HDCP, Language Model, Luminescent devices, Network Security, Operational test, Operational use, Problem oriented languages, Randomisation, Real-world drivings, Sailing vessels, Ships, Test condition, UNIX, Vehicle modelling, Virtual addresses
@inproceedings{sehgal_virtual_2024,
title = {Virtual Recording Generation Using Generative AI and Carla Simulator},
author = {V. Sehgal and N. Sekaran},
url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85213320680&doi=10.4271%2f2024-28-0261&partnerID=40&md5=37a924cf9beda31f2c23b3a2cdf575d2},
doi = {10.4271/2024-28-0261},
isbn = {01487191 (ISSN)},
year = {2024},
date = {2024-01-01},
booktitle = {SAE Techni. Paper.},
publisher = {SAE International},
abstract = {To establish and validate new systems incorporated into next generation vehicles, it is important to understand actual scenarios which the autonomous vehicles will likely encounter. Consequently, to do this, it is important to run Field Operational Tests (FOT). FOT is undertaken with many vehicles and large acquisition areas ensuing the capability and suitability of a continuous function, thus guaranteeing the randomization of test conditions. FOT and Use case(a software testing technique designed to ensure that the system under test meets and exceeds the stakeholders' expectations) scenario recordings capture is very expensive, due to the amount of necessary material (vehicles, measurement equipment/objectives, headcount, data storage capacity/complexity, trained drivers/professionals) and all-time robust working vehicle setup is not always available, moreover mileage is directly proportional to time, along with that it cannot be scaled up due to physical limitations. During the early development phase, ground truth data is not available, and data that can be reused from other projects may not match 100% with current project requirements. All event scenarios/weather conditions cannot be ensured during recording capture, in such cases synthetic/virtual recording comes very handy which can accurately mimic real conditions on test bench and can very well address the before mentioned constraints. Car Learning to Act (CARLA) [1] is an autonomous open-source driving simulator, used for the development, training, and validation of autonomous driving systems is extended for generation of synthetic/virtual data/recordings, by integrating Generative Artificial Intelligence (Gen AI), particularly Generative Adversarial Networks (GANs) [2] and Retrieval Augmented Generation (RAG) [3] which are deep learning models. The process of creating synthetic data using vehicle models becomes more efficient and reliable as Gen AI can hold and reproduce much more data in scenario development than a developer or tester. A Large Language Model (LLM) [4] takes user input in the form of user prompts and generate scenarios that are used to produce a vast amount of high-quality, distinct, and realistic driving scenarios that closely resemble real-world driving data. Gen AI [5] empowers the user to generate not only dynamic environment conditions (such as different weather conditions and lighting conditions) but also dynamic elements like the behavior of other vehicles and pedestrians. Synthetic/Virtual recording [6] generated using Gen AI can be used to train and validate virtual vehicle models, FOT/Use case data which is used to indirectly prove real-world performance of functionality of tasks such as object detection, object recognition, image segmentation, and decision-making algorithms in autonomous vehicles. Augmenting LLM with CARLA involves training generative models on real-world driving data using RAG which allows the model to generate new, synthetic instances that resemble real-world conditions/scenarios. © 2024 SAE International. All Rights Reserved.},
keywords = {Access control, Air cushion vehicles, Associative storage, Augmented Reality, Automobile driver simulators, Automobile drivers, Automobile simulators, Automobile testing, Autonomous Vehicles, benchmarking, Computer testing, Condition, Continuous functions, Dynamic random access storage, Formal concept analysis, HDCP, Language Model, Luminescent devices, Network Security, Operational test, Operational use, Problem oriented languages, Randomisation, Real-world drivings, Sailing vessels, Ships, Test condition, UNIX, Vehicle modelling, Virtual addresses},
pubstate = {published},
tppubtype = {inproceedings}
}