AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Monjoree, U.; Yan, W.

Assessing AI Models' Spatial Visualization in PSVT:R and Augmented Reality: Towards Enhancing AI's Spatial Intelligence Proceedings Article

In: pp. 727–734, Institute of Electrical and Electronics Engineers Inc., 2025, ISBN: 9798331524005 (ISBN).

Abstract | Links | BibTeX | Tags: 3D modeling, Architecture engineering, Artificial intelligence, Augmented Reality, Construction science, Engineering education, Engineering science, Generative AI, generative artificial intelligence, Image processing, Intelligence models, Linear transformations, Medicine, Rotation, Rotation process, Spatial Intelligence, Spatial rotation, Spatial visualization, Three dimensional computer graphics, Three dimensional space, Visualization

@inproceedings{monjoree_assessing_2025,

title = {Assessing AI Models' Spatial Visualization in PSVT:R and Augmented Reality: Towards Enhancing AI's Spatial Intelligence},

author = {U. Monjoree and W. Yan},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105011255775&doi=10.1109%2FCAI64502.2025.00131&partnerID=40&md5=0bd551863839b3025898e55265403969},

doi = {10.1109/CAI64502.2025.00131},

isbn = {9798331524005 (ISBN)},

year  = {2025},

date = {2025-01-01},

pages = {727–734},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {Spatial intelligence is important in many fields, such as Architecture, Engineering, and Construction (AEC), Science, Technology, Engineering, and Mathematics (STEM), and Medicine. Understanding three-dimensional (3D) spatial rotations can involve verbal descriptions and visual or interactive examples, illustrating how objects move and change orientation in 3D space. Recent studies show that artificial intelligence (AI) with language and vision capabilities still faces limitations in spatial reasoning. In this paper, we have studied the spatial capabilities of advanced generative AI to understand the rotations of objects in 3D space utilizing its image processing and language processing features. We examined the spatial intelligence of three generative AI models (GPT-4, Gemini 1.5 Pro, and Llama 3.2) to understand the spatial rotation process with spatial rotation diagrams based on the revised Purdue Spatial Visualization Test: Visualization of Rotations (Revised PSVT:R). Furthermore, we incorporated an added layer of a coordinate system axes on Revised PSVT:R to study the variations in generative AI models' performance. We additionally examined generative AI models' understanding of 3D rotations in Augmented Reality (AR) scene images that visualize spatial rotations of a physical object in 3D space and observed an increased accuracy of generative AI models' understanding of rotations by adding additional textual information depicting the rotation process or mathematical representations of the rotation (e.g., matrices) superimposed on the object. The results indicate that while GPT-4, Gemini 1.5 Pro, and Llama 3.2 as the main current generative AI model lack the understanding of a spatial rotation process, it has the potential to understand the rotation process with additional information that can be provided by methods such as AR. AR can superimpose textual information or mathematical representations of rotations on spatial transformation diagrams and create a more intelligible input for AI to comprehend or for training AI's spatial intelligence. Furthermore, by combining the potentials in spatial intelligence of AI with AR's interactive visualization abilities, we expect to offer enhanced guidance for students' spatial learning activities. Such spatial guidance can greatly benefit understanding spatial transformations and additionally support processes like assembly, construction, manufacturing, as well as learning in AEC, STEM, and Medicine that require precise 3D spatial understanding. © 2025 Elsevier B.V., All rights reserved.},

keywords = {3D modeling, Architecture engineering, Artificial intelligence, Augmented Reality, Construction science, Engineering education, Engineering science, Generative AI, generative artificial intelligence, Image processing, Intelligence models, Linear transformations, Medicine, Rotation, Rotation process, Spatial Intelligence, Spatial rotation, Spatial visualization, Three dimensional computer graphics, Three dimensional space, Visualization},

pubstate = {published},

tppubtype = {inproceedings}

}

Spatial intelligence is important in many fields, such as Architecture, Engineering, and Construction (AEC), Science, Technology, Engineering, and Mathematics (STEM), and Medicine. Understanding three-dimensional (3D) spatial rotations can involve verbal descriptions and visual or interactive examples, illustrating how objects move and change orientation in 3D space. Recent studies show that artificial intelligence (AI) with language and vision capabilities still faces limitations in spatial reasoning. In this paper, we have studied the spatial capabilities of advanced generative AI to understand the rotations of objects in 3D space utilizing its image processing and language processing features. We examined the spatial intelligence of three generative AI models (GPT-4, Gemini 1.5 Pro, and Llama 3.2) to understand the spatial rotation process with spatial rotation diagrams based on the revised Purdue Spatial Visualization Test: Visualization of Rotations (Revised PSVT:R). Furthermore, we incorporated an added layer of a coordinate system axes on Revised PSVT:R to study the variations in generative AI models' performance. We additionally examined generative AI models' understanding of 3D rotations in Augmented Reality (AR) scene images that visualize spatial rotations of a physical object in 3D space and observed an increased accuracy of generative AI models' understanding of rotations by adding additional textual information depicting the rotation process or mathematical representations of the rotation (e.g., matrices) superimposed on the object. The results indicate that while GPT-4, Gemini 1.5 Pro, and Llama 3.2 as the main current generative AI model lack the understanding of a spatial rotation process, it has the potential to understand the rotation process with additional information that can be provided by methods such as AR. AR can superimpose textual information or mathematical representations of rotations on spatial transformation diagrams and create a more intelligible input for AI to comprehend or for training AI's spatial intelligence. Furthermore, by combining the potentials in spatial intelligence of AI with AR's interactive visualization abilities, we expect to offer enhanced guidance for students' spatial learning activities. Such spatial guidance can greatly benefit understanding spatial transformations and additionally support processes like assembly, construction, manufacturing, as well as learning in AEC, STEM, and Medicine that require precise 3D spatial understanding. © 2025 Elsevier B.V., All rights reserved.

Salinas, C. S.; Magudia, K.; Sangal, A.; Ren, L.; Segars, W. P.

In-silico CT simulations of deep learning generated heterogeneous phantoms Journal Article

In: Biomedical Physics and Engineering Express, vol. 11, no. 4, 2025, ISSN: 20571976 (ISSN), (Publisher: Institute of Physics).

Abstract | Links | BibTeX | Tags: adult, algorithm, Algorithms, anatomical concepts, anatomical location, anatomical variation, Article, Biological organs, bladder, Bone, bone marrow, CGAN, colon, comparative study, computer assisted tomography, Computer graphics, computer model, Computer Simulation, Computer-Assisted, Computerized tomography, CT organ texture, CT organ textures, CT scanners, CT synthesis, CT-scan, Deep learning, fluorodeoxyglucose f 18, Generative Adversarial Network, Generative AI, histogram, human, human tissue, Humans, III-V semiconductors, image analysis, Image processing, Image segmentation, Image texture, Imaging, imaging phantom, intra-abdominal fat, kidney blood vessel, Learning systems, liver, lung, major clinical study, male, mean absolute error, Medical Imaging, neoplasm, Phantoms, procedures, prostate muscle, radiological parameters, signal noise ratio, Signal to noise ratio, Signal-To-Noise Ratio, simulation, Simulation platform, small intestine, Statistical tests, stomach, structural similarity index, subcutaneous fat, Textures, three dimensional double u net conditional generative adversarial network, Three-Dimensional, three-dimensional imaging, Tomography, Virtual CT scanner, Virtual Reality, Virtual trial, virtual trials, whole body CT, X-Ray Computed, x-ray computed tomography

@article{salinas_-silico_2025,

title = {In-silico CT simulations of deep learning generated heterogeneous phantoms},

author = {C. S. Salinas and K. Magudia and A. Sangal and L. Ren and W. P. Segars},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105010297226&doi=10.1088%2F2057-1976%2Fade9c9&partnerID=40&md5=47f211fd93f80e407dcd7e4c490976c2},

doi = {10.1088/2057-1976/ade9c9},

issn = {20571976 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Biomedical Physics and Engineering Express},

volume = {11},

number = {4},

abstract = {Current virtual imaging phantoms primarily emphasize geometric accuracy of anatomical structures. However, to enhance realism, it is also important to incorporate intra-organ detail. Because biological tissues are heterogeneous in composition, virtual phantoms should reflect this by including realistic intra-organ texture and material variation. We propose training two 3D Double U-Net conditional generative adversarial networks (3D DUC-GAN) to generate sixteen unique textures that encompass organs found within the torso. The model was trained on 378 CT image-segmentation pairs taken from a publicly available dataset with 18 additional pairs reserved for testing. Textured phantoms were generated and imaged using DukeSim, a virtual CT simulation platform. Results showed that the deep learning model was able to synthesize realistic heterogeneous phantoms from a set of homogeneous phantoms. These phantoms were compared with original CT scans and had a mean absolute difference of 46.15 ± 1.06 HU. The structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR) were 0.86 ± 0.004 and 28.62 ± 0.14, respectively. The maximum mean discrepancy between the generated and actual distribution was 0.0016. These metrics marked an improvement of 27%, 5.9%, 6.2%, and 28% respectively, compared to current homogeneous texture methods. The generated phantoms that underwent a virtual CT scan had a closer visual resemblance to the true CT scan compared to the previous method. The resulting heterogeneous phantoms offer a significant step toward more realistic in silico trials, enabling enhanced simulation of imaging procedures with greater fidelity to true anatomical variation. © 2025 Elsevier B.V., All rights reserved.},

note = {Publisher: Institute of Physics},

keywords = {adult, algorithm, Algorithms, anatomical concepts, anatomical location, anatomical variation, Article, Biological organs, bladder, Bone, bone marrow, CGAN, colon, comparative study, computer assisted tomography, Computer graphics, computer model, Computer Simulation, Computer-Assisted, Computerized tomography, CT organ texture, CT organ textures, CT scanners, CT synthesis, CT-scan, Deep learning, fluorodeoxyglucose f 18, Generative Adversarial Network, Generative AI, histogram, human, human tissue, Humans, III-V semiconductors, image analysis, Image processing, Image segmentation, Image texture, Imaging, imaging phantom, intra-abdominal fat, kidney blood vessel, Learning systems, liver, lung, major clinical study, male, mean absolute error, Medical Imaging, neoplasm, Phantoms, procedures, prostate muscle, radiological parameters, signal noise ratio, Signal to noise ratio, Signal-To-Noise Ratio, simulation, Simulation platform, small intestine, Statistical tests, stomach, structural similarity index, subcutaneous fat, Textures, three dimensional double u net conditional generative adversarial network, Three-Dimensional, three-dimensional imaging, Tomography, Virtual CT scanner, Virtual Reality, Virtual trial, virtual trials, whole body CT, X-Ray Computed, x-ray computed tomography},

pubstate = {published},

tppubtype = {article}

}

2024

Venkatachalam, N.; Rayana, M.; Vignesh, S. Bala; Prathamesh, S.

Voice-Driven Panoramic Imagery: Real-Time Generative AI for Immersive Experiences Proceedings Article

In: Int. Conf. Intell. Data Commun. Technol. Internet Things, IDCIoT, pp. 1133–1138, Institute of Electrical and Electronics Engineers Inc., 2024, ISBN: 9798350327533 (ISBN).

Abstract | Links | BibTeX | Tags: Adaptive Visual Experience, First person, First-Person view, generative artificial intelligence, Generative Artificial Intelligence (AI), Image processing, Immersive, Immersive visual scene, Immersive Visual Scenes, Language processing, Natural Language Processing, Natural Language Processing (NLP), Natural language processing systems, Natural languages, Panoramic Images, Patient treatment, Personalized environment, Personalized Environments, Phobia Treatment, Prompt, prompts, Psychological intervention, Psychological Interventions, Real-Time Synthesis, User interaction, User interfaces, Virtual experience, Virtual Experiences, Virtual Reality, Virtual Reality (VR), Virtual-reality headsets, Visual experiences, Visual languages, Visual scene, Voice command, Voice commands, VR Headsets

@inproceedings{venkatachalam_voice-driven_2024,

title = {Voice-Driven Panoramic Imagery: Real-Time Generative AI for Immersive Experiences},

author = {N. Venkatachalam and M. Rayana and S. Bala Vignesh and S. Prathamesh},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85190121845&doi=10.1109%2FIDCIoT59759.2024.10467441&partnerID=40&md5=867e723b20fb9fead7d1c55926af9642},

doi = {10.1109/IDCIoT59759.2024.10467441},

isbn = {9798350327533 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {Int. Conf. Intell. Data Commun. Technol. Internet Things, IDCIoT},

pages = {1133–1138},

publisher = {Institute of Electrical and Electronics Engineers Inc.},

abstract = {This research study introduces an innovative system that aims to synthesize 360-degree panoramic images in Realtime based on vocal prompts from the user, leveraging state-of-The-Art Generative AI with a combination of advanced NLP models. The primary objective of this system is to transform spoken descriptions into immersive and interactive visual scenes, specifically designed to provide users with first-person field views. This cutting-edge technology has the potential to revolutionize the realm of virtual reality (VR) experiences, enabling users to effortlessly create and navigate through personalized environments. The fundamental goal of this system is to enable the generation of real-Time images that are seamlessly compatible with VR headsets, offering a truly immersive and adaptive visual experience. Beyond its technological advancements, this research also highlights its significant potential for creating a positive social impact. One notable application lies in psychological interventions, particularly in the context of phobia treatment and therapeutic settings. Here, patients can safely confront and work through their fears within these synthesized environments, potentially offering new avenues for therapy. Furthermore, the system serves educational and entertainment purposes by bringing users' imaginations to life, providing an unparalleled platform for exploring the boundaries of virtual experiences. Overall, this research represents a promising stride towards a more immersive and adaptable future in VR technology, with the potential to enhance various aspects of human lives, from mental health treatment to entertainment and education. © 2024 Elsevier B.V., All rights reserved.},

keywords = {Adaptive Visual Experience, First person, First-Person view, generative artificial intelligence, Generative Artificial Intelligence (AI), Image processing, Immersive, Immersive visual scene, Immersive Visual Scenes, Language processing, Natural Language Processing, Natural Language Processing (NLP), Natural language processing systems, Natural languages, Panoramic Images, Patient treatment, Personalized environment, Personalized Environments, Phobia Treatment, Prompt, prompts, Psychological intervention, Psychological Interventions, Real-Time Synthesis, User interaction, User interfaces, Virtual experience, Virtual Experiences, Virtual Reality, Virtual Reality (VR), Virtual-reality headsets, Visual experiences, Visual languages, Visual scene, Voice command, Voice commands, VR Headsets},

pubstate = {published},

tppubtype = {inproceedings}

}

2023

Dipanda, Albert; Gallo, Luigi; Yetongnon, Kokou (Ed.)

2023 17th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) Proceedings

IEEE Computer Society, 2023, ISBN: 979-8-3503-7091-1.

Abstract | Links | BibTeX | Tags: Computer graphics, Image processing

Dipanda, Albert; Gallo, Luigi; Yetongnon, Kokou (Ed.)

2023 17th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) Book

IEEE Computer Society, 2023, ISBN: 979-8-3503-7091-1, (tex.referencetype: proceedings).

Abstract | Links | BibTeX | Tags: Computer graphics, Image processing

Vlasov, A. V.

GALA Inspired by Klimt's Art: Text-to-image Processing with Implementation in Interaction and Perception Studies: Library and Case Examples Journal Article

In: Annual Review of CyberTherapy and Telemedicine, vol. 21, pp. 200–205, 2023, ISSN: 15548716 (ISSN), (Publisher: Interactive Media Institute).

Abstract | Links | BibTeX | Tags: AIGC, applied research, art library, Article, Artificial intelligence, benchmarking, dataset, GALA, human, Human computer interaction, Image processing, Klimt, library, life satisfaction, neuropoem, Text-to-image, Virtual Reality, Wellbeing

2013

Franchini, Silvia; Gentile, Antonio; Vassallo, Giorgio; Sorbello, Filippo; Vitabile, Salvatore

A Specialized Architecture for Color Image Edge Detection Based on Clifford Algebra Proceedings Article

In: pp. 128–135, 2013, ISBN: 978-0-7695-4992-7.

Abstract | Links | BibTeX | Tags: Application-specific processors, Clifford algebra, Color image edge detection, Embedded coprocessors, Field Programmable Gate Arrays, FPGA prototyping, Geometric algebra, Image processing, Medical Imaging, Multispectral Magnetic Resonance images

@inproceedings{franchiniSpecializedArchitectureColor2013,

title = {A Specialized Architecture for Color Image Edge Detection Based on Clifford Algebra},

author = { Silvia Franchini and Antonio Gentile and Giorgio Vassallo and Filippo Sorbello and Salvatore Vitabile},

doi = {10.1109/CISIS.2013.29},

isbn = {978-0-7695-4992-7},

year  = {2013},

date = {2013-01-01},

pages = {128--135},

abstract = {Edge detection of color images is usually performed by applying the traditional techniques for gray-scale images to the three color channels separately. However, human visual perception does not differentiate colors and processes the image as a whole. Recently, new methods have been proposed that treat RGB color triples as vectors and color images as vector fields. In these approaches, edge detection is obtained extending the classical pattern matching and convolution techniques to vector fields. This paper proposes a hardware implementation of an edge detection method for color images that exploits the definition of geometric product of vectors given in the Clifford algebra framework to extend the convolution operator and the Fourier transform to vector fields. The proposed architecture has been prototyped on the Celoxica RC203E Field Programmable Gate Array (FPGA) board. Experimental tests on the FPGA prototype show that the proposed hardware architecture allows for an average speedup ranging between 6x and 18x for different image sizes against the execution on a conventional general-purpose processor. Clifford algebra based edge detector can be exploited to process not only color images but also multispectral gray-scale images. The proposed hardware architecture has been successfully used for feature extraction of multispectral magnetic resonance (MR) images. textcopyright 2013 IEEE.},

keywords = {Application-specific processors, Clifford algebra, Color image edge detection, Embedded coprocessors, Field Programmable Gate Arrays, FPGA prototyping, Geometric algebra, Image processing, Medical Imaging, Multispectral Magnetic Resonance images},

pubstate = {published},

tppubtype = {inproceedings}

}

Franchini, Silvia; Gentile, Antonio; Vassallo, Giorgio; Sorbello, Filippo; Vitabile, Salvatore

A specialized architecture for color image edge detection based on Clifford algebra Proceedings Article

In: pp. 128–135, 2013, ISBN: 978-0-7695-4992-7.

@inproceedings{franchini_specialized_2013,

title = {A specialized architecture for color image edge detection based on Clifford algebra},

author = {Silvia Franchini and Antonio Gentile and Giorgio Vassallo and Filippo Sorbello and Salvatore Vitabile},

doi = {10.1109/CISIS.2013.29},

isbn = {978-0-7695-4992-7},

year  = {2013},

date = {2013-01-01},

pages = {128–135},

abstract = {Edge detection of color images is usually performed by applying the traditional techniques for gray-scale images to the three color channels separately. However, human visual perception does not differentiate colors and processes the image as a whole. Recently, new methods have been proposed that treat RGB color triples as vectors and color images as vector fields. In these approaches, edge detection is obtained extending the classical pattern matching and convolution techniques to vector fields. This paper proposes a hardware implementation of an edge detection method for color images that exploits the definition of geometric product of vectors given in the Clifford algebra framework to extend the convolution operator and the Fourier transform to vector fields. The proposed architecture has been prototyped on the Celoxica RC203E Field Programmable Gate Array (FPGA) board. Experimental tests on the FPGA prototype show that the proposed hardware architecture allows for an average speedup ranging between 6x and 18x for different image sizes against the execution on a conventional general-purpose processor. Clifford algebra based edge detector can be exploited to process not only color images but also multispectral gray-scale images. The proposed hardware architecture has been successfully used for feature extraction of multispectral magnetic resonance (MR) images. © 2013 IEEE.},

keywords = {Application-specific processors, Clifford algebra, Color image edge detection, Embedded coprocessors, Field Programmable Gate Arrays, FPGA prototyping, Geometric algebra, Image processing, Medical Imaging, Multispectral Magnetic Resonance images},

pubstate = {published},

tppubtype = {inproceedings}

}

2012

Franchini, Silvia; Gentile, Antonio; Sorbello, Filippo; Vassallo, Giorgio; Vitabile, Salvatore

Clifford Algebra Based Edge Detector for Color Images Proceedings Article

In: pp. 84–91, 2012, ISBN: 978-0-7695-4687-2.

Abstract | Links | BibTeX | Tags: Clifford algebra, Clifford convolution, Clifford Fourier transform, Color image edge detection, Edge detection, Geometric algebra, Image processing, Segmentation