AHCI RESEARCH GROUP

Publications

Papers published in international journals,
proceedings of conferences, workshops and books.

OUR RESEARCH

Scientific Publications

How to

Here you can find the complete list of our publications.
You can use the tag cloud to select only the papers dealing with specific research topics.
You can expand the Abstract, Links and BibTex record for each paper.

Show all

2025

Li, C.; Da, F.

Refined dense face alignment through image matching Journal Article

In: Visual Computer, vol. 41, no. 1, pp. 157–171, 2025, ISSN: 01782789 (ISSN); 14322315 (ISSN), (Publisher: Springer Science and Business Media Deutschland GmbH).

Abstract | Links | BibTeX | Tags: 3D Avatars, Alignment, Dense geometric supervision, Face alignment, Face deformations, Face reconstruction, Geometry, Human computer interaction, Image enhancement, Image matching, Image Reconstruction, Metaverses, Outlier mixup, Pixels, Rendered images, Rendering (computer graphics), State of the art, Statistics, Target images, Three dimensional computer graphics

@article{li_refined_2025,

title = {Refined dense face alignment through image matching},

author = {C. Li and F. Da},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85187924785&doi=10.1007%2Fs00371-024-03316-3&partnerID=40&md5=2de9f0dbdf9ea162871458c08e711c94},

doi = {10.1007/s00371-024-03316-3},

issn = {01782789 (ISSN); 14322315 (ISSN)},

year  = {2025},

date = {2025-01-01},

journal = {Visual Computer},

volume = {41},

number = {1},

pages = {157–171},

abstract = {Face alignment is the foundation of building 3D avatars for virtue communication in the metaverse, human-computer interaction, AI-generated content, etc., and therefore, it is critical that face deformation is reflected precisely to better convey expression, pose and identity. However, misalignment exists in the currently best methods that fit a face model to a target image and can be easily captured by human perception, thus degrading the reconstruction quality. The main reason is that the widely used metrics for training, including the landmark re-projection loss, pixel-wise loss and perception-level loss, are insufficient to address the misalignment and suffer from ambiguity and local minimums. To address misalignment, we propose an image MAtchinG-driveN dEnse geomeTrIC supervision (MAGNETIC). Specifically, we treat face alignment as a matching problem and establish pixel-wise correspondences between the target and rendered images. Then reconstructed facial points are guided towards their corresponding points on the target image, thus improving reconstruction. Synthesized image pairs are mixed up with face outliers to simulate the target and rendered images with ground-truth pixel-wise correspondences to enable the training of a robust prediction network. Compared with existing methods that turn to 3D scans for dense geometric supervision, our method reaches comparable shape reconstruction results with much lower effort. Experimental results on the NoW testset show that we reach the state-of-the-art among all self-supervised methods and even outperform methods using photo-realistic images. We also achieve comparable results with the state-of-the-art on the benchmark of Feng et al. Codes will be available at: github.com/ChunLLee/ReconstructionFromMatching. © 2025 Elsevier B.V., All rights reserved.},

note = {Publisher: Springer Science and Business Media Deutschland GmbH},

keywords = {3D Avatars, Alignment, Dense geometric supervision, Face alignment, Face deformations, Face reconstruction, Geometry, Human computer interaction, Image enhancement, Image matching, Image Reconstruction, Metaverses, Outlier mixup, Pixels, Rendered images, Rendering (computer graphics), State of the art, Statistics, Target images, Three dimensional computer graphics},

pubstate = {published},

tppubtype = {article}

}

2024

Li, K.; Gulati, M.; Shah, D.; Waskito, S.; Chakrabarty, S.; Varshney, A.

PixelGen: Rethinking Embedded Cameras for Mixed-Reality Proceedings Article

In: ACM MobiCom - Proc. Int. Conf. Mob. Comput. Netw., pp. 2128–2135, Association for Computing Machinery, Inc, 2024, ISBN: 9798400704895 (ISBN).

Abstract | Links | BibTeX | Tags: Blind spots, embedded systems, Embedded-system, Field of views, Language Model, Large language model, large language models, Mixed reality, Networking, Partial views, Pixels, Power, Visible spectrums

@inproceedings{li_pixelgen_2024,

title = {PixelGen: Rethinking Embedded Cameras for Mixed-Reality},

author = {K. Li and M. Gulati and D. Shah and S. Waskito and S. Chakrabarty and A. Varshney},

url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-105002721208&doi=10.1145%2F3636534.3696216&partnerID=40&md5=542e49e392dc1f1c834cd89e432d65ce},

doi = {10.1145/3636534.3696216},

isbn = {9798400704895 (ISBN)},

year  = {2024},

date = {2024-01-01},

booktitle = {ACM MobiCom - Proc. Int. Conf. Mob. Comput. Netw.},

pages = {2128–2135},

publisher = {Association for Computing Machinery, Inc},

abstract = {Mixed-reality headsets offer new ways to perceive our environment. They employ visible spectrum cameras to capture and display the environment on screens in front of the user's eyes. However, these cameras lead to limitations. Firstly, they capture only a partial view of the environment. They are positioned to capture whatever is in front of the user, thus creating blind spots during complete immersion and failing to detect events outside the restricted field of view. Secondly, they capture only visible light fields, ignoring other fields like acoustics and radio that are also present in the environment. Finally, these power-hungry cameras rapidly deplete the mixed-reality headset's battery. We introduce PixelGen to rethink embedded cameras for mixed-reality headsets. PixelGen proposes to decouple cameras from the mixed-reality headset and balance resolution and fidelity to minimize the power consumption. It employs low-resolution, monochrome image sensors and environmental sensors to capture the surroundings around the headset. This approach reduces the system's communication bandwidth and power consumption. A transformer-based language and image model process this information to overcome resolution trade-offs, thus generating a higher-resolution representation of the environment. We present initial experiments that show PixelGen's viability. © 2025 Elsevier B.V., All rights reserved.},

keywords = {Blind spots, embedded systems, Embedded-system, Field of views, Language Model, Large language model, large language models, Mixed reality, Networking, Partial views, Pixels, Power, Visible spectrums},

pubstate = {published},

tppubtype = {inproceedings}

}