SMNS event
Advancing Cooperative Perception Systems in Real-World Deployment: Challenges, Solutions, and Frontiers
In this presentation, I will explore state-of-the-art advancements and practical solutions in vehicle-to-everything (V2X) cooperative perception for intelligent transportation systems. I begin by highlighting the limitations of single-agent perception and show how connectivity can address critical challenges faced by automated vehicles in real-world settings. Despite its potential, deploying cooperative perception systems introduces significant hurdles. This talk introduces several transformative contributions, including V2X-ViT (ECCV’22), a unified transformer architecture for robust perception in noisy environments, as well as CoBEVT (CoRL’22), an efficient vision transformer architecture tailored for BEV semantic segmentation using cost-effective camera-only strategies. We also introduce our latest work on using emerging vision architectures, Mamba, for real-time onboard detection. Furthermore, I will introduce our recent framework accepted to ICLR’25 on designing a scalable and task-agnostic collaborative perception protocol, to facilitate heterogenous and secure mobility systems for the future networks. Lastly, I will talk about our latest efforts on generative foundation models for autonomous driving: 1) OpenEMMA, a large-scale end-to-end multimodal language models for driving, to be presented at WACV’25 at LLVM-AD workshop, and 2) AutoTrust, benchmarking the trustworthiness of large vision-language models (VLMs) for autonomous driving, paving the way for future safe, robust, and private foundation models for autonomy. These advancements collectively push the boundaries of cooperative perception, offering scalable, efficient, and safety-critical solutions for autonomous systems in complex real-world environments.