Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

¹ Xi’an Jiaotong University
² National University of Singapore
³ Nanyang Technological University
⁴ Cleveland State University
ICCV 2025

Abstract

Egocentricly comprehending the causes and effects of car accidents is crucial for the safety of self-driving cars, and synthesizing causal-entity reflected accident videos can facilitate the capability test to respond to unaffordable accidents in reality. However, incorporating causal relations as seen in real-world videos into synthetic videos remains challenging. This work argues that precisely identifying the accident participants and capturing their related behaviors are of critical importance. In this regard, we propose a novel diffusion model Causal-VidSyn for synthesizing egocentric traffic accident videos. To enable causal entity grounding in video diffusion, Causal-VidSyn leverages the cause descriptions and driver fixations to identify the accident participants and behaviors, facilitated by accident reason answering and gaze-conditioned selection modules. To support Causal- VidSyn, we further construct Drive-Gaze, the largest driver gaze dataset (with 1.54M frames of fixations) in driving accident scenarios. Extensive experiments show that Causal- VidSyn surpasses state-of-the-art video diffusion models in terms of frame quality and causal sensitivity in various tasks, including accident video editing, normal-to-accident video diffusion, and text-to-video generation.

Some text-to-video samples

AEdit and N2A samples are large. We will share them via external links on the GitHub page.

go-car drives too fast and the braking distance is short, resulting in that the ego-car hitting a crossing car.

The vehicle drives too fast and the braking distance is short, resulting in that the car hitting a crossing pedestrian

Ego-car drives too fast and the braking distance is short, resulting in that the ego-car hitting a car.

Pedestrian does not notice the coming vehicles when crossing the street, resulting in that the ego-car hitting a crossing pedestrian

Cyclist does not notice the coming vehicles when crossing the road, resulting in that the ego-car hitting a crossing cyclist

Motorcycle driver is inattentive,resulting in that the ego-car hitting a motorbike

Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis

The training schema of Causal-VidSyn.

Abstract

Sample visualizations of N2A task by Latte* , Latte-T, CogV-X*, CogV-X-T, MotionClone, A-OAVD, LAMP, and our Causal-VidSyn (Best viewed in zoom mode).

Performance on N2A and T2V tasks (bold font: the best).

We visualize AEdit results of one crossing situation by LAMP, A-OAVD, and our Causal-VidSyn.

Afd is the ratio of IOU(, ) > 0 of all checks. IOU: the intersection over union of two bounding boxes.

Some text-to-video samples

BibTeX