Alibaba's New AI Video Generation Framework Tora with Trajectory-oriented DiT Technology

TapTechNews August 6th news, the Alibaba team has newly launched the AI video generation framework Tora, while integrating text, vision and trajectory conditions for video generation, which is based on the trajectory-oriented Diffusion Transformer (DiT) technology.

Tora is composed of a trajectory extractor (TE), a spatiotemporal DiT and a motion guided fuser (MGF):

The TE encodes any trajectory into hierarchical spatiotemporal motion patches using a 3D video compression network.

The MGF integrates the motion patches into the DiT module to generate coherent videos that follow the trajectory.

Alibabas New AI Video Generation Framework Tora with Trajectory-oriented DiT Technology_0

Tora seamlessly fits the DiT design and supports the production of videos with a maximum of 204 frames and 720P resolution, and can precisely control the video content with different durations, aspect ratios and resolutions. A large number of experiments have proved that Tora performs excellently in achieving high motion fidelity, while also being able to delicately simulate the motion of the physical world.

Alibabas New AI Video Generation Framework Tora with Trajectory-oriented DiT Technology_1

Its unique design concept integrates text, vision and trajectory conditions, precisely controls video content, and simulates the motion laws of the physical world, bringing unlimited possibilities to the field of film special effects production and virtual reality.

TapTechNews attaches a reference address

Likes