Research

The art of
intelligent video creation

Applying a decade of VFX pipeline engineering to AI - building systems that plan, generate, evaluate, and refine autonomously.

MSc Dissertation - 2025/26

"A Scene-Aware, Feedback-Driven Approach to Intelligent Video Generation Using Multi-Agent Planning"

A fully algorithmic, closed-loop pipeline for physics-aware video generation. Multi-agent planning decomposes scenes, Houdini constructs 3D environments, Wan2.1 I2V synthesises video, six weighted metrics evaluate quality, and adaptive refinement iterates - all without human intervention.

Supervised by Dr. Saber Farag - Northumbria University

Fully algorithmicClosed-loop refinementPhysics-aware 3D generationSix evaluation metricsNo VLM (future work)
Overview

Pipeline Explainer

A walkthrough of the end-to-end system.

Layer 1 - Planning

Multi-Agent Planning

Three GPT-4o-mini agents - Director, Storyboard, and Cinematographer - collaborate to decompose a scene description into structured parameters via Chain of Thought reasoning and Pydantic outputs.

Multi-agent planning diagram
Layer 2 - Construction

Houdini 3D Scene Construction

Agent plans become procedural 3D environments in Houdini - geometry, depth maps, collision detection, physics simulation. The same pipeline techniques used on feature films, now driven by AI.

Layer 3 - Generation

Video Generation - Wan2.1 I2V 14B

Rendered frames from Houdini feed into Wan2.1 at 624x352, 25 frames, 8fps. Three test scenes evaluated the pipeline across different motion types:

Scene 1 - Straight Motion
Scene 2 - Curved Motion
Scene 3 - Collision Motion
Layer 4 - Evaluation

Automated Evaluation

Six weighted metrics assess physical plausibility, visual fidelity, temporal consistency, and scene accuracy. Entirely algorithmic - no VLM.

Six weighted evaluation metrics
Layer 5 - Refinement

Adaptive Refinement Loop

Scores feed back into the system. The pipeline identifies the weakest metric, adjusts a single generation parameter, and iterates until convergence at 0.75. Scene 3 Config B converged at 0.7525 on iteration 3.

Refinement loop flowchart
Detailed Breakdown
01
Multi-Agent Planning

Director, Storyboard, and Cinematographer agents via GPT-4o-mini with CoT reasoning and Pydantic outputs.

GPT-4o-mini - Structured output - Multi-agent
02
Houdini Scene Generation

Procedural 3D environments with depth maps, collision geometry, and physics simulation.

Houdini - Python/VEX - Procedural modelling
03
Wan2.1 I2V Synthesis

Image-to-video at 624x352, 25 frames, 8fps, 30-step joint inference. 3D-rendered input provides geometry and depth priors.

Wan2.1 I2V 14B - Diffusion model
04
Six-Metric Evaluation

SSIM/PSNR, reference sequence similarity, optical flow, Canny edge IoU, flow vectors, centroid path vs Houdini ground truth.

6 metrics - Total weight 1.00
05
Adaptive Refinement

Closed-loop feedback adjusts guidance scale, motion bucket, noise augmentation, or seed. Early stopping at 0.75 threshold.

Autonomous iteration - Score-driven
Future Work

VLM-based semantic evaluation, AnimateDiff with ControlNet, video-to-video refinement, and depth map integration as a direct conditioning signal.