Edit-As-Act | CVPR 2026

Abstract

Editing a 3D indoor scene from natural language is conceptually straightforward but technically challenging. Existing open-vocabulary systems often regenerate large portions of a scene or rely on image-space edits that disrupt spatial structure, resulting in unintended global changes or physically inconsistent layouts. These limitations stem from treating editing primarily as a generative task.

We take a different view. A user instruction defines a desired world state, and editing should be the minimal sequence of actions that makes this state true while preserving everything else. This perspective motivates Edit-As-Act, a framework that performs open-vocabulary scene editing as goal-regressive planning in 3D space.

Given a source scene and free-form instruction, Edit-As-Act predicts symbolic goal predicates and plans in EditLang, a PDDL-inspired action language that we design with explicit preconditions and effects encoding support, contact, collision, and other geometric relations. A language-driven planner proposes actions, and a validator enforces goal-directedness, monotonicity, and physical feasibility, producing interpretable and physically coherent transformations.

By separating reasoning from low-level generation, Edit-As-Act achieves instruction fidelity, semantic consistency, and physical plausibility—three criteria that existing paradigms cannot satisfy together. On E2A-Bench, our benchmark of 63 editing tasks across 9 indoor environments, Edit-As-Act significantly outperforms prior approaches across all edit types and scene categories.

Method Overview

An overview of Edit-As-Act, which formulates open-vocabulary 3D scene editing as goal-regressive planning by translating free-form instructions into symbolic goal predicates and reasoning over executable actions in EditLang.

Experiment Results

Quantitative comparisons and human evaluations demonstrate that Edit-As-Act consistently outperforms existing paradigms in instruction fidelity, semantic consistency, and physical plausibility across diverse 3D indoor editing tasks.

Quantitative Results

Performance on E2A-Bench.

User Study

Human evaluation results.

BibTeX

Coming soon

Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

Edit-As-Act transforms a free-form editing instruction into symbolic goals and executable actions for precise and physically coherent 3D indoor scene editing.

Abstract

Method Overview

Experiment Results

BibTeX