OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

ByteDance Inc.
*Equal Contribution Project Lead
teaser

Overall evaluation for Seedream 3.0 Fill across four image editing tasks, and text rendering is include in image fill.

Abstract

In this paper, we introduce OneReward, a unified reinforcement learning framework that enhances the model’s generative capabilities across multiple tasks under different evaluation criteria using only One Reward model.

By employing a single vision-language model (VLM) as the generative reward model, which can distinguish the winner and loser for a given task and a given evaluation criterion, it can be effectively applied to diffusion-based models, particularly in contexts with varied data and diverse task objectives. We utilize OneReward for mask-guided image generation, which can be further divided into several sub-tasks such as image fill, image extend, object removal, and text rendering, involving a binary mask as the edit area. Although these domain-specific tasks share same conditioning paradigm, they differ significantly in underlying data distributions and evaluation metrics.

Existing methods often rely on task-specific supervised fine-tuning (SFT), which limits generalization and training efficiency. Building on OneReward, we develop Seedream 3.0 Fill, a mask-guided generation model trained via multi-task reinforcement learning directly on a pre-trained base model, eliminating the need for task-specific SFT. Experimental results demonstrate that our unified edit model consistently outperforms both commercial and open-source competitors, including Ideogram, Adobe Photoshop, and FLUX.Fill [Pro], across multiple evaluation dimensions.

Methods

Results for Seedream 3.0 Fill

Image Fill

Image Extend without prompt

Image Extend with prompt

Object Removal

Text Rendering

Results for Flux Fill [dev][OneReward] (Open Source)

Under approval process, coming soon...