CVPR 2026 DIS SOTA 429M Params

PDFNet: High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy

Xianjie Liu Keren Fu Qijun Zhao
Sichuan University
Depth Integrity-Prior
Fine-Grained Patches
<50% Params

TL;DR

We discover a novel Depth Integrity-Prior: in pseudo depth maps, foreground objects consistently convey stable depth values with much lower variances than chaotic background patterns. We propose PDFNet — a network that deeply fuses RGB and pseudo depth features with a novel depth integrity-prior loss and fine-grained patch strategy. PDFNet achieves SOTA with 429M parameters (including 94M segmentation network + 335M pseudo depth generator, <50% of diffusion-based models).

01 Abstract

High-precision dichotomous image segmentation (DIS) is a task of extracting fine-grained objects from high-resolution images. Existing methods face a dilemma: non-diffusion methods work efficiently but suffer from false or missed detections due to weak semantics and less robust spatial priors; diffusion methods, using strong generative priors, have high accuracy but encounter high computational burdens.

As a solution, we find pseudo depth information from monocular depth estimation models can provide essential semantic understanding that quickly reveals spatial differences across target objects and backgrounds. Inspired by this phenomenon, we discover a novel insight we term the depth integrity-prior: in pseudo depth maps, foreground objects consistently convey stable depth values with much lower variances than chaotic background patterns.

To exploit such a prior, we propose a Prior of Depth Fusion Network (PDFNet). Specifically, our network establishes multimodal interactive modeling to achieve depth-guided structural perception by deeply fusing RGB and pseudo depth features. We further introduce a novel depth integrity-prior loss to explicitly enforce depth consistency in segmentation results. Additionally, we design a fine-grained perception enhancement module with adaptive patch selection to perform boundary-sensitive detail refinement.

02 Key Insight: Depth Integrity-Prior

Depth Variance Analysis

Figure 1: Depth variance analysis showing that foreground objects have significantly lower variance compared to background regions in pseudo depth maps.

The Challenge

Non-diffusion methods suffer from false/missed detections due to weak semantics. Diffusion methods are accurate but computationally expensive.

Our Discovery

Depth Integrity-Prior: Foreground objects in pseudo depth maps have stable depth values with much lower variance than chaotic backgrounds!

Depth Prior Visualization

Figure 2: Visualization of the depth integrity-prior concept across different scenarios.

03 Method: PDFNet Architecture

PDFNet Architecture Overview

Figure 3: Overview of the PDFNet Framework. The network deeply fuses RGB and pseudo depth features through a shared Swin-B backbone with multi-scale feature enhancement.

Detailed Architecture

Figure 4: Detailed architecture showing the multi-modal feature fusion and patch-based processing strategy.

Multi-Modal Feature Fusion

PDFNet uses a shared Swin-B Transformer backbone to process both RGB and pseudo depth inputs, enabling cross-modal feature learning and depth-guided structural perception.

  • Shared encoder for visual and depth modalities
  • Multi-scale feature extraction (F₁ to F₅)
  • Cross-modal feature mixing at each scale
Depth-guided perception
Cross-modal learning
Efficient fusion

Depth Integrity-Prior Loss

A novel loss function that explicitly enforces depth consistency in segmentation results, leveraging the discovered prior that foreground objects have stable depth values.

  • Enforces low depth variance in foreground
  • Improves segmentation uniformity
  • Enhances object integrity
Prior-guided learning
Uniform predictions
Better boundaries

Fine-Grained Patch Strategy

An adaptive patch selection module that performs boundary-sensitive detail refinement, improving sensitivity to fine details in high-resolution images.

  • Adaptive patch selection
  • Boundary-sensitive refinement
  • High-resolution detail preservation
Detail enhancement
Adaptive selection
High-resolution ready

Framework Overview

RGB Input
Pseudo Depth
PDFNet Feature Fusion
Patch Refine Fine-Grained
Segmentation High-Precision

04 Results

PDFNet achieves state-of-the-art performance with efficient architecture

Method Type Params DIS-TE1 DIS-TE2 DIS-TE3 DIS-TE4
IS-Net Non-Diffusion 173M 0.881 0.846 0.821 0.864
BiRefNet Non-Diffusion 176M 0.902 0.863 0.842 0.883
PDFNet (Ours) Non-Diffusion 429M 0.915 0.879 0.856 0.897
DiffDIS Diffusion 891M 0.908 0.872 0.849 0.891
429M
Parameters
94M Seg + 335M Depth Gen
SOTA
Non-Diffusion
Best among all non-diffusion
<50%
vs Diffusion
Much more efficient

05 Visual Results

Qualitative comparison with other methods

Visual Comparison Results

Comparison with Other Methods

PDFNet produces more accurate, complete, and robust segmentations across diverse object categories including complex structures, thin parts, and reflective surfaces.

More Visual Cases

Figure 5: PDFNet with uncertainty-aware segmentation outputs showing probabilistic predictions.

06 Get Started

Installation
# Clone the repository
git clone https://github.com/Tennine2077/PDFNet.git
cd PDFNet

# Create conda environment
conda create -n PDFNet python=3.11.4
conda activate PDFNet

# Install dependencies
pip install -r requirements.txt
Training
# Prepare DIS-5K dataset and DAM-V2 depth maps
# See README for data preparation details

# Train PDFNet
python Train_PDFNet.py
Inference
# Run inference
cd metric_tools
python Test.py

# Or use the demo notebook
# Open demo.ipynb in Jupyter

07 Citation

If you find PDFNet helpful in your research, please consider citing:

@misc{liu2025patchdepthfusiondichotomousimage,
      title={Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior}, 
      author={Xianjie Liu and Keren Fu and Qijun Zhao},
      year={2025},
      eprint={2503.06100},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.06100}, 
}