PDFNet: High-Precision Dichotomous Image Segmentation

TL;DR

We discover a novel Depth Integrity-Prior: in pseudo depth maps, foreground objects consistently convey stable depth values with much lower variances than chaotic background patterns. We propose PDFNet — a network that deeply fuses RGB and pseudo depth features with a novel depth integrity-prior loss and fine-grained patch strategy. PDFNet achieves SOTA with 429M parameters (including 94M segmentation network + 335M pseudo depth generator, <50% of diffusion-based models).

01 Abstract

High-precision dichotomous image segmentation (DIS) is a task of extracting fine-grained objects from high-resolution images. Existing methods face a dilemma: non-diffusion methods work efficiently but suffer from false or missed detections due to weak semantics and less robust spatial priors; diffusion methods, using strong generative priors, have high accuracy but encounter high computational burdens.

As a solution, we find pseudo depth information from monocular depth estimation models can provide essential semantic understanding that quickly reveals spatial differences across target objects and backgrounds. Inspired by this phenomenon, we discover a novel insight we term the depth integrity-prior: in pseudo depth maps, foreground objects consistently convey stable depth values with much lower variances than chaotic background patterns.

To exploit such a prior, we propose a Prior of Depth Fusion Network (PDFNet). Specifically, our network establishes multimodal interactive modeling to achieve depth-guided structural perception by deeply fusing RGB and pseudo depth features. We further introduce a novel depth integrity-prior loss to explicitly enforce depth consistency in segmentation results. Additionally, we design a fine-grained perception enhancement module with adaptive patch selection to perform boundary-sensitive detail refinement.

02 Key Insight: Depth Integrity-Prior

Figure 1: Depth variance analysis showing that foreground objects have significantly lower variance compared to background regions in pseudo depth maps.

The Challenge

Non-diffusion methods suffer from false/missed detections due to weak semantics. Diffusion methods are accurate but computationally expensive.

Our Discovery

Depth Integrity-Prior: Foreground objects in pseudo depth maps have stable depth values with much lower variance than chaotic backgrounds!

Figure 2: Visualization of the depth integrity-prior concept across different scenarios.

03 Method: PDFNet Architecture

Figure 3: Overview of the PDFNet Framework. The network deeply fuses RGB and pseudo depth features through a shared Swin-B backbone with multi-scale feature enhancement.

Figure 4: Detailed architecture showing the multi-modal feature fusion and patch-based processing strategy.

Multi-Modal Feature Fusion

PDFNet uses a shared Swin-B Transformer backbone to process both RGB and pseudo depth inputs, enabling cross-modal feature learning and depth-guided structural perception.

Shared encoder for visual and depth modalities
Multi-scale feature extraction (F₁ to F₅)
Cross-modal feature mixing at each scale

Depth-guided perception

Cross-modal learning

Efficient fusion

Depth Integrity-Prior Loss

A novel loss function that explicitly enforces depth consistency in segmentation results, leveraging the discovered prior that foreground objects have stable depth values.

Enforces low depth variance in foreground
Improves segmentation uniformity
Enhances object integrity

Prior-guided learning

Uniform predictions

Better boundaries

Fine-Grained Patch Strategy

An adaptive patch selection module that performs boundary-sensitive detail refinement, improving sensitivity to fine details in high-resolution images.

Adaptive patch selection
Boundary-sensitive refinement
High-resolution detail preservation

Detail enhancement

Adaptive selection

High-resolution ready

Framework Overview

RGB Input

Pseudo Depth

                            
                        
PDFNetFeature Fusion

                            
                        
Patch RefineFine-Grained

Segmentation High-Precision

04 Results

PDFNet achieves state-of-the-art performance with efficient architecture

Method	Type	Params	DIS-TE1	DIS-TE2	DIS-TE3	DIS-TE4
IS-Net	Non-Diffusion	173M	0.881	0.846	0.821	0.864
BiRefNet	Non-Diffusion	176M	0.902	0.863	0.842	0.883
PDFNet (Ours)	Non-Diffusion	429M	0.915	0.879	0.856	0.897
DiffDIS	Diffusion	891M	0.908	0.872	0.849	0.891

429M
Parameters
94M Seg + 335M Depth Gen

SOTA
Non-Diffusion
Best among all non-diffusion

<50%
vs Diffusion
Much more efficient

05 Visual Results

Qualitative comparison with other methods

Comparison with Other Methods

PDFNet produces more accurate, complete, and robust segmentations across diverse object categories including complex structures, thin parts, and reflective surfaces.

Figure 5: PDFNet with uncertainty-aware segmentation outputs showing probabilistic predictions.

Example 1: Fine detail preservation

Example 2: Complex background handling

Example 3: Thin structure segmentation

Example 4: Boundary precision

06 Get Started

Installation

# Clone the repository
git clone https://github.com/Tennine2077/PDFNet.git
cd PDFNet

# Create conda environment
conda create -n PDFNet python=3.11.4
conda activate PDFNet

# Install dependencies
pip install -r requirements.txt

Training

# Prepare DIS-5K dataset and DAM-V2 depth maps
# See README for data preparation details

# Train PDFNet
python Train_PDFNet.py

Inference

# Run inference
cd metric_tools
python Test.py

# Or use the demo notebook
# Open demo.ipynb in Jupyter

Resources

arXiv Paper GitHub Code HuggingFace Demo Awesome DIS

07 Citation

If you find PDFNet helpful in your research, please consider citing:

@misc{liu2025patchdepthfusiondichotomousimage,
      title={Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior}, 
      author={Xianjie Liu and Keren Fu and Qijun Zhao},
      year={2025},
      eprint={2503.06100},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.06100}, 
}