We discover a novel Depth Integrity-Prior: in pseudo depth maps, foreground objects consistently convey stable depth values with much lower variances than chaotic background patterns. We propose PDFNet — a network that deeply fuses RGB and pseudo depth features with a novel depth integrity-prior loss and fine-grained patch strategy. PDFNet achieves SOTA with 429M parameters (including 94M segmentation network + 335M pseudo depth generator, <50% of diffusion-based models).
High-precision dichotomous image segmentation (DIS) is a task of extracting fine-grained objects from high-resolution images. Existing methods face a dilemma: non-diffusion methods work efficiently but suffer from false or missed detections due to weak semantics and less robust spatial priors; diffusion methods, using strong generative priors, have high accuracy but encounter high computational burdens.
As a solution, we find pseudo depth information from monocular depth estimation models can provide essential semantic understanding that quickly reveals spatial differences across target objects and backgrounds. Inspired by this phenomenon, we discover a novel insight we term the depth integrity-prior: in pseudo depth maps, foreground objects consistently convey stable depth values with much lower variances than chaotic background patterns.
To exploit such a prior, we propose a Prior of Depth Fusion Network (PDFNet). Specifically, our network establishes multimodal interactive modeling to achieve depth-guided structural perception by deeply fusing RGB and pseudo depth features. We further introduce a novel depth integrity-prior loss to explicitly enforce depth consistency in segmentation results. Additionally, we design a fine-grained perception enhancement module with adaptive patch selection to perform boundary-sensitive detail refinement.
Figure 1: Depth variance analysis showing that foreground objects have significantly lower variance compared to background regions in pseudo depth maps.
Non-diffusion methods suffer from false/missed detections due to weak semantics. Diffusion methods are accurate but computationally expensive.
Depth Integrity-Prior: Foreground objects in pseudo depth maps have stable depth values with much lower variance than chaotic backgrounds!
Figure 2: Visualization of the depth integrity-prior concept across different scenarios.
Figure 3: Overview of the PDFNet Framework. The network deeply fuses RGB and pseudo depth features through a shared Swin-B backbone with multi-scale feature enhancement.
Figure 4: Detailed architecture showing the multi-modal feature fusion and patch-based processing strategy.
PDFNet uses a shared Swin-B Transformer backbone to process both RGB and pseudo depth inputs, enabling cross-modal feature learning and depth-guided structural perception.
A novel loss function that explicitly enforces depth consistency in segmentation results, leveraging the discovered prior that foreground objects have stable depth values.
An adaptive patch selection module that performs boundary-sensitive detail refinement, improving sensitivity to fine details in high-resolution images.
PDFNet achieves state-of-the-art performance with efficient architecture
| Method | Type | Params | DIS-TE1 | DIS-TE2 | DIS-TE3 | DIS-TE4 |
|---|---|---|---|---|---|---|
| IS-Net | Non-Diffusion | 173M | 0.881 | 0.846 | 0.821 | 0.864 |
| BiRefNet | Non-Diffusion | 176M | 0.902 | 0.863 | 0.842 | 0.883 |
| PDFNet (Ours) | Non-Diffusion | 429M | 0.915 | 0.879 | 0.856 | 0.897 |
| DiffDIS | Diffusion | 891M | 0.908 | 0.872 | 0.849 | 0.891 |
Qualitative comparison with other methods
PDFNet produces more accurate, complete, and robust segmentations across diverse object categories including complex structures, thin parts, and reflective surfaces.
Figure 5: PDFNet with uncertainty-aware segmentation outputs showing probabilistic predictions.
# Clone the repository
git clone https://github.com/Tennine2077/PDFNet.git
cd PDFNet
# Create conda environment
conda create -n PDFNet python=3.11.4
conda activate PDFNet
# Install dependencies
pip install -r requirements.txt
# Prepare DIS-5K dataset and DAM-V2 depth maps
# See README for data preparation details
# Train PDFNet
python Train_PDFNet.py
# Run inference
cd metric_tools
python Test.py
# Or use the demo notebook
# Open demo.ipynb in Jupyter
If you find PDFNet helpful in your research, please consider citing:
@misc{liu2025patchdepthfusiondichotomousimage,
title={Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior},
author={Xianjie Liu and Keren Fu and Qijun Zhao},
year={2025},
eprint={2503.06100},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.06100},
}