Feature Pyramid Networks for Object Detection

Abstract

feature pyramid는 다양한 스케일에서 object detection을 하는 인식 시스템에 기본 구성 요소이지만, 계산과 메모리에 부담이 있어서 pyramid 구조를 사용하지 않았다.
이 논문에서는 제한된 추가적인 cost 내에서, convolution network의 다중 스케일의 pyramid 구조를 활용한다.
FPN을 Faster R-CNN 구조에 사용하였을 때 성능이 우수하다.

Introduction

다중 스케일에서 object detection은 computer vision에서 가장 기초적인 문제이다.
- 이 문제를 위한 해결에는 다양한 방법들이 있어왔다.
  - (a) Featurized image pyramid
    - 여러 스케일의 image를 스캔하여 탐지 -> 상당한 연산량이 필요해 느림
  - (b) Single feature map
    - ConvNet을 이용하여 하나의 feature map을 이용한 최근 탐지 시스템 -> 빠르지만 부정확
  - (c) Pyramidal feature hierachy
    - ConvNet의 pyramid 구조를 사용하여 layer 마다 feature map 추출 -> feature map간에 resolution 차이로 발생하는 semantic gap이 있음
  - (d) Feature Pyramid Network
    - 논문에서 제안한 구조
    - ConvNet의 pyramid 구조를 사용했는데, low-resolution의 feature map과 high-resolution의 feature map을 합쳐서 사용함
    - Top : skp connection을 사용한 기존의 top-down 방식으로, prediction 할 하나의 high-level feature map을 생성
    - Bottom : 이 논문에서 활용한 구조로 각 level에서 나오는 prediction을 이용함

Feature Pyramid Networks

FPN은 임의의 크기의 단일 스케일에 이미지를 convolutional network에 입력하여, 다양한 scale의 feature map을 출력한다. 이 과정은 bottom-up pathway, top-down pathway, lateral connections에 따라 진행된다.

Bottom-up pathway
- bottom-up pathway는 이미지를 ConvNet에 forward pass하여, 2배씩 작아지는 feature map을 추출하는 과정이다.
- ConvNet에 같은 크기의 feature map을 출력하는 layer들을 같은 stage에 있다고 정의하고, 각 stage 마다 하나의 pyramid level을 정의한다.
- 각 stage에서 마지막 layer에서 나온 feature map을 사용한다.
Top-down pathway and lateral connections
- top-down pathway는 higher resolution의 feature map을 2배 upsampling하고 1x1 conv를 통해 channel 수를 동일하게 맞춰주는 과정이다.
- upsampling 과정으로는 nearest neighbor upsampling 과정을 사용한다.
- lateral connection은 upsampled된 feature map과 바로 아래 level의 feature map이 element-wise 연산을 하는 과정이다.
- 그 이후 각각의 feature map의 3x3 conv를 통해 prediction으로 사용하는 feature map을 얻는다.
- 가장 높은 level의 feature map은 위에 그림과 같이 진행된다.

논문 출처

[1612.03144] Feature Pyramid Networks for Object Detection (arxiv.org)

Feature Pyramid Networks for Object Detection

Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. In this paper, w

arxiv.org

참고 이미지

논문 외 1곳

https://herbwood.tistory.com/18

FPN 논문(Feature Pyramid Networks for Object Detection) 리뷰

이번 포스팅에서는 FPN 논문(Feature Pyramid Networks for Object Detection)을 리뷰해보도록 하겠습니다. 이미지 내 존재하는 다양한 크기의 객체를 인식하는 것은 Object dection task의 핵심적인 문제입니다. 모

herbwood.tistory.com

'Paper Review' 카테고리의 다른 글

An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches (0)	2024.02.02