Graph R-CNN for Scene Graph Generation

👩‍💻 도비는 공부중/📋 연구과제(2023.7 ~ )

Graph R-CNN for Scene Graph Generation

오모짱_ 2023. 11. 2. 09:09

[Paper]. [Github]

✏️ 제안

1. Graph R-CNN: 객체 <-> 이미지간의 관계 detecting
➞ Relation Proposal Network(RePN)

2. attentional Graph Convolutional Network

➞ obj & relation 사이 contextual 정보 파악

3. evaluation metric

➞ 존재하는 metric 보다 holistic & realistic ^^

(a) Object node extraction (b) Relationship edge pruning (c) Graph context integratoin

Visual Scene

➞ 이미지 내 객체에 초점을 둔게 대부분

img classification | obj detection | segmentation

scene 을 객체간의 집합?collection으로 표현하면 관계를 파악하기 어려워

➞ obj 를 포함하고 있는 그래프로 이해하자

그래프 표현:

node = obj, edge = relationship 형태의 fully connected graph

객체수가 증가하면? 그래프 표현이 의미가... ㄴㄴ

Graph R-CNN (for scene graph generation)

1. obj node 추출 ➞ utilize standard obj detection pipeline[32] Faster R-CNN

2. relation edge pruing

3. graph context integration

Scene Graph

* Image region

* relationships:

- V: set of nodes corresponding to localized obj regions in I

- E: relationships between obj

- O: object , R; relationship labels

model을 다음과 같이 나타낼 수 있음

I: Image V: 이미지 내의 object node set E: object 간의 relationship O: object label, R: Relationship label

1. Object Region Proposal

Faster R-CNN으로 객체 영역 제안

객체 i와 관련 있는 영역 rio=[xi,yi,wi,hi]과 feature vector , class에 대한 확률분포 를 추출한다.

(classes = { 1, ..., k })

spatial region
feature vector
class에 대한 확률분포 를 추출

2. Relationship Proposal Network

: (원래) vertex의 잠재적인 edge의 uniform random sampling 을 통해 구하는데

: 논문에서는 RePN 제안 > 전체 생성 프로세스를 end-to-end 방식으로 학습할 수 있도록 함

n개의 obj가 있을 때 O(n^2)개의 관계 만들어지는데

RePN -> pruning

(Φ(pio),ψ(pjo)는 sub, obj를 구별하기 위한 projection function)

> [poi, poj] 를 concate 해서 MLP 하는 방법도 있는데 메모리/계산량 측면에서 매우 그지같음

>>> asymmetric kernel fucntion 사용

Overlap 계산

두 객체 쌍 {u, v} & {p, q}

I: computed intersection area between 2 boxes

U: union area

모든 pair 는 sigmoid 를 거쳐 0 ~ 1 사이의 score 가져

내림 차순으로 정렬하고 K 개의 pair 에 대해 NMS 수행

결과 = sparse Graph = (V, E) 얻어

edge 에는 object pair 의 union box의 feature 추출한 visual representation 담긴다..

: number of relation)

~~나머지 m 객체 쌍은 관계 E를 가진 후보로 간주해 - G=(V, E) 그래프를 얻는다?~~

Vanila GCN

노드 i의 표현 zi: 모든 이웃 노드들 zj에 선형변환 가중치 aij 곱해 update

a는 0 ~ 1 사이의 값으로 feature의 symmmetically normalized adjacency matrix에 의해 미리 계산됨.

여기에 attention 을 도입하면

[* , *]: concatenation operation: concate 한 node feature 를 2 layer MLP에 통과 > softmax

== a를 구한다

* object node

* relationship node

N 개의 object node 와 m개의 relationships + 모든 object nodes 끼리 skip-connect edge를 연결시켜global context를 고려한 representation update

3. Graph labeling

: iterative refinement process

Loss function

P(V | I )
- RPN과 동일한 loss: binary cross entropy loss
P(E|V ,I)
- another binary cross entropy loss on relation proposals
P(R,O|V,E,I)
- 2 multi-class cross-entropy losses
- object classification & predicate classification

Evaluating Scene Graph Generation

SGGen+ formulated:

C(O): number of object nodes correctly localized and recognized
C(P): for predicate → depends on the location of the subject and object
C(T): triplet
N: number of entries (sum: objects, predicates, relationships)