[Pytorch] Implement GEDGNN

👩‍💻 도비는 공부중/📋 연구과제(2023.7 ~ )

[Pytorch] Implement GEDGNN

오모짱_ 2023. 8. 8. 22:58

차근차근

GED: G1 -> G2 변환하기 위해 필요한 기본 작업의 최소 횟수 결정하기

= A* beam sesarch

= greedy algorithms: bipartite matching

GED problem을 0과 1 사이의 SIMILARITY SCORE 로 변환하는 방식으로 접근.

> model 은 편집 경로를 생성할 능력이 X

=====그럼 어떻게 해결?

1. charge of predicting the GED Value and matching matrix

2. post-processing algorithm based on k-best matching

- k 개의 가능한 노드 매칭 도출

===== 비교는?

기존 GNN 기반 모델들과 비교해 MAE

SOTA searching algoritm: Noah 와 비교

A* beam-search

- uses an extra parameter beamsize to control the maximum number of nodes

beam size를 사용해 a* 알고리즘과 비교해 큐 내의 노드의 최대 개수 제어

Greedy function

- invoke the greedy function to estimate the cost of matching one node in the src graph to target graph

- 일치한 노드 쌍의 추정 비교를 더해 GED 계산 단순화

GNN 기반

- graph 쌍의 embedding 생성

- MLP 입력해 GED 예측 값 얻어

- [limitation] cannot gnerate edit path

===== PROPOSE A NOVEL DEEP LEARNING FRAMEWORK: SOLVE GED problem

1. model: GEDGNN

- predicts a GED value

- predicts a matching matrix by the cross matrix module

= matching matrix의 값들은 0과 1사이의 값, 노드 쌍이 얼마나 일치해야 하는지 정도

2. post-processing algorithm

- GEDGNN이 생성한 matching matrix > 여러 개의 노드 매칭 도출 > edit path 생성

- k-best 이용

DEFINITIONS

★ G = (V, E, L): G1, G2 에 대한 operation

1. edge 추가 / 제거

2. node 추가 / 제거

3. node의 label 변경

★ Graph Edit Path: G1 ➡ G2 로 변환하는 operation (o(1), ... o(k))

k는 GED(G1, G2) 보다 크거나 같으며, 최소 길이가 결국 GED

★ Bipartite Graph

main

┖ Trainer(args)

┖ load_data()

┖ utils.py: load_all_graphs() | train, val, test, graphs 에 대한 정보 load

┖ label에 대한 정보 load

┖ ged_dict

┖ utils.py: load_ged()

┖ged = [(id_1, id_2, ged_value, ged_nc, ged_in, ged_ie, [best_node_mapping])]

ged_value = nc+ in + ed

┖ transfer_data_to_torch()

┖ edge에 대한 index[], feature tensor

┖ 그래프 mapping matrix (n x n)

┖ features = [torch.tensor(x) x를 tensor로 copy]

┖ ~~delta_graph : node 수가 10개 이상인 그래프에 대한 처리로 생략~~

┖ init_graph_pairs()

┖ train / val / test 에 대한 그래프 정보

Generate 465 training graph pairs.
Generate 10 * 100 val graph pairs.
Generate 10 * 100 testing graph pairs.
Generate 10 * 100 testing2 graph pairs.

┖ setup_model()

┖ self.model = NNN(self.args, self.number_of_labels).to(self.device)

┖ trainer.fit()

┖ optimizer = Adam

┖ model.train()

┖ epoch에 대해서

┖ batches = training 그래프에 대해 random shuffle

┖ loss 계산

┖ prediction analysis : (pre_ged - ground-truth) array가 parameter로 주어진다

┖ result append(저장)

┖ trainer.save()

┖ torch.save(): model 상태, tensor를 파일로 저장

┖ trainer.score()

┖ moedl.eval()

┖ mse, mae, number_of_exact_prediciton, number_of_feasible_prediction

┖ rho, tau, pk10, pk20

┖ 10개의 testing graph에 대해

┖ pack_graph_pair( (pair_type, i, j) ) # id_1, id_2

┖ 새로운 tensor dict return

Model

class NNN(torch.nn.Module):

__init__

┖ agrs

┖ number_labels

┖ setup_layers()

❍ setup_layers(): gin

┖ nn1

┖ Linear

┖ ReLU

┖ Linear

┖ BatchNorm1d

┖ nn2

┖ nn3

┖ convolution_1 = GINConv(nn1)

┖ convolution_2

┖ convolution_3

"""

┖ energy value

┖ attention value

"""

mapMatrix, costMatrix 계산:

┖ GedMatrixModule

┖ d: input feature, k: hidden layer size

┖ init_weight_matrix()

┖ torch.nn.Parameter(torch.Tensor()) 학습 가능한 파라미터

┖ torch.nn.init.xavier_uniform: 가중치 행렬 초기화

┖ init_mlp(): k > 2k > k > 1

┖ layers.append( torch.nn.Linear(k, k * 2) )

┖ layers.append( torch.nn.ReLU )

┖ layers.append( torch.nn.Linear(k * 2, k) )

┖ layers.append( torch.nn.ReLU )

┖ layers.append( torch.nn.Linear(k, 1) )

┖ mlp = torch.nn.Sequential(*layers)

┖ forward(embedding_1, embedding_2):

┖ return similar matrix

❍ forward(data = dictionary):

┖ abstrac_features_1 = convolutional_pass(index_1, features_1) # 노드 임베딩 결과

┖ convolutional_pass(edge_index, features):

┖ features = self.convolution_1(features, edge_index)

┖ = torch.nn.fucntional.relu

┖ = torch.nn.fucntional.dropout

┖ = self.convolution_2

┖ = self.convolution_3

┖ cost_matrix

┖ map_matrix 계산

"""

calculate ged using map_matrix

Softmax(dim=1)

bias_value

┖ pooled_features = attention

┖ scores = fully_connected XXX

score = sigmoid

"""

return score, pre_ged.item(), map_matrix