Introduction

Author of the paper: New York University and Facebook AI Research

  • Proposed CommNet,which enable cooperating agents to learn communicating before take an action through a continuous channel.
  • Agents prefer not to communicate it to the others unless neccessary accroding to the experiments.
  • Tested on Lever Pulling task and Traffic Junction environments.
  • paper
  • code

Approach

Setting: partial observe,global and continuous communication channel,the state input is $s=[s_1,s_2,...,s_J]$,what we need to get is $a=\phi(s)$

1
2
3
4
5
6
7
8
Init: $h_j^0$ was getted by first layer encoded state $s_j$,and  $c_j^0=0$

for every communication step i to T
calc $h_j^{t+1}=f^i(h_j^i,c_j^i)$ for every agent j
calc message $c_j^{t+1}=1/(J-1)\sum_{j'\neq j}h_{j'}^{i+1}$ for every agent j
end for
calc action distribution q(h_j^T) for every agent j
sample from action distribution and exec it .

use the policy update with baseline to calc the $\delta \theta$

Extensions

Local Connectivity

we can extend the global communication to local communication ,replace the $c_j$ calc

Skip Connections

Cause the $h_j^0$ contains info about the self agent obs,for some case we can contain it into every communicate step

Temporal Recurrence

We can use RNN or LSTM to replace MLP we used before to enable remember info important.

Experiments

I won't go every experiments listed in paper ,but there are some important results in it.

  • Agents prefer not to communicate it to the others unless neccessary .

  • use CommNet with LSTM gain better perfomance than others
  • Continuous channel can do better than discrete channel

Notes

  • Agents prefer not to communicate unless necessary
  • single network with a communication channel at each layer is not easy to scale up