Author of the paper: Anonymous,from ICLR 2019 under review.

  • Motivation: communicate to different agents with different messages enble a more flexble strategy;avtively select message recievers can reduce the cost of communication;
  • Use soft attention in the communication architecture to enable agents to communicate agent-goal-specific msgs,be adaptive to variable team sizes and be interpretable of what message to whom
  • Use multi-stage communication to exchange infomations
  • Tested on 2D and even 3D scene.
  • paper


every agent get its local observation and the aggregated message for it ,and they share a policy network constructed by GRU ,use this network output the message and the action they would take .


The message output by the policy network consists two parts:$m_i^t=[k_i^t,v_i^t]$ .the $k_i^t$ denotes the signature for target recipients,and the $v_i^t$ denotes the value they want to transmit. for the reciever,it will generate an query ,then apply to all the message it recived.

I didn't get clear about the multi-stage communication,waited for the paper updated.


what we can take away

  • choose the message routing is a good thought,but we should focus on the listener.
  • multi-stage communictaion can be useful in complex scene to communicate action intentions and make the final decision
  • use GRU to handle the message and the local observarion are useful for PO-MDP.


  • the multi stage communication did not get a clear statement.
  • It's still transmit all the info to all others,that seems like didn't match the paper's intitution s