Introduction

Author of the paper: Open AI and UC Berkeley

  • Investigate if and how grounded compositional language can emerge as a means to achieve goals in MARL
  • Language: represented as a streams of abstract discrete symbols uttered by agents over time ,has a coherent structure(defined vocabulary and syntax)
  • Grounded: words are tied to something
  • Compositional: assemble words to sentence
  • paper
  • blog

Approach

PO-MDP env: agent can see all the agents and landmarks, but cannot see their internal goal and memory bank

  1. $N$ agents .
  2. $M$ landmarks.
  3. $a$ action space contains look at and go to actions.
  4. $g$ :every agent has a internal goal $g$ which contains which agent a want to do action to landmark b.
  5. $x_{ij}$ denote agent j's physical state in agent i's reference frame
  6. $c_i$ is agent i's communication message
  7. $m_i$ is agent i's memory bank related to $c$
  8. $obs_i=[x_{i1},..x_{iN+M},c_{1,...,N},m_i,g_i]$

Discrete communication

$c$~$G(logp)$

Policy arch

  • the pool operation enables different num of agents and messages
  • recurrent memory can capture stream over time

Reward

  • outside:
  • make the prediction better:
  • make already pupulae words more likely to survive:

Experiment

  • assign discrete symbol to natural language

  • vocabulary size will adapte the env it need

Notes

  • limitation: use landmark denote target ,but doesn't learn to abstract target position (i.e. south-east)