A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.