top of page

Transformers and Attention Mechanism

The Encoder is on the left and the Decoder is on the right.

Both Encoder and Decoder are composed of modules that can be stacked on top of each other multiple times, which is described by Nx in the figure.

We see that the modules consist mainly of Multi-Head Attention and Feed Forward layers. The inputs and outputs (target sentences) are first embedded into an n-dimensional space since we cannot use strings directly.

Recent Posts

See All

OpenAI: Emergent Tool Use from Multi-Agent Interaction

DARPA’s New Project Is Investing Millions in Brain-Machine Interface Tech

Deliver Your Analytics Projects on Time

Something I don’t hear talked about enough in the data space is delivering work in a timely way. Software engineers spend a tremendous...

Comments

bottom of page