Era 3 · Attention Era (2017-2019)¶

Starting from the Transformer paper, the entire NLP / CV / multimodal landscape was rewritten by self-attention.

Collected Notes¶

AlphaZero — Erasing Human Go Knowledge from RL via Pure Self-Play · 2017 · Silver et al.
Capsule Networks — Routing Parts into Wholes · 2017 · Sabour, Frosst & Hinton
CycleGAN — Unlocking Unpaired Image Translation via Cycle Consistency Loss · 2017 · Zhu et al.
GCN — Founding Semi-supervised Node Classification and Graph Neural Networks · 2017 · Kipf & Welling
Mask R-CNN — Unifying Instance Segmentation by Adding One Branch to Faster R-CNN · 2017 · He et al.
MobileNet — Bringing Deep Learning to Mobile Devices via Depthwise Separable Convolutions · 2017 · Howard et al.
PointNet — Permutation-Invariant Deep Networks for Unordered Point Clouds · 2017 · Qi et al.
PPO — How Clipping Finally Made Policy Gradient Tunable and Usable · 2017 · Schulman et al.
Transformer — Burying Recurrence with Attention · 2017 · Vaswani et al.
WGAN — Curing GAN Training Instability with Wasserstein Distance · 2017 · Arjovsky et al.
BERT — Ushering NLP into the Pretraining Era via Masked Language Modeling · 2018 · Devlin et al.
ELMo — Bringing Contextual Embeddings Mainstream via BiLSTM Bidirectional LMs · 2018 · Peters et al.
Graph Attention Networks (GAT) — Attention as a Learnable Graph Edge · 2018 · Velickovic et al.
GPT-1 — Igniting the Pre-training Revolution with Decoder-only Transformer · 2018 · Radford et al.
Group Normalization — Freeing Normalization from Batch Size · 2018 · Wu & He
PGD Adversarial Training — Robustness as Min-Max Optimization · 2018 · Madry et al.
SE-Net — Channel Attention Crowning the ILSVRC 2017 Champion · 2018 · Hu et al.
StyleGAN — Pushing GAN to Photorealistic Face Generation via Style Modulation · 2018 · Karras et al.
ULMFiT — Making Language Model Fine-tuning Work · 2018 · Howard & Ruder
EfficientNet — Redefining CNN Efficiency via Compound Scaling · 2019 · Tan & Le
GPT-2 — Announcing the LLM Era with Scale and Zero-shot · 2019 · Radford et al.
RoBERTa — The Engineering Audit That Re-trained BERT Properly · 2019 · Liu et al.
Sentence-BERT — Turning BERT into a Sentence Embedding Engine · 2019 · Reimers & Gurevych
T5 — Unifying All NLP Tasks as Text-to-Text · 2019 · Raffel et al.