Era 3 · Attention Era (2017-2019)¶
Starting from the Transformer paper, the entire NLP / CV / multimodal landscape was rewritten by self-attention.
Collected Notes¶
- AlphaZero — Erasing Human Go Knowledge from RL via Pure Self-Play · 2017 · Silver et al.
- Capsule Networks — Routing Parts into Wholes · 2017 · Sabour, Frosst & Hinton
- CycleGAN — Unlocking Unpaired Image Translation via Cycle Consistency Loss · 2017 · Zhu et al.
- GCN — Founding Semi-supervised Node Classification and Graph Neural Networks · 2017 · Kipf & Welling
- Mask R-CNN — Unifying Instance Segmentation by Adding One Branch to Faster R-CNN · 2017 · He et al.
- MobileNet — Bringing Deep Learning to Mobile Devices via Depthwise Separable Convolutions · 2017 · Howard et al.
- PointNet — Permutation-Invariant Deep Networks for Unordered Point Clouds · 2017 · Qi et al.
- PPO — How Clipping Finally Made Policy Gradient Tunable and Usable · 2017 · Schulman et al.
- Transformer — Burying Recurrence with Attention · 2017 · Vaswani et al.
- WGAN — Curing GAN Training Instability with Wasserstein Distance · 2017 · Arjovsky et al.
- BERT — Ushering NLP into the Pretraining Era via Masked Language Modeling · 2018 · Devlin et al.
- ELMo — Bringing Contextual Embeddings Mainstream via BiLSTM Bidirectional LMs · 2018 · Peters et al.
- Graph Attention Networks (GAT) — Attention as a Learnable Graph Edge · 2018 · Velickovic et al.
- GPT-1 — Igniting the Pre-training Revolution with Decoder-only Transformer · 2018 · Radford et al.
- Group Normalization — Freeing Normalization from Batch Size · 2018 · Wu & He
- PGD Adversarial Training — Robustness as Min-Max Optimization · 2018 · Madry et al.
- SE-Net — Channel Attention Crowning the ILSVRC 2017 Champion · 2018 · Hu et al.
- StyleGAN — Pushing GAN to Photorealistic Face Generation via Style Modulation · 2018 · Karras et al.
- ULMFiT — Making Language Model Fine-tuning Work · 2018 · Howard & Ruder
- EfficientNet — Redefining CNN Efficiency via Compound Scaling · 2019 · Tan & Le
- GPT-2 — Announcing the LLM Era with Scale and Zero-shot · 2019 · Radford et al.
- RoBERTa — The Engineering Audit That Re-trained BERT Properly · 2019 · Liu et al.
- Sentence-BERT — Turning BERT into a Sentence Embedding Engine · 2019 · Reimers & Gurevych
- T5 — Unifying All NLP Tasks as Text-to-Text · 2019 · Raffel et al.