跳转至

Awesome AI Papers

基础模型 (2020-2022)

基础模型 (2020-2022)¶

从 GPT-3 引爆 scaling 信仰，到 ChatGPT 重塑产品形态 —— 大模型从研究范式变成商业范式的关键 3 年。

收录笔记¶

GPT-3 — 当语言模型大到 175B，prompting 成为新的编程范式 · 2020 · Brown et al. (OpenAI)
DDPM — 把生成建模重新表述为去噪过程 · 2020 · Ho, Jain, Abbeel (UC Berkeley)
ViT — 一张图等于 16×16 个词，Transformer 进入视觉 · 2020 · Dosovitskiy et al. (Google Brain)
CLIP — 用 4 亿图文对学会"看图配文字" · 2021 · Radford et al. (OpenAI)
AlphaFold 2 — 用注意力机制破解 50 年蛋白质折叠难题 · 2021 · Jumper et al. (DeepMind)
Stable Diffusion — 用 latent diffusion 把 T2I 民主化 · 2022 · Rombach et al. (CompVis)
InstructGPT — RLHF 把 GPT-3 拉成 ChatGPT 的母体 · 2022 · Ouyang et al. (OpenAI)
Chain-of-Thought Prompting — 一句"let's think step by step"激活推理能力 · 2022 · Wei et al. (Google Brain)
Chinchilla — 用 compute-optimal scaling 重写 scaling laws · 2022 · Hoffmann et al. (DeepMind)
Flamingo — 冻结 LM + Perceiver Resampler + Gated Cross-Attention，多模态 few-shot 起点 · 2022 · Alayrac et al. (DeepMind)
LoRA — 用低秩分解让人人都能微调 175B 大模型 · 2021 · Hu et al. (Microsoft Research)
MAE — mask 75% 让 ViT 迎来自己的 BERT 时刻 · 2022 · He et al. (FAIR)
NeRF — 8 层 MLP 把场景压成可微分的 5D 辐射场 · 2020 · Mildenhall et al. (UC Berkeley)
Scaling Laws — 用 7 个数量级实验把 LLM 变成可预测的工程 · 2020 · Kaplan et al. (OpenAI)