Skip to content

Commit 8ad50a3

Browse files
committed
update gpt-2 paper link
1 parent 1e48c13 commit 8ad50a3

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

ch04/01_main-chapter-code/ch04.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@
106106
"source": [
107107
"- In previous chapters, we used small embedding dimensions for token inputs and outputs for ease of illustration, ensuring they fit on a single page\n",
108108
"- In this chapter, we consider embedding and model sizes akin to a small GPT-2 model\n",
109-
"- We'll specifically code the architecture of the smallest GPT-2 model (124 million parameters), as outlined in Radford et al.'s [Language Models are Unsupervised Multitask Learners](https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe) (note that the initial report lists it as 117M parameters, but this was later corrected in the model weight repository)\n",
109+
"- We'll specifically code the architecture of the smallest GPT-2 model (124 million parameters), as outlined in Radford et al.'s [Language Models are Unsupervised Multitask Learners](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=dOad5HoAAAAJ&citation_for_view=dOad5HoAAAAJ:YsMSGLbcyi4C) (note that the initial report lists it as 117M parameters, but this was later corrected in the model weight repository)\n",
110110
"- Chapter 6 will show how to load pretrained weights into our implementation, which will be compatible with model sizes of 345, 762, and 1542 million parameters"
111111
]
112112
},
@@ -1271,7 +1271,7 @@
12711271
"id": "309a3be4-c20a-4657-b4e0-77c97510b47c",
12721272
"metadata": {},
12731273
"source": [
1274-
"- Exercise: you can try the following other configurations, which are referenced in the [GPT-2 paper](https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe), as well.\n",
1274+
"- Exercise: you can try the following other configurations, which are referenced in the [GPT-2 paper](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=dOad5HoAAAAJ&citation_for_view=dOad5HoAAAAJ:YsMSGLbcyi4C), as well.\n",
12751275
"\n",
12761276
" - **GPT2-small** (the 124M configuration we already implemented):\n",
12771277
" - \"emb_dim\" = 768\n",

0 commit comments

Comments
 (0)