fix several typo/formulations in explanatory text

2017-10-12 20:52:59 +02:00 · 2017-10-12 20:52:59 +02:00 · 3c910dde8b
parent b240e20413
commit 3c910dde8b
1 changed files with 3 additions and 3 deletions
--- a/nn-from-scratch.ipynb
+++ b/nn-from-scratch.ipynb
@ -199,7 +199,7 @@
   "source": [
    "We can choose the dimensionality (the number of nodes) of the hidden layer. The more nodes we put into the hidden layer the more complex functions we will be able fit. But higher dimensionality comes at a cost. First, more computation is required to make predictions and learn the network parameters. A bigger number of parameters also means we become more prone to overfitting our data. \n",
    "\n",
-    "How to choose the size of the hidden layer? While there are some general guidelines and recommendations, it always depends on your specific problem and is more of an art than a science. We will play with the number of nodes in the hidden later later on and see how it affects our output."
+    "How to choose the size of the hidden layer? While there are some general guidelines and recommendations, it always depends on your specific problem and is more of an art than a science. We will play with the number of nodes in the hidden layer later on and see how it affects our output."
   ]
  },
  {
@ -243,7 +243,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "$z_i$ is the input of layer $i$ and $a_i$ is the output of layer $i$ after applying the activation function. $W_1, b_1, W_2, b_2$ are  parameters of our network, which we need to learn from our training data. You can think of them as matrices transforming data between layers of the network. Looking at the matrix multiplications above we can figure out the dimensionality of these matrices. If we use 500 nodes for our hidden layer then $W_1 \\in \\mathbb{R}^{2\\times500}$, $b_1 \\in \\mathbb{R}^{500}$, $W_2 \\in \\mathbb{R}^{500\\times2}$, $b_2 \\in \\mathbb{R}^{2}$. Now you see why we have more parameters if we increase the size of the hidden layer."
+    "$z_i$ is the weighted sum of inputs of layer $i$ (bias included) and $a_i$ is the output of layer $i$ after applying the activation function. $W_1, b_1, W_2, b_2$ are  parameters of our network, which we need to learn from our training data. You can think of them as matrices transforming data between layers of the network. Looking at the matrix multiplications above we can figure out the dimensionality of these matrices. If we use 500 nodes for our hidden layer then $W_1 \\in \\mathbb{R}^{2\\times500}$, $b_1 \\in \\mathbb{R}^{500}$, $W_2 \\in \\mathbb{R}^{500\\times2}$, $b_2 \\in \\mathbb{R}^{2}$. Now you see why we have more parameters if we increase the size of the hidden layer."
   ]
  },
  {
@ -555,7 +555,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can see that while a hidden layer of low dimensionality nicely capture the general trend of our data, but higher dimensionalities are prone to overfitting. They are \"memorizing\" the data as opposed to fitting the general shape. If we were to evaluate our model on a separate test set (and you should!) the model with a smaller hidden layer size would likely perform better because it generalizes better. We could counteract overfitting with stronger regularization, but picking the a correct size for hidden layer is a much more \"economical\" solution."
+    "We can see that while a hidden layer of low dimensionality nicely capture the general trend of our data, but higher dimensionalities are prone to overfitting. They are \"memorizing\" the data as opposed to fitting the general shape. If we were to evaluate our model on a separate test set (and you should!) the model with a smaller hidden layer size would likely perform better because it generalizes better. We could counteract overfitting with stronger regularization, but picking the correct size for hidden layer is a much more \"economical\" solution."
   ]
  },
  {