# Build a DeepLearning algorithm from scratch in F# 03 – Building A Deep Neural Network

Welcome to the third post of my “Build a DeepLearning algorithm from scratch in F#” serie.

In the first article, we described how to implement a logistic regression with a single neuron. In the second article, we’ve seen how to implement a neural network made of a single hidden layer.

In today’s article, we will see how to implement and use an $$L$$ layer deep neural network.

## Used Notation

• Superscript $$[l]$$ denotes a quantity associated with the $$l^{th}$$ layer.
• Example: $$a^{[L]}$$ is the $$L^{th}$$ layer activation. $$W^{[L]}$$ and $$b^{[L]}$$ are the $$L^{th}$$ layer parameters.
• Superscript $$(i)$$ denotes a quantity associated with the $$i^{th}$$ example.
• Example: $$x^{(i)}$$ is the $$i^{th}$$ training example.
• Lowerscript $$i$$ denotes the $$i^{th}$$ entry of a vector.
• Example: $$a^{[l]}_i$$ denotes the $$i^{th}$$ entry of the $$l^{th}$$ layer’s activations).

Let’s get started!

## Algorithm description

• 1 – Initialize the parameters for an $$L$$-layer neural network.
• 2 – Implement the forward propagation module (shown in purple in the figure below).
• 2.1 – First, the LINEAR part of a layer’s forward propagation step (resulting in $$Z^{[l]}$$).
• 2.2 – Then apply the differend kind of ACTIVATION functions (relu/sigmoid).
• 2.3 – Combine the previous two steps into a new [LINEAR->ACTIVATION] forward function.
• 2.4 – Stack the [LINEAR->RELU] forward function L-1 time (for layers 1 through L-1) and add a [LINEAR->SIGMOID] at the end (for the final layer $$L$$).
• 3 – Compute the loss.
• 4 – Implement the backward propagation module (denoted in red in the figure below).
• 4.1 First, LINEAR part of a layer’s backward propagation step.
• 4.2 Then apply the differend kind of ACTIVATION functions (relu_backward/sigmoid_backward)
• 4.3 Combine the previous two steps into a new [LINEAR->ACTIVATION] backward function.
• 4.4 Stack [LINEAR->RELU] backward L-1 times and add [LINEAR->SIGMOID] backward in a new L_model_backward function
• 5 – Finally update the parameters. Figure 1

## 1 – Initialization

The initialization for a deeper L-layer neural network is more complicated than a single neuron or a single hidden layer neural net because there are many more weight matrices and bias vectors. $$n^{[l]}$$ is the number of units in layer $$l$$. Thus for example if the size of our input $$X$$ is $$(12288, 209)$$ (with $$m=209$$ examples) then:

 Shape of W Shape of b Activation Shape of Activation Layer 1 $$(n^{},12288)$$ $$(n^{},1)$$ $$Z^{} = W^{} X + b^{}$$ $$(n^{},209)$$ Layer 2 $$(n^{}, n^{})$$ $$(n^{},1)$$ $$Z^{} = W^{} A^{} + b^{}$$ $$(n^{}, 209)$$ $$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$ $$\vdots$$ Layer L-1 $$(n^{[L-1]}, n^{[L-2]})$$ $$(n^{[L-1]}, 1)$$ $$Z^{[L-1]} = W^{[L-1]} A^{[L-2]} + b^{[L-1]}$$ $$(n^{[L-1]}, 209)$$ Layer L $$(n^{[L]}, n^{[L-1]})$$ $$(n^{[L]}, 1)$$ $$Z^{[L]} = W^{[L]} A^{[L-1]} + b^{[L]}$$ $$(n^{[L]}, 209)$$

### 1.1 – Importing nuget packages

Same as usual, we will import the nuget packages that allow us to load the data from csv files, do the matrix operations, and display results using nice charts

 123456 #load "Paket.fsx" Paket.Package     [         "FsLab"     ]
 12345 open System open FSharp.Data open FSharp.Data.CsvExtensions open MathNet.Numerics.LinearAlgebra open MathNet.Numerics.LinearAlgebra.Double
 123 #load "XPlot.Plotly.Paket.fsx" #load "XPlot.Plotly.fsx" open XPlot.Plotly

### 1.2 – Helper methods

 12345678910 let createVector size value =     Seq {         for _ in 0..size-1  do             yield value         } let initialize_with_zeros dim =     let w = createVector dim 0.0 |> DenseVector.ofSeq     let b = 0.0     w, b
 1234567 type MatrixVectorDU =         | Vector of Vector         | Matrix of Matrix let shape x = match x with                     | Vector n -> (1, n.Count)                     | Matrix n -> (n.RowCount, n.ColumnCount)
 1234 type public NnParameter = {     W: Matrix;     b: Vector;     }

### 1.3 – Initialize implementation

 12345678910111213 let initialize_parameters_deep layer_dims =     let layer_count = layer_dims |> List.length     Seq {         for l in 1..layer_count-1 do             let sqrtPrevious = sqrt (double (layer_dims.[l-1]))             let W = (Matrix.Build.Random( (layer_dims.[l]), (layer_dims.[l-1]) )).Map(fun x -> x / sqrtPrevious)             let b = createVector layer_dims.[l] 0.0 |> DenseVector.ofSeq                         yield {                     W = W;                     b = b;                 }         }
 1 initialize_parameters_deep([5;4;3])

## 2 – Forward propagation module

### 2.1 – Linear Forward

Now that we have initialized our parameters, we will do the forward propagation module. We will implement three functions in this order:

• LINEAR
• LINEAR -> ACTIVATION where ACTIVATION will be either ReLU or Sigmoid.
• [LINEAR -> RELU] $$\times$$ (L-1) -> LINEAR -> SIGMOID (whole model)

The linear forward module computes the following equations:

$$Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}$$

where $$A^{} = X$$.

Reminder:
The mathematical representation of this unit is $$Z^{[l]} = W^{[l]}A^{[l-1]} +b^{[l]}$$.

 123456789101112131415161718192021222324 type public LinearForwardResult = {     A: Matrix;     W: Matrix;     b: Vector;     Z: Matrix;     } let linear_forward (A:Matrix) (W:Matrix) (b:Vector) =     //Helper method     let multiply_matrix_by_matrix_of_another_size_and_add_b (X:Matrix) (weights:Matrix) (b:Vector) =         let weight_shape = shape (MatrixVectorDU.Matrix weights)         let mutable weights_vectors = Seq.empty>         for i in 0..(snd weight_shape) - 1 do             let ax_plus_b = X.LeftMultiply (weights.Column(i) |> DenseVector.ofSeq) |> Vector.map (fun x -> x + b.[i])             weights_vectors <- Seq.append weights_vectors [ax_plus_b]         weights_vectors |> DenseMatrix.ofRowSeq     let Z = multiply_matrix_by_matrix_of_another_size_and_add_b A (W.Transpose()) b     {         A = A;         W = W;         b = b;         Z = Z;     }

Test case:

 1234567891011121314 let A_testcase = matrix [                             [ 1.62434536; -0.61175641]                             [-0.52817175; -1.07296862]                             [ 0.86540763; -2.3015387 ]                         ]                         let W_testcase =  matrix [                     [ 1.74481176; -0.7612069; 0.3190391]                 ]                 let b_tescase = vector [-0.24937038] let linear_testcase_result = linear_forward A_testcase W_testcase b_tescase printfn "%s" (linear_testcase_result.Z |> string)

Output:

 Z [[ 3.26295337 -1.23429987]]

### 2.2 – Linear-Activation Forward

We will use two activation functions:

• Sigmoid: $$\sigma(Z) = \sigma(W A + b) = \frac{1}{ 1 + e^{-(W A + b)}}$$.
• ReLU: The mathematical formula for ReLu is $$A = RELU(Z) = max(0, Z)$$.
 12345 let sigmoid (z:Matrix) =     z.Map(fun x -> -x)             |> (fun x -> x.PointwiseExp())             |> (fun x -> 1.0 + x)             |> (fun x -> 1.0 / x)
 1234 let relu (z:Matrix) =     let z_shape = shape (MatrixVectorDU.Matrix z)     let other_matrix = DenseMatrix.Build.Dense(fst z_shape, snd z_shape)     z.Map2((fun x y -> Math.Max(x, y)), other_matrix)

For more convenience, we are going to group two functions (Linear and Activation) into one function (LINEAR->ACTIVATION). Hence, we will implement a function that does the LINEAR forward step followed by an ACTIVATION forward step.

 12345678910111213141516171819202122232425 type public LinearActivationForwardResult = {     A_prev: Matrix;     W: Matrix;     b: Vector;     Z: Matrix;     A: Matrix;     } let linear_activation_forward (a_prev:Matrix) (W:Matrix) (b:Vector) (activation:string) =     let linear_forward_result = linear_forward a_prev W b     let Z = linear_forward_result.Z     let A = if activation = "sigmoid" then sigmoid Z else if activation = "tanh" then tanh Z else relu Z         let W_shape = shape (MatrixVectorDU.Matrix W)     let A_shape = shape (MatrixVectorDU.Matrix A)     let A_prev_shape = shape (MatrixVectorDU.Matrix a_prev)     assert (A_shape = ( (W_shape |> fst), (A_prev_shape |> snd)))     {         A_prev = a_prev;         W = W;         b = b;         Z = Z;         A = A;     }

Test case:

 1234567891011121314151617181920 let A_prev_testcase = matrix [                                 [-0.41675785; -0.05626683]                                 [-2.1361961;   1.64027081]                                 [-1.79343559; -0.84174737]                             ]                             let W_testcase = matrix [                     [ 0.50288142; -1.24528809; -1.05795222;]                 ]                 let b_testcase = vector[-0.90900761] let sigmoid_activation_testcase = linear_activation_forward A_prev_testcase W_testcase b_testcase "sigmoid" printfn "With sigmoid: A = %s" (string sigmoid_activation_testcase.A) let relu_activation_testcase = linear_activation_forward A_prev_testcase W_testcase b_testcase "relu" printfn "With relu: A = %s" (string relu_activation_testcase.A) let tanh_activation_testcase = linear_activation_forward A_prev_testcase W_testcase b_testcase "tanh" printfn "With tanh: A = %s" (string tanh_activation_testcase.A)

Output:

 With sigmoid: A [[ 0.96890023 0.11013289]] With ReLU: A [[ 3.43896131 0. ]] With tanh: A [[ 0.99794156 -0.96982745]]

### 2.3 – Forward propagation loop

For even more convenience when implementing the $$L$$-layer Neural Net, we will need a function that replicates the previous one (linear_activation_forward with RELU) $$L-1$$ times, then follows that with one linear_activation_forward with SIGMOID.

We will keep a “cache” of every activation forward results that will be used to compute the derivatives in the backward propagation step. Figure 2 : *[LINEAR -> RELU] $$\times$$ (L-1) -> LINEAR -> SIGMOID* model

Note: In the code below, the variable AL will denote $$A^{[L]} = \sigma(Z^{[L]}) = \sigma(W^{[L]} A^{[L-1]} + b^{[L]})$$. (This is sometimes also called Yhat, i.e., this is $$\hat{Y}$$.)

 1234567891011121314151617 let L_model_forward (X:Matrix) (parameters:list) =     let mutable caches = Seq.empty     let mutable A = X     let L = parameters |> Seq.length         //Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.     for l in 1..L-1 do         let A_prev = A         let forward_activation_result = linear_activation_forward A_prev parameters.[l-1].W parameters.[l-1].b "relu"         A <- forward_activation_result.A         caches <- Seq.append caches [forward_activation_result]         //Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.     let AL = linear_activation_forward A parameters.[L-1].W parameters.[L-1].b "sigmoid"     caches <- Seq.append caches [AL]     caches

Test case:

 123456789101112131415161718192021222324252627282930313233343536373839 let parameters_testcase = [                     {                         W = matrix [                                         [ 0.35480861;  1.81259031; -1.3564758; -0.46363197;  0.82465384]                                         [-1.17643148;  1.56448966;  0.71270509; -0.1810066;  0.53419953]                                         [-0.58661296; -1.48185327;  0.85724762;  0.94309899;  0.11444143]                                         [-0.02195668; -2.12714455; -0.83440747; -0.46550831;  0.23371059]                                     ];                         b = vector [ 1.38503523;-0.51962709; -0.78015214; 0.95560959];                     };                     {                                             W = matrix[                                         [-0.12673638; -1.36861282;  1.21848065; -0.85750144]                                         [-0.56147088; -1.0335199 ;  0.35877096;  1.07368134]                                         [-0.37550472;  0.39636757; -0.47144628;  2.33660781]                                 ];                         b = vector[ 1.50278553; -0.59545972; 0.52834106];                     };                     {                         W = matrix[                                     [ 0.9398248;  0.42628539; -0.75815703]                                 ];                         b = vector [-0.16236698];                     }                 ]                         let X_testcase = matrix [                             [-0.31178367;  0.72900392;  0.21782079; -0.8990918 ]                             [-2.48678065;  0.91325152;  1.12706373; -1.51409323]                             [ 1.63929108; -0.4298936;   2.63128056;  0.60182225]                             [-0.33588161;  1.23773784;  0.11112817;  0.12915125]                             [ 0.07612761; -0.15512816;  0.63422534;  0.810655  ]                         ] let lmodel_forward_result = L_model_forward X_testcase parameters_testcase let last_testcase = Seq.last lmodel_forward_result printfn "AL = %s" (string(last_testcase.A)) printfn "Length of caches list = %s" (string(lmodel_forward_result |> Seq.length)
 AL [[ 0.03921668 0.70498921 0.19734387 0.04728177]] Length of caches list 3

Great! Now we have a full forward propagation that takes the input X and outputs a row vector $$A^{[L]}$$ containing our predictions. It also records all intermediate values in “caches”. Using $$A^{[L]}$$, we can compute the cost of your predictions.

## 3 – Cost function

Before we implement backward propagation, we need to compute the cost between iterations, because we want to check if our model is actually learning.

We will compute the cross-entropy cost $$J$$, using the following formula: $$-\frac{1}{m} \sum\limits_{i = 1}^{m} (y^{(i)}\log\left(a^{[L] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right))$$

 1234 let compute_cost (AL:Vector) (Y:Vector) =     let m = shape (MatrixVectorDU.Vector Y) |> snd |> float     let logprobs =  AL.PointwiseLog().PointwiseMultiply(Y) + AL.Map(fun a -> 1.0 - a).PointwiseLog().PointwiseMultiply(Y.Map(fun y -> 1.0 - y))     - logprobs.Sum() |> float |> (fun x -> x / m)

Test case:

 123456 let AL_testcase = vector [ 0.8;  0.9;  0.4] let Y_testcase = vector [ 1.0; 1.0; 1.0] let cost = compute_cost AL_testcase Y_testcase printfn "cost = %s" (string cost)

Expected Output:

 cost 0.414932

## 4 – Backward propagation module

Back propagation is used to calculate the gradient of the loss function with respect to the parameters.

Reminder: Figure 3 : Forward and Backward propagation for *LINEAR->RELU->LINEAR->SIGMOID*
*The purple blocks represent the forward propagation, and the red blocks represent the backward propagation.*

Now, similar to forward propagation, we are going to build the backward propagation in three steps:

• LINEAR backward
• LINEAR -> ACTIVATION backward where ACTIVATION computes the derivative of either the ReLU or sigmoid activation
• [LINEAR -> RELU] $$\times$$ (L-1) -> LINEAR -> SIGMOID backward (whole model)

For layer $$l$$, the linear part is: $$Z^{[l]} = W^{[l]} A^{[l-1]} + b^{[l]}$$ (followed by an activation).

Suppose we have already calculated the derivative $$dZ^{[l]} = \frac{\partial \mathcal{L} }{\partial Z^{[l]}}$$. We want to get $$(dW^{[l]}, db^{[l]} dA^{[l-1]})$$. Figure 4
The three outputs $$(dW^{[l]}, db^{[l]}, dA^{[l]})$$ are computed using the input $$dZ^{[l]}$$.Here are the formulas we need:
$$dW^{[l]} = \frac{\partial \mathcal{L} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T}$$
$$db^{[l]} = \frac{\partial \mathcal{L} }{\partial b^{[l]}} = \frac{1}{m} \sum_{i = 1}^{m} dZ^{[l](i)}$$
$$dA^{[l-1]} = \frac{\partial \mathcal{L} }{\partial A^{[l-1]} = W^{[l] T} dZ^{[l]} }$$

### 4.1 – Linear Backward

 12345678910111213141516171819202122232425262728 type public LinearBackwardResult = {     dA_prev: Matrix;     dW: Matrix;     db: Vector;     }     let multiply_matrix_by_matrix_of_another_size (X:Matrix) (weights:Matrix) =     let weight_shape = shape (MatrixVectorDU.Matrix weights)     let mutable weights_vectors = Seq.empty>     for i in 0..(snd weight_shape) - 1 do         let product = X.LeftMultiply (weights.Column(i) |> DenseVector.ofSeq)         weights_vectors <- Seq.append weights_vectors [product]     weights_vectors |> DenseMatrix.ofRowSeq let linear_backward (dZ:Matrix) (cache:LinearActivationForwardResult) =     let A_prev = cache.A_prev     let W = cache.W     let b = cache.b     let m = (shape (MatrixVectorDU.Matrix A_prev)) |> snd |> float     let dW = 1.0 / m * (dZ * A_prev.Transpose())     let db = 1.0 / m * (dZ |> Matrix.sumRows)     let dA_prev =  multiply_matrix_by_matrix_of_another_size dZ W         {         dA_prev = dA_prev;         dW = dW;         db = db;     }

Test case:

 1234567891011121314151617181920212223242526 let linear_cache_testcase = {     A_prev = matrix [                         [-0.52817175; -1.07296862]                         [ 0.86540763; -2.3015387 ]                         [ 1.74481176; -0.7612069 ]                     ];                         W = matrix [                     [ 0.3190391; -0.24937038;  1.46210794]                 ];         b = vector [-2.06014071];         Z = matrix [                     [ 0.0 ]                 ];         A = matrix [                     [ 0.0 ]                 ]; } let dZ_testcase = matrix [ [ 1.62434536; -0.61175641] ] linear_backward dZ_testcase linear_cache_testcase

Output:

 dA_prev [[ 0.51822968 -0.19517421] [-0.40506361 0.15255393] [ 2.37496825 -0.89445391]] dW [[-0.10076895 1.40685096 1.64992505]] db [[ 0.50629448]]

### 4.2 – Linear-Activation backward

Next, we will create a function that merges the two helper functions: linear_backward and the backward step for the activation linear_activation_backward.

We will implement two backward functions:

• sigmoid_backward: Implements the backward propagation for SIGMOID unit.

$$g'(Z^{[l]}) =\sigma (x)\cdot (1-\sigma(x)).$$

• relu_backward: Implements the backward propagation for RELU unit.
dZ = relu_backward dA activation_cache


If $$g(.)$$ is the activation function, sigmoid_backward and relu_backward compute $$dZ^{[l]} = dA^{[l]} * g'(Z^{[l]})$$.

 12345678910 let sigmoid_backward (dA:Matrix) (cache:LinearActivationForwardResult) =     let Z = cache.Z         let s = Z |> Matrix.map (fun x -> -x)                 |> (fun x -> x.PointwiseExp())                 |> Matrix.map (fun x -> 1.0 + x)                 |> Matrix.map (fun x -> 1.0 / x)             let dZ = dA.PointwiseMultiply(s).PointwiseMultiply( (s |> Matrix.map (fun x -> 1.0 - x) ) )     dZ
 123456789 let relu_backward (dA:Matrix) (cache:LinearActivationForwardResult) =     let Z = cache.Z     let mutable dZ = dA.Clone()         for i in 0..dA.RowCount - 1 do         for j in 0..dA.ColumnCount - 1 do             let value = if Z.Item(i, j) <= 0.0 then 0.0 else dA.Item(i, j)             dZ.Item(i, j) <- value     dZ

Implementation: Backpropagation for the LINEAR->ACTIVATION layer.

 12345678910 let linear_activation_backward (dA:Matrix) (cache:LinearActivationForwardResult) (activation:string) =     let dZ = if activation = "sigmoid" then sigmoid_backward dA cache else relu_backward dA cache         let linear_backward_result = linear_backward dZ cache     {         dA_prev = linear_backward_result.dA_prev;         dW = linear_backward_result.dW;         db = linear_backward_result.db;     }

Test case:

 1234567891011121314151617181920212223242526 let linear_cache_testcase = {     A_prev = matrix [                         [ -2.1361961;   1.64027081]                         [ -1.79343559; -0.84174737 ]                         [ 0.50288142; -1.24528809 ]                     ];                         W = matrix [                     [ -1.05795222; -0.90900761;  0.55145404]                 ];         b = vector [ 2.29220801 ];         Z = matrix [                     [ 0.04153939; -1.11792545]                 ];         A = matrix [                     [ 0.0 ]                 ]; } let AL_sigmoid_back_testcase = matrix [ [-0.41675785; -0.05626683] ] linear_activation_backward AL_sigmoid_back_testcase linear_cache_testcase "sigmoid"

Output with sigmoid:

 dA_prev [[ 0.11017994 0.01105339] [ 0.09466817 0.00949723] [-0.05743092 -0.00576154]] dW [[ 0.10266786 0.09778551 -0.01968084]] db [[-0.05729622]]
 1 linear_activation_backward AL_sigmoid_back_testcase linear_cache_testcase "relu"

Output with relu:

 dA_prev [[ 0.44090989 0. ] [ 0.37883606 0. ] [-0.2298228 0. ]] dW [[ 0.44513824 0.37371418 -0.10478989]] db [[-0.20837892]]

### 4.3 – Backpropagation loop

We will implement the backward function for the whole network. Recall that when we implemented the L_model_forward function, at each iteration, we stored a cache which contains (X,W,b, and z). In the back propagation module, we will use those variables to compute the gradients. Therefore, in the L_model_backward function, we will iterate through all the hidden layers backward, starting from layer $$L$$. On each step, we will use the cached values for layer $$l$$ to backpropagate through layer $$l$$. Figure 5 below shows the backward pass. Figure 5/: Backward pass
Initializing backpropagation:
To backpropagate through this network, we know that the output is,
$$A^{[L]} = \sigma(Z^{[L]})$$. Our code thus needs to compute dAL $$= \frac{\partial \mathcal{L}}{\partial A^{[L]}}$$.

We can then use this post-activation gradient dAL to keep going backward. As seen in Figure 5, we can now feed in dAL into the LINEAR->SIGMOID backward function we implemented (which will use the cached values stored by the L_model_forward function). After that, we will have to use a for loop to iterate through all the other layers using the LINEAR->RELU backward function. We will store each dA, dW, and db in the grads dictionary. To do so, we will use this formula :

$$grads[“dW” + str(l)] = dW^{[l]}$$

For example, for $$l=3$$ this would store $$dW^{[l]}$$ in grads[“dW3”].

Implementation: Backpropagation for the [LINEAR->RELU] $$\times$$ (L-1) -> LINEAR -> SIGMOID model.

Test case:

 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950 let AL_testcase = vector [ 1.78862847;  0.43650985] let Y_testcase = vector [ 1.0; 0.0 ] let forward_caches_testcase = [|     {         A = matrix [  [ 0.0; 0.0; 0.0 ] ];         A_prev = matrix [                             [ 0.09649747; -1.8634927 ]                             [-0.2773882; -0.35475898 ]                             [-0.08274148; -0.62700068]                             [-0.04381817; -0.47721803]                         ];         W = matrix [                         [-1.31386475;  0.88462238;  0.88131804;  1.70957306]                         [ 0.05003364; -0.40467741; -0.54535995; -1.54647732]                         [ 0.98236743; -1.10106763; -1.18504653; -0.2056499 ]                    ];         b = vector [ 1.48614836; 0.23671627; -1.02378514];         Z = matrix [                         [-0.7129932;  0.62524497 ]                         [-0.16051336; -0.76883635]                         [-0.23003072;  0.74505627]                    ];         }; {                 A = matrix [  [ 0.0 ] ];         A_prev = matrix [                             [ 1.97611078; -1.24412333]                             [-0.62641691; -0.80376609]                             [-2.41908317; -0.92379202]                         ];         W = matrix  [                         [-1.02387576;  1.12397796; -0.13191423]                     ];         b = vector [ -1.62328545 ];         Z = matrix [ [ 0.64667545; -0.35627076] ];     }; |] let L_model_backward_testcase_result = L_model_backward AL_testcase Y_testcase forward_caches_testcase L_model_backward_testcase_result
 1234567 let dA1 = L_model_backward_testcase_result..dA_prev let dW1 = L_model_backward_testcase_result..dW let db1 = L_model_backward_testcase_result..db printfn "dA1 = %s" (string dA1) printfn "dW1 = %s" (string dW1) printfn "db1 = %s" (string db1)

Output

 dW1 [[ 0.41010002 0.07807203 0.13798444 0.10502167] [ 0. 0. 0. 0. ] [ 0.05283652 0.01005865 0.01777766 0.0135308 ]] db1 [[-0.22007063] [ 0. ] [-0.02835349]] dA1 [[ 0.12913162 -0.44014127] [-0.14175655 0.48317296] [ 0.01663708 -0.05670698]]

## 5 – Update Parameters

In this section we will update the parameters of the model, using gradient descent:

$$W^{[l]} = W^{[l]} – \alpha \text{ } dW^{[l]}$$$$b^{[l]} = b^{[l]} – \alpha \text{ } db^{[l]}$$

where $$\alpha$$ is the learning rate.

 12345678910111213 let update_parameters(parameters:list) (grads:Seq) (learning_rate:float) =     let L = parameters |> Seq.length // the number of layers     let gradsList = grads |> Seq.toList         Seq {         for l in 0..L-1 do                     yield {                     W = parameters.[l].W - (gradsList.[L - 1 - l].dW |> Matrix.map (fun x -> learning_rate * x))                     b = parameters.[l].b - (gradsList.[L - 1 - l].db |> Vector.map (fun x -> learning_rate * x))             }         }

Test case:

 12345678910111213141516171819202122232425262728293031323334353637383940414243444546 let parameters = [                     {                         W = matrix [                                         [-0.41675785; -0.05626683; -2.1361961 ;  1.64027081]                                         [-1.79343559; -0.84174737;  0.50288142; -1.24528809]                                         [-1.05795222; -0.90900761;  0.55145404;  2.29220801]                                     ];                                                             b = vector[ 0.04153939; -1.11792545; 0.53905832];                     };                     {                          W = matrix [                                          [-0.5961597; -0.0191305; 1.17500122];                                     ];                                                             b = vector [-0.74787095];                     };                 ]                     let grads = [|                 {                     dA_prev = matrix [ [ 0.0 ] ];                                         dW = matrix [                                    [-0.40467741; -0.54535995; -1.54647732]                                 ];                                               db = vector [ 0.98236743 ];                 };                 {                     dA_prev = matrix [ [ 0.0 ] ];                                                                                 dW = matrix [                                    [ 1.78862847;  0.43650985;  0.09649747; -1.8634927 ]                                    [-0.2773882 ; -0.35475898; -0.08274148; -0.62700068]                                    [-0.04381817; -0.47721803; -1.31386475;  0.88462238]                                 ];                                               db = vector [ 0.88131804; 1.70957306; 0.05003364];                 }             |] update_parameters parameters grads 0.1

Output:

 W1 [[-0.59562069 -0.09991781 -2.14584584 1.82662008] [-1.76569676 -0.80627147 0.51115557 -1.18258802] [-1.0535704 -0.86128581 0.68284052 2.20374577]] b1 [[-0.04659241] [-1.28888275] [ 0.53405496]] W2 [[-0.55569196 0.0354055 1.32964895]] b2 [[-0.84610769]]

## 6 – Assemble our L-layer Neural Network

We will use use the functions we’ve implemented so far to build a deep network, and apply it to cat vs non-cat classification. Hopefully, we will see an improvement in accuracy relative to our logistic regression implementation.

Let’s get started!

 123456789101112131415161718192021222324 let L_layer_model (X:Matrix) (Y:Vector) (layers_dims:list)                     (learning_rate:float) (num_iterations:int) (print_cost:bool) =         //Parameters initialization.     let mutable parameters = initialize_parameters_deep layers_dims |> Seq.toList     let mutable costs = Seq.empty         //Loop (gradient descent)     for i in 0..num_iterations - 1 do         let forward_result = L_model_forward X parameters         let AL = (forward_result |> Seq.last).A.Row(0)         let cost = compute_cost AL Y         let grads = L_model_backward AL Y forward_result                   // Update parameters.         parameters <- update_parameters parameters grads learning_rate |> Seq.toList                         //Print the cost every 100 training examples         if print_cost && i % 100 = 0 then             (printfn "Cost after iteration %i: %f" i cost)             costs <- Seq.append costs [cost]             parameters, costs
 12345678910111213 let predict (X:Matrix) (Y:Vector) (parameters:list) =         let mutable predictions = Seq.empty     let m = shape (MatrixVectorDU.Vector Y) |> snd |> float     let n = parameters |> List.length         let propagationResult = L_model_forward X parameters     let AL = (propagationResult |> Seq.last).A.Row(0)     let predictions = AL |> Vector.map (fun x -> if x > 0.5 then 1.0 else 0.0)     printfn "Accuracy: {%f}" (100.0 * (1.0 - (predictions - Y |> Vector.map(fun x -> abs x) |> Vector.toArray |> Array.average)))             predictions

## 7 – See our model in action

We will use the same “Cat vs non-Cat” dataset as in our first post. The model we built had 70% test accuracy on classifying cats vs non-cats images. Hopefully, our new model will perform better!

### 7.1 – Load Dataset

 12345678910111213141516171819202122232425262728293031323334353637383940 //returns a sequence of array, first index is a pixel array, second index is a one item label array let parse_csv (x:CsvFile) =     Seq {         for row in x.Rows do             let rowValues = row.Columns                             |> Seq.map (fun c -> int c)                             |> Seq.toArray                                     let pixelsValues = rowValues.[1..]             let labelValues = rowValues.[..0]             yield [| pixelsValues; labelValues |]         } //extract and split parsed value from csv to train/test x matrix and y vector let load_dataset (train:CsvFile) (test:CsvFile) =     let parsed_train_rows =  parse_csv train     let parsed_test_rows = parse_csv test         let extract_y (x:Seq) =           Seq {             for row in x do                 yield (float row..)             }     let extract_x (x:Seq) =         Seq {             for row in x do                 yield row. |> Seq.map (fun r -> float r) |> Seq.toArray             }                     let train_x = extract_x parsed_train_rows |> Seq.toArray |> DenseMatrix.ofColumnArrays     let train_y = extract_y parsed_train_rows |> DenseVector.ofSeq     let test_x = extract_x parsed_test_rows |>  Seq.toArray |> DenseMatrix.ofColumnArrays     let test_y = extract_y parsed_test_rows |> DenseVector.ofSeq         train_x, train_y, test_x, test_y       //building our datasets let train_ds =  CsvFile.Load("C:\\Users\\Mathieu\\Dropbox\\ML\Blog\\03_Deep_Neural_Network\\datasets\\train.csv", ",", ''', false, true, 0) let test_ds =  CsvFile.Load("C:\\Users\\Mathieu\\Dropbox\\ML\Blog\\03_Deep_Neural_Network\\datasets\\test.csv", ",", ''', false, true, 0) let train_x, train_y, test_x, test_y = load_dataset train_ds test_ds
 1234567 let normalize_pixels (matrix:Matrix) =     let columnCount = matrix.ColumnCount - 1     let rowCount = matrix.RowCount - 1     for i in 0..rowCount do         for j in 0..columnCount do             matrix.Item(i, j) <- matrix.Item(i, j) / (float 255)     matrix

### 7.2 – Explore Dataset

 123456789101112131415161718192021 open System.Drawing let showPicture (matrix:Matrix) (index:int) (filename:string) =     let pixelVector = matrix.Column(index)     let mutable i = 0     let mutable line = -1     let mutable vectorIndex = 0     let bitmap = new Bitmap(64, 64)     while vectorIndex < pixelVector.Count - 3 do         if i % 64 = 0 then             i <- 0             line <- line + 1                     bitmap.SetPixel(i, line, Color.FromArgb(                                                 int pixelVector.[vectorIndex],                                                 int pixelVector.[vectorIndex + 1],                                                 int pixelVector.[vectorIndex + 2]))         vectorIndex <- vectorIndex + 3         i <- i + 1     bitmap.Save("C:\\Users\Mathieu\\Dropbox\\ML\\Blog\\03_Deep_Neural_Network\\generated\" + filename)     "" |> Util.Html |> Display
 1 showPicture train_x 2 "image_2.bmp" As usual, we reshape and standardize the images before feeding them to the network.

 12 normalize_pixels train_x normalize_pixels test_x

### 7.3 – Train our model

 1234 let layers_dims = [12288; 20; 7; 5; 1] //  5-layer model let startedTime = System.DateTime.Now let llayer_output = L_layer_model train_x train_y layers_dims 0.0075 1800 true let endedTime = System.DateTime.Now

Cost after iteration 0: 0.666255
Cost after iteration 100: 0.520911
Cost after iteration 200: 0.494446
Cost after iteration 300: 0.423527
Cost after iteration 400: 0.352914
Cost after iteration 500: 0.341232
Cost after iteration 600: 0.290970
Cost after iteration 700: 0.108312
Cost after iteration 800: 0.065780
Cost after iteration 900: 0.042299
Cost after iteration 1000: 0.032148
Cost after iteration 1100: 0.025922
Cost after iteration 1200: 0.021531
Cost after iteration 1300: 0.018271
Cost after iteration 1400: 0.016059
Cost after iteration 1500: 0.014428
Cost after iteration 1600: 0.013200
Cost after iteration 1700: 0.012193

 1234567 llayer_output     |> snd     |> Seq.toArray     |> Chart.Line     |> Chart.WithLayout (Layout(title = "Learning rate=0.0075"))     |> Chart.WithXTitle("iterations (per hundreds)")     |> Chart.WithYTitle("cost")

We can see the cost going down after each iteration batch, meaning our model is learning well ! Now let’s see how accurate the predictions are.

 1 predict train_x train_y (llayer_output |> fst)
seq [0.0; 0.0; 1.0; 0.0; ...]
Accuracy: {100.000000}
 1 predict test_x test_y (llayer_output |> fst)
seq [1.0; 1.0; 1.0; 1.0; ...]
Accuracy: {80.000000}

We can see we achieved an 80% accuracy on predicting cat vs non cat images. In the first article, we had a 70% accuracy, so it is a good improvement.

#### 1 thought on “Build a DeepLearning algorithm from scratch in F# 03 – Building A Deep Neural Network”

• Roman Melniko says:

Awesome post! Thank you!