Two More Simple Neural Networks - NN3my.2

Rockwell Anyoha

Feb 23, 20228 min read

A rolling stone gathers no moss. In a previous post, we walked forwards and backwards through a simple neural network. In brief, information in a neural network flows through neurons. Neurons receive inputs, which are the weighted sum of preceding outputs. Then, they transform that sum into a new value (via an activation function) and pass the output to another neuron or to an error function if it's the final layer. Training a neural network is an iterative process in which the weights of the network are updated in order to minimize the error of the network (as determined by the error function) via gradient descent. After the gradient is calculated, the weights are updated using a technique called back-propagation that takes advantage of the chain rule. And so, with the goal being to learn some function, we start with inputs, perform a forward pass in which neurons pass on the activation of the weighted sums of preceding outputs, calculate error (a function of target value - output value), calculate the gradient of the error function, perform a backward pass in which we use back-propagation to update the weights, and keep repeating until the error reaches zero or stabilizes. To hone our abilities and sharpen our understanding of this concept, we will continue with two more simple neural networks.

The first network (really the second of this series) showcases two weighted inputs converging onto an output at the final layer, L (for a brief summary on the notation I'm using, refer to the previous post).

In order to calculate the output, we add the weighted sum of the two inputs. Because we are using the identity function (for now), we can eliminate some terms in the network for simplicity.

So already, we've done the "forward pass". Now, in order to update each weight, we must calculate the gradient of an error function. There are many error functions that we will discuss in the future, but for now we will use the mean squared error (MSE) as we've done before. In order to calculate the gradient, we apply the chain rule.

Now that we have the gradient for each weight, we can update each weight based on a learning rate, η.

Finally, we iterate this process until the error is minimized

Below is a demonstration of the network (learning t=-3x -2y, inputs: 1.3, -2.5, weights: .5, .3).

If you want to practice, the updated weights after each iteration are at the end of the post!

Notice how the final weights are around .7 and .-.08. This network is overtrained to the input examples! If we randomize the inputs, the network is more likely to find the general solution (shown below). The weights actually approach -3 and -2 respectively. We'll talk about proper training etiquette in the future!

So far, the length of the longest path of the two neural networks we've explored has been two neurons. Now, we will introduce an "inner layer". Along with non-linear activation functions, deep neural networks also have inner layers between the input and the output. But, because we've thoroughly derived the general form of the forward and backward pass, we can apply what we know to train such a network.

For the forward pass, we sequentially calculate neuron outputs.

Per usual, we'll make the error function the MSE. First, we can calculate the gradient for the weight between the output layer and the hidden layer, then, we can calculate the gradient for the weight between the input layer and the hidden layer (note that the order of these calculations doesn't actually matter!).

For the weight between the hidden layer and the output layer:

And for the weight between the input layer and the hidden layer:

And again, we iterate the forward and backward passes until the error is minimized!

Here is a demonstration of the training (learning t = x/3, input: 1.5, first weight: .7, second weight: -.8). Again, the updated weights are posted below for those that want to get their hands dirty!

As a bonus, here is the general formula for updating a weight any distance from the output given that the activation function is the identity function!

So that's enough for today! I'm hoping these examples will help reinforce the skills and concepts learned thus far. We will continue to push further into the depth of neural networks. So brace yourselves!

-R

# Training Network 2

iteration: 0
top weight = 0.5  :   bottom weight = 0.3



iteration: 1
inputs = 1.3 , -2.5  :  output = -0.09999999999999998
top weight = 0.8119999999999999  :   bottom weight = -0.2999999999999999  :  error = 1.4399999999999993



iteration: 2
inputs = 1.3 , -2.5  :  output = 1.8055999999999996
top weight = 0.628544  :   bottom weight = 0.052800000000000125  :  error = 0.49787136



iteration: 3
inputs = 1.3 , -2.5  :  output = 0.6851071999999997
top weight = 0.736416128  :   bottom weight = -0.15464639999999988  :  error = 0.17213603549183995



iteration: 4
inputs = 1.3 , -2.5  :  output = 1.3439569663999997
top weight = 0.672987316736  :   bottom weight = -0.032667916799999835  :  error = 0.05951500145509077



iteration: 5
inputs = 1.3 , -2.5  :  output = 0.9565533037567996
top weight = 0.710283457759232  :   bottom weight = -0.10439126492159986  :  error = 0.020576954663088905



iteration: 6
inputs = 1.3 , -2.5  :  output = 1.1843466573910013
top weight = 0.6883533268375716  :   bottom weight = -0.06221793622609903  :  error = 0.007114358613035016



iteration: 7
inputs = 1.3 , -2.5  :  output = 1.0504041654540905
top weight = 0.7012482438195079  :   bottom weight = -0.08701585349905358  :  error = 0.0024597468043051907



iteration: 8
inputs = 1.3 , -2.5  :  output = 1.1291623507129942
top weight = 0.6936660326341293  :   bottom weight = -0.0724346781425563  :  error = 0.0008504426991076941



iteration: 9
inputs = 1.3 , -2.5  :  output = 1.082852537780759
top weight = 0.6981243728111318  :   bottom weight = -0.08100840925217664  :  error = 0.00029403546056028663



iteration: 10
inputs = 1.3 , -2.5  :  output = 1.1100827077849131
top weight = 0.6955028687870544  :   bottom weight = -0.0759670553597199  :  error = 0.00010166099627595513



iteration: 11
inputs = 1.3 , -2.5  :  output = 1.0940713678224705
top weight = 0.6970443131532119  :   bottom weight = -0.07893137144848447  :  error = 3.5148679496433996e-05



iteration: 12
inputs = 1.3 , -2.5  :  output = 1.1034860357203866
top weight = 0.6961379438659113  :   bottom weight = -0.07718835358829099  :  error = 1.2152445043813906e-05



iteration: 13
inputs = 1.3 , -2.5  :  output = 1.0979502109964123
top weight = 0.696670889006844  :   bottom weight = -0.07821324809008465  :  error = 4.201634959227503e-06



iteration: 14
inputs = 1.3 , -2.5  :  output = 1.1012052759341089
top weight = 0.6963575172639757  :   bottom weight = -0.07761061012303003  :  error = 1.4526900773429012e-06



iteration: 15
inputs = 1.3 , -2.5  :  output = 1.0992912977507434
top weight = 0.6965417798487823  :   bottom weight = -0.07796496124765814  :  error = 5.022588781008427e-07



iteration: 16
inputs = 1.3 , -2.5  :  output = 1.1004167169225623
top weight = 0.6964334334489161  :   bottom weight = -0.07775660278637683  :  error = 1.736529935500615e-07



iteration: 17
inputs = 1.3 , -2.5  :  output = 1.099754970449533
top weight = 0.6964971411320374  :   bottom weight = -0.07787911756161012  :  error = 6.003948060184885e-08



iteration: 18
inputs = 1.3 , -2.5  :  output = 1.100144077375674
top weight = 0.6964596810143621  :   bottom weight = -0.07780707887377289  :  error = 2.075829018123839e-08



iteration: 19
inputs = 1.3 , -2.5  :  output = 1.0999152825031029
top weight = 0.6964817075635552  :   bottom weight = -0.07784943762222127  :  error = 7.177054280452183e-09



iteration: 20
inputs = 1.3 , -2.5  :  output = 1.100049813888175
top weight = 0.6964687559526297  :   bottom weight = -0.07782453067813364  :  error = 2.4814234551378282e-09



iteration: 21
inputs = 1.3 , -2.5  :  output = 1.0999707094337527
top weight = 0.6964763714998539  :   bottom weight = -0.07783917596125708  :  error = 8.579372710639118e-10



iteration: 22
inputs = 1.3 , -2.5  :  output = 1.1000172228529528
top weight = 0.696471893558086  :   bottom weight = -0.07783056453478052  :  error = 2.966266638453444e-10



iteration: 23
inputs = 1.3 , -2.5  :  output = 1.099989872962463
top weight = 0.6964745265878456  :   bottom weight = -0.0778356280535488  :  error = 1.025568892668114e-10



iteration: 24
inputs = 1.3 , -2.5  :  output = 1.1000059546980714
top weight = 0.6964729783663469  :   bottom weight = -0.07783265070451292  :  error = 3.5458429125605034e-11



iteration: 25
inputs = 1.3 , -2.5  :  output = 1.0999964986375335
top weight = 0.6964738887205881  :   bottom weight = -0.07783440138574602  :  error = 1.2259539119609406e-11



iteration: 26
inputs = 1.3 , -2.5  :  output = 1.1000020588011297
top weight = 0.6964733534322943  :   bottom weight = -0.077833371985181  :  error = 4.238662092982575e-12



iteration: 27
inputs = 1.3 , -2.5  :  output = 1.0999987894249352
top weight = 0.6964736681818111  :   bottom weight = -0.07783397727271321  :  error = 1.4654919865815492e-12



iteration: 28
inputs = 1.3 , -2.5  :  output = 1.1000007118181374
top weight = 0.6964734831090953  :   bottom weight = -0.07783362136364433  :  error = 5.066850612531246e-13



iteration: 29
inputs = 1.3 , -2.5  :  output = 1.0999995814509347
top weight = 0.6964735919318522  :   bottom weight = -0.07783383063817681  :  error = 1.751833197807257e-13



iteration: 30
inputs = 1.3 , -2.5  :  output = 1.10000024610685
top weight = 0.6964735279440711  :   bottom weight = -0.07783370758475158  :  error = 6.056858184061039e-14

# Training Network 3

iteration: 0
first weight = 0.7  :   second weight = -0.8



iteration: 1
input = 1.5  :  output = -0.8400000000000001
first weight = 0.37839999999999985  :   second weight = -0.5186000000000001  :  error = 1.7956000000000003



iteration: 2
input = 1.5  :  output = -0.2943573599999999
first weight = 0.25481388193119986  :   second weight = -0.4284245524928001  :  error = 0.6310036153861694



iteration: 3
input = 1.5  :  output = -0.16375278500299126
first weight = 0.16950348496697296  :   second weight = -0.37768452535602287  :  error = 0.44056775959922717



iteration: 4
input = 1.5  :  output = -0.09602826489891442
first weight = 0.10197028926883685  :   second weight = -0.34737586494426764  :  error = 0.35524969255841043



iteration: 5
input = 1.5  :  output = -0.053133026150069054
first weight = 0.044326770252400706  :   second weight = -0.33045492454006675  :  error = 0.305956144617933



iteration: 6
input = 1.5  :  output = -0.021971999278292926
first weight = -0.007419695037660122  :   second weight = -0.3235137246730082  :  error = 0.2724547680305783



iteration: 7
input = 1.5  :  output = 0.003600559766356893
first weight = -0.05559730458833479  :   second weight = -0.32461866441202786  :  error = 0.24641240426427424



iteration: 8
input = 1.5  :  output = 0.02707188414056092
first weight = -0.10165369258829116  :   second weight = -0.3325067229637752  :  error = 0.22366100277035902



iteration: 9
input = 1.5  :  output = 0.05070080429954954
first weight = -0.14647219354607616  :   second weight = -0.3462085996597452  :  error = 0.20186976725707168



iteration: 10
input = 1.5  :  output = 0.07606489952501728
first weight = -0.1905031867906933  :   second weight = -0.3648370108860693  :  error = 0.1797209694147337



iteration: 11
input = 1.5  :  output = 0.1042539198494806
first weight = -0.23381803187629158  :   second weight = -0.387454277714649  :  error = 0.1566149599545013



iteration: 12
input = 1.5  :  output = 0.135890694985934
first weight = -0.2761407442113139  :   second weight = -0.412994874040519  :  error = 0.13257558599782615



iteration: 13
input = 1.5  :  output = 0.17106706780951014
first weight = -0.3168950286806509  :   second weight = -0.44024440944772647  :  error = 0.1081968738794334



iteration: 14
input = 1.5  :  output = 0.20926689713765023
first weight = -0.355293115633613  :   second weight = -0.46788397193872017  :  error = 0.08452573709996965



iteration: 15
input = 1.5  :  output = 0.24935393121770674
first weight = -0.3904750990974185  :   second weight = -0.4945998187484135  :  error = 0.06282345179601807



iteration: 16
input = 1.5  :  output = 0.289693369859028
first weight = -0.4216803854422128  :   second weight = -0.5192356694219555  :  error = 0.044228878681251596



iteration: 17
input = 1.5  :  output = 0.3284272458257934
first weight = -0.44840639360267665  :   second weight = -0.540940328955424  :  error = 0.029437209974922725



iteration: 18
input = 1.5  :  output = 0.36384165309172084
first weight = -0.47050245589265405  :   second weight = -0.5592566109442371  :  error = 0.018539095432795295



iteration: 19
input = 1.5  :  output = 0.39469741338519915
first weight = -0.4881698062068107  :   second weight = -0.5741201486284708  :  error = 0.011088634747767636



iteration: 20
input = 1.5  :  output = 0.420402182543079
first weight = -0.5018794194434716  :   second weight = -0.5857773239651999  :  error = 0.00633581254390532



iteration: 21
input = 1.5  :  output = 0.4409843749122074
first weight = -0.5122504239242898  :   second weight = -0.5946629422623464  :  error = 0.003482844004502898



iteration: 22
input = 1.5  :  output = 0.45692451639892856
first weight = -0.5199350420695658  :   second weight = -0.6012825726829641  :  error = 0.0018554972874661746



iteration: 23
input = 1.5  :  output = 0.4689418195854206
first weight = -0.5255374648563248  :   second weight = -0.6061270435851017  :  error = 0.0009646105706645645



iteration: 24
input = 1.5  :  output = 0.4778137047998602
first weight = -0.529571778911655  :   second weight = -0.6096249623853124  :  error = 0.000492231694707747



iteration: 25
input = 1.5  :  output = 0.48426026369901093
first weight = -0.5324503797567885  :   second weight = -0.612125558431067  :  error = 0.00024773929882467314



iteration: 26
input = 1.5  :  output = 0.4888897290681868
first weight = -0.5344906439963255  :   second weight = -0.6139002588241204  :  error = 0.0001234381201782937



iteration: 27
input = 1.5  :  output = 0.4921859170326225
first weight = -0.5359297642631694  :   second weight = -0.6151532250953626  :  error = 6.105989262105933e-05



iteration: 28
input = 1.5  :  output = 0.49451838436662915
first weight = -0.5369413743238498  :   second weight = -0.6160345533876147  :  error = 3.0048109952015718e-05



iteration: 29
input = 1.5  :  output = 0.49616165959038727
first weight = -0.5376507394198454  :   second weight = -0.6166528425200127  :  error = 1.4732857100066007e-05



iteration: 30
input = 1.5  :  output = 0.49731578511935154
first weight = -0.5381473080406713  :   second weight = -0.6170857935546155  :  error = 7.205009525494633e-06

Two More Simple Neural Networks - NN3my.2

Recent Posts

Comments

Succumb!

Let the posts
come to you.

Two More Simple Neural Networks - NN3my.2

Recent Posts

Comments

Succumb!

Let the posts come to you.

Let the posts
come to you.