Commit b4440b05 authored by Petr Kubeš's avatar Petr Kubeš

finishing touches

parent cbe39ded
......@@ -47,7 +47,7 @@
<body>
<div class="container mt-3">
<h1>Neural network visualized</h1>
<h1>Neural network playground</h1>
<p class="lead">Visualization of a simple neural network for strictly educational purposes.</p>
<hr>
<div class="container-fluid" id="controls">
......@@ -161,9 +161,10 @@
<canvas id="content" width="1200" height="600" style="border: 5px solid #2E282A; margin-bottom: 1em">Sorry, you need a better browser to see the neural net.</canvas>
<div>
<h2>What is this?</h2>
<h5>Cost function</h5>
<p>This is implementation of neural network with back-propagation. There aren't any special tricks, it's as simple neural
network as it gets. The cost is defined as \(C = \frac{1}{2 \times sampleCnt}\sum^{sampleCnt}_{m=1}(\sum^{outputSize}_{n=1}(neruon_n-target_n)^2)\).
network as it gets.</p>
<h5>Cost function</h5>
<p>The cost is defined as \(C = \frac{1}{2 \times sampleCnt}\sum^{sampleCnt}_{m=1}(\sum^{outputSize}_{n=1}(neruon_n-target_n)^2)\).
In words: Error is defined as \((value - target)^2\). To get error of neural network for one training sample, you
simply add errors of all output neurons. The total cost is then defined as average error of all training samples.
</p>
......@@ -174,13 +175,15 @@
<a target="_blank" href="https://www.desmos.com/calculator/dw9fmqwlmn">sigmoid</a> function the that sum. Other activation functions are possible, but I have not implemented them yet.
</p>
<h5>Back propagation</h5>
<p>The cost function defined above is a function dependend on weights of connections in the same way as \(f(x, y) = x^2
+ y^2\) is dependend on x and y. In the beginning, the weights are random. Let's say x = 5 and y = 3. The cost at
this point would be 25 + 9 = 34, which we want to get to 0. Now we take the derivate with respect to each of these
weights, which tells us how to adjust the wieghts to minimize the function. \(\frac{\partial f(x, y)}{\partial x}
= 2x\), \(\frac{\partial f(x, y)}{\partial y} = 2y\). Now that we have the derivatives, we know the "direction" in
which to change the weights. \(x_{new} = x_{old} - rate \times 2x = 5 - 0.1 \times 2 \times 5 = 4\) and that's a
little bit closer to the desired 0 result of f(x, y). The rate is necessary to avoid stepping over the minimum.</p>
<p>
In the simplest way possible: The cost function defined above is a function dependend on weights of connections in the same
way as \(f(x, y) = x^2 + y^2\) is dependend on x and y. What you do is that you take derivation of C with respect
to each of the weights. Each of these derivatives tells you in which direction (up/down) will the cost change if
you increase/decrease the weight. So you take the old weight and substract a small step is a direction that decreases
the cost. In equation: \(w_{new} = w_{old} - rate \times \frac{\partial C}{\partial w_{old}}\). How to compute the
derivative is a little bit harder, but all you need to know is the
In practice is the computation of the derivatives is a little bit harder, but all you need to know is the
<i>chain rule</i>. I highly recommend
<a target="_blank" href="https://www.youtube.com/watch?v=aircAruvnKk">3blue1brown's series</a> and
<a target="_blank" href="https://web.archive.org/web/20150317210621/https://www4.rgu.ac.uk/files/chapter3%20-%20bp.pdf">this paper</a> for better understanding.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment