博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
从头学习计算机网络_如何从头开始构建神经网络
阅读量:2524 次
发布时间:2019-05-11

本文共 14893 字,大约阅读时间需要 49 分钟。

从头学习计算机网络

Neural Networks are like the workhorses of Deep learning. With enough data and computational power, they can be used to solve most of the problems in deep learning. It is very easy to use a Python or R library to create a neural network and train it on any dataset and get a great accuracy.

神经网络就像是深度学习的主力军。 有了足够的数据和计算能力,它们就可以用来解决深度学习中的大多数问题。 使用Python或R库创建神经网络并在任何数据集上对其进行训练都是非常容易的,并且具有很高的准确性。

We can treat neural networks as just some black box and use them without any difficulty. But even though it seems very easy to go that way, it's much more exciting to learn what lies behind these algorithms and how they work.

我们可以将神经网络视为只是一个黑匣子,并且可以毫无困难地使用它们。 但是,尽管走这条路似乎很容易,但了解这些算法背后的内容以及它们的工作原理却更加令人兴奋。

In this article we will get into some of the details of building a neural network. I am going to use Python to write code for the network. I will also use Python's numpy library to perform numerical computations. I will try to avoid some complicated mathematical details, but I will refer to some brilliant resources in the end if you want to know more about that.

在本文中,我们将探讨构建神经网络的一些细节。 我将使用Python编写网络代码。 我还将使用Python的numpy库执行数值计算。 我将尽量避免一些复杂的数学细节,但是如果您想进一步了解这些知识,我将在最后引用一些出色的资源。

So let's get started.

因此,让我们开始吧。

理念 (Idea)

Before we start writing code for our Neural Network, let's just wait and understand what exactly is a Neural Network.

在开始为我们的神经网络编写代码之前,让我们拭目以待,确切地了解什么是神经网络。

In the image above you can see a very casual diagram of a neural network. It has some colored circles connected to each other with arrows pointing to a particular direction. These colored circles are sometimes referred to as neurons.

在上图中,您可以看到一个非常随意的神经网络图。 它具有一些彼此相连的彩色圆圈,并带有指向特定方向的箭头。 这些彩色圆圈有时被称为神经元

These neurons are nothing but mathematical functions which, when given some input, generate an output. The output of neurons depends on the input and the parameters of the neurons. We can update these parameters to get a desired value out of the network.

这些神经元不过是数学函数,当给出一些输入时,它们就会产生输出神经元输出取决于输入神经元参数 。 我们可以更新这些参数以从网络中获得所需的值。

Each of these neurons are defined using sigmoid function. A sigmoid function gives an output between zero to one for every input it gets. These sigmoid units are connected to each other to form a neural network.

这些神经元中的每一个都使用S形函数定义。 S形函数对于得到的每个输入,其输出在0到1之间。 这些S形单元彼此连接以形成神经网络。

By connection here we mean that the output of one layer of sigmoid units is given as input to each sigmoid unit of the next layer. In this way our neural network produces an output for any given input. The process continues until we have reached the final layer. The final layer generates its output.

这里的连接是指将一层S形单元的输出作为输入提供给下一层的每个S形单元。 这样,我们的神经网络就可以为任何给定的输入生成输出。 这个过程一直持续到我们到达最后一层。 最后一层生成其输出。

This process of a neural network generating an output for a given input is Forward Propagation. Output of final layer is also called the prediction of the neural network. Later in this article we will discuss how we evaluate the predictions. These evaluations can be used to tell whether our neural network needs improvement or not.

神经网络为给定输入生成输出的过程是正向传播 。 最后一层的输出也称为神经网络的预测 。 在本文的后面,我们将讨论如何评估这些预测 。 这些评估可以用来判断我们的神经网络是否需要改进。

Right after the final layer generates its output, we calculate the cost function. The cost function computes how far our neural network is from making its desired predictions. The value of the cost function shows the difference between the predicted value and the truth value.

在最后一层生成其输出之后,我们立即计算成本函数 。 代价函数计算出我们的神经网络离做出期望的预测有多远。 成本函数的值表示预测值真实值之间的差。

Our objective here is to minimize the value of the cost function. The process of minimization of the cost function requires an algorithm which can update the values of the parameters in the network in such a way that the cost function achieves its minimum value.

我们的目标是使成本函数的值最小化。 成本函数最小化的过程需要一种算法,该算法可以更新网络中参数的值,以使成本函数达到其最小值

Algorithms such as gradient descent and stochastic gradient descent are used to update the parameters of the neural network. These algorithms update the values of weights and biases of each layer in the network depending on how it will affect the minimization of cost function. The effect on the minimization of the cost function with respect to each of the weights and biases of each of the input neurons in the network is computed by backpropagation.

使用诸如梯度下降随机梯度下降之类的算法来更新神经网络的参数 。 这些算法根据其将如何影响成本函数的最小化来更新网络中每一层的权重和偏差值。 通过反向传播计算网络中每个输入神经元的权重和偏差对成本函数最小化的影响。

(Code)

So, we now know the main ideas behind the neural networks. Let us start implementing these ideas into code. We will start by importing all the required libraries.

因此,我们现在知道了神经网络背后的主要思想。 让我们开始将这些想法实现为代码。 我们将从导入所有必需的库开始。

import numpy as npimport matplotlib.pyplot as plt

As I mentioned we are not going to use any of the deep learning libraries. So, we will mostly use numpy for performing mathematical computations efficiently.

正如我提到的,我们不会使用任何深度学习库。 因此,我们将主要使用numpy高效地执行数学计算。

The first step in building our neural network will be to initialize the parameters. We need to initialize two parameters for each of the neurons in each layer: 1) Weight and 2) Bias.

建立神经网络的第一步将是初始化参数。 我们需要为每一层中的每个神经元初始化两个参数:1)权和2) 偏向

These weights and biases are declared in vectorized form. That means that instead of initializing weights and biases for each individual neuron in every single layer, we will create a vector (or a matrix) for weights and another one for biases, for each layer.

这些权重和偏差以矢量化形式声明。 这意味着,我们无需为每一层中的每个神经元初始化权重和偏差,而是为每一层创建一个权重向量(或矩阵),并为偏差创建另一个向量。

These weights and bias vectors will be combined with the input to the layer. Then we will apply the sigmoid function over that combination and send that as the input to the next layer.

这些权重偏差矢量将与图层的输入组合。 然后,我们将在该组合上使用Sigmoid函数,并将其作为输入发送到下一层。

layer_dims holds the dimensions of each layer. We will pass these dimensions of layers to the init_parms function which will use them to initialize parameters. These parameters will be stored in a dictionary called params. So in the params dictionary params['W1'] will represent the weight matrix for layer 1.

layer_dims 保持每一层的尺寸。 我们将这些层的尺寸传递给init_parms 该函数将使用它们来初始化参数。 这些参数将存储在名为params的字典中。 因此在params字典中params ['W1'] 将代表第1层的权重矩阵。

def init_params(layer_dims):    np.random.seed(3)    params = {}    L = len(layer_dims)        for l in range(1, L):        params['W'+str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*0.01        params['b'+str(l)] = np.zeros((layer_dims[l], 1))            return params

Great! We have initialized the weights and biases and now we will define the sigmoid function. It will compute the value of the sigmoid function for any given value of Z and will also store this value as a cache. We will store cache values because we need them for implementing backpropagation. The Z here is the linear hypothesis.

大! 我们已经初始化了权重和偏差,现在我们将定义S型函数 。 它将为任何给定的Z值计算S型函数的值,并将该值存储为高速缓存。 我们将存储缓存值,因为我们需要它们来实现反向传播。 这里的Z线性假设

Note that the sigmoid function falls under the class of activation functions in the neural network terminology. The job of an activation function is to shape the output of a neuron.

请注意,S型函数属于神经网络术语中的激活函数类别。 激活功能的作用是塑造神经元的输出。

For example, the sigmoid function takes input with discrete values and gives a value which lies between zero and one. Its purpose is to convert the linear outputs to non-linear outputs. There are different types of activation functions that can be used for better performance but we will stick to sigmoid for the sake of simplicity.

例如,S形函数接收具有离散值的输入,并给出介于零和一之间的值。 其目的是将线性输出转换为非线性输出。 可以使用不同类型的激活函数来获得更好的性能,但是为了简单起见,我们将坚持使用S型。

# Z (linear hypothesis) - Z = W*X + b , # W - weight matrix, b- bias vector, X- Input def sigmoid(Z):	A = 1/(1+np.exp(np.dot(-1, Z)))    cache = (Z)        return A, cache

Now, let's start writing code for forward propagation. We have discussed earlier that forward propagation will take the values from the previous layer and give it as input to the next layer. The function below will take the training data and parameters as inputs and will generate output for one layer and then it will feed that output to the next layer and so on.

现在,让我们开始编写代码以进行正向传播。 前面我们已经讨论过, 前向 传播将采用上一层的值并将其作为下一层的输入。 下面的函数将训练数据参数作为输入,并将生成一层的输出,然后将其输出到下一层,依此类推。

def forward_prop(X, params):        A = X # input to first layer i.e. training data    caches = []    L = len(params)//2    for l in range(1, L+1):        A_prev = A                # Linear Hypothesis        Z = np.dot(params['W'+str(l)], A_prev) + params['b'+str(l)]                 # Storing the linear cache        linear_cache = (A_prev, params['W'+str(l)], params['b'+str(l)])                 # Applying sigmoid on linear hypothesis        A, activation_cache = sigmoid(Z)                  # storing the both linear and activation cache        cache = (linear_cache, activation_cache)        caches.append(cache)        return A, caches

A_prev is input to the first layer. We will loop through all the layers of the network and will compute the linear hypothesis. After that it will take the value of Z (linear hypothesis) and will give it to the sigmoid activation function. Cache values are stored along the way and are accumulated in caches. Finally, the function will return the value generated and the stored cache.

A_prev I S输入到第一层。 我们将遍历网络的所有层并计算线性假设。 之后,它将采用Z值(线性假设)并将其赋予S型激活函数。 缓存值一路存储并累积在缓存中 。 最后,该函数将返回生成的值和存储的缓存。

Let's now define our cost function.

现在让我们定义成本函数。

def cost_function(A, Y):    m = Y.shape[1]        cost = (-1/m)*(np.dot(np.log(A), Y.T) + np.dot(log(1-A), 1-Y.T))         return cost

As the value of the cost function decreases, the performance of our model becomes better. The value of the cost function can be minimized by updating the values of the parameters of each of the layers in the neural network. Algorithms such as Gradient Descent are used to update these values in such a way that the cost function is minimized.

随着成本函数值的降低,我们模型的性能会变得更好。 通过更新神经网络中每个层的参数值,可以使成本函数的值最小化。 使用诸如梯度下降之类的算法以使成本函数最小化的方式来更新这些值。

Gradient Descent updates the values with the help of some updating terms. These updating terms called gradients are calculated using the backpropagation. Gradient values are calculated for each neuron in the network and it represents the change in the final output with respect to the change in the parameters of that particular neuron.

梯度下降借助一些更新术语来更新值。 这些称为梯度的更新项是使用反向传播计算的。 为网络中的每个神经元计算梯度值,它表示相对于该特定神经元的参数变化而言,最终输出的变化。

def one_layer_backward(dA, cache):    linear_cache, activation_cache = cache        Z = activation_cache    dZ = dA*sigmoid(Z)*(1-sigmoid(Z)) # The derivative of the sigmoid function        A_prev, W, b = linear_cache    m = A_prev.shape[1]        dW = (1/m)*np.dot(dZ, A_prev.T)    db = (1/m)*np.sum(dZ, axis=1, keepdims=True)    dA_prev = np.dot(W.T, dZ)        return dA_prev, dW, db

The code above runs the backpropagation step for one single layer. It calculates the gradient values for sigmoid units of one layer using the cache values we stored previously. In the activation cache we have stored the value of Z for that layer. Using this value we will calculate the dZ, which is the derivative of the cost function with respect to the linear output of the given neuron.

上面的代码对一层进行反向传播步骤。 它使用我们先前存储的缓存值来计算一层S型单元的梯度值。 在激活缓存中,我们存储了该层的Z值。 使用该值,我们将计算dZ ,它是成本函数相对于给定神经元的线性输出的导数。

Once we have calculated all of that, we can calculate dW, db and dA_prev, which are the derivatives of cost function with respect the weights, biases and previous activation respectively. I have directly used the formulae in the code. If you are not familiar with calculus then it might seem too complicated at first. But for now think about it as any other math formula.

一旦计算出所有这些,就可以计算dWdbdA_prev,它们是成本函数相对于权重,偏差和先前激活的导数。 我直接在代码中使用了公式。 如果您不熟悉微积分,那么乍一看似乎太复杂了。 但是现在,将其与其他任何数学公式一样考虑。

After that we will use this code to implement backpropagation for the entire neural network. The function backprop implements the code for that. Here, we have created a dictionary for mapping gradients to each layer. We will loop through the model in a backwards direction and compute the gradient.

之后,我们将使用此代码为整个神经网络实现反向传播。 函数backprop为此实现了代码。 在这里,我们创建了一个字典,用于将渐变映射到每个图层。 我们将向后遍历模型并计算梯度。

def backprop(AL, Y, caches):    grads = {}    L = len(caches)    m = AL.shape[1]    Y = Y.reshape(AL.shape)        dAL = -(np.divide(Y, AL) - np.divide(1-Y, 1-AL))        current_cache = caches[L-1]    grads['dA'+str(L-1)], grads['dW'+str(L-1)], grads['db'+str(L-1)] = one_layer_backward(dAL, current_cache)        for l in reversed(range(L-1)):                current_cache = caches[l]        dA_prev_temp, dW_temp, db_temp = one_layer_backward(grads["dA" + str(l+1)], current_cache)        grads["dA" + str(l)] = dA_prev_temp        grads["dW" + str(l + 1)] = dW_temp        grads["db" + str(l + 1)] = db_temp            return grads

Once, we have looped through all the layers and computed the gradients, we will store those values in the grads dictionary and return it.

一次,我们遍历了所有图层并计算了渐变,将这些值存储在grads字典中并返回。

Finally, using these gradient values we will update the parameters for each layer. The function update_parameters goes through all the layers and updates the parameters and returns them.

最后,使用这些梯度值,我们将更新每个图层的参数。 函数update_parameters遍历所有层并更新参数并返回它们。

def update_parameters(parameters, grads, learning_rate):    L = len(parameters) // 2        for l in range(L):        parameters['W'+str(l+1)] = parameters['W'+str(l+1)] -learning_rate*grads['W'+str(l+1)]        parameters['b'+str(l+1)] = parameters['b'+str(l+1)] -  learning_rate*grads['b'+str(l+1)]            return parameters

Finally, it's time to put it all together. We will create a function called train for training our neural network.

最后,是时候将它们放在一起了。 我们将创建一个称为训练的函数来训练我们的神经网络。

def train(X, Y, layer_dims, epochs, lr):    params = init_params(layer_dims)    cost_history = []        for i in range(epochs):        Y_hat, caches = forward_prop(X, params)        cost = cost_function(Y_hat, Y)        cost_history.append(cost)        grads = backprop(Y_hat, Y, caches)                params = update_parameters(params, grads, lr)                    return params, cost_history

This function will go through all the functions step by step for a given number of epochs. After finishing that, it will return the final updated parameters and the cost history. Cost history can be used to evaluate the performance of your network architecture.

对于给定的时期,此功能将逐步执行所有功能。 完成后,它将返回最终更新的参数和成本历史记录。 成本历史记录可用于评估网络体系结构的性能。

结论 (Conclusion)

If you are still reading this, Thanks! This article was a little complicated, so what I suggest you to do is to try playing around with the code. You might get some more insights out of it and maybe you might find some errors in the code too. If that is the case or if you have some questions or both, feel free to hit me up on . I will do my best to help you.

如果您仍在阅读本文,谢谢! 本文有点复杂,所以我建议您尝试尝试使用代码。 您可能会从中获得更多见解,也许您也可能会在代码中发现一些错误。 如果是这种情况,或者您有任何疑问或两者都有,请随时在上打我。 我会尽力为您服务。

资源资源 (Resources)

  • - by 3Blue1Brown

    -3Blue1Brown

  •  - by Michael A. Nielsen

    -Michael A. Nielsen

翻译自:

从头学习计算机网络

转载地址:http://kezzd.baihongyu.com/

你可能感兴趣的文章
MySql cmd下的学习笔记 —— 有关视图的操作(algorithm)
查看>>
gulp教程
查看>>
MySQL常用日期的选择
查看>>
Ubuntu下查看命令的源码
查看>>
锁及锁粒度的详细比喻
查看>>
JS获取终端屏幕、浏览窗口的相关信息
查看>>
长这么漂亮为啥还学编程?什么心态?
查看>>
JQ JS 切换背景图
查看>>
C#WebBrowser控件使用教程与技巧收集
查看>>
Git 命令
查看>>
/x00
查看>>
数据加载中……显示框
查看>>
判断Http请求由手机端发起,还是有电脑端发起
查看>>
ATMEL处理器自带USB CDC的Win7驱动问题
查看>>
gcc 4.8.5安装
查看>>
time模块
查看>>
db2相关问题及解决方法
查看>>
三、CSS样式——背景
查看>>
UVa 12299 RMQ with Shifts(线段树)
查看>>
BZOJ 3224: Tyvj 1728 普通平衡树(BST)
查看>>