trainingnns.github.io

View on GitHub

Getting Started with Neural Networks

by Mariia Seleznova, Stefan Kolek and Hector Andrade

This guide is for anyone who wants to learn how to use neural networks but has little to no prior experience and does not know where to start. We will cover basic concepts, as well as programming tools, that will help you to get started. You can find this guide useful if you are a math student, a life scientist or anyone else interested in data analysis.

The guide is organized into sections as in the map above and you can read it in the order that is most convenient for you, as well as skip some sections altogether if you are familiar with concepts covered in them. Here is some advice on how to navigate the guide:

Enjoy!

Table of Contents

  1. Math Background
  2. Neural Network Basics
  3. Command Line Basics
  4. Hardware and OS
  5. Python
  6. Deep Learning Frameworks
  7. Training Neural Nets
  8. Research Experiments with neural nets
  9. Data Analysis
  10. Deep Learning Courses Online
  11. Contact

1. Math Background

Back to table of contents

Areas of math that are most commonly used in neural networks are (more or less in the order of importance):

If you are familiar with basics of these fields, especially with matrices and gradients (e.g. you took Linear Algebra and Multivariable Calculus courses), you can skip this section. Otherwise we recommend that you obtain basic understanding of matrices and gradients.

You can start with videos by 3blue1brown, who created a series on the basics of linear algebra, calculus and multivariable calculus. You can either watch the entire series or just stop watching after feeling comfortable with matrices and gradients.

You can also have a look at the following free online courses from Khan academy:

If you prefer written or more in-depth sources, you can have a look at the following:

If you are a Mathematician you may now think the math in deep learning is not very involved, but this is far from true. Take a look at the Modern Mathematics of Deep Learning paper, which is a great introduction into the exciting and growing field of Mathematics of Deep Learning.

For the next section you should understand why the gradient is pointing in the direction of steepest descent and matrix multiplication.

2. Neural Network Basics

Back to table of contents

You can skip this section if you already know what a neural network is, what a loss function and the backpropagation algroithm is. Before traininig neural networks you should know what a neural network is. The following video series by 3blue1brown provides an excellent intuitive introduction to the basics of neural networks.

https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

3. Command Line Basics

Back to table of contents

To run code on your computer or work with remote machines, you often need to use command line. The command languages (or shells) that are available for you depend on your OS. The most commonly used one is Bash, which is the default for most Linux systems and MacOS prior to 2019. You can start learning about it with this tutorial: https://ubuntu.com/tutorials/command-line-for-beginners#1-overview

Other common shells for Linux or MacOS (e.g. Zsh) are quite similar to Bash and allow easy transition.

On the other hand, Windows uses Command Prompt (cmd), which is rather different from Bash. You can take a look at this tutorial if you have interest on learning cmd: https://www.cs.princeton.edu/courses/archive/spr05/cos126/cmd-prompt.html

4. Hardware and OS

Back to table of contents

The training of neural networks is usually computationally expensive. Modern deep learning frameworks have included the possibility to train the neural network in different architectures and devices, in particular, GPUs. Although the need of a GPU is generally minor when you are training your first experiments, a real-world application will sometimes need more than one GPU to be trained. The advantages of frameworks like pytorch or tensorflow (they will be covered in Section 6) is the compatibility of its code to almost any device. Since the software normally used in deep learning is python, one can train neural networks on any operating system, e.g. Windows, OS X and Linux. We recommend to use Windows or Linux since currently OS X does not offer a stable GPU version.

In order to use GPUs for training you need to have a CUDA-compatible GPU from NVIDIA. If you have Linux Ubuntu, you can follow this post. For Windows 10 users, you can consult this guide.

If you are affiliated to a university or a research center you would normally have access to computing clusters with GPUs. The main tool you need to know to run your training remotely is ssh. If you are interested to have your own Deep Learning Rig, there are affordable ways to build it; this video provides a detailed guide on how to do it. There are also some commercially available pre-built deep learning workstations (high-performance PCs), for example in Germany, there is AIME.

5. Python

Back to table of contents

Python is a very popular programming language! It is fair to say that most of deep learning research and applications require python. The following chart illustrates the rise of popularity of python. (Source: https://stackoverflow.blog/2017/09/06/incredible-growth-python/ )

There is an abundance of tutorials for python. We recommend https://www.python-course.eu/python3_course.php

The main advantage of python is the available optimized libraries for scientific computing, for example, numpy and scipy. For visualization matplotlib is typically used. We recommend anybody to create a local enviroment to install all your libraries wihtout affecting the global system. This can be done using conda, conda is also useful to install GPU-capable libraries. Within conda you can use pip for easy installation of the libraries.

In scientific computing, you typically would also like to interact with the results of your computations and visualize them in real-time. Jupyter Notebook is the best tool to do that in Python. It allows you to visualize and run individual pieces in real-time, which is ideal for prototyping. Jupyter Notebook is also available for remote computation, one can learn to run notebooks remotely following this guide. This also allows you to have a graphical interface on remote servers. Another great visualization tool similar to Jupyter is Colab. As Jupyter, in Colab you can use interactive notebooks to run python code and train neural networks. The big advantage of this tool is that it makes all the computations in the cloud. This also means that you can run your notebook whenever you have internet access. Google also allows Colab users to train and deploy models, both freely and with cost, with GPUs and TPUs.

6. Deep Learning Frameworks

Back to table of contents

There is a great number of python libraries that provide implementations of neural networks, but the most popular ones are Tensorflow and PyTorch:

Both of the libraries allow similar functionality and are well-documented. They are also compatible with a lot of architectures, such as CPU, GPU and TPU. The choice between them either depends on your project’s needs or is just subjective. You can consult some recent blogposts (e.g. this one) to make your choice.

7. Training Neural Nets

Back to table of contents

There are a lot of code examples online, which run the entire pipeline of building and training neural network models. Check out for example these colab notebooks, which you can simply run with no changes to train a neural network on a given dataset:

The models above are specifically designed and tuned to solve a particular problem, which may be different from the application you are interested in. To find a model that is well-suited for your problem, it is always useful to look for more examples online. You can find a great variety of short and simple code examples for different data types (images, video, audio, timeseries, etc.) and different problems (classification, object recognition, denoising, generation, etc.) in the Keras library of code examples. Learning what models are popular in your domain will help you get an idea of what is likely and not likely to work for you. You can even look up some pre-trained models (see e.g. TensorFlow Hub) that are already trained on large datasets specific to a particular domain and are ready to apply without training. {fig:tas However, you may still need to design your own model from scratch, e.g. because your problem is very specific, or you need more control and understanding of the results. In this case, there are numerous choices you have to make while building and training a neural network model. They can be categorized as follows:

Architecture: First you need to choose the very structure of a network. How many layers should it have? What kind of layers in what order? How many neurons/filters should be in each layer? The number of particular architectures published in deep learning research is enormous and it’s impossible to cover all of them. But to understand more complex architectures, it is important to consider at least these basic classes:

Optimization method: There is a number of optimization methods beyond gradient descent that are commonly used in deep learning and you need to choose one of them to train your network. You can find a good survey of optimization methods for deep learning in this blogpost. Often adaptive optimization methods or methods with momentum yield better results than simple gradient descent and the Adam algorithm is a very popular choice.

Hyperparameters: You need to understand and reasonably choose hyperparameters involved in training, such as learning rate, batch size or number of training epochs. To undertand what hyperparameters are in general and what kinds of hyperparameters you will encounter during training of neural networks, you can check this blogpost. The choice (or tuning) of hyperparameters can also be performed algorithmically in practice. This post is an up-to-date study of available hyperparameter tuning algorithms and implementations.

Initialization: Initialization of your weight can make the difference for your network to converge succesfully to good minima. In this article there is a detailed discussion on the commonly used initialization procedures.

Layers: A neural network architecture is defined by its basic components, the layers. The most commonly used layers are for example:

These layers and many variations of them are implemented in the deep learning frameworks that we covered.

8. Research Experiments with Neural Nets

Back to table of contents

Deep learning is a field that has an important empirical side. In order to train a neural network succesfully merely choosing the design is not enough, you would need to make trial-error iterations in order to tune the different elements. You can evaluate the performance of your neural network using different metrics, such as accuracy and mean square error. Software like tensorboard allows you to monitor the performance of different runs simultaneously. Tools like Keras Tuner in tensorflow and Ray Tune in pytorch are handy for optimization and fine-tuning of different hyperparameters of your architecture.

9. Data Analysis and Image Processing

Back to table of contents

Data Analysis and Image processing tools are also very handy in the training of neural networks. In python the main library used for data processing and analysis is pandas, inspired by the statistical programming language R. For image processing we recommend the library sci-kit image which contains plenty of image filtering, resizing, cropping, rotating, etc… algorithms.

10. Deep Learning Courses Online

Back to table of contents

Much work has gone into several online deep learning courses. We recommend to check out the following two sites:

11. Contact

Back to table of contents

If you have comments, questions or cool ideas regarding the content of this webpage please feel free to contact the authors: