Deep Learning: Architectures

Purpose: This notebook covers several common deep learning architectures. In particular, we will see architectures for:

Linear Regression as a Neural Network
Logistic Regression as a Neural Network
Deep Learning Neural Networks
Recurrent Neural Network and Long Short-Term Memory Networks
Convolutional Neural Networks
Autoencoders

We’ll also discuss use-cases for each of these types of network.

About this Notebook

I intend for this topic (and the next one) to be in-class discussions. As such, the prepared notes here will be quite limited.

Generating Toy Data

We’ll generate a toy data set that includes both a numerical and categorical response variable along with three predictors.

x1	x2	x3	num_response	cat_response
2.3081028	7.366398	6.275836	74.77727	0
4.9467855	2.757787	4.292854	28.64206	0
9.2832849	2.346415	6.118673	51.60359	1
1.1983868	7.885522	1.904544	82.59721	0
9.4410272	1.517296	0.182168	52.30417	1
0.9818792	4.189278	3.793543	47.82695	0

Linear Regression as a Neural Network

Linear regression can be achieved as a neural network with no hidden layers. It will have an input layer with a node for each of the available predictors and a single output node. Each of the input nodes will be linked to the output node via a weighted edge whose weight is the linear regression coefficient for that predictor. There will also be a bias which serves as the linear regression intercept.

Notice that the weightings on the edges of the network are the same as the learned coefficients for the corresponding linear regression model. Indeed, we’ll fit the linear regressor using {tidymodels} and examine its coefficients.

term	estimate	std.error	statistic	p.value
(Intercept)	24.9898479	0.3028626	82.5121574	0.0000000
x1	-0.0061934	0.0332745	-0.1861294	0.8523471
x2	4.9890067	0.0331808	150.3582597	0.0000000
x3	0.0094713	0.0330095	0.2869271	0.7741741

Logistic Regression as a Neural Network

Similar to linear regression, logistic regression can be achieved as a neural network. There will be no hidden layers. Each predictor will be assigned a weight (coefficient), and there will be a bias (intercept).

Again, this network is the same as the corresponding logistic regression model. We can verify this by fitting that model using {tidymodels} and viewing the fitted coefficients.

term	estimate	std.error	statistic	p.value
(Intercept)	-0.0331366	0.0704517	-0.4703453	0.6381084
x1	0.2224139	0.0084865	26.2078822	0.0000000
x2	-0.3326046	0.0088907	-37.4104458	0.0000000
x3	-0.0038466	0.0079976	-0.4809730	0.6305357

Here the coefficients of the logistic regression model are similar but not identical the weights in the neural network. There is one additional intricacy here in how the weighted predictor values and bias are converted into a response for a neural network. There is the presence of an activation function which may perform an additional transformation prior to prediction (actually, there are likely to be lots of activation functions throughout the notebook). We’ll discuss activation functions in the next notebook.

Deep Learning Networks

The previous examples of linear regression and logistic regression as networks are useful in gaining some intuition as to what neural networks are, but they aren’t practical. Neural networks typically have at least one hidden layer – that is, a transformation layer between the input layer and the output layer. These layers can take lots of forms and they’ll largely dictate the type of network you are constructing. We’ll discuss types of layers shortly, but for now we’ll simply state that deep learning networks typically have at least two hidden layers.

Below are two neural networks. The first has a single hidden layer with four neurons, while the second network has two hidden layers with four neurons each. Generally these hidden layers may have many neurons (32 or more). Powers of two seem like common choices for the number of neurons in a hidden layer.

The networks above use fully connected (dense) hidden layers. This means that every neuron from the previous layer is connected to every neuron in the current layer. These dense layers are only one type of layer which we have access to when constructing these networks.

Note: The networks shown here have not been optimized. Parameters for these networks have been set so that the network training would be stopped very early in order for the notebook to render relatively quickly.

Special Classes of Network

What has been described up to this point are basic neural networks for use in regression and classification applications. We can build more purposeful architectures with special classes of hidden layer to perform more nuanced tasks. In particular, we may want to build a network that includes recurrent layers for working with time series data or we may want to use convolutional layers for working on image classification problems.

We’ll discuss those specific layer classes and architectures in class.

Summary

We’ve gotten a basic introduction to architectures for deep learning networks in this notebook. Hopefully the discussions here have been helpful for building your intuition regarding what a large-scale neural network may look like.