Purpose: The previous two notebooks have discussed theoretical and foundational aspects of deep learning models. In particular, what types of architectures and activation functions exist (and, to a lesser extent, how do I choose one). In this notebook our goal will be to actually build, assess, and utilize a deep learning network for image classification.

Data and Modeling

Since you set up TensorFlow in an earlier notebook, let’s load the {tidyverse}, {tensorflow}, {keras}, and {reticulate} libraries and get some data. We’ll use the Fashion MNIST data set. You can learn more about that data set from its official repository here.

library(tidyverse)
library(tensorflow)
library(keras)
library(reticulate)
use_virtualenv("mat434")

c(c(x_train, y_train), c(x_test, y_test)) %<-% keras::dataset_fashion_mnist()

x_train <- x_train/255
x_test <- x_test/255

labels_df <- tibble(label = seq(0, 9, 1),
                    item = c("Tshirt",
                             "Trousers",
                             "Pullover",
                             "Dress",
                             "Coat",
                             "Sandal",
                             "Shirt",
                             "Sneaker",
                             "Bag",
                             "AnkleBoot"))

rotate_img <- function(x){
  return(t(apply(x, 2, rev)))
}

In the code block above, we loaded the Fashion MNIST data, which comes already packaged into training and test sets. We then scaled the pixel densities from integer values (between 0 and 255) to floats. We created a data frame of labels for convenience, since the labels in y_train and y_test are numeric only. Finally, we wrote a function to rotate the matrix of pixel intensities so that the images will be arranged vertically when we plot them – this is important for us humans but of no importance to the neural network we’ll be training.

Let’s take a look at a few items and their labels.

item_num <- 4
image(rotate_img(x_train[item_num, , ]))

labels_df %>%
  filter(label == y_train[item_num])
## # A tibble: 1 × 2
##   label item 
##   <dbl> <chr>
## 1     3 Dress

Okay – I’m having a difficult time identifying these items. Can we train a sequential neural network to learn the classes?

model <- keras_model_sequential(input_shape = c(28, 28)) %>%
  layer_flatten() %>%
  layer_dense(128, activation = "relu") %>%
  layer_dropout(0.2) %>%
  layer_dense(10)

model
## Model: "sequential"
## ________________________________________________________________________________
##  Layer (type)                       Output Shape                    Param #     
## ================================================================================
##  flatten (Flatten)                  (None, 784)                     0           
##  dense_1 (Dense)                    (None, 128)                     100480      
##  dropout (Dropout)                  (None, 128)                     0           
##  dense (Dense)                      (None, 10)                      1290        
## ================================================================================
## Total params: 101770 (397.54 KB)
## Trainable params: 101770 (397.54 KB)
## Non-trainable params: 0 (0.00 Byte)
## ________________________________________________________________________________

We have a model with over 100,000 parameters! Because random weights are initially set for each of these, we can use the model straight “out of the box” for prediction. We shouldn’t expect the network to perform very well though.

predictions <- predict(model, x_train[1:2, , ])
## 1/1 - 0s - 116ms/epoch - 116ms/step
#Predictions as a vector of log-odds
predictions
##           [,1]      [,2]       [,3]       [,4]      [,5]        [,6]      [,7]
## [1,] 0.8857768 0.5447693 -0.1722238 -0.5263715 0.4813388  0.17510255 0.7504405
## [2,] 0.2289805 1.3005552  0.1945747 -0.1561988 0.6584888 -0.05388436 0.2360167
##            [,8]       [,9]    [,10]
## [1,] -0.7732127 -1.3050996 0.599178
## [2,] -0.6014200 -0.9070991 1.621396
#Predictions as class-membership probabilities
tf$nn$softmax(predictions)
## tf.Tensor(
## [[0.1856365  0.13199751 0.06444356 0.04522465 0.12388484 0.09120559
##   0.16213903 0.03533242 0.02075763 0.13937827]
##  [0.07328597 0.21399312 0.07080738 0.04985854 0.11260402 0.05522989
##   0.07380344 0.0319435  0.02353031 0.29494383]], shape=(2, 10), dtype=float64)

Let’s define a loss function so that we can train the model by optimizing the loss.

loss_fn <- loss_sparse_categorical_crossentropy(from_logits = TRUE)
loss_fn(y_train[1:2], predictions)
## tf.Tensor(2.2919748936865756, shape=(), dtype=float64)

Before training, we’ll need to set the optimizer, assign the loss function, and define the performance metric. We’ll then compile the model with these attributes.

model %>%
  compile(
    optimizer = "adam",
    loss = loss_fn,
    metrics = "accuracy"
  )

Note that, unlike most actions in R, the model object is updated here without explicitly overwriting the object. This is because the underlying process is being completed in the Python Environment and then the transformed object is being passed back to R via reticulate.

Since the model has been compiled, we are ready to train it. Again, we won’t have to explicitly overwrite the model since the work is being done in Python and the objects passed back and forth.

model %>% fit(x_train, 
              y_train, 
              epochs = 5)
## Epoch 1/5
## 1875/1875 - 5s - loss: 0.5307 - accuracy: 0.8128 - 5s/epoch - 3ms/step
## Epoch 2/5
## 1875/1875 - 5s - loss: 0.3988 - accuracy: 0.8558 - 5s/epoch - 3ms/step
## Epoch 3/5
## 1875/1875 - 5s - loss: 0.3652 - accuracy: 0.8666 - 5s/epoch - 2ms/step
## Epoch 4/5
## 1875/1875 - 5s - loss: 0.3429 - accuracy: 0.8747 - 5s/epoch - 2ms/step
## Epoch 5/5
## 1875/1875 - 5s - loss: 0.3308 - accuracy: 0.8781 - 5s/epoch - 2ms/step

Now let’s evaluate our model performance.

model %>%
  evaluate(x_test, y_test, verbose = 2)
## 313/313 - 1s - loss: 0.3537 - accuracy: 0.8727 - 861ms/epoch - 3ms/step
##      loss  accuracy 
## 0.3536686 0.8727000

We got 88% accuracy with a pretty vanilla and shallow neural network. There was only one hidden layer here, with 20% dropout. We didn’t tune any model hyperparameters and only trained over 5 epochs. We can see that loss was continuing to decrease and accuracy was continuing to climb from one epoch to the next here.

Since our model has been trained, we can use it to make predictions again.

predictions <- model %>%
  predict(x_test[1:5, , ])
## 1/1 - 0s - 53ms/epoch - 53ms/step
tf$nn$softmax(predictions)
## tf.Tensor(
## [[2.09221329e-06 8.33838061e-07 9.79384697e-08 1.32954037e-07
##   2.26237622e-06 2.55798039e-03 7.94836655e-07 2.35516503e-02
##   1.96899985e-04 9.73687255e-01]
##  [1.49911500e-04 6.05002471e-10 9.85528367e-01 2.09960876e-06
##   1.50449845e-03 1.20536686e-10 1.28150711e-02 6.92770693e-12
##   5.09815078e-08 7.54979788e-10]
##  [1.11819469e-09 9.99999999e-01 3.48303600e-14 1.15882631e-10
##   3.07909090e-11 1.88697026e-17 1.47505442e-11 5.14895254e-20
##   3.99406911e-12 9.20255718e-17]
##  [3.11891263e-09 9.99999925e-01 6.16965867e-12 7.04174525e-08
##   4.94652047e-10 2.53157468e-15 4.62908491e-10 4.14518466e-17
##   2.86259153e-11 5.24266162e-14]
##  [1.80683811e-01 1.33260846e-05 5.05038832e-02 5.35679100e-03
##   2.01916857e-02 7.39189483e-06 7.38544857e-01 7.56533411e-06
##   4.68116116e-03 9.52778394e-06]], shape=(5, 10), dtype=float64)

We can update our model so that it will provide class predictions rather than just the class membership probabilities.

class_model <- keras_model_sequential() %>%
  model() %>%
  layer_activation_softmax() %>%
  layer_lambda(tf$argmax)

class_model %>%
  predict(x_test[1:5, , ])
## 1/1 - 0s - 68ms/epoch - 68ms/step
##  [1] 4 2 1 4 4 0 4 0 4 0

Now that we’ve trained an assessed one neural network, go back and change your model. Add hidden layers to make it a true deep learning model. Experiment with the dropout rate or activation functions. Just remember that you’ll need 10 neurons in your output layer since we have 10 classes and that the activation function used there should remain softmax since we are working on a multiclass classification problem. Everything else (other than the input shape) is fair game to change though!

Summary

In this notebook we installed and used TensorFlow from R to build and assess a shallow learning network to classify clothing items from very pixelated images. The images were \(28\times 28\). We saw that even a “simple” neural network was much better at predicting the class of an item based off of its pixelated image than we are as humans.