Build Your Own Convolutional Neural Network

Coral reefs are among the most diverse and valuable ecosystems on Earth, providing habitat for 25% of all marine species and supporting the livelihoods of over half a billion people worldwide. However, these ecosystems face unprecedented threats from climate change, ocean acidification, and other human activities, with many species now endangered.

We are given a dataset containing images of three different coral species:

Acropora cervicornis (Staghorn Coral)
Colpophyllia natans (Boulder Brain Coral)
Montastraea cavernosa (Greater Star Coral)

Our task is to build a Convolutional Neural Network (CNN) that can classify the coral images into the correct species. This technology can help automate coral reef monitoring efforts and support conservation initiatives by enabling rapid, large-scale species identification.

By the end of this exercise, you should be able to:

Import, explore, organize, visualize, and pre-process input image data
Build and train a CNN from scratch
Build and train a CNN based on the VGG19 architecture (transfer learning)
Use callbacks to optimize the model performance during training
Assess model accuracy numerically (e.g. precision, recall, F1-score) and visually (e.g. confusion matrix)

Note

In this section, long blocks of Python code are shown with line numbers on the left-hand side. The Python code blocks can be run in a Jupyter Notebook, interactive Python interpreter, or in a stand-alone Python script:

print('Hello, world!')
# Python code here

Linux commands are shown with a host name in square brackets followed by a dollar sign. Linux commands should be entered into a Terminal:

[frontera]$ echo "Hello, world!"
Hello, world!
[frontera]$ # Linux commands here

Tutorial Setup and Materials

All materials and instructions for running this tutorial in the TACC Analysis Portal are available in our GitHub repository: TACC Deep Learning Tutorials.

Part 1: Building a CNN Model from Scratch

Step 0: Check GPU Availability and TensorFlow Version

Before training deep learning models, it’s important to check whether TensorFlow can access the GPU on your machine. Training on a GPU is significantly faster than on a CPU, especially for large image datasets.

If you’ve followed the setup instructions from Introduction to Systems and Data: Set up for Tutorials, you should now be running this notebook inside a containerized Jupyer kernel that includes:

TensorFlow v. 2.13.0 with GPU support

CUDA libraries compatible with the system

All required Python packages pre-installed

This cell will confirm that your environment is correctly configured (TIP: Make sure you change your kernel to Day3-tf-213).

import tensorflow as tf

# Check if TensorFlow can detect a GPU
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# Print TensorFlow version
print(tf.__version__)

# Set random seed for reproducibility
tf.random.set_seed(123)

You should see the following output:

Num GPUs Available:  4
2.13.0

Step 1: Data Loading and Organization

In this step, we load all coral images from the dataset directory and organize them into a DataFrame. Each image is assigned a label based on the name of the directory it’s stored in (i.e., ‘ACER’ - Acropora cervicornis, ‘CNAT’ - Colpophyllia natans, ‘MCAV’ - Montastraea cavernosa).

This DataFrame will serve as the foundation for splitting our data into training, validation, and test sets later in the tutorial.

1.1 List Dataset Directory Contents

Before loading the images, we first want to inspect the directory structure to make sure everything is in the right place.

Finding Your SCRATCH Directory Path:

On TACC systems, your scratch directory is a temporary storage space for computational work. To find the path to your scratch directory:

After logging into Frontera, run this command to see your SCRATCH path:

[frontera]$ echo $SCRATCH
# This will output something like: /scratch1/12345/username

Verify that the coral-species dataset is in the correct location:

[frontera]$ ls $SCRATCH/tacc-deep-learning-tutorials/data/coral-species
# You should see three directories: ACER, CNAT, and MCAV

Use the full path shown by these commands in the code below.

Now that you know your SCRATCH path, let’s list the contents of the coral-species data directory to verify that the subdirectories for each coral species are present and correctly named:

from pathlib import Path

# Define the path to the dataset directory
# NOTE: Replace the path below with the full path to your scratch directory containing the training materials
dataset_dir = Path('/scratch1/12345/username/tacc-deep-learning-tutorials/data/coral-species')

# List the contents of the data directory
print(list(dataset_dir.iterdir()))

You should see something like this:

[PosixPath('/scratch1/12345/username/tacc-deep-learning-tutorials/data/coral-species/CNAT'), PosixPath('/scratch1/12345/username/tacc-deep-learning-tutorials/data/coral-species/MCAV'), PosixPath('/scratch1/12345/username/tacc-deep-learning-tutorials/data/coral-species/ACER')]

1.2 Check File Extensions

Next, we scan the dataset directory and all its subdirectories to find out what types of image files are present. This helps us catch unexpected or unsupported file types (e.g., GIFs, txt files, etc.), which could cause problems later when loading images.

This also allows us to see if the images are all in the same format or not.

# Recursively list all files under the dataset directory
image_files = list(dataset_dir.rglob("*"))

# Extract and print the unique file extensions
# This helps us confirm that only valid image files are present
extensions = set(p.suffix.lower() for p in image_files if p.is_file())
print("File extensions found:", extensions)

Question: What file extensions are present in the dataset? Write down your answer.

1.3 Explore Image Dimensions and Color Modes

Before feeding images into a CNN, it’s important to understand the basic properties of the dataset. In this step, we examine the dimensions (width x height) as well as the color mode (e.g., RGB, RGBA, grayscale) of each image. This helps us decide if we need to resize or convert images before we begin training our CNN.

The script below prints a summary and gives recommendations if inconsistencies are found.

from PIL import Image
from pathlib import Path
from collections import Counter

def explore_image_dataset(dataset_dir):
    """
    Explore basic properties of images: size and color mode.
    """
    print("Starting image dataset exploration...\n")

    # Gather all .jpg files in the dataset
    image_files = list(Path(dataset_dir).rglob('*.jpg'))
    print(f"Found {len(image_files)} image files\n")

    # Track sizes and color modes
    image_sizes = []
    color_modes = []

    print("Checking image dimensions and color modes...\n")
    for img_path in image_files:
        with Image.open(img_path) as img:
            image_sizes.append(img.size)
            color_modes.append(img.mode)

    # Summarize image sizes
    size_counts = Counter(image_sizes)
    print("=== Image Sizes ===")
    print(f"Found {len(size_counts)} unique image sizes:")
    for size, count in size_counts.most_common():
        print(f"- {size}: {count} images")

    # Summarize color modes
    mode_counts = Counter(color_modes)
    print("\n=== Color Modes ===")
    print(f"Found {len(mode_counts)} unique color modes:")
    for mode, count in mode_counts.most_common():
        print(f"- {mode}: {count} images")

    # Simple recommendations
    print("\n=== Recommendations ===")
    if len(size_counts) > 1:
        print(f"Images have different sizes. Consider resizing.")
    else:
        print("All images are the same size.")

    if len(mode_counts) > 1:
        print("Images have different color modes. Consider converting to RGB.")
    else:
        print("All images share the same color mode.")

# Run the function
explore_image_dataset(dataset_dir)

Our dataset analysis reveals some important characteristics that we’ll need to keep in mind as we proceed with the tutorial:

Image Size Variation: We have 417 total images in our dataset, with 63 different image sizes (dimensions). Also notice that some images are in portrait orientation (height > width) while others are landscape (width > height). CNNs expect all images to have the same dimensions, so we’ll need to resize them to a standard size before training our model.

Color Mode: All images share the same color mode. Great!

We will address these issues in Step 4 when we prepare our data for input into the CNN.

1.4 Check for Corrupted Images

Before continuing, we want to make sure that all images files are readable. Corrupted files can break your model training or cause unexpected errors during preprocessing.

In this step, we:

Attempt to open each ‘.jpg’ file using PIL

Discard any files that fail to load

This ensures we only keep clean, valid images for training.

from PIL import Image
from tqdm import tqdm

# Find all .jpg files in the dataset
# NOTE: add the correct file extension(s) for your image dataset in the space indicated below
# TIP: see Step 1.2
image_paths = list(dataset_dir.rglob('*.___'))

# Create lists to store valid and corrupted files
valid_images = []
bad_images = []

print("Checking for corrupted images...\n")

# tqdm adds a progress bar to show how long the process will take
for path in tqdm(image_paths):
    try:
        # Try to open and verify the image
        with Image.open(path) as img:
            img.verify()
        # If the image is valid, add it to valid_images
        valid_images.append(path)

    except Exception:
        # If any error occurs while opening/verifying the image, add it to bad_images
        bad_images.append(path)

print(f"Valid images: {len(valid_images)}")
print(f"Corrupted images removed: {len(bad_images)}")

If there are any corrupted images in your dataset, this code will automatically remove them.

1.5 Create a DataFrame of Image Paths and Labels

Now that we have taken a peek at the format of our data and have removed any corrupted images, we can start setting up our data for training. In this step, we build a pandas.DataFrame that organizes all the image data into two columns:

filepath: The full path to each image file

label: The class label for each image, taken from the directory name

This structured DataFrame is essential for training with Keras’ flow_from_dataframe method that we’ll use later in the tutorial.

import pandas as pd

# Set pandas to display full column content (no truncation)
pd.set_option('display.max_colwidth', None)

# Build (filepath, label) pairs from valid image paths
data = []
for path in valid_images:
    label = path.parent.name # Extract label from directory name
    data.append((str(path), label))

# Create a DataFrame with columns for filepath and label
df = pd.DataFrame(data, columns=["filepath", "label"])

# Show a preview of the DataFrame
df.head()

Step 2: Visualize the Data

2.1 Visualize the Class Distribution

Before training our CNN, it’s important to understand how many images we have for each class (i.e., coral species in this case).

In this step we:

Count how many images belong to each class

Plot the class distribution as a pie chart and bar graph

If the dataset is imbalanced (i.e., some classes have far more images than others), we may need to account for this later using class weights.

import matplotlib.pyplot as plt

# Count class distribution (counts how many times each unique value appears in the 'label' column of your DataFrame)
counts = df['label'].value_counts()

# Create a figure with two plots side-by-side (1-row, 2-columns; 12 inches wide, 5 inches tall)
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Define a color palette
colors = ['#8158ff', '#ff9423', '#7fcdbb']

# Create a Pie chart in the first plot position (axes[0])
## counts.values: The number of images for each class
## counts.index: The class labels (e.g., 'ACER', 'CNAT', 'MCAV')
## autopct='%1.1f%%': Display the percentage of images for each class
## colors: The colors to use for each class (defined earlier)
axes[0].pie(counts.values, labels=counts.index, autopct='%1.1f%%', colors=colors)
axes[0].set_title('Class Distribution (Percentage)')

# Creates a Bar chart in the second plot position (axes[1])
axes[1].bar(counts.index, counts.values, color=colors)
axes[1].set_title('Class Distribution (Values)')
axes[1].set_ylabel('Number of Images')

# Display the figure with both charts
plt.show()

# Print label counts and percentages
for label, count in counts.items():
    print(f"{label}: {count} images ({count/len(df)*100:1.2f}%)")

Thought Challenge: Describe the class distribution in your own words. How much of the dataset is made up by the largest class? The smallest class? Is there anything that we need to address before continuing?

2.2 Visualize Images from the Dataset

It’s helpful to look at a few images from each class to get a better understanding of the dataset. This will give us a better sense of:

What each coral species looks like
How much visual variation exists within each class (e.g., different angles, lighting, etc.)
Whether the dataset includes noise, blur, or other artifacts

We’ll display a grid of randomly selected images, grouped by class.

from tensorflow.keras.preprocessing.image import load_img
import random

# Set the number of images to display per class
samples_per_class = 3

# Get list of unique coral species names (classes)
classes = df['label'].unique()

# Create a figure with appropriate size
# The height (2.5 * len(classes)) ensures enough space for all images
plt.figure(figsize=(12, len(classes) * 2.5))

# Loop through each class to create a grid of images
for i, label in enumerate(sorted(classes)):
    # Filter DataFrame to get only images from the current class
    class_df = df[df['label'] == label]

    # Randomly select 3 images from the current class
    sample_paths = random.sample(list(class_df['filepath']), samples_per_class)

    # Create subplot for each image
    for j, img_path in enumerate(sample_paths):

        # Calculate position in grid: (row * width) + column + 1
        plt.subplot(len(classes), samples_per_class, i * samples_per_class + j + 1)

        # Load and display the image
        img = load_img(img_path)        # Load the image
        plt.imshow(img)                 # Display the image
        plt.title(label)                # Add species name as title
        plt.axis('off')

plt.show()

Remember: the quality of a machine learning model is decided largely by the quality of the dataset it was trained on!

Step 3: Split the Dataset and Handle Class Imbalance

3.1 Split the Dataset into Training, Validation, and Test Sets

We are now ready to split our labeled image dataset into three parts:

Training Set: Used to train the model

Validation Set: Used to tune hyperparameters and monitor model performance during training

Test Set: Used to evaluate the final model’s performance after training is complete

We will use the train_test_split function from sklearn in two stages:

First, we split the original dataset into training + test sets

Then, we split the training set again into training + validation

This approach ensures that our CNN never sees the test set during training, which is important for obtaining an unbiased estimate of the model’s performance.

To preserve the class distribution across splits, we use stratify=df["label"] to ensure each split has the same proportion of each class as in the original dataset. This is called stratified sampling.

# NOTE: Replace the spaces indicated below with your code
from sklearn.model_selection import ____

# First, split the original dataset into training + test sets
train_df, test_df = train_test_split(
    df,                            # This is our DataFrame from step 1.5
    test_size=____,                # Keep 20% of the data in the test set
    stratify=df["label"],          # Ensure each split maintains original class distribution
    random_state=123               # Set random seed for reproducibility
)

# Then, split the training set into training + validation sets
____, ____ = train_test_split(
    ____,                          # What goes here?
    test_size=____,                # Keep 20% of the training data in the validation set
    stratify=____,                 # Ensure each split maintains original class distribution
    random_state=123               # Set random seed for reproducibility
)

# Print split sizes
total = len(df)
print(f"\nDataset splits:")
print(f"Train: {len(train_df)} images ({len(train_df)/total:.2%})")
print(f"Validation: {len(val_df)} images ({len(val_df)/total:.2%})")
print(f"Test: {len(test_df)} images ({len(test_df)/total:.2%})")

3.2 Compute Class Weights

If our dataset is imbalanced (i.e., some classes have many more images than others), the model may learn to favor those majority classes. To address this, we can compute class weights based on the training data using the compute_class_weight function from sklearn.

These weights:

Assign higher importance to underrepresented classes

Are passed into model.fit() using the class_weight argument

Adjust how the loss is calculated during training

This technique helps the model give balanced attention to all classes during training.

While our dataset is quite balanced, we provide the code for computing class weights below:

from sklearn.utils.class_weight import compute_class_weight
import numpy as np

# Get unique class labels
class_labels = np.unique(train_df['label'])

# Compute class weights based on training labels
class_weights = compute_class_weight(
    class_weight='balanced',
    classes=class_labels,
    y=train_df['label']
)

# Convert to a dictionary: {index: weight}
class_weight_dict = dict(zip(range(len(class_labels)), class_weights))

# Preview the result
print("Computed class weights:")
for index, weight in class_weight_dict.items():
    print(f"{index}: {weight:.2f}")

Computed class weights:
1.02
1.08
0.91

In the above output, 0 corresponds to ACER, 1 corresponds to CNAT, and 2 corresponds to MCAV. The class weights are inversely proportional to the number of samples in each class: classes with fewer samples get higher weights to compensate for their lower representation in the training data.

We need to convert the string labels (like ACER, CNAT, and MCAV) to integers (0, 1, 2) because the model expects numeric class indices. The class_weight_dict is a dictionary that maps each class index to its corresponding weight.

Thought Challenge: Look back at the pie chart and bar chart that we generated above. Do the class weights make sense? Why or why not?

The class weights make sense because the class with the fewest samples (CNAT) has the highest weight (1.08), while the class with the most samples (MCAV) has the lowest weight (0.91). This means that the model will pay more attention to the CNAT class during training, which has fewer samples.

Step 4: Image Preprocessing and Data Generators

As we discovered in Step 1.3, we need to prepare our images before feeding them into the CNN. This step involves two key concepts: Data Generators and Data Augmentation.

Data generators are special tools that help us efficiently load and preprocess image data in small batches (instead of all at once). Keras provides a built-in data generator called ImageDataGenerator that can:

Resize all images to a consistent size

Normalize pixel values (e.g., from [0-255] to [0-1])

Augment the training data with random transformations to improve generalization

Data generators can be used with Keras model methods like fit(), evaluate(), and predict(), which is particularly useful when dealing with large datasets that don’t all fit into memory at once.

Data augmentation is a powerful technique that helps our model learn more robust features by creating variations of our training images. Augmentation techniques not only expand the size of our training set, but also help prevent overfitting by exposing our model to different variations of our images.

Conveniently, ImageDataGenerator also provides a number of built-in augmentation techniques that we can use to augment our training data, such as:

Random rotations

Zooming in or out

Shifting the image left or right

Flipping the image horizontally

Each of these modifications creates a new, slightly different version of our training images, helping our model learn to recognize the same features in different orientations.

4.1 Define Image Preprocessing and Augmentation

We will define three separate ImageDataGenerator objects, one for each dataset split (train, val, test):

train_datagen will apply both normalization and augmentation to the training data

val_datagen and test_datagen will only apply normalization (no augmentation)

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define training data generator
train_datagen = ImageDataGenerator(
    rescale=1./255,             # Normalize pixel values to [0, 1]
    rotation_range=30,          # Augment: Random rotation
    width_shift_range=0.2,      # Augment: Random horizontal shift
    height_shift_range=0.2,     # Augment: Random vertical shift
    zoom_range=0.2,             # Augment: Random zoom
    horizontal_flip=True,       # Augment: Random horizontal flip
    fill_mode='nearest'         # Augment: After random transformations, fill in missing pixels with nearest neighbor
)

# Validation and test data generators only need normalization – do not augment
val_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

4.2 Load Images Using `flow_from_dataframe()`

Now that our preprocessing methods are defined, we can use flow_from_dataframe() to load images in batches directly from our labeled Dataframes (train_df, val_df, and test_df).

All generators return batches of preprocessed image tensors and their corresponding labels.

# Set image size and batch size
IMAGE_SIZE = (224, 224)
BATCH_SIZE = 32

# Training generator
train_generator = train_datagen.flow_from_dataframe(
    dataframe=train_df,         # Our training DataFrame
    x_col='filepath',           # Column containing image paths
    y_col='label',              # Column containing labels
    target_size=IMAGE_SIZE,     # Resize images to this size
    batch_size=BATCH_SIZE,      # Number of images per batch
    class_mode='categorical',   # One-hot encode the labels
    color_mode='rgb',           # Use RGB color channels
    shuffle=True,                # Randomize order of images
    seed=123                    # Set random seed for reproducibility
)

# Validation generator
val_generator = val_datagen.flow_from_dataframe(
    # ... same parameters as above ...
    shuffle=False,               # Keep original order for validation
    seed=123                    # Set random seed for reproducibility
)

# Test generator
test_generator = test_datagen.flow_from_dataframe(
    # ... same parameters as above ...
    shuffle=False,               # Keep original order for testing
    seed=123                    # Set random seed for reproducibility
)

Sanity Check: Inspect a Batch from the Training Generator

Let’s inspect the output of the train_generator to make sure it’s working as expected.

In the code below, we:

Retrieve one batch of images and labels from the training generator

Check the shape of the batch

Display a few image-label pairs to confirm the generator is working

# Get one batch from the training generator
images, labels = next(train_generator)

# Check the shape of the batch
print("Image batch shape:", images.shape)    # Should be (BATCH_SIZE, height, width, channels)
print("Label batch shape:", labels.shape)    # Should be (BATCH_SIZE, num_classes)

# Preview the first 5 label vectors
print("\nFirst 5 labels (one-hot encoded):")
print(labels[:5])

Visualize a Few Images from the Training Generator

Let’s display a few images from the training geneator along with their decoded class labels.

# Get a fresh batch of images
images, labels = next(train_generator)

# Display 6 images in a grid
plt.figure(figsize=(12, 6))

# Show each image
for i in range(6):
    plt.subplot(2, 3, i + 1)

    # Get the species name
    species_names = list(train_generator.class_indices.keys())
    species = species_names[np.argmax(labels[i])]

    # Show the image
    plt.imshow(images[i])
    plt.title(f"Species: {species}")
    plt.axis("off")

plt.show()

Thought Challenge: Look carefully at the images displayed above. Try running the code cell multiple times and changing the code to display images from the validation and test generators. What do you notice about the images that you didn’t see before (in Step 3)? Do you notice any differences in the images each time you run the cell? Think about why this might be happening.

Step 5: Define Your CNN Model Architecture

Congratulations! Our data is now ready to be used to train a Convolutional Neural Network to classify our coral images.

In this step, we will define the architecture of our CNN model. Below, we define a model that consists of three main parts:

Convolutional Blocks (Feature Extraction):

Block 1: 32 filters (3x3 kernels), followed by Average Pooling

Block 2: 64 filters (3x3 kernels), followed by Average Pooling

Block 3: 128 filters (3x3 kernels), followed by Average Pooling

Each block increases the number of filters, allowing the model to learn increasingly complex features.

Flatten Layer: Converts the 3D feature maps into a 1D vector for the dense layers

Dense Layers (Classification):

First dense layer: 128 perceptrons

Second dense layer: 64 perceptrons

Output layer: How many perceptrons should our output layer have? Which activation function should we use?

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Input, Conv2D, AveragePooling2D, Flatten, Dense

# Build a custom CNN architecture
cnn_model = Sequential([
    # Input layer: matches image shape (height, width, channels)
    Input(shape=(___, ___, __)),

    # Convolution Block 1
    Conv2D(32, (3, 3), padding='same', activation='relu'),
    AveragePooling2D((2, 2), padding='same'),

    # Convolution Block 2
    # ...
    # ...

    # Convolution Block 3
    # ...
    # ...

    # Flatten to convert 2D feature maps into a 1D vector
    Flatten(),

    # Fully connected layers
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(___, activation='___')
])

Once you have filled in the blanks and defined your model, let’s compile it:

from tensorflow.keras.optimizers import RMSprop

cnn_model.compile(
    optimizer=RMSprop(learning_rate=1e-4),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

In the code above, we use the RMSprop optimizer, which adapts the learning rate based on recent gradients, and is a popular choice for image classification tasks. We also set the learning rate to 1e-4, which sets the initial learning rate for the optimizer.

Note: While these are good starting choices, you might want to experiment with different optimizers or learning rates based on your model’s performance.

Finally, let’s display our model architecture and parameter count:

cnn_model.summary()

Model: "sequential"

+--------------------------------+----------------------+-------------+
| Layer (type)                   | Output Shape         | Param #     |
+================================+======================+=============+
| conv2d (Conv2D)                | (None, 224, 224, 32) | 896         |
+--------------------------------+----------------------+-------------+
| average_pooling2d              | (None, 112, 112, 32) | 0           |
| (AveragePooling2D)             |                      |             |
+--------------------------------+----------------------+-------------+
| conv2d_1 (Conv2D)              | (None, 112, 112, 64) | 18,496      |
+--------------------------------+----------------------+-------------+
| average_pooling2d_1            | (None, 56, 56, 64)   | 0           |
| (AveragePooling2D)             |                      |             |
+--------------------------------+----------------------+-------------+
| conv2d_2 (Conv2D)              | (None, 56, 56, 128)  | 73,856      |
+--------------------------------+----------------------+-------------+
| average_pooling2d_2            | (None, 28, 28, 128)  | 0           |
| (AveragePooling2D)             |                      |             |
+--------------------------------+----------------------+-------------+
| flatten (Flatten)              | (None, 100352)       | 0           |
+--------------------------------+----------------------+-------------+
| dense (Dense)                  | (None, 128)          | 12,845,184  |
+--------------------------------+----------------------+-------------+
| dense_1 (Dense)                | (None, 64)           | 8,256       |
+--------------------------------+----------------------+-------------+
| dense_2 (Dense)                | (None, 3)            | 195         |
+--------------------------------+----------------------+-------------+

Total params: 12,946,883 (49.39 MB)

Trainable params: 12,946,883 (49.39 MB)

Non-trainable params: 0 (0.00 B)

Thought Challenge: Describe our CNN model architecture layer-by-layer.

First Convolutional Block
- Input: 224 x 224 RGB images
- conv2d: Creates 32 feature maps using 3x3 kernels -> Output shape maintains input size due to padding (224, 224, 32)
- average_pooling2d: Reduces spatial dimensions by half -> Output shape (112, 112, 32)
Second Convolutional Block
- conv2d_1: Creates 64 feature maps using 3x3 kernels -> Output shape maintains input size due to padding (112, 112, 64)
- average_pooling2d_1: Reduces spatial dimensions by half -> Output shape (56, 56, 64)
Third Convolutional Block
- conv2d_2: Creates 128 feature maps using 3x3 kernels -> Output shape maintains input size due to padding (56, 56, 128)
- average_pooling2d_2: Reduces spatial dimensions by half -> Output shape (28, 28, 128)
Classification Layers
- flatten: Converts 3D feature maps into a 1D vector -> Output shape (100352)
- dense: First dense layer with 128 perceptrons
- dense_1: Second dense layer with 64 perceptrons
- dense_2: Output layer with 3 perceptrons (one for each coral species)

Calculating Parameters in CNNs

For a convolutional layer, the number of parameters is determined by the size of the kernels, the number of input channels, and the number of kernels:

\[\text{Parameters} = (\text{Kernel width} \times \text{Kernel height} \times \text{Input channels} + 1) \times \text{Kernels}\]

Kernel width x Kernel height: The size of each kernel (e.g., a 3x3 kernel has 9 weights per input channel)
Input channels: The number of channels in the input image (e.g., 3 for RGB images)
+1: Each kernel also has 1 bias term.
kernels: The number of kernels you want to use. This is how many output channels you’ll get from this layer. Each kernel is applied independently and learns different features.

Thought Challenge: What is the formula for calculating the number of parameters in a dense layer? Can you correctly calculate the total number of parameters in our model? Write down each step of your calculation.

Convolutional Layers

First Conv2D:

3x3 kernel, 3 input channels (RGB), 32 kernels

(3 x 3 x 3 + 1) x 32 = 896 parameters

Second Conv2D:

3x3 kernel, 32 input channels, 64 kernels

(3 x 3 x 32 + 1) x 64 = 18,496 parameters

Third Conv2D:

3x3 kernel, 64 input channels, 128 kernels

(3 x 3 x 64 + 1) x 128 = 73,856 parameters

Dense Layers

Formula: (Input units x Output units) + Output units

First Dense:

100352 Input units (flattened), 128 Output units (one per perceptron)

(100352 x 128) + 128 = 12,845,184 parameters

Second Dense:

128 Input units, 64 Output units

(128 x 64) + 64 = 8,256 parameters

Output Dense:

64 Input units, 3 Output units

(64 x 3) + 3 = 195 parameters

Step 6: Train the CNN Model

Now that our CNN architecture is defined, we can train the model using the fit() method.

During training, the model will learn patterns in the training data and adjust its parameters to minimize the loss function. After each epoch, the model’s performance is evaluated on the validation set.

Here, we will also pass in class_weight to demonstrate how to handle imbalanced data.

We also track the training history, which we’ll use later to visualize performance over time.

cnn_history = cnn_model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=15,
    class_weight=class_weight_dict # Computed in Step 3.2
)

Example output:

Epoch 1/15
9/9 [==============================] - 5s 391ms/step - loss: 1.1136 - accuracy: 0.3195 - val_loss: 1.0908 - val_accuracy: 0.3731
Epoch 2/15
9/9 [==============================] - 4s 468ms/step - loss: 1.1017 - accuracy: 0.2932 - val_loss: 1.0911 - val_accuracy: 0.3582
Epoch 3/15
9/9 [==============================] - 4s 464ms/step - loss: 1.0936 - accuracy: 0.4023 - val_loss: 1.0823 - val_accuracy: 0.4627
Epoch 4/15
9/9 [==============================] - 4s 468ms/step - loss: 1.0937 - accuracy: 0.3534 - val_loss: 1.0762 - val_accuracy: 0.4328
Epoch 5/15
9/9 [==============================] - 4s 464ms/step - loss: 1.0869 - accuracy: 0.3985 - val_loss: 1.0671 - val_accuracy: 0.4627
...
Epoch 15/15
9/9 [==============================] - 4s 472ms/step - loss: 1.0374 - accuracy: 0.4662 - val_loss: 0.9806 - val_accuracy: 0.5672

Visualizing Training History

After training the model, we can visualize the accuracy and loss over time to better understand how the model is learning. These plots can help us identify overfitting, underfitting, or confirm that the model is learning as expected.

We use the cnn_history object returned by the fit() method to plot the training and validation accuracy and loss:

def plot_training_history(history, title_prefix="CNN"):
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']
    loss = history.history['loss']
    val_loss = history.history['val_loss']
    epochs = range(1, len(acc) + 1)

    # Set color palette
    training_color = '#fc8d59'
    validation_color = '#91bfdb'

    # Plot accuracy
    plt.figure(figsize=(14, 5))
    plt.subplot(1, 2, 1)
    plt.plot(epochs, acc, color=training_color, linestyle='-', marker='o',
            label='Training Accuracy', linewidth=2)
    plt.plot(epochs, val_acc, color=validation_color, linestyle='-', marker='s',
            label='Validation Accuracy', linewidth=2)
    plt.title(f'{title_prefix} Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True, alpha=0.3)

    # Plot loss
    plt.subplot(1, 2, 2)
    plt.plot(epochs, loss, color=training_color, linestyle='-', marker='o',
            label='Training Loss', linewidth=2)
    plt.plot(epochs, val_loss, color=validation_color, linestyle='-', marker='s',
            label='Validation Loss', linewidth=2)
    plt.title(f'{title_prefix} Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

# Call the plotting function
plot_training_history(cnn_history)

The plots above show the training and validation accuracy/loss over 15 epochs.

Thought Challenge: What do you notice about the training and validation accuracy and loss? What does this tell you about the model’s learning performance (i.e. overfitting, underfitting, healthy learning)?

Step 7: Evaluate the Model on the Test Set

Now that we’ve trained our model, it’s time to evaluate its performance on the test set. This step is crucial because it helps us understand how well the model generalizes to new, unseen data, which is a good indicator of its real-world performance.

Evaluate Test Accuracy and Loss

We use model.evaluate() to calculate the test accuracy and loss. These metrics give us a quick overview of the model’s performance.

# Evaluate test accuracy and loss
test_loss, test_acc = cnn_model.evaluate(test_generator, verbose=0)
print(f"Test Accuracy: {test_acc:.2%}")
print(f"Test Loss: {test_loss:.4f}")

Example output:

Test Accuracy: 44.05%
Test Loss: 1.0581

Our model correctly classifies the test images about 35% of the time, and our loss is still quite high. While these numbers provide a snapshot of performance, they don’t tell the whole story. Let’s dig deeper with a confusion matrix.

Visualize Predictions with a Confusion Matrix

A confusion matrix provides a detailed breakdown of the model’s predictions compared to the true labels. It helps identify which classes are being confused with each other.

from sklearn.metrics import confusion_matrix
import seaborn as sns

# Get predicted probabilities for each class
pred_probs = cnn_model.predict(test_generator)

# Convert to predicted class labels
y_pred = np.argmax(pred_probs, axis=1)

# Get true labels
y_true = test_generator.classes

# Create confusion matrix
cm = confusion_matrix(y_true, y_pred)

# Map class indices back to names
class_names = list(test_generator.class_indices.keys())

# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=class_names,
            yticklabels=class_names)
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.tight_layout()
plt.show()

Detailed Performance with a Classification Report

The classification report provides precision, recall, and F1-scores for each class, offering a more nuanced view of model performance.

from sklearn.metrics import classification_report

# Print classification report
print("Classification Report:")
print(classification_report(y_true, y_pred, target_names=class_names))

Example output:

Classification Report:
              precision    recall  f1-score   support

        ACER       0.50      0.52      0.51        27
        CNAT       0.42      0.50      0.46        26
        MCAV       0.40      0.32      0.36        31

    accuracy                           0.44        84
   macro avg       0.44      0.45      0.44        84
weighted avg       0.44      0.44      0.44        84

Click below to see a brief explanation of the metrics in the classification report.

Precision: The ratio of correctly predicted positive observations to the total predicted positives.

Formula: \(\frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}\)

Interpretation: High precision indicates a low false positive rate, which is useful when the cost of false positives is high.

Recall: The ratio of correctly predicted positive observations to all actual positives.

Formula: \(\frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}\)

Interpretation: High recall indicates a low false negative rate, which is useful when the cost of false negatives is high.

F1-score: The weighted average of precision and recall. It considers both false positives and false negatives.

Formula: \(2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\)

Interpretation: The F1-score is useful when you need to balance precision and recall. It provides a single score that considers both false positives and false negatives.

Support: The number of actual occurrences of the class in the test data.

Thought Challenge: Critically assess the performance of our model based on the accuracy/loss values, confusion matrix, and classification report. Are there any classes that the model is particularly good or bad at predicting? Think about the data and why the model might be performing better or worse for certain classes.

Part 2: Transfer Learning with VGG19

In this section, we apply a technique called transfer learning to improve model performance on our coral species classification task.

Transfer learning is a deep learning technique where we reuse a model that has already been trained on a large dataset for a different but related task. Instead of starting from scratch, we “transfer” the knowledge learned by the pre-trained model to our new task.

This is especially useful when you have a limited dataset, you want to train a model faster, or you want to achieve better accuracy with less computational effort.

We will use the VGG19 model, a classic convolutional neural network architecture developed by researchers at Oxford University. It was trained on the ImageNet dataset, which contains over 14 million images across 1000 classes.

Step 1: Prepare Data for VGG19

1.1 Define Image Preprocessing and Augmentation

VGG19 expects input images to be preprocessed in a very specific way because of the way it was trained. We use the preprocess_input() function from tensorflow.keras.applications.vgg19 to preprocess our images. Specifically, this function converts RGB pixel values to the format VGG19 was originally trained on (i.e., channels in BGR order, zero-centered with respect to ImageNet).

Let’s create new data generators for VGG19 using ImageDataGenerator with:

preprocess_input for normalization

Augmentation on the training set

No augmentation on the validation and test sets

from tensorflow.keras.applications.vgg19 import VGG19, preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Constraints
IMAGE_SIZE = (224, 224)
BATCH_SIZE = 32

# Define new ImageDataGenerators for VGG19
vgg19_train_datagen = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

vgg19_val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
vgg19_test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

1.2 Load Images Using `flow_from_dataframe()`

Just like we did for our CNN model, we can use flow_from_dataframe() to load images in batches directly from our labeled Dataframes (train_df, val_df, and test_df).

# Assuming train_df, val_df, and test_df are defined
# Create training generator below
train_generator_vgg19 = _____

# Create validation generator below
val_generator_vgg19 = _____

# Create test generator below
test_generator_vgg19 = _____

Step 2: Define and Train the VGG19 Model

2.1 Load VGG19 Base Model and Stack a Custom Classifier

We now load the VGG19 base model, which has been pre-trained on ImageNet. We exclude the original classification head (include_top=False) and freeze all convolutional layers.

Next, we stack a custom classifier on top using Keras’ Sequential API:

Flatten the output of VGG19’s last convolutional layer
Add the same fully connected (dense) layers that we had in our original CNN built from scratch

from tensorflow.keras.applications import VGG19
from tensorflow.keras import Sequential
from tensorflow.keras.layers import ___, ___   # Import the necessary layers
from tensorflow.keras.optimizers import RMSprop

# Load VGG19 base (without top classifier)
vgg_base = VGG19(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
vgg_base.trainable = False  # Freeze all pretrained layers

# Build the full model
VGG19_model = Sequential([
    vgg_base,
    # Add a flatten layer:
    # ... your code here ...

    # Then add our three dense layers:
    # ... your code here ...
    # ... your code here ...
    # ... your code here ...
])

Now, let’s compile the model with the same optimizer and loss function as our previous model.

# Compile with a low learning rate optimizer
VGG19_model.compile(
    # ... your code here ...
    # ... your code here ...
    # ... your code here ...
)

2.2 Define Training Callbacks

Next, let’s define some training callbacks. Callbacks are functions executed during training that allow the training process to change its behavior dynamically.

Some common callbacks include:

EarlyStopping: This callback stops training when a monitored metric (e.g., validation accuracy) stops improving. It helps prevent overfitting by halting training once the model’s performance plateaus.
ReduceLROnPlateau: This callback reduces the learning rate when a monitored metric (e.g., validation loss) stops improving. By lowering the learning rate, the model can converge to a better local minimum (preventing it from getting stuck in a suboptimal solution).

from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

# Define callbacks
callbacks = [
    EarlyStopping(
        monitor='val_accuracy',      # Monitor validation accuracy
        patience=5,                  # Number of epochs to wait before stopping
        restore_best_weights=True    # Restore the best weights from the epoch with the highest validation accuracy
    ),
    ReduceLROnPlateau(
        monitor='val_loss',          # Monitor validation loss
        factor=0.5,                  # Reduce learning rate by 50%
        patience=3,                  # Number of epochs to wait before reducing learning rate
        min_lr=1e-6                  # Minimum learning rate
    )
]

# Train the model with callbacks
VGG19_history = VGG19_model.fit(
    train_generator_vgg19,
    validation_data=val_generator_vgg19,
    epochs=15,
    class_weight=class_weight_dict,
    callbacks=callbacks             # Pass the callbacks to the fit method
)

Example Output:

Epoch 1/15
9/9 [==============================] - 7s 379ms/step - loss: 4.3120 - accuracy: 0.5263 - val_loss: 0.4430 - val_accuracy: 0.8657 - lr: 1.0000e-04
Epoch 2/15
9/9 [==============================] - 5s 503ms/step - loss: 0.7271 - accuracy: 0.8346 - val_loss: 0.3540 - val_accuracy: 0.9104 - lr: 1.0000e-04
Epoch 3/15
9/9 [==============================] - 4s 479ms/step - loss: 0.7825 - accuracy: 0.8120 - val_loss: 0.4158 - val_accuracy: 0.8507 - lr: 1.0000e-04
Epoch 4/15
9/9 [==============================] - 4s 480ms/step - loss: 0.4168 - accuracy: 0.8684 - val_loss: 0.3532 - val_accuracy: 0.8806 - lr: 1.0000e-04
Epoch 5/15
9/9 [==============================] - 4s 474ms/step - loss: 0.5239 - accuracy: 0.8835 - val_loss: 0.4578 - val_accuracy: 0.8806 - lr: 1.0000e-04
Epoch 6/15
9/9 [==============================] - 4s 470ms/step - loss: 0.4025 - accuracy: 0.8835 - val_loss: 1.0588 - val_accuracy: 0.8209 - lr: 1.0000e-04
Epoch 7/15
9/9 [==============================] - 4s 488ms/step - loss: 0.5346 - accuracy: 0.8496 - val_loss: 0.2890 - val_accuracy: 0.9104 - lr: 1.0000e-04

Visualizing Training History

Just like we did for our first CNN model, let’s plot the training and validation performance over time.

Refer back to Section 1: Step 6 – Visualizing Training History for a refresher on how to do this.

# Plot for VGG19
plot_training_history(VGG19_history, title_prefix='VGG19')

Thought Challenge: Compare the performance of our VGG19 model to our previous CNN model. What are some major differences in the training curves?

Step 3: Evaluate the VGG19 Model on the Test Set

Just like we did for our first CNN model, let’s evaluate the VGG19 model on the test set.

Evaluate Test Accuracy and Loss

First, let’s calculate the test accuracy and loss. Can you recall how to do this?

# Evaluate test accuracy and loss
# ... your code here ...
# ... your code here ...
# ... your code here ...

Example output:

Test Accuracy: 86.90%
Test Loss: 0.9072

Our model correctly classifies the test images about 87% of the time. What an improvement!

Visualize Predictions with a Confusion Matrix

Now, let’s visualize the predictions of our VGG19 model on the test set with a confusion matrix.

Refer back to Section 1: Step 7 – Visualize Predictions with a Confusion Matrix for a refresher on how to do this.

# Get predicted probabilities for each class
vgg19_pred_probs = # ... your code here ...

# Convert to predicted class labels
vgg19_y_pred = np.argmax(vgg19_pred_probs, axis=1)

# Get true labels
vgg19_y_true = # ... your code here ...

# Create confusion matrix
cm = # ... your code here ...

# Map class indices back to names
class_names = # ... your code here ...

# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=class_names,
            yticklabels=class_names)
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.tight_layout()
plt.show()

Notice how the confusion matrix shows a distinct diagonal pattern, where the true and predicted labels are the same more often than not? This indicates that our model is performing well on all classes. Nice!

Detailed Performance with a Classification Report

Finally, let’s print out the full classification report.

# Print the full classification report
# ... your code here ...
# ... your code here ...
# ... your code here ...

Example output:

   Classification Report (VGG19):
              precision    recall  f1-score   support

        ACER       1.00      0.89      0.94        27
        CNAT       0.76      0.96      0.85        26
        MCAV       0.89      0.77      0.83        31

    accuracy                           0.87        84
   macro avg       0.88      0.87      0.87        84
weighted avg       0.88      0.87      0.87        84

Thought Challenge: Compare the performance of our VGG19 model to our previous CNN model. What are some major differences in the classification report? Are there still any problematic classes that the model is struggling with? If so, what do you think is causing this?

Step 4: Visualize Predictions from the Test Set

First, let’s take the raw predictions from our VGG19 model and organize them into a pandas DataFrame with four columns:

Filepath: Where each image is located

True Label: The actual species of coral in the image

Predicted Label: What our model thinks the species is

Confidence: How confident our model is in its prediction (0-1)

This organized DataFrame makes it easy to save our model’s predictions and create visualizations of the results.

import os

# Create a mapping from class indices to class names
idx_to_class = {v: k for k, v in test_generator_vgg19.class_indices.items()}

# The filenames already contain the full paths, so we can use them directly
file_paths = test_generator_vgg19.filenames

# Convert class indices to class names
true_class_names = [idx_to_class[idx] for idx in vgg19_y_true]
pred_class_names = [idx_to_class[idx] for idx in vgg19_y_pred]

# Get the confidence scores for the predicted classes
confidence_scores = [vgg19_pred_probs[i][pred_idx] for i, pred_idx in enumerate(vgg19_y_pred)]

# Create the results DataFrame
vgg19_results_df = pd.DataFrame({
    'Filepath': file_paths,
    'True Label': true_class_names,
    'Predicted Label': pred_class_names,
    'Confidence': confidence_scores
})

# Display first few rows
print(vgg19_results_df.head())

Let’s display a few test images along with their predicted labels, true labels, and the model’s confidence scores.

This helps visually confirm whether predictions make sense – and helps identify patterns in misclassifications.

from tensorflow.keras.preprocessing.image import load_img

# Number of test images to show
num_images = 8

# Sample a few random rows from the test results
sample_df = vgg19_results_df.sample(n=num_images, random_state=123).reset_index(drop=True)

# Set up the plot grid
plt.figure(figsize=(16, 8))
for i in range(num_images):
    row = sample_df.iloc[i]
    img = load_img(row['Filepath'], target_size=(224, 224))

    plt.subplot(2, num_images // 2, i + 1)
    plt.imshow(img)
    plt.axis('off')

    # Determine color based on prediction accuracy
    is_correct = row['Predicted Label'] == row['True Label']
    color = 'green' if is_correct else 'red'

    # Create title with colored text
    title = f"Pred: {row['Predicted Label']}\nTrue: {row['True Label']}\nConf: {row['Confidence']:.2f}"
    plt.title(title, fontsize=10, color=color)

plt.suptitle("Sample Predictions", fontsize=12)
plt.show()

Final Thoughts and Wrap-Up

Congratulations!

You have successfully built and trained a Convolutional Neural Network (CNN) using the VGG19 architecture to classify coral species.
You learned how to implement and utilize training callbacks to optimize the model’s performance.
You explored the importance of data preprocessing and augmentation in improving model accuracy.
You gained insights into the practical application of deep learning in biological data analysis.

To further enhance your model, consider the following ideas:

Fine-tune VGG19: Unfreeze some of the deeper convolutional layers and retrain the model to better adapt to your specific dataset.
Explore Other Architectures: Experiment with different pre-trained models like ResNet or Inception to compare their performance with VGG19.
Enhance Data Augmentation: Implement more aggressive data augmentation techniques such as color jitter, brightness shifts, cropping, and noise addition to increase model robustness.
Improve Image Quality: Apply image cleaning or filtering techniques to enhance the quality of your dataset.
Optimize Model Architecture: Consider adding Batch Normalization, Dropout, or other regularization techniques to improve model generalization.

—

Thank you for following along! You’ve made significant progress in understanding how deep learning can be applied to real-world biological data. Keep experimenting and learning!

Additional Resources

Adapted from: COE 379L: Software Design For Responsible Intelligent Systems

Build Your Own Convolutional Neural Network

Tutorial Setup and Materials

Part 1: Building a CNN Model from Scratch

Step 0: Check GPU Availability and TensorFlow Version

Step 1: Data Loading and Organization

1.1 List Dataset Directory Contents

1.2 Check File Extensions

1.3 Explore Image Dimensions and Color Modes

1.4 Check for Corrupted Images

1.5 Create a DataFrame of Image Paths and Labels

Step 2: Visualize the Data

2.1 Visualize the Class Distribution

2.2 Visualize Images from the Dataset

Step 3: Split the Dataset and Handle Class Imbalance

3.1 Split the Dataset into Training, Validation, and Test Sets

3.2 Compute Class Weights

Step 4: Image Preprocessing and Data Generators

4.1 Define Image Preprocessing and Augmentation

4.2 Load Images Using flow_from_dataframe()

Sanity Check: Inspect a Batch from the Training Generator

Visualize a Few Images from the Training Generator

Step 5: Define Your CNN Model Architecture

Calculating Parameters in CNNs

Step 6: Train the CNN Model

Visualizing Training History

Step 7: Evaluate the Model on the Test Set

Evaluate Test Accuracy and Loss

Visualize Predictions with a Confusion Matrix

Detailed Performance with a Classification Report

Part 2: Transfer Learning with VGG19

Step 1: Prepare Data for VGG19

1.1 Define Image Preprocessing and Augmentation

1.2 Load Images Using flow_from_dataframe()

Step 2: Define and Train the VGG19 Model

2.1 Load VGG19 Base Model and Stack a Custom Classifier

2.2 Define Training Callbacks

Visualizing Training History

Step 3: Evaluate the VGG19 Model on the Test Set

Evaluate Test Accuracy and Loss

Visualize Predictions with a Confusion Matrix

Detailed Performance with a Classification Report

Step 4: Visualize Predictions from the Test Set

Final Thoughts and Wrap-Up

Additional Resources

4.2 Load Images Using `flow_from_dataframe()`

1.2 Load Images Using `flow_from_dataframe()`