Week 47: Recurrent neural networks and Autoencoders

Morten Hjorth-Jensen
Department of Physics, University of Oslo, Norway

November 17-21, 2025












Plan for week 47

Plans for the lecture Monday 18 November, with video suggestions etc

  1. Recurrent neural networks, code examples and long-short-term memory
  2. Readings and Videos:
  3. These lecture notes at https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week47/ipynb/week47.ipynb
  4. See also lecture notes from week 46 at https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week46/ipynb/week46.ipynb. The lecture on Monday starts with a repetition on recurrent neural networks. The second lecture starts with basics of autoenconders.
Lab sessions on Tuesday and Wednesday

  1. Work and Discussion of project 3
  2. Last weekly exercise










Reading recommendations

  1. For RNNs, see Goodfellow et al chapter 10, see https://www.deeplearningbook.org/contents/rnn.html.
  2. Reading suggestions for implementation of RNNs in PyTorch: see Rashcka et al.'s chapter 15 and GitHub site at https://github.com/rasbt/machine-learning-book/tree/main/ch15.










TensorFlow examples

For TensorFlow (using Keras) implementations, we recommend

  1. David Foster, Generative Deep Learning with TensorFlow, see chapter 5 at https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/ch05.html
  2. Joseph Babcock and Raghav Bali Generative AI with Python and their GitHub link, chapters 2 and 3 at https://github.com/PacktPublishing/Hands-On-Generative-AI-with-Python-and-TensorFlow-2










What is a recurrent NN?

A recurrent neural network (RNN), as opposed to a regular fully connected neural network (FCNN) or just neural network (NN), has layers that are connected to themselves.

In an FCNN there are no connections between nodes in a single layer. For instance, \( (h_1^1 \) is not connected to \( (h_2^1 \). In addition, the input and output are always of a fixed length.

In an RNN, however, this is no longer the case. Nodes in the hidden layers are connected to themselves.











Why RNNs?

Recurrent neural networks work very well when working with sequential data, that is data where the order matters. In a regular fully connected network, the order of input doesn't really matter.

Another property of RNNs is that they can handle variable input and output. Consider again the simplified breast cancer dataset. If you have trained a regular FCNN on the dataset with the two features, it makes no sense to suddenly add a third feature. The network would not know what to do with it, and would reject such inputs with three features (or any other number of features that isn't two, for that matter).











RNNs in more detail















RNNs in more detail, part 2















RNNs in more detail, part 3















RNNs in more detail, part 4















RNNs in more detail, part 5















RNNs in more detail, part 6















RNNs in more detail, part 7















The mathematics of RNNs, the basic architecture

See notebook at https://github.com/CompPhysics/AdvancedMachineLearning/blob/main/doc/pub/week7/ipynb/rnnmath.ipynb











Gating mechanism: Long Short Term Memory (LSTM)

Besides a simple recurrent neural network layer, as discussed above, there are two other commonly used types of recurrent neural network layers: Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). For a short introduction to these layers see https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b and https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b.

LSTM uses a memory cell for modeling long-range dependencies and avoid vanishing gradient problems. Capable of modeling longer term dependencies by having memory cells and gates that controls the information flow along with the memory cells.

  1. Introduced by Hochreiter and Schmidhuber (1997) who solved the problem of getting an RNN to remember things for a long time (like hundreds of time steps).
  2. They designed a memory cell using logistic and linear units with multiplicative interactions.
  3. Information gets into the cell whenever its “write” gate is on.
  4. The information stays in the cell so long as its keep gate is on.
  5. Information can be read from the cell by turning on its read gate.

The LSTM were first introduced to overcome the vanishing gradient problem.











Implementing a memory cell in a neural network

To preserve information for a long time in the activities of an RNN, we use a circuit that implements an analog memory cell.

  1. A linear unit that has a self-link with a weight of 1 will maintain its state.
  2. Information is stored in the cell by activating its write gate.
  3. Information is retrieved by activating the read gate.
  4. We can backpropagate through this circuit because logistics are have nice derivatives.










LSTM details

The LSTM is a unit cell that is made of three gates:

  1. the input gate,
  2. the forget gate,
  3. and the output gate.

It also introduces a cell state \( c \), which can be thought of as the long-term memory, and a hidden state \( h \) which can be thought of as the short-term memory.











Basic layout (All figures from Raschka et al.,)















LSTM details

The first stage is called the forget gate, where we combine the input at (say, time \( t \)), and the hidden cell state input at \( t-1 \), passing it through the Sigmoid activation function and then performing an element-wise multiplication, denoted by \( \odot \).

Mathematically we have (see also figure below)

$$ \mathbf{f}^{(t)} = \sigma(W_{fx}\mathbf{x}^{(t)} + W_{fh}\mathbf{h}^{(t-1)} + \mathbf{b}_f) $$

where the $W$s are the weights to be trained.











Comparing with a standard RNN















LSTM details I















LSTM details II















LSTM details III















Forget gate















The forget gate

The naming forget gate stems from the fact that the Sigmoid activation function's outputs are very close to \( 0 \) if the argument for the function is very negative, and \( 1 \) if the argument is very positive. Hence we can control the amount of information we want to take from the long-term memory.

$$ \mathbf{f}^{(t)} = \sigma(W_{fx}\mathbf{x}^{(t)} + W_{fh}\mathbf{h}^{(t-1)} + \mathbf{b}_f) $$

where the $W$s are the weights to be trained.











Basic layout















Input gate

The next stage is the input gate, which consists of both a Sigmoid function (\( \sigma_i \)), which decide what percentage of the input will be stored in the long-term memory, and the \( \tanh_i \) function, which decide what is the full memory that can be stored in the long term memory. When these results are calculated and multiplied together, it is added to the cell state or stored in the long-term memory, denoted as \( \oplus \).

We have

$$ \mathbf{i}^{(t)} = \sigma_g(W_{ix}\mathbf{x}^{(t)} + W_{ih}\mathbf{h}^{(t-1)} + \mathbf{b}_i), $$

and

$$ \mathbf{g}^{(t)} = \tanh(W_{gx}\mathbf{x}^{(t)} + W_{gh}\mathbf{h}^{(t-1)} + \mathbf{b}_g), $$

again the $W$s are the weights to train.











Short summary















Forget and input

The forget gate and the input gate together also update the cell state with the following equation,

$$ \mathbf{c}^{(t)} = \mathbf{f}^{(t)} \otimes \mathbf{c}^{(t-1)} + \mathbf{i}^{(t)} \otimes \mathbf{g}^{(t)}, $$

where \( f^{(t)} \) and \( i^{(t)} \) are the outputs of the forget gate and the input gate, respectively.











Basic layout















Output gate

The final stage of the LSTM is the output gate, and its purpose is to update the short-term memory. To achieve this, we take the newly generated long-term memory and process it through a hyperbolic tangent (\( \tanh \)) function creating a potential new short-term memory. We then multiply this potential memory by the output of the Sigmoid function (\( \sigma_o \)). This multiplication generates the final output as well as the input for the next hidden cell (\( h^{\langle t \rangle} \)) within the LSTM cell.

We have

$$ \begin{aligned} \mathbf{o}^{(t)} &= \sigma_g(W_o\mathbf{x}^{(t)} + U_o\mathbf{h}^{(t-1)} + \mathbf{b}_o), \\ \mathbf{h}^{(t)} &= \mathbf{o}^{(t)} \otimes \sigma_h(\mathbf{c}^{(t)}). \\ \end{aligned} $$

where \( \mathbf{W_o,U_o} \) are the weights of the output gate and \( \mathbf{b_o} \) is the bias of the output gate.











Summary of LSTM

LSTMs provide a basic approach for modeling long-range dependencies in sequences. If you wish to read more, see An Empirical Exploration of Recurrent Network Architectures, authored by Rafal Jozefowicz et al., Proceedings of ICML, 2342-2350, 2015).

An important recent development are the so-called gated recurrent units (GRU), see for example the article by Junyoung Chung et al.,, at URL:"https://arxiv.org/abs/1412.3555. This article is an excellent read if you are interested in learning more about these modern RNN architectures

The GRUs have a simpler architecture than LSTMs. This leads to computationally more efficient methods, while their performance in some tasks, such as polyphonic music modeling, is comparable to LSTMs.











LSTM implementation using TensorFlow

"""
Key points:
1. The input images (28x28 pixels) are treated as sequences of 28 timesteps with 28 features each
2. The LSTM layer processes this sequential data
3. A final dense layer with softmax activation handles the classification
4. Typical accuracy ranges between 95-98% (lower than CNNs but reasonable for demonstration)

Note: LSTMs are not typically used for image classification (CNNs are more efficient), but this demonstrates how to adapt them for such tasks. Training might take longer compared to CNN architectures.

To improve performance, you could:
1. Add more LSTM layers
2. Use Bidirectional LSTMs
3. Increase the number of units
4. Add dropout for regularization
5. Use learning rate scheduling
"""

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.utils import to_categorical

# Load and preprocess data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape data for LSTM (samples, timesteps, features)
# MNIST images are 28x28, so we treat each image as 28 timesteps of 28 features
x_train = x_train.reshape((-1, 28, 28))
x_test = x_test.reshape((-1, 28, 28))

# Convert labels to one-hot encoding
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Build LSTM model
model = Sequential()
model.add(LSTM(128, input_shape=(28, 28)))  # 128 LSTM units
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

# Display model summary
model.summary()

# Train the model
history = model.fit(x_train, y_train,
                   batch_size=64,
                   epochs=10,
                   validation_split=0.2)

# Evaluate on test data
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f'\nTest accuracy: {test_acc:.4f}')










And the corresponding one with PyTorch

"""
Key components:
1. **Data Handling**: Uses PyTorch DataLoader with MNIST dataset
2. **LSTM Architecture**:
  - Input sequence of 28 timesteps (image rows)
  - 128 hidden units in LSTM layer
  - Fully connected layer for classification
3. **Training**:
  - Cross-entropy loss
  - Adam optimizer
  - Automatic GPU utilization if available

This implementation typically achieves **97-98% accuracy** after 10 epochs. The main differences from the TensorFlow/Keras version:
- Explicit device management (CPU/GPU)
- Manual training loop
- Different data loading pipeline
- More explicit tensor reshaping

To improve performance, you could:
1. Add dropout regularization
2. Use bidirectional LSTM
3. Implement learning rate scheduling
4. Add batch normalization
5. Increase model capacity (more layers/units)
"""

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Hyperparameters
input_size = 28     # Number of features (pixels per row)
hidden_size = 128   # LSTM hidden state size
num_classes = 10    # Digits 0-9
num_epochs = 10     # Training iterations
batch_size = 64     # Batch size
learning_rate = 0.001

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# MNIST dataset
transform = transforms.Compose([
   transforms.ToTensor(),
   transforms.Normalize((0.1307,), (0.3081,))  # MNIST mean and std
])

train_dataset = datasets.MNIST(root='./data',
                              train=True,
                              transform=transform,
                              download=True)

test_dataset = datasets.MNIST(root='./data',
                             train=False,
                             transform=transform)

train_loader = DataLoader(dataset=train_dataset,
                         batch_size=batch_size,
                         shuffle=True)

test_loader = DataLoader(dataset=test_dataset,
                        batch_size=batch_size,
                        shuffle=False)

# LSTM model
class LSTMModel(nn.Module):
   def __init__(self, input_size, hidden_size, num_classes):
       super(LSTMModel, self).__init__()
       self.hidden_size = hidden_size
       self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
       self.fc = nn.Linear(hidden_size, num_classes)

   def forward(self, x):
       # Reshape input to (batch_size, sequence_length, input_size)
       x = x.reshape(-1, 28, 28)

       # Forward propagate LSTM
       out, _ = self.lstm(x)  # out: (batch_size, seq_length, hidden_size)

       # Decode the hidden state of the last time step
       out = out[:, -1, :]
       out = self.fc(out)
       return out

# Initialize model
model = LSTMModel(input_size, hidden_size, num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training loop
total_step = len(train_loader)
for epoch in range(num_epochs):
   model.train()
   for i, (images, labels) in enumerate(train_loader):
       images = images.to(device)
       labels = labels.to(device)

       # Forward pass
       outputs = model(images)
       loss = criterion(outputs, labels)

       # Backward and optimize
       optimizer.zero_grad()
       loss.backward()
       optimizer.step()

       if (i+1) % 100 == 0:
           print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_step}], Loss: {loss.item():.4f}')

   # Test the model
   model.eval()
   with torch.no_grad():
       correct = 0
       total = 0
       for images, labels in test_loader:
           images = images.to(device)
           labels = labels.to(device)
           outputs = model(images)
           _, predicted = torch.max(outputs.data, 1)
           total += labels.size(0)
           correct += (predicted == labels).sum().item()

       print(f'Test Accuracy: {100 * correct / total:.2f}%')

print('Training finished.')










Dynamical ordinary differential equation

Let us illustrate how we could train an RNN using data from the solution of a well-known differential equation, namely Newton's equation for oscillatory motion for an object being forced into harmonic oscillations by an applied external force.

We will start with the basic algorithm for solving this type of equations using the Runge-Kutta-4 approach. The first code example is a standalone differential equation solver. It yields positions and velocities as function of time, starting with an initial time \( t_0 \) and ending with a final time.

The data the program produces will in turn be used to train an RNN for a selected number of training data. With a trained RNN, we will then use the network to make predictions for data not included in the training. That is, we will train a model which should be able to reproduce velocities and positions not included in training data.











The Runge-Kutta-4 code

import numpy as np
import pandas as pd
from math import *
import matplotlib.pyplot as plt
import os

# Where to save the figures and data files
PROJECT_ROOT_DIR = "Results"
FIGURE_ID = "Results/FigureFiles"
DATA_ID = "DataFiles/"

if not os.path.exists(PROJECT_ROOT_DIR):
    os.mkdir(PROJECT_ROOT_DIR)

if not os.path.exists(FIGURE_ID):
    os.makedirs(FIGURE_ID)

if not os.path.exists(DATA_ID):
    os.makedirs(DATA_ID)

def image_path(fig_id):
    return os.path.join(FIGURE_ID, fig_id)

def data_path(dat_id):
    return os.path.join(DATA_ID, dat_id)

def save_fig(fig_id):
    plt.savefig(image_path(fig_id) + ".png", format='png')


def SpringForce(v,x,t):
#   note here that we have divided by mass and we return the acceleration
    return  -2*gamma*v-x+Ftilde*cos(t*Omegatilde)


def RK4(v,x,t,n,Force):
    for i in range(n-1):
# Setting up k1
        k1x = DeltaT*v[i]
        k1v = DeltaT*Force(v[i],x[i],t[i])
# Setting up k2
        vv = v[i]+k1v*0.5
        xx = x[i]+k1x*0.5
        k2x = DeltaT*vv
        k2v = DeltaT*Force(vv,xx,t[i]+DeltaT*0.5)
# Setting up k3
        vv = v[i]+k2v*0.5
        xx = x[i]+k2x*0.5
        k3x = DeltaT*vv
        k3v = DeltaT*Force(vv,xx,t[i]+DeltaT*0.5)
# Setting up k4
        vv = v[i]+k3v
        xx = x[i]+k3x
        k4x = DeltaT*vv
        k4v = DeltaT*Force(vv,xx,t[i]+DeltaT)
# Final result
        x[i+1] = x[i]+(k1x+2*k2x+2*k3x+k4x)/6.
        v[i+1] = v[i]+(k1v+2*k2v+2*k3v+k4v)/6.
        t[i+1] = t[i] + DeltaT


# Main part begins here

DeltaT = 0.001
#set up arrays 
tfinal = 20 # in dimensionless time
n = ceil(tfinal/DeltaT)
# set up arrays for t, v, and x
t = np.zeros(n)
v = np.zeros(n)
x = np.zeros(n)
# Initial conditions (can change to more than one dim)
x0 =  1.0 
v0 = 0.0
x[0] = x0
v[0] = v0
gamma = 0.2
Omegatilde = 0.5
Ftilde = 1.0
# Start integrating using Euler's method
# Note that we define the force function as a SpringForce
RK4(v,x,t,n,SpringForce)

# Plot position as function of time    
fig, ax = plt.subplots()
ax.set_ylabel('x[m]')
ax.set_xlabel('t[s]')
ax.plot(t, x)
fig.tight_layout()
save_fig("ForcedBlockRK4")
plt.show()










Using the above data to train an RNN

In the code here we have reworked the previous example in order to generate data that can be handled by recurrent neural networks in order to train our model.

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt


# Newton's equation for harmonic oscillations with external force

# Global parameters
gamma = 0.2        # damping
Omegatilde = 0.5   # driving frequency
Ftilde = 1.0       # driving amplitude

def spring_force(v, x, t):
    """
    SpringForce:
    note: divided by mass => returns acceleration
    a = -2*gamma*v - x + Ftilde*cos(Omegatilde * t)
    """
    return -2.0 * gamma * v - x + Ftilde * np.cos(Omegatilde * t)


def rk4_trajectory(DeltaT=0.001, tfinal=20.0, x0=1.0, v0=0.0):
    """
    Returns t, x, v arrays.
    """
    n = int(np.ceil(tfinal / DeltaT))

    t = np.zeros(n, dtype=np.float32)
    x = np.zeros(n, dtype=np.float32)
    v = np.zeros(n, dtype=np.float32)

    x[0] = x0
    v[0] = v0

    for i in range(n - 1):
        # k1
        k1x = DeltaT * v[i]
        k1v = DeltaT * spring_force(v[i], x[i], t[i])

        # k2
        vv = v[i] + 0.5 * k1v
        xx = x[i] + 0.5 * k1x
        k2x = DeltaT * vv
        k2v = DeltaT * spring_force(vv, xx, t[i] + 0.5 * DeltaT)

        # k3
        vv = v[i] + 0.5 * k2v
        xx = x[i] + 0.5 * k2x
        k3x = DeltaT * vv
        k3v = DeltaT * spring_force(vv, xx, t[i] + 0.5 * DeltaT)

        # k4
        vv = v[i] + k3v
        xx = x[i] + k3x
        k4x = DeltaT * vv
        k4v = DeltaT * spring_force(vv, xx, t[i] + DeltaT)

        # Update
        x[i + 1] = x[i] + (k1x + 2.0 * k2x + 2.0 * k3x + k4x) / 6.0
        v[i + 1] = v[i] + (k1v + 2.0 * k2v + 2.0 * k3v + k4v) / 6.0
        t[i + 1] = t[i] + DeltaT

    return t, x, v


# Sequence generation for RNN training

def create_sequences(x, seq_len):
    """
    Given a 1D array x (e.g., position as a function of time),
    create input/target sequences for next-step prediction.

    Inputs:  [x_i, x_{i+1}, ..., x_{i+seq_len-1}]
    Targets: [x_{i+1}, ..., x_{i+seq_len}]
    """
    xs = []
    ys = []
    for i in range(len(x) - seq_len):
        seq_x = x[i : i + seq_len]
        seq_y = x[i + 1 : i + seq_len + 1]  # shifted by one step
        xs.append(seq_x)
        ys.append(seq_y)

    xs = np.array(xs, dtype=np.float32)      # shape: (num_samples, seq_len)
    ys = np.array(ys, dtype=np.float32)      # shape: (num_samples, seq_len)
    # Add feature dimension (1 feature: the position x)
    xs = np.expand_dims(xs, axis=-1)         # (num_samples, seq_len, 1)
    ys = np.expand_dims(ys, axis=-1)         # (num_samples, seq_len, 1)
    return xs, ys


class OscillatorDataset(Dataset):
    def __init__(self, seq_len=50, DeltaT=0.001, tfinal=20.0, x0=1.0, v0=0.0):
        t, x, v = rk4_trajectory(DeltaT=DeltaT, tfinal=tfinal, x0=x0, v0=v0)
        self.t = t
        self.x = x
        self.v = v
        xs, ys = create_sequences(x, seq_len=seq_len)
        self.inputs = torch.from_numpy(xs)  # (N, seq_len, 1)
        self.targets = torch.from_numpy(ys) # (N, seq_len, 1)

    def __len__(self):
        return self.inputs.shape[0]

    def __getitem__(self, idx):
        return self.inputs[idx], self.targets[idx]


# RNN model (LSTM-based in this example)

class RNNPredictor(nn.Module):
    def __init__(self, input_size=1, hidden_size=32, num_layers=1, output_size=1):
        super().__init__()
        self.lstm = nn.LSTM(input_size=input_size,
                            hidden_size=hidden_size,
                            num_layers=num_layers,
                            batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # x: (batch, seq_len, input_size)
        out, _ = self.lstm(x)   # out: (batch, seq_len, hidden_size)
        out = self.fc(out)      # (batch, seq_len, output_size)
        return out


# Training part

def train_model(
    seq_len=50,
    DeltaT=0.001,
    tfinal=20.0,
    batch_size=64,
    num_epochs=10,
    hidden_size=64,
    lr=1e-3,
    device=None,
):
    if device is None:
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")

    # Dataset & DataLoader
    dataset = OscillatorDataset(seq_len=seq_len, DeltaT=DeltaT, tfinal=tfinal)
    train_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

    # Model, loss, optimizer
    model = RNNPredictor(input_size=1, hidden_size=hidden_size, output_size=1)
    model.to(device)

    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)

    # Training loop
    model.train()
    for epoch in range(num_epochs):
        epoch_loss = 0.0
        for batch_x, batch_y in train_loader:
            batch_x = batch_x.to(device)
            batch_y = batch_y.to(device)

            optimizer.zero_grad()
            outputs = model(batch_x)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

            epoch_loss += loss.item() * batch_x.size(0)

        epoch_loss /= len(train_loader.dataset)
        print(f"Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.6f}")

    return model, dataset


# Evaluation / visualization

def evaluate_and_plot(model, dataset, seq_len=50, device=None):
    if device is None:
        device = next(model.parameters()).device

    model.eval()
    with torch.no_grad():
        # Take a single sequence from the dataset
        x_seq, y_seq = dataset[0]  # shapes: (seq_len, 1), (seq_len, 1)
        x_input = x_seq.unsqueeze(0).to(device)  # (1, seq_len, 1)
        # Model prediction (next-step for whole sequence)
        y_pred = model(x_input).cpu().numpy().squeeze(-1).squeeze(0)  # (seq_len,)
        # True target
        y_true = y_seq.numpy().squeeze(-1)  # (seq_len,)
        # Plot comparison
        plt.figure()
        plt.plot(y_true, label="True x(t+Δt)", linestyle="-")
        plt.plot(y_pred, label="Predicted x(t+Δt)", linestyle="--")
        plt.xlabel("Time step in sequence")
        plt.ylabel("Position")
        plt.legend()
        plt.title("RNN next-step prediction on oscillator trajectory")
        plt.tight_layout()
        plt.show()

# This is the main part of the code where we define the network

if __name__ == "__main__":
    # Hyperparameters can be tweaked as you like
    seq_len = 50
    DeltaT = 0.001
    tfinal = 20.0
    num_epochs = 10
    batch_size = 64
    hidden_size = 64
    lr = 1e-3

    model, dataset = train_model(
        seq_len=seq_len,
        DeltaT=DeltaT,
        tfinal=tfinal,
        batch_size=batch_size,
        num_epochs=num_epochs,
        hidden_size=hidden_size,
        lr=lr,
    )

    evaluate_and_plot(model, dataset, seq_len=seq_len)
© 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license