Discrete-Event Simulation Combine RL
There are primarily three types of system simulation methods: Discrete-Event Simulation (DES), Agent-Based Modeling (ABM), and System Dynamics (SD). This time, let's discuss DES.
What is DES?
DES is a method of simulating the behavior and performance of a system. It models the operation of a system as a sequence of discrete events in time. Each event occurs at a particular instant and marks a change of state in the system.
It involves scheduling and executing events that change the state of the system. The simulation progresses by processing these events one at a time, which may include random elements to simulate variability in system behavior. An additional concept of a clock in discrete event simulation is that there will be a timer in that environment to record the current time. Once the event trigger moment is reached, the event will occur.
An "event" refers to any occurrence that alters the state of a system. The term "discrete" indicates that our focus is solely on specific moments when these events take place, disregarding other periods as irrelevant. Examples of such events include the arrival and departure of customers, the allocation and release of resources, and sudden, unexpected occurrences like earthquakes that impact operations.
Given that resources are limited, a frequent scenario in discrete event simulation involves "waiting" for resources. For instance, this can be observed when a patient waits in a reception area for a medical consultation or when individuals queue at a service counter to make a purchase.
DES Combine RL
After understanding the definition of discrete event simulation, one term came to mind - Reinforcement Learning.
Reinforcement Learning is a machine learning approach in which an agent learns to make decisions through interactions with its environment. The agent develops a strategy to optimize the total reward received based on the actions it executes.
In Reinforcement Learning (RL), an agent learns to make the best decisions by trying different actions and seeing how they affect rewards over time. Unlike supervised learning, where the correct answers are provided, RL uses rewards to show how good or bad the agent’s actions are. The agent has to figure things out on its own based on these rewards. In RL, the agent’s actions impact both immediate and future rewards. Because the environment gives limited information, the agent learns from its own experiences. It gradually improves its actions to better adapt to the environment.
Training an RL agent can be time-consuming, often requiring hundreds of thousands of steps to find a nearly optimal strategy. In model-free Markov Decision Processes, this becomes even more complex because the agent has to estimate probabilities of different outcomes from its interactions with the environment, rather than having predefined probabilities.
Simulation becomes an essential tool in reinforcement learning (RL) by providing a controlled and repeatable environment for the RL agent to interact with. This approach reduces the risks and costs associated with the agent's direct interactions in complex or potentially hazardous real-world scenarios. Discrete Event Simulation (DES) is particularly valuable for modeling environments where events happen at specific, separate times. This offers a structured yet adaptable framework for RL to operate within, making it ideal for applications.
Scenario Setting
Consider a hospital scenario: Different departments have their queues of patients, with varying service times and levels of urgency for their visits. The objective here is to minimize the overall waiting time while ensuring that urgent cases receive priority attention. In this context, using DES for simulation allows the RL agent to experiment with different strategies for managing patient flow, optimizing service sequences, and prioritizing patients effectively in a risk-free, virtual setting.
DES Hospital Queue Model
Patient Arrivals: Simulate patient arrivals with key attributes like urgency department, and estimated service time.
Queue Management: Each department has a separate queue. Traditional queue management might be first-come-first-serve or based on fixed priority rules.
Service: Simulate the service process where patients are treated by the department staff.
RL Integration
State: Define the system's state, which could include the number of patients in each queue, the current patient being served, and the urgency levels of waiting patients.
Actions: At each decision point (e.g., when a patient is served and the next patient needs to be selected), the action could be choosing which patient to serve next from any queue.
Reward: Design a reward function that penalizes long wait times, particularly for urgent cases, and possibly rewards short wait times for less urgent cases.
Implementation Steps
Simulate the Environment using DES: Use DES to model the patient flow and service mechanisms. The DES handles the dynamics of patient arrivals, waiting, and service.
Apply RL for Decision Making: Use an RL agent to learn the best strategies for selecting patients from the queue. The agent observes the state of the queues and receives rewards based on the outcomes of its actions (selection of patients).
Training: The RL agent continuously learns from each interaction (each patient service completion and selection of the next patient). Over time, it identifies optimal patterns and strategies to improve queue management.
Integration: Integrate the RL decision-making process into the DES so that at every decision point, the RL agent is consulted to choose the next patient to serve.
Unlike static rules, an RL agent can adapt to changing conditions, such as sudden increases in patient arrivals or changes in department availability. RL can balance multiple objectives such as minimizing waiting times while maximizing the care of urgent cases. The system can continue to improve as more data is collected, potentially adapting to new patterns of patient arrivals or changes in hospital operations.
Experimental Design
To simulate a discrete event system (DES) such as a hospital queuing problem in Python, you can use the SimPy library. SimPy is the standard Python simulation framework for process-based discrete events.
Create the env with anaconda or use the local env.
conda create env -n RL-DES python=3.10
For our simple example, let's go:
import simpy
import random
import numpy as np
from collections import deque, defaultdict
# Global counter for patient IDs
patient_counter = 0
# Define the Patient class, representing a patient in the system
class Patient:
def __init__(self, env, urgency, department, service_time):
global patient_counter
self.id = patient_counter # Assign a unique ID to each patient
patient_counter += 1
self.env = env
self.urgency = urgency # Urgency level of the patient (1 is most urgent, 3 is least)
self.department = department # Department the patient is assigned to
self.service_time = service_time # Time required to serve the patient
self.arrival_time = env.now # Time when the patient arrives
self.wait_time = None # Time the patient spends waiting before being served
# Define the Department class, representing a department in the hospital
class Department:
def __init__(self, env, name, rl_agent):
self.env = env
self.name = name # Name of the department
self.queue = deque() # Queue to hold patients waiting in the department
self.rl_agent = rl_agent # Reference to the RL agent for decision-making
self.action = env.process(self.run()) # Start the department's main process
self.served_patients = [] # List to store wait times of served patients
self.patient_ids = [] # List to store IDs of patients waiting in the department
def run(self):
# Main loop for serving patients in the department
while True:
if self.queue:
# Get the current state and select a patient based on urgency and arrival time (FCFS)
state = self.rl_agent.get_state(self)
patient = self.select_patient(state)
if patient:
# Serve the selected patient and calculate wait time
yield self.env.timeout(patient.service_time)
patient.wait_time = self.env.now - patient.arrival_time
self.served_patients.append(patient.wait_time)
# Update the RL agent's Q-table with the observed reward
next_state = self.rl_agent.get_state(self)
reward = self.calculate_reward(patient)
self.rl_agent.update_q_table(state, self.queue.index(patient), reward, next_state)
# Log patient service information
print(
f"Patient {patient.id} with urgency {patient.urgency} served in {self.name} at time {self.env.now} after waiting {patient.wait_time}")
# Remove the patient from the queue and the patient ID list
self.queue.remove(patient)
self.patient_ids.remove(patient.id)
else:
# If the queue is empty, wait for a short time before checking again
yield self.env.timeout(1)
def add_patient(self, patient):
# Add a new patient to the department's queue and track their ID
self.queue.append(patient)
self.patient_ids.append(patient.id)
print(f"Patient {patient.id} with urgency {patient.urgency} arrives at {self.name} at time {self.env.now}")
def calculate_reward(self, patient):
# Calculate the reward based on the patient's urgency and wait time
# The reward is higher for shorter wait times and more urgent patients
base_reward = 100 # A base reward for serving the patient
wait_time_penalty = -patient.wait_time # Negative impact of waiting time
urgency_multiplier = 4 - patient.urgency # Urgency multiplier (higher for more urgent patients)
# Final reward calculation
reward = base_reward + (urgency_multiplier * wait_time_penalty)
return reward
def get_average_wait_time(self):
# Calculate the average wait time for all served patients
if self.served_patients:
return np.mean(self.served_patients)
return 0.0
def select_patient(self, state):
# Select the patient to be served next based on urgency and FCFS within the same urgency
sorted_queue = sorted(self.queue, key=lambda p: (p.urgency, p.arrival_time))
return sorted_queue[0] if sorted_queue else None
# Define the RL Agent class, responsible for decision-making and learning
class RLAgent:
def __init__(self, departments):
self.q_table = {} # Q-table for storing state-action values
self.departments = departments # List of departments in the system
def get_state(self, department):
# Get the current state, represented by the length of the queue in each department
return tuple(len(d.queue) for d in self.departments)
def select_action(self, state, queue_length):
# Select an action (which patient to serve) using an epsilon-greedy strategy
if random.random() < 0.1:
# Exploration: Choose a random action
return random.choice(range(queue_length))
# Exploitation: Choose the best action based on the Q-values
q_values = [self.q_table.get((state, a), 0) for a in range(queue_length)]
return np.argmax(q_values)
def update_q_table(self, state, action, reward, next_state):
# Update the Q-value for the given state-action pair using the reward and future rewards
old_value = self.q_table.get((state, action), 0)
future_rewards = max([self.q_table.get((next_state, a), 0) for a in range(len(self.departments))], default=0)
self.q_table[(state, action)] = old_value + 0.1 * (reward + 0.9 * future_rewards - old_value)
# Function to generate patients and assign them to departments
def patient_generator(env, departments, arrival_rate):
while True:
# Stop generating new patients after time 500
if env.now > 500:
break
# Generate a new patient with weighted urgency levels
urgency_distribution = [0.1, 0.3, 0.6] # Probability distribution for urgency levels [1, 2, 3]
urgency = random.choices([1, 2, 3], weights=urgency_distribution, k=1)[0]
department = random.choice(departments)
# Determine service time based on the department
if department.name == "Department 1":
service_time = max(1, np.random.normal(10, 3)) # Mean 10, SD 3
elif department.name == "Department 2":
service_time = max(1, np.random.normal(8, 3)) # Mean 8, SD 3
elif department.name == "Department 3":
service_time = 11 # Fixed service time of 11 minutes
patient = Patient(env, urgency, department, service_time)
department.add_patient(patient)
# Wait for the next patient to arrive based on the arrival rate
yield env.timeout(random.expovariate(arrival_rate))
# Function to reset the departments' state between simulation episodes
def reset_departments(departments):
for department in departments:
department.queue.clear()
department.patient_ids.clear()
department.served_patients.clear()
# Function to run one episode of the simulation and return the average wait time
def run_episode(env, rl_agent, departments, simulation_time):
# Start the patient generator process
env.process(patient_generator(env, departments, arrival_rate=0.3)) # Increase the patient arrival rate
# Run the simulation until time 540 or until all patients are served
env.run(until=simulation_time)
# After 500, ensure the simulation continues until all patients are served
while any(len(dept.queue) > 0 for dept in departments):
env.run(until=env.now + 1) # Continue running until all queues are empty
# Calculate the average wait time for all departments
average_wait_times = [dept.get_average_wait_time() for dept in departments]
return np.mean(average_wait_times)
# Function to set up the simulation environment and run multiple episodes
def run_simulation(episodes=100, simulation_time=540):
first_round_avg = 0
last_round_avg = 0
for episode in range(episodes):
print(f"\nStarting episode {episode + 1}")
# Reset the patient counter at the start of each episode
global patient_counter
patient_counter = 0
env = simpy.Environment()
rl_agent = RLAgent(departments=[])
departments = [Department(env, f"Department {i + 1}", rl_agent) for i in range(3)]
rl_agent.departments = departments
# Run one episode and record the average wait time
avg_wait_time = run_episode(env, rl_agent, departments, simulation_time)
print(f"Average wait time for episode {episode + 1}: {avg_wait_time}")
# Store the average wait time for the first and last episodes
if episode == 0:
first_round_avg = avg_wait_time
if episode == episodes - 1:
last_round_avg = avg_wait_time
# Reset the departments for the next episode
reset_departments(departments)
# Print the average wait times for the first and last episodes
print(f"\nAverage wait time in the first episode: {first_round_avg}")
print(f"Average wait time in the last episode: {last_round_avg}")
# Run the simulation
if __name__ == "__main__":
run_simulation(episodes=100, simulation_time=540)
Similar ideas of simulation combined with reinforcement learning are gradually being tested in various places, and this article only briefly introduces these two concepts, while more details will be given in subsequent articles.
Reference:
S. Belsare, E. D. Badilla and M. Dehghanimohammadabadi, "Reinforcement Learning with Discrete Event Simulation: The Premise, Reality, and Promise," 2022 Winter Simulation Conference (WSC), Singapore, 2022, pp. 2724-2735, doi: 10.1109/WSC57314.2022.10015503.
Nian, R., J. Liu, and B. Huang. 2020. “A Review on Reinforcement Learning: Introduction and Applications in Industrial Process Control”. Computers and Chemical Engineering 139:106886.
Join X-Lab on Discord!
We've launched a new discussion channel called "X-Lab" for anyone interested in exploring related topics. This session focuses on Discrete-Event Simulation Combined with Reinforcement Learning.
Click the link to join our Discord and participate in the discussion: https://discord.gg/EDdmCKuPkb
We look forward to seeing you there and diving into exciting discussions!