Multi-category patient scheduling in a diagnostic facility using Markov decision process


A properly designed multi-category patient scheduling strategy will boost service quality, operational efficiency, health outcomes, and profit. Our goal is to maximize the expected net revenue, minimize the average waiting-time of patients and minimize the average number of patients who are not scanned by the end of the work-day. 

Categories of patients 


Scheduling challenges

In many hospitals nationwide, a limited number of CT scanners and other equipment can lead to longer patient stays and patient dissatisfaction.

Let's consider the following scenarios: if a Critical Emergency Patient (CEP) needs a CT scan, they will be scheduled immediately, thus causing delays for all other waiting patients. In the absence of a CEP, other Inpatients (IPs) and Outpatients (OPs) will be scanned based entirely on their allotted time-slot. If an OP is scheduled first, this may result in an extra day of hospitalization for an IP who may not be scanned by the end of the work day. If an IP is scheduled first instead, this may cause perceived loss of service-quality for an OP. 

We have described the challenges which may arise in scheduling multi-category patients. Such scheduling challenges may further become more complicated due to the rapid progress of imaging technology and the growing demand for such services. Multi-category patient scheduling will play a hugely important role for not only hospitals but all service related businesses.

Data sets

The data set that we will be using is from a hospital with 2 CT scanners, that is staffed between 10am and 8pm with an average time-slot of 30 minutes.

Resource Schedule. 

The attributed cost and revenues for this data is summarized in Table 2.

Table2: Monetary values of parameters for different


Markov Decision Process

We can model this data set with a Markov Decision Process (MDP). The working procedure of an MDP process is shown in the subsequent figure.

MDP: Markov Decision Process Tree

An MDP contains an agent that is trying to learn an environment by taking actions. Actions will transition the agent from one state to another. Reward or cost is generated as an outcome of the action and the MDP learns a policy that maximizes, not just the immediate, but the total potential reward.

In our example, the action is applied for each CT scanner time-slot in a work-day in order to find an optimal policy that optimizes the multi-category patient scheduling problem. 

To do this we can have used a value iteration algorithm. We assign an action to each state in a way that maximizes total expected net revenue over a work-day. 

This is solved by iteratively calculating the following:

Value iteration algorithm to be solved. 

where P is the probability of transition from state s to s’ under action a, R is the reward, gamma is a discounter factor (greater than 0 but less than 1) and V is the value of state s in iteration i. This will eventually converge to the Bellman equation, where the value of each state can be read off. 

For our data set we already have the probabilities and costs of particular the actions (corresponding to who to schedule on the two scanners), so we just need to feed this into an algorithm. The agent in this case corresponds to the person scheduling patients on both scanners.


The optimal policy for the varying possible cases of IPs and UOPs is shown below. The values of ith row and jth column of the table represent the optimal schedule for the two CT scanners, given  i IPs and j UOPs. 

Table3: Solved output of the possible decisions. 

For example, if there are 2 UOPs and 1 IPs in the queue, you would schedule a UOP and an IP for the two scanners.

By following these policies, we ran a simulation of 5 working days for net revenue and wait-times, with random numbers of UOPs and IPs generated over the course of the time period. This leads to an optimized solution for the wait time, shown below.

Simple representation of outcome. 

Final Thoughts

Even such a simple scheduling algorithms is widely applicable to a large variety of fields and industries. In this article we show the power of a Markov decision process for scheduling multi-category patients for optimal policies. A very similar analysis, with minimal adjustments, could easily be used to solve many other scheduling problems like job and appointment scheduling etc.