## Probabilities, distributions, measurements and expected errors

First we assume that the possible outcomes of an observation is a finite set of values. For instance, let us assume that the time the prof takes to mark an exam is either 1, 2, 3 or 4 hours (s/he does not count minutes). If the probability for each value is the same (namely 0.25 = 1 / 4) we say that the marking time has a uniform distribution over the four values 1 through 4.

What would be the distribution of time required by this prof to mark two exams ? - Answer: the value 2 and 8 have probability (1 / 16), values 3 and 7 have probability 2 * (1 / 16), values 4 and 6 have probability 3 * (1 / 16), and value 5 has probability 4 * (1 / 16). This approaches a Normal Distribution.

The average of a distribution over a set of values Vi (i = 1 ... N) is equal to (Sum over all i of Vi * Pi ) / N where Pi is the probability that Vi occurs. Note that values of the probabilities Pi are always normalized such that their sum is equal to 1 (one).

In the case that the set of values is infinite, for instance all real values x between zero and infinity, the average is defined as the integral over x * Px * dx where Px * dx is the probability that a value between x and (x + dx) occurs (dx is a differential - very small).

Two important numbers defined by a distribution: the average Av and the standard deviation SD. The standard deviation is the square root of ((Sum over all i of (Av - Vi )**2 ) / N)

Performance models: Performance models are introduced to model the performance aspects of a system. Often one is interested in the response time of a system. If one makes the assumption that the response time does not depend on the particular request that is coming from the user, and if the next request is only provided once the answer to the first request has been given by the system, then we can characterize the performance of the system by a distribution over the real values between zero and infinity which indicates what the probability is of obtaining a response time between a value x and (x + dx).

Expected errors of measurements: The average response time (average for the limit of a very large number of experiments) will be equal to the average of the distribution. However, we would like to determine the average response time by making only a few experiments, say N. Since the N values for the response time that we will measure are randomly selected (with a probability corresponding to the distribution), we can expect that the average calculated from these N values will not be exactly equal to the "real" average of the distribution. The difference is the expected error of the calculated average. This means, if we do several experiments with N measurements each, we will get different values for the average. The average from N measurements, therefore, has a certain distribution. It has been shown that the average (for very many experiments of N measurements) of the measured averages is equal to the average of the "real" distribution, and that the standard deviation of the measured averages is equal to the standard deviation of the "real" distribution divided by square-root of N. (see Wikipedia - "Standard error of mean" - see also Standard Deviation). - This means that the expected error of the avarage is smaller than the standard deviation (the expected error of a signle experiment) by a factor 1 over square-root of N.

Expected statistical errors in histograms: A histogram is an array that accumulates for each bin of the array the number of event occurrences that have a value that belongs in the corresponding bin. If the values of the events depend on some statistically varying process, then the number of events falling into a particular bin also has statistical variations. The expected amount of variation of this number is equal to the square root of the expected average number for that bin. This is confirmed by the following experiment.

Experience with errors due to statistical variations can be obtained through simulations. Here is a simulation program that builds a histogram of inter-arrival times for a Poisson arrival process. The measured number of occurrences for a given bin of inter-arrival time is compared with the calculated expected number of occurrences, as shown in the diagrams based on the numbers obtained from the simulation. Note that the statistical variations of the measured values correspond to the expected variations according to the above rule. Different statistical variations are obtained when different seed values are used for the random number generator.

## Queuing models

### Distribution of request inter-arrival times

The problem: Let us assume that we want to model a system consisting of many users and a server, and we are interested in the response time of the server which is the time between the moment that a user sends a request to the server to the moment when the server has completed its answer (we ignore any communication delays).

Now, let us assume that the service time of the server is a constant (1 second), that is, it always takes exactly 1 second to provide an answer to a request. It is assumed that the server works on one request at a time (no multi-threading). This means the distribution of the service time is a delta-function with a peak at t = 1 second.

If the users organize themselves in such a way that the next request will be sent to the server only after the last answer was received, then the response time will always be equal to the service time. However, if the users do not coordinate their requests, it may happen that a user sends a request before the last answer was produced. This means that the server is busy when the request arrives and the request has to wait (in some kind of input queue) before it will be processed. Therefore the response time, in this case, will be larger than the service time (service time plus waiting time). We see that the arrival pattern of the request has an impact on the average response time of the system.

Poisson inter-arrival pattern: It can be shown that, if there are very many users without any coordination among them, the inter-arrival time (that is, the time between the arrival of one request up to the arrival of the next request) has an exponential distribution of the form Pt = alpha * exp(- alpha * t) where alpha is the arrival rate. In fact, the average of this inter-arrival distribution is 1/alpha which is the average time between two consecutive requests. We see that in this case there is a good probability that the inter-arrival time is much smaller than the average. Therefore it would be interesting to determine what the average waiting time of requests in the server input queue would be. An answer is given by the queuing models.

Important conclusion: One cannot say what the response time of a server is without knowing the inter-arrival pattern of the requests. The performance guarantee of the server depends on the hypothesis about its environment (which determines the distribution of incoming requests).

### Queuing models

In the simplest case, a queuing model represents a system where service requests arrive randomly (Poisson arrival process - with an exponential inter-arrival time distribution) and a shared resource is needed for performing the service; if the resource is occupied serving one request, subsequent requests must wait in a queue; the time the resource is required for serving a request (the service time) is supposed to be known (either fixed value or a given time distribution). Simple analytical formulas exist that may be used to calculate the average waiting time in the queue, the total time a service request remains in the system (this is the response time), the average length of the queue, etc. (for an overview, see "slides on Queuing Models").

For the case of a single server with a queue and with Poisson arrival and exponential service time (so-called M/M/1 queue), we have the following formulas:

• average time spent in the system (service plus waiting time) Tr = Ts / (1- ρ) where Ts is the average service time and ρ is the utilization (average service time over average inter-arrival time = percentage of time that the server is in use);
• average number of requests in the system (waiting plus being served) r = ρ / (1- ρ)
• average number of requests waiting in the queue w = ρ**2 / (1- ρ) - - see simulation results on actual histories of queue lengths

The factor 1 / (1- ρ) is typical of most queuing formulas. It shows that the response time and queue length blow up when the service time approaches the inter-arrival time. If it reaches the inter-arrival time or becomes larger, the system is completely overloaded and the average queue lengths increases without bound.

Assumptions: The main assumptions for the formulas of the queuing models are the following (they not always satisfied - but the formulas often still provide good approximations):

1. independent arrival of individual requests (Poison arrival) - not satisfied when bursty arrivals (partly self-similar) are considered, or arrival at regular intervals;
2. distribution of service times may not be exponential - different other distributions can be considered.

Example: Assume an inter-arrival time equal to 90% of the service time, and a constant service time. If the requests arrive in regular time intervals, each request will be processed before the next one arrives - there will be no queuing delay. If the requests arrive independently (according to a Poisson arrival process), the the above formula shows that the response time will be 10 times larger than the service time - a queuing delay 9 times larger than the service time.

## Simulation programs: (a) one thread per active object - or (b) a program with future event list

Simulation models: When queuing models are not easily applicable (because the underlying assumptions are not met - or the system is more complicated), one may build simulation models. They can be quite detailed, depending on what aspects are important for the modeling. However, they may also require much CPU power for obtaining statistically valid results. See Lab 11.

There are basically two approaches to writing a simulation program:

1. (Intuitive) The program contains multiple threads (one for each active object being simulated) - each thread will execute a sleep operation whenever the simulated object performs some actions that take some time in real life (but do not need to be simulated in detail). One normally uses a certain accelaration factor x: if the time to perform an action in the real world is T then the sleep time would be t = T / x.
2. (Professional) The simulation program is a sequential program (a single thread) using a queue of future events - the events are ordered according to the future time when they should be performed. See simulation program given in Lab 11. Here is the description of the program, and here is the program (zip-file).

These two approaches use two very different programming structures. Here are some comments:

• The first approach uses a structure where actions are performed by threads, and when an action takes some time the thread will call the sleep primitive to obtain a corresponding delay in real time. The real time of the program execution corresponds to the real time in the modeled system, up to a certain factor (the accelaration factor x). So one second execution time may correspond to a one hour time period in the real (simulated) world. The simulation program must be such that the execution of the actions by the threads correspond to the scheduling of actions in the simulated world. Semaphores may be used to control the access to shared resources with mutual exclusion.
• In the second approach, there is no relationship between the real-time of the execution of the simulation program and the time of the simulated system (called simulated time). The simulated time is stored in a variable (sometime called now) and increased from its current value when no events exists in the future event list that should be processed during the current value of the simulated time. If there is no such event, then the now variable is incremented to take the value of the simulated time when the next event in the future event list should be processed. Since there are no concurrent threads in the simulation program, there are no problems of concurrency. One can use normal variables to store the status of shared resources, since each action in the simulated system is associated with an event, and this action is executed in mutual exclusion with any other event of the simulation..
• In both programs, special attention must be given to the collection of statistical information in order to obtain enough information about the performance of the simulated system.

Cours notes

## Other implementation choices (more general view)

Last update: March 24,  2015