SEG-2106 "Software Construction"

Chapter 3-1: Concurrency

Additional readings:

[Sebesta] chapter 12 (see printed course notes) Please read Sebesta 12.1 (Introduction), 12.2 (Introduction to subprogram-level concurrency), 12.3 (Semaphores), 12.4 (Monitors), and 12.7 (Java threads).

Hardware and software concurrency

Physical concurrency

A single CPU of a computer executes one instruction after the other. There is no concurrency. If we have several CPUs running in parallel (either within the same computer or in separate computers) we have some form of concurrency (also called parallelism). Concurrency realized at the hardware level is called physical concurrency. Physical concurrency is related to parallel computer architectures:

A normal computer contains at least one CPU and several peripharel processing units that look after input/output with the external world, such as video display, sound input and output, disc access, etc. Also the computer clock is a separate hardware unit that runs concurrently with the CPU and is used to control the scheduler of the operating system which allocates time slots to the different programs running on the computer (see below).
Various schemes have been designed to build hardware architectures with multiple CPUs. The purpose is to efficiently execute various computing tasks in parallel. They can be classified as follows:
- SIMD (single instruction - multiple data): These are computers where multiple CPUs work on different data, but all execute the same sequence of instructions. Vector processors belong to this class.
- MIMD (multiple instructions - multiple data): Here the different CPUs execute different sequences of instructions (and work on different sets of data). They can be further classified as
  - shared memory architectures: Here all CPUs access the same memory (it is easy to share data between the different processes executed by the CPUs)
  - distributed architectures: Here each CPU has its own (local) memory (in a sense, each CPU is a more or less complete computer); sharing of data between the processes executed by the different CPUs is less efficient (usually realized through some network through which the different CPUs are connected.
Concurrency at the level of instruction execution: This is realized when a single CPU, which consists of several parallel operating components, is organized in such a way that several instructions to be executed are processed in parallel, usually in some form of pipeline processing where each instruction goes through the different components each doing specific aspects of the instruction processing - such as (1) fetching the next instruction, (2) feching the operands, (3) performing an arithmetic operation.

Concurrency in software

The book by Sebesta mentions the following levels of concurrency:

Instruction concurrency: this is actually concurrency at the hardware level, called "Concurrency at the level of instruction execution" above.
Statement concurrency: This means that several statements of a program are executed concurrently on some physical parallel computer architecture. Usually, one considers a conventional (sequential - non-concurrent) programming language, such as Fortran or C. Certain optimizing compilers can automatically determine which statements are independent of one another and can be executed concurrently without changing the result of the program execution. They can also take the existing parallel hardware architecture into account (on which the program is supposed to be executed) in order to generate code for this hardware architecture which provides for concurrent statement execution. Note: The programmer is normally not much involved for this kind of concurrency, since it is dealt with automatically by the optimizing compiler.
Unit (or sub-program) concurrency: Here one talks about concurrent "units", which are defined by the programming language and are intended to be executed concurrently. Programming languages with concurrent "units" allow the programmer to design programs which include a number of concurrent activities and the concurrency structure of a program depends very much on the application. A typical application area where concurrent "units" are very useful is simulation of process control or transportation systems, where the simulated objects evolve in real time concurrently. This level of concurrency is also called logical concurrency. In contrast to physical concurrency, it does not imply that the logically concurrent "units" are really executed in physical concurrency. Often they are executed in an interleaved manner (time-sharing, see below). Note: Instead of "unit", one often uses the words process, thread, task, or simply concurrent statements. Note: This kind of concurrency must be understood by the programmer that uses the programming language that supports concurrent threads (e.g. Java). In this course, we are mainly interested in this kind of concurrency.
Program (or task) concurrency: Often several programs are executed concurrently (in an interleaved manner) on a computer within a time-sharing operating system. The CPU of the computer is shared among the programs by allocating the CPU to each program for a short time slice at a time. The allocation is done by a scheduler (see Sebesta's book). The same technique of time sharing is usually also used for realizing concurrent "units" within a given program (see above).

The sharing of a CPU among several concurrent threads or programs: Each operating system has a scheduler which allocates short time periods of the CPU to the different active programs running on the computer. At the end of such a time period, it determines which program should obtain the CPU in the next time period. A similar schedule is included in each program that supports several concurrent threads - this thread scheduler will determine how the time period, allocated by the operating system scheduler to the program, is shared among the threads within that program. For instance, the program running a Java JM has a scheduler to share the CPU among the threads of the Java program (if it contains threads in addition to the main thread).

Note that the behavior of a scheduler is not always defined at all levels of details; e.g. for the Thread scheduler in a Java virtual machine, not all aspects are defined. Normally, one distinguishes the following states for a process (thread, task, ...) that is scheduled by the scheduler (see diagram below):

New: the process was just created
Running: the process is allocated to a processor (e.g. CPU) that executes the instructions of the process
Ready: the process is ready to be executed, but no processor is available. When a processor terminates the execution of a process (either because the time unit has elapsed, or the process has to wait for some input/output operation to terminate) the processor will select another process from the "ready queue" to be executed. If there are several processes waiting in the ready state, it may not be defined which one will be selected, although priorities (if defined) must be taken into account.
Blocked: the process waits for an input/output operation to be completed before getting into the Ready state.
Terminated: the process has no more instructions to be executed.

diagram

Dependencies between concurrent processes

Different kinds of dependencies

In the simplest case, concurrent processes perform their processing independently from one another. But often they have to coordinate their work, which is done through interprocess communication and synchronization. One may classify their interrelationship as follows:

Disjoint processes: They are independent, no synchronization (except, possibly, through their time-shared execution by the same CPU; but this is not seen at the logical level of the program which contains these processes).
Competitive synchronization: They are logically independent of one another, but their share some common resource (which is also described by the concurrent program which contains these processes). Shared resources usually require some relative synchronization of the accesses performed by different processes (typical example: readers and writers accessing a shared database).
Cooperative synchronization: The logical structure of the processing of one process depends on information obtained from another process. A typical example is the "producer - consumer" system, which often contains a queue which passes objects (messages) from the producer to the consumer (see the "Monitors and concurrency in Java" topic explained in sections 12.4 and 12.7 of Sebesta's book).

Note: Competitive and cooperative synchronization and communication can be realized through message passing primitives or through shared variables (including some basic synchronization primitive containing as an atomic operation the testing and (conditional) updating of a variable. Such primitives can be provided by an operating system for communication between concurrently executing programs, or by the hardware of a concurrent computer architecture. Examples of synchronization primitives for shared variables are a machine instruction "test and set" and the semaphore concept described in Section 12.3 in Sebesta's book.

The non-deterministic nature of concurrent systems (remembering what we discussed in Chapter 1-3)

Concurrent processes give rise to a large number of possible interaction sequences because the execution of sequences of the different processes may be interleaved in many ways. In the case of independent concurrency (as for instance defined by parallel regions of UML State diagrams) one assumes by default that there is no interrelationship between the concurrent processes (no communication, no shared resources, nor shared variables). This has the advantage, that the behavior of each process can be analysed independently of the other processes. However, the number of possible overall execution sequences is the largest, because the actions of the different processes may be interleaved arbitrarily in the global execution trace. An example is shown in the following diagrams, where (a) and (b) represent two independent processes, defined in the form of LTSs, and (c) shows the resulting global behavior (obtained by reachability analysis - which is simple in this case). In this example, one assumes that the LTS transitions take no time, therefore the execution of two transitions of two different LTSs can be assumed to be sequential, one after the other, and when the LTS are independent and two transitions of different LTSs are enabled, then their order of execution is not determined; there is therefore some non-determinism introduced by this form of concurrency. This is called interleaving semantics.

The main point is that one cannot predict which execution path will be taken by the interleaving concurrent processes. If, let us assume, there is some dependency between the two processes which leads to some possible difficulties when the transition z is taken from the global state (3, C), then it is very difficult to test the system to ensure that this dependency is correctly implemented. We cannot control the system to go through a particular path. It is worse if one does not know where a problem may reside in an implementation - the chances to execute a transition path that encounters the problem is very small indead during normal testing. Conclusion: Concurrent systems are very difficult to test for correctness. Therefore one should try to verify by logical reasoning.

Some notes on the "message passing" concept

"Message passing" means that one process sends a message to another process and then continues its local processing. The message may take some time to get to the other process and may be stored in the input queue of the destination process if the latter is not immediately ready to receive the message. Then the message is received by the destination process, when the latter arrives at a point in its local processing where it is ready to receive messages. This is called asynchronous message passing (because sending and receiving is not at the same time, although the reception is necessarily after the sending).

Asynchronous message passing implies the following synchronization between the two processes:

Blocking send and receive operations: The receive will be blocked if it arrives at the point where it may receive messages and no message is waiting. (The sender may get blocked if there is no room in the message queue between the sender and the receiver; however, in many cases, one assumes arbitrary long queues, which means that the sender will never be blocked).
Non-blocking send and receive operations: In this case, the send and receive operations always return immediately, returning a status value which could indicate that no message has arrived at the receiver. Therefore the receiver may test whether a message is waiting and possibly do some other processing.

In the case of synchronous message passing (as for instance realized in the Ada language, see Sebesta's book, Sections 12.5.1 through 12.5.3), one assumes that sending and receiving takes place at the same time (there is no need for an intermediate buffer). This is also called rendezvous and implies closer synchronization: the combined send-and-receive operation can only occur if both parties (the sending and receiving processes) are ready to do their part. Therefore the sending process may have to wait for the receiving process, or the receiving process may have to wait for the sending one (as in the case of asynchronous message passing).

Ressource sharing

Need for mutual exclusion

The example of two uses simulaneously updating the salaries of employees:
- One administrator increments all salaries by 5% : salary = salary * 1.05;
- The other administrator adds a 100$ bonus to employee X: salary = salary + 100;
- What are the possible outcomes for employee X, if initially he had a salary of 5000$.
Notion of transactions in database systems (possibly distributed). A transaction must be executed completely, or not at all. Several transactions may be executed concurrently, but must follow certain scheduling rules that ensure that the net result is equivalent to serial execution (serializable schedule). A transaction must leave the database in a consistent state. In case of a system failure, the effect of partially executed transactions must be "rolled back". These properties of transactions are called ACID properties (atomicity, consistency, isolation, durability).
Example of money transfer: Transfer of 1000$ from account A in bank BankA to account B in BankB. The transaction requires access to both accounts, residing on two different computers. It should be atomic - execute completely or be aborted (even if one of the computers fails during the operations). The mutual exclusion should include both accounts - they both must be reserved before they can be used, and later freed. - The resource for which mutually exclusive access is reserved may include many units, e.g. a whole database, or several databases.

Ressource reservation

Ressource reservation is a scheme for realizing mutual exclusion; may be realised through semaphores or monitors (as programmable in Java)

Historical note: Dijkstra (at the University of Eindhoven in Holland) constructed in 1965 THE operating system (probably the first multi-tasking operating system); and he had problems sharing the different computer resources (CPU, memory, disk, etc.) among the different concurrent applications. He programmed in machine language and noticed that, given the list of instructions available on the computer, it was impossible to realize reliable ressource sharing among concurrent processes. He then proposed a new instruction, sometimes called "test and set", which he introduced as part of concept of a semaphore. The semaphore is an object that supports two operations: P (for reserve: it tests whether the value of the semaphore is > 0; if that is the case, the value is decremented by one, otherwise the process has to wait) and V (for liberate: the value of the semaphore is incremented and one of the waiting processes (if there is one) is allowed to try its P operation again). An application must call the P operation before accessing the ressource associated with the semaphore, and must execute the V operation when it does not need the ressources any more.
- Conclusions:
  - A semaphore is a very simple primitive for programming and controlling concurrency
  - One cannot control concurrency with operations (machine instructions) that are solely conceived for sequential programs
  - However, much later, Lamport described the Bakery algorithm (see Wikipedia) that provides mutual exclusion based on individual read and write operations on shared variables (where the write operations must be atomic).
More complex concurrency rules: Often one wants to describe more complex concurrency rules, beyond simple mutual exclusion. Examples are: (a) the readers-writers-problem (where several readers may access the ressource concurrently), and (b) the tutorial example of the dining philosophers proposed by Dijkstra (each philosopher needs to reserve the left and right fork in order to eat; there may be conflicts and the possibility of deadlocks due to a cyclic "wait-for graph". Here is the code for a philosopher who is initialized with the parameter i which indicates at what position he is on the table.
Deadlocks
- Necessary conditions:
  - mutual exclusion
  - hold and wait attitude: while keeping a given ressource, a process may ask for another ressource
  - no preemption
  - possibility for circular waiting chains
- Strategies for dealing with deadlocks
  - prevent: prevent the above conditions (at least one)
  - avoid: organize the system such that no circular waiting will occur. One example of such a scheme is the so-called two-phase ressource reservation: the ressources are ordered, and each process must first reserve all ressources required for its task, and request the reservations in the order of the ressources; afterwards, all ressources must be released.
  - deadlock detection and recovery: this requires a mechanism to detect the occurrence of a deadlock and a scheme for un-doing the actions of one of the processes involved in the deadlock (which may retry is actions later). In the case of transaction management, this means that one of the transactions involved in the deadlock cycle must be aborted.

The monitor concept

Historical note: 1967 Simula (first OO language, defined by Dahl in Norway); 1972 Pascal (defined by Wirth in Zurich, Switzerland); 1976 Concurrent Pascal (defined by Brinch-Hansen, a danish emigrant to the US) : This language combines the simplicity of Pascal with the concept of Class from Simula and introduces the concept of controlling ressource sharing for concurrent processes by Monitors.
A monitor is an instance of a Class with the following properties:
- The execution of concurrent calls of methods are mutually excluded in time (in Java, this is indicated by using the synchronized keyword)
- The possibility of using within the body of methods the operations wait (terminating temporarily the mutual exclusion until some programmer-defined condition - depending on internal monitor variables and method parameters - is satisfied), and notify (executed just before terminating a method call - used to activate any waiting process that will again check its waiting condition which may have been modified by the process that executed the notify operation).
A semaphore is a special case of a monitor.
In Java, each object instance (that inherits from the Object class) can be used as a monitor; the keyword synchronized indicates the need for mutual exclusion for a given method and the operations wait and notify (or notifyall) are defined on any instance of Object.

Concurrency in Java (Please read in the book by Sebesta)

Examples:

A FIFO queue with producer and consumer process are given in the book by Sebesta (see copy (a) of the Queue class (note: there is a bug in the incrementation of the pointer variables, such as nextIn) and (b) of the producer and consumer classes that call an instance of Queue. At the end, there is the main program that creates instances of these objects and passes the Queue instance to the producer and consumer when they are constructed.
Note: A very similar example is provided by SUN's Java tutorial under the title "Guarded Blocks").
Here are some additional Java programs using semaphores and other kinds of monitors.

Some additional questions:

How could one implement the monitor concept in Java in the Java Virtual Machine (suppose you have to design such a machine)
Program the example of readers and writers in different versions (different versions of priorities).

Initialement écrit: 22 mars 2003; mise à jour: 19 mars 2004; révision typographique: 4 avril 2005; translated to english: March 3, 2008, revised 2009, 2010, 2011, 2013