Introduction to distributed algorithms

CS version: SZZ - úvod do distribuovaných algoritmů

Basic notions, problems and solutions, distributed and centric architecture and its differences. Petri nets – basic concepts.Basic principles of modeling with Petri nets. Relation between distributed and service systems. How it was mirrored in lessons learnt from Interim Project?

Distributed computing
[zdroj: wikipedia - Distributed computing, Distributed algorithms] The word distributed in terms such as "distributed system", "distributed programming", and "distributed algorithm" originally referred to computer networks where individual computers were physically distributed within some geographical area. The terms are nowadays used in a much wider sense, even when referring to autonomous processes that run on the same physical computer and interact with each other by message passing. While there is no single definition of a distributed system, the following defining properties are commonly used:     - There are several autonomous computational entities, each of which has its own local memory. - The entities communicate with each other by message passing. Other typical properties of distributed systems include the following:     - The system has to tolerate failures in individual computers. - The structure of the system (network topology, network latency, number of computers) is not known in advance, the system may consist of different kinds of computers and network links, and the system may change during the execution of a distributed program. - Each computer has only a limited, incomplete view of the system. Each computer may know only one part of the input.

Applications
There are two main reasons: (1) the nature of the application may require the use of a communication network, for example, data is produced in one physical location and it is needed in another location; (2)distributed system is beneficial for practical reasons - cost-efficient for a high level of performance (cluster of low-end computers), reliable (no single point of failure), easy to expand. Examples of distributed systems and applications of distributed computing include the following:    Telecommunication networks: Network applications: Real-time process control: Parallel computation:
 * Telephone networks and cellular networks.
 * Computer networks such as the Internet.
 * Wireless sensor networks.
 * Routing algorithms.
 * World wide web and peer-to-peer networks.
 * Massively multiplayer online games and virtual reality communities.
 * Distributed databases and distributed database management systems.
 * Network file systems.
 * Distributed information processing systems such as banking systems and airline reservation systems.
 * Aircraft control systems.
 * Industrial control systems.
 * Scientific computing, including cluster computing and grid computing and various volunteer computing projects.
 * Distributed rendering in computer graphics.

Parallel or distributed computing?
The terms "concurrent computing", "parallel computing", and "distributed computing" have a lot of overlap, and no clear distinction exists between them. The same system may be characterised both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel. Parallel computing may be seen as a particular tightly-coupled form of distributed computing, and distributed computing may be seen as a loosely-coupled form of parallel computing. Nevertheless, it is possible to roughly classify concurrent systems as "parallel" or "distributed" using the following criteria:      In parallel computing, all processors have access to a shared memory. Shared memory can be used to exchange information between processors. In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors.
 * Figure (a) is a schematic view of a typical distributed system; a graph in which each node (vertex) is a computer and each edge (line between two nodes) is a communication link.
 * Figure (b) shows the same distributed system in more detail: each computer has its own local memory, and information can be exchanged only by passing messages from one node to another by using the available communication links.
 * Figure (c) shows a parallel system in which each processor has a direct access to a shared memory.

The complexity of distributed computations
Depending on the type of computation one is considering (sequential, parallel, distributed), the complexity may include the number of processor cycles (time), the number of processors, the number of messages that are sent, and various other quantities that relate to the resources upon which the computation's demands are heaviest. Most often, the measures are as follows: Sequential - time and memory cells employed Parallel - time, shared memory, number of processors Distributed - time, communication demands (number of messages sent or number of bits transmitted), diameter of the network

Distributed algorithms
Distributed algorithms are typically executed concurrently, with separate parts of the algorithm being run simultaneously on independent processors, and having limited information about what the other parts of the algorithm are doing. One of the major challenges in developing and implementing distributed algorithms is successfully coordinating the behavior of the independent parts of the algorithm in the face of processor failures and unreliable communications links. The choice of an appropriate distributed algorithm to solve a given problem depends on both the characteristics of the problem, and characteristics of the system the algorithm will run on such as the type and probability of processor or link failures, the kind of inter-process communication that can be performed, and the level of timing synchronization between separate processes.

Problems and Solutions
 [zdroje: wikipedia - Distributed computing, Distributed algorithms.] [An Introduction to Distributed  Algorithms. Barbosa C. Valmir. The MIT Press. http://fi.muny.cz/data/IV100/An Introduction to Distributed Processing.pdf]  Traditional computational problems take the perspective that we ask a question, a computer (or a distributed system) processes the question for a while, and then produces an answer and stops. However, there are also problems where we do not want the system to ever stop. Examples of such problems include the dining philosophers problem and other similar mutual exclusion problems. In these problems, the distributed system is suppo sed to continuously coordinate the use of shared resources so that no conflicts or deadlocks occur. There are also fundamental challenges that are unique to distributed computing. The first example is challenges that are related to fault-tolerance. Examples of related problems include consensus problems,[33] Byzantine fault tolerance,[34] and self-stabilisation.[35]

Atomic commit
An atomic commit is an operation where a set of distinct changes is applied as a single operation. If the atomic commit succeeds, it means that all the changes have been applied. If there is a failure before the atomic commit can be completed, the "commit" is aborted and no changes will be applied. Algorithms for solving the atomic commit protocol include the two-phase commit protocol and the three-phase commit protocol.  Two-phase commit protocol In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC) is a type of an atomic commitment protocol. It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction (it is a specialized type of consensus protocol). The protocol works in the following manner: one node is designated the coordinator, which is the master site, and the rest of the nodes in the network are designated the cohorts (also called participants, or workers). The two phases of the protocol are (1) the commit-request phase, in which the coordinator sends a query-to-commit message to all cohorts; cohorts try to execute the query; and each cohort replies with an agreement message (cohort votes Yes to commit), if the transaction succeeded, or an abort message (cohort votes No, not to commit), if the transaction failed. (2) The commit phase, in which, based on voting (either "Yes," commit, or "No," abort) of the cohorts, the coordinator decides whether to commit (only if all vote "Yes") or abort the transaction (otherwise), and notifies the result to the cohorts. The cohorts then follow with the needed actions (commit or rollback) with their transactional resources. Disadvantages: it is a blocking protocol - If a cohort has sent an agreement message to the coordinator, it will block until a commit or rollback is received. When the coordinator has sent query-to-commit to the cohorts, it will block until all cohorts have sent their local decision.

Consensus
Consensus is a problem in distributed computing that encapsulates the task of group agreement in the presence of faults - any process in the group may crash at any time. A process is called "correct" if it does not fail at any point during its execution. Every process "proposes" a value; the goal of the protocol is for all correct processes to choose a single value from among those proposed. A process must eventually "decide" a value by passing it to the application on that pro cess that invoked the Consensus protocol. More precisely, a Consensus protocol must satisfy the four formal properties below.  Termination: every correct process decides some value. Validity: if all processes propose the same value v, then every correct process decides v. Integrity: every correct process decides at most one value, and if it decides some value v, then v must have been proposed by some process. Agreement: if a correct process decides v, then every correct process decides v. Impossibility In an asynchronous system, where processes have no common clock and run at arbitrarily varying speeds, the problem is impossible to solve if one process may crash and processes communicate by sending messages to one another or by reading and writing shared variables. However, there exist algorithms, even under the asynchronous model, that can reach consensus with probability almost one. A typical algorithm for solving consensus is the paxos algorithm (moc dlouhé na popsání zde a moc podrobné).

Leader election
In distributed computing, leader election is the process of designating a single process as the organizer of some task distributed among several computers (nodes). Before the task is begun, all netwo rk nodes are unaware which node will serve as the "leader," or coordinator, of the task. After a leader election algorithm has been run, however, each node throughout the network recognizes a particular, unique node as the task leader. The leader election problem can only be solved if every node's identification is unique. Most approaches to leader election take the leader to be the candidate with greatest identification. The leader election problem is very closely related to another problem, namely the problem of establishing a minimum spanning tre e (neboli kostry, která je v případě neorientovaného grafu vždy stromem) on a graph (the leader is then the core of the tree). Jak efektivně najít kostru grafu je popsáno zde: http://courses.csail.mit.edu/6.852/05/papers/p66-gallager.pdf (moc složíté na popis zde).

Mutual exclusion
Aim is to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a proce ss or thread accesses a common resource. There are both software and hardware solutions for enforcing mutual exclusion. The different solutions are shown below. hardware solutions   - on a uniprocessor system a common way to achieve mutual exclusion inside kernels is to disable interrupts. - in several processors sharing memory, an indivisible test-and-set of a flag could be used in a tight loop to wait until the other processor clears the flag software solutions Locks, Reentrant mutexes, Semaphores, Monitors, Message passing, Tuple space. Many fo rms of mutual exclusion have side-effects. For example, classic semaphores permit deadlocks, in which one process gets a semaphore, another process gets a second semaphore, and then both wait forever for the other semaphore to be released. No perfect scheme is known.

Reliable Broadcast
Reliable broadcast is a communication primitive in distributed systems. A reliable broadcast is defined by the following properties: Validity - if a correct process sends a message, then some correct process will eventually de liver that message Agreement - if a correct process delivers a message, then all correct processes eventually deliver that message Integrity - every correct process delivers the same message at most once and only if that message has been sent by a process A reliable broadcast can have sequential, causal or total ordering.

Replication
Whether one replicates data or computation, the objective is to have some group of processes that handle incoming events. If we replicate data, these processes are passive and operate only to maintain the stored data, reply to read requests, and apply updates. When we replicate computation, the usual goal is to provide faulttolerance or load balancing. For example, a replicated service might be used to control a telephone switch, with the objective of ensuring that even if the primary controller fails, the backup can take over its functions. But the underlying needs are the same in both cases: by ensuring that the replicas see the same events in equivalent orders, they stay in consistent states and hence any replica can respond to queries. Replication models in distributed systems A number of widely cited models exist for data replication, each having its own prope rties and performance: 1.Transactional replication. This is the model for replicating transactional data, for example a database or some other form of transactional storage structure. The one-copy serializability mode l is employed in this case, which defines legal outcomes of a transaction on replicated data in accordance with the overall ACID properties that transactional systems seek to guarantee. 2.State machine replication. This model assumes that replicated process is a deterministic finite state machine and that atomic broadcast of every event is possible. It is based on a distributed computing problem called distributed consensus and has a great deal in common with the transactional replication model. This is sometimes mistakenly used as synonym of active replication. State machine replication is usually implemented by a replicated log consisting of multiple subsequent rounds of the Paxos algorithm. This was popularized by Google's Chubby system, and is the core behing the open-source Keyspace data store. 3.Virtual synchrony. This computational model is used when a group of processes cooperate to replicate in-memory data or to coordinate actions. The model defines a new distributed entity called a process group. A process can join a group, which is much like opening a file: the process is added to the group, but is also provided with a checkpoint containing the current state of the data replicated by gro up members. Processes can then send events (multicasts) to the group and will see incoming events in the identical order, even if events are sent concurrently. Membership changes are handled as a special kind of platform-generated event that delivers a new membership view to the processes in the group. Levels of performance vary widely depending on the model selected. Transactional replication is slowest, at least when one-copy serializability guarantees are desired (better performance can be obtained when a database uses log-based replication, but at the cost of possible inconsistencies if a failure cause s part of the log to be lost). Virtual synchrony is the fastest of the three models, but the handling of failures is less rigorous than in the transactional model. State machine replication lies somewhere in between; the model is faster than transactions, but much slower than virtual synchrony.

Resource allocation
Resource allocation is used to assign the available resources in an economic way. Solutions: Wireless Channel Allocation Using An Auction Algorithm (pdf), Proportional Share Scheduling

Spanning tree generation
The classic spanning tree algorithms are depth-first search (DFS) and breadth-first search (BFS). Spanning Tree Protocol The most common distributed algorithm used by OSI link layer devices to create a spanning tree using the existing links as the source graph in order to avoid broadcast storms. Steps: - Select a root bridge - Determine the least cost paths to the root bridge - Disable all other root paths - Modifications in case of ties Evolution: Rapid Spanning Tree Protocol, which provides for faster spanning tree convergence after a topology change. Multiple Spanning Tree Protocol configures a separate Spanning Tree for each VLAN (virtual LAN) group and blocks all but one of the possible alternate paths within each Spanning Tree.

Other Problems
Distributed search. Distributed snapshots - a technique for recording global states during the execution of an asynchronous algorithm. Network synchronization - how to synchronize distributed computations in the asynchronous nature of distributed systems? Synchronizers can be used to run synchronous algorithms in asynchronous systems. Logical clocks provide a causal happened-before ordering of events. Clock synchronization algorithms provide globally consistent physical time stamps. Self-stabilization and stability detection (for recovery from faults, termination and deadlock detectio n). Finding a maximum flow in a graph (for the operation of distributed-memory systems). Deterministic re-execution of distributed algorithms in an asynchronous setting and detecting breakpoints during these executions (for program debugging). Distributed simulation of physical systems (for modelling natural systems occurring in various scientific fields).

Basic algorithms
Propagation of Information problem (or PI problem) and the Propagation of Information with Feedback problem (or PIF problem) = broadcasting information originally held by only a subset of nodes. Na řešení tohoto problému existuje mnoho algoritmů, např. "wave" algoritmy, ve kterých každý uzel pošle informaci všem sousedům, nebo sofistikovanější algoritmy, které nejdříve graf rozdělí na strom/y a uzly poté posílají informaci jen na větve, které z nich vycházejí. computing the shortest distances (in terms of numbers of edges) between all pairs of nodes.

Distributed and centric architecture and its differences
Distributed systems differs from centralized systems in a number of essential aspects: Lack of knowledge of the global state of the system. Collectingstate information may be possible but may not be upto date. Lack of global time frame. No total order of events. Nondeterminism. For example the order of arrival of requests to a server. Advantages and disadvantages of distributed systems: - desired level of performance and price by using a cluster of several low-end computers - more reliable than a non-distributed system, as there is no single point of failure - easier to expand - coordination of the use of shared resources is a problem - communication demands might be high - see above mentioned problems

Architectures
Various hardware and software architectures are used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely-coupled devices and cables. At a higher level, it is nece ssary to interconnect processes running on those CPUs with some sort of communication system. Distributed programming typically falls into one of several basic architectures or categories: client–server, 3-tier architecture, n-tier architecture, distributed objects, loose coupling, or tight coupling. Client–server: Smart client code contacts the server for data then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change. 3-tier architecture: Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier. n-tier architecture: n-tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers. Tightly coupled (clustered): refers typically to a cluster of machines that closely work together, running a shared process in parallel. The task is subdivided in parts that are made individually by each one and then put back together to make the final result. Peer-to-peer: an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers. Space based: refers to an infrastructure that creates the illusion (virtualization) of one single address-space. Data are transparently replicated according to application needs. Decoupling in time, space and reference is achieved. Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in a master/slave relationship. Alternatively, a "database-centric" architecture can enable distributed computing to be done without any form of direct inter-process communication, by utilizing a shared database.

Petri nets
Modeling languages can be used to specify: Executable modeling languages applied with proper tool support are expected to automate system verification, validation, simulation and code generation from the same representations (in the future ).  Petri nets were invented in August 1939 by Carl Adam Petri – at the age of 13 – for the purpose of describing chemical processes. Like industry standards such as UML activity diagrams, BPMN and EPCs (Event-driven process chain), Petri nets offer a graphical notation for stepwise processes that include choice, iteration, and concurrent execution. Unlike these standards, Petri nets have an exact mathematical definition of their execution semantics, with a well-developed mathematical theory for process analysis.
 * system requirements,
 * structures and
 * behaviors.

Modeling with Petri nets


Relation between distributed and service systems
