PhD Thesis 2015

Verification of Message Passing Concurrent Systems

  • E. D'Osualdo

Winner of the 2016 CPHC/BCS Distinguished Dissertation award

Short Abstract

As our world keeps delegating more and more critical tasks to networks of interconnected machines, developing a strong theoretical framework for their design and analysis is of paramount importance. This dissertation is concerned with the development of fully-automatic methods of verification, for message-passing based concurrent systems.

First, we define a sound parametric analysis for Erlang, an industrial strength programming language. Thanks to a combination of abstraction and infinite-state model checking, our prototype implementation, called Soter, is able to prove properties of Erlang programs such as unreachability of error states, mutual exclusion, or bounds on mailboxes.

The resulting analysis, however, has a blind spot: it is not able to precisely represent reconfigurable systems, i.e. systems where the communication network changes over time. To fix this, the second part of the thesis develops a novel type system for the analysis of the communication topology of pi-calculus processes.

Tools

I developed two tools as a result of my research:

Soter (Web) James Bound

The first is Soter, a static analyser for Erlang written in collaboration with Jonathan Kochems. The second is James Bound, an Haskell library to manipulate π-calculus programs and which implements my type system for Hierarchical systems.

Summary

This dissertation is concerned with the development of fully-automatic methods of verification, for message-passing based concurrent systems.

In the first part of the thesis we focus on an industrial strength programming language: Erlang. Its concurrency model is a particularly clean and succinct concurrent extension of a functional language. We propose an automatic abstraction technique which is able to exploit infinite state model checking to prove safety properties of Erlang programs. In devising a verification method for Erlang programs we face numerous challenges. The sequential core language is a dynamically typed, higher-order functional language with pattern-matching algebraic data types. These features alone pose serious challenges for automatic analysis tools and in fact many of the efforts in the Erlang analysis literature concentrate on taming the complexity of the sequential fragment. In addition, we aim at modelling and accurately analysing the concurrency model of Erlang, which is based on asynchronous message-passing. This adds two highly non-trivial features: unbounded dynamic spawning of processes and unbounded communication buffers (called mailboxes, each associated with a process).

We focus on the core features of Erlang, which we formalise by defining the LambdaActor language. We give semantics to LambdaActor in a form that supports the application of abstract interpretation and systematically derive a provably sound parametric control-flow analysis. The analysis is then used to bootstrap the construction of an abstract model of the semantics that we call Actor Communicating System (ACS). ACS are given semantics by means of Vector Addition System (VAS), also known as Petri nets, which have rich decidable properties. We exploit a class of properties called coverability queries to prove properties of Erlang programs such as unreachability of error states, mutual exclusion, or bounds on mailboxes. After showing how the ACS model is extracted from the analysis and establishing the relation between the program’s and the ACS’ semantics, we discuss limitations and possible extensions of the analysis.

To assess the empirical effectiveness of the approach, we constructed Soter, a tool implementation of the verification method, thereby obtaining the first fully-automatic, infinite-state model checker for a core concurrent fragment of Erlang.

Then, in the second part of the thesis, we analyse one of the major shortcomings of the proposed abstraction: the representation of the identity of unboundedly many processes participating in interactions is too imprecise. To remedy this problem, we formalise the issue using the pi-calculus, a well-known process algebra which directly and succinctly models name-passing features. In contrast to those of simpler process calculi such as CCS, the communication topology of a pi-calculus system, i.e., the graph linking processes that share channels, is dynamically evolving. From a verification point of view, proving properties of pi-calculus terms is challenging: the full pi-calculus is Turing-complete. As a consequence, a lot of research effort has been devoted to defining fragments of pi-calculus that could be verified automatically while retaining as much expressivity as possible. To date, the most expressive fragment that has decidable verification problems is the depth-bounded pi-calculus, introduced by Roland Meyer. Roughly speaking, the depth of a pi-calculus term can be understood as the maximum length of the simple (i.e. non looping) paths in the communication topology of the term. A term is depth-bounded if there exists a natural number k such that the maximal nested depth of restriction of each reachable term is bounded by k.

Unfortunately, depth boundedness is a semantic property of a Turing-powerful language, so it is undecidable whether a given arbitrary pi-calculus term is depth-bounded. It has recently been proven that the problem becomes decidable if the bound k is fixed but the complexity is very high (the known lower bound is non-primitive recursive). It is only safe to apply algorithms for the verification of depth-bounded systems on programs that can be proven depth-bounded. Two approaches are possible:

  1. identify a well-behaved fragment of depth-bounded systems with decidable membership
  2. make the analysis capable of abstracting away sources of unboundedness in depth so that the abstract model is guaranteed to have a depth bounded by some statically determined measure.

In Part II of the thesis we show a possible attack to this problem following the first approach, and hint at how one can realise the second.

The main contribution presented in Part II is a novel fragment of pi-calculus which we call typably hierarchical, which is a proper subset of the depth-bounded pi-calculus. This fragment is defined by means of a type system with decidable checking and inference.

The type system itself is based on the novel notion of T-compatibility, where T is a finite forest. Consider a simple depth-bounded system implementing a client-server pattern. The system consists of a number of servers, each with its cluster of clients; each client may in turn be waiting to receive a number of private replies from its server. This description suggests a hierarchical relationship between the names representing servers, clients and data, although the communication topology itself does not form a tree. The concept of T-compatibility formalises this view of the system by associating to each name a base type from a finite set; a finite forest T of base types specifies the hierarchical relationship between them. A term is said to be T-compatible when it respects the hierarchy. If each configuration reachable from a term P is T-compatible (i.e. T-compatibility is an invariant) then P is depth-bounded.

While invariance of T-compatibility is undecidable, the forest structure enables the design of a type system that guarantees the invariance of T-compatibility under reduction. We define such type system and prove it correct. Furthermore type inference is decidable, so that the type system can be used to infer a hierarchy T, with respect to which the input term is hierarchical. To the best of our knowledge, our type system is the first that can infer a shape invariant of the communication topology of a system. The typing rules that ensure invariance of T-compatibility under reduction, arise from a new perspective of the pi-calculus reaction, one that allows compatibility to a given hierarchy to be tracked more readily.

After presenting the type system and proving its soundness, the thesis concludes with a study of the expressivity of hierarchical systems in relation to other known models of computation including a class of automata on infinite alphabets.