Transaction commit protocols play a pivotal role in supporting scalability and availability guarantees of today's large-scale transactional databases. Their theoretical properties are traditionally captured and analysed through Atomic Commitment Problem (ACP), introduced by Gray in the early 70s. Roughly, ACP is concerned with ensuring all-or-nothing (atomicity) semantics for transactions (the 'A' in the famous ACID acronym). It is formulated as a one-shot agreement problem in which a single COMMIT or ABORT decision must be output for a given transaction depending on the COMMIT or ABORT votes provided by a collection of fail-prone sites holding the data objects involved in the transaction. We argue that ACP is too simplistic to adequately capture complexities of transaction commit in modern transactional data stores. In particular, its one-shot nature ignores the fact that a decision to commit or abort an individual transaction is not taken in isolation, but rather influenced by how conflicts with other simultaneously executing transactions are handled by the concurrency control mechanism, which ensures the isolation of transactions - 'I' in ACID. The lack of a formal framework capturing such intricate interdependencies between mechanisms for atomicity and isolation in real-world databases makes it difficult to understand them and prove correct. We address this by proposing a new problem, called Transaction Certification Service (TCS), that formalises how the classical Two-Phase Commit (2PC) protocol interacts with concurrency control in many practical transaction processing systems in a way parametric in the isolation level provided. In contrast to ACP, TCS is a multi-shot problem in the sense that the outcome of a transaction must be justified by the entire history of the past transactions rather than a single set of input votes. We then use TCS to identify core algorithmic insights underlying transaction commit in several widely used transactional data stores (such as Google Spanner), and leverages these to develop a new fault-tolerant multi-shot transaction commit protocol. This protocol is theoretically optimal in terms of its time complexity, and can serve as a reference implementation for future systems.
Joint work with Alexey Gotsman, IMDEA Software Institute, Spain.
The π-calculus, viewed as a core concurrent programming language, has been used as the target of much research on type systems for concurrency. In this paper we propose a new type system for deadlock- free session-typed π-calculus processes, by integrating two separate lines of work. The first is the propositions-as-types approach by Caires and Pfenning, which provides a linear logic foundation for session types and guarantees deadlock-freedom by forbidding cyclic process connections. The second is Kobayashi’s approach in which types are annotated with priorities so that the type system can check whether or not processes contain genuine cyclic dependencies between communication operations. We combine these two techniques for the first time, and define a new and more expressive variant of classical linear logic with a proof assign- ment that gives a session type system with Kobayashi-style priorities. This can be seen in three ways: (i) as a new linear logic in which cyclic structures can be derived and a Cycle-elimination theorem generalises Cut-elimination; (ii) as a logically-based session type system, which is more expressive than Caires and Pfenning’s; (iii) as a logical foundation for Kobayashi’s system, bringing it into the sphere of the propositions- as-types paradigm.
The Yoneda Lemma has been called “the most important result in category theory”. Its formal presentation is impressively concise, but can also be daunting. In fact, it has a number of very practical applications, such as: proofs by indirect inequality; accumulating parameters in list functions; closure conversion and continuation-passing style; and even artificial intelligence, philosophy, and art. I will sketch these applications, without getting into too much category theory. (This talk is based on the introduction to my ICFP 2018 paper with Guillaume Boisseau, which is mostly about another application, to profunctor optics; but I won’t discuss that application here.)
Link to paper and video presentation.
Linear types, derived from Girard's Linear Logic, provide a means to expose safeinterfaces to stateful protocols and language features, e.g. channels and filehandles. Data is delineated into two camps: unconstrained values which can beconsumed or discarded arbitrarily and 'resources' which must be used exactlyonce. Bounded Linear Logic (BLL) [1], allows tracking a more nuanced notion ofnonlinearity via the natural numbers semiring which is baked into its proofrules. Our system of Graded Modal Types generalises BLL by parameterising overthe resource algebra, thus allowing a wide array of analyses to be captured inthe type system.In this talk we will explore how graded modal types and linearity convenientlyextend our typical toolkit of parametric polymorphism and indexed types,allowing us to reason about pure and effectful programs in a novel,resource-oriented, manner. Session typed channels and mutable arrays are justtwo examples of programming idioms that can be provided in such a languagewithout having to give up safety and compositionality. We will see this inaction via Granule [2], our implementation of a functional language with a typesystem which supports all these features.
1. Girard, Scedrov, Scott (1992)
2. Project page
We present the first modular semantics for weak memory concurrency that avoids thin-air reads, and provides data-race free programs with sequentially consistent semantics (DRF-SC). We remove a major limitation on the usefulness of the standard DRF-SC result: race-free components behave according to sequential consistency, even when composed with racy code. Like others, we interpret programs as event structures with a set of relations. In contrast, our relations are not defined (only) for whole-program event structures, but instead are built up compositionally, following the structure of the program. We evaluate our semantics by running it on a set of key litmus test examples drawn from the literature on memory models for Java and C-like languages. These examples indicate that thin-air reads are forbidden, that locks provide usable synchronisation, and that our semantics supports compiler optimisations: one forbidden by a prior state-of-the-art model, and another performed by the Hotspot compiler yet (erroneously) forbidden by the Java memory model.
This talk presents work in progress, comprising the modelling in CSP and verification using FDR, of designs for selected concurrent datatypes. The case studies are a list based stack and queue, and an array based stack, for which the implementations are characterised by requiring stamped items (pointers or data values) for correctness. The models exploit techniques from earlier security protocol verification work by others, which allow an infinite supply of distinct stamps to be modelled, and hence verified, using a finite set.
Transient gradual typing imposes run-time type tests that typically cause a linear slowdown in programs' performance. This performance impact discourages the use of type annotations because adding types to a program makes the program slower. A virtual machine can employ standard just-in-time optimizations to reduce the overhead of transient checks to near zero. These optimizations can give gradually-typed languages performance comparable to state-of-the-art dynamic languages, so programmers can add types to their code without affecting their programs' performance.
The problem of giving a formal semantics for a weak memory model of a high-level programming language is known to be challenging. Recently, Chakraborty & Vafeiadis proposed a new solution to this problem [1]. Their model called WeakestMO defines the semantics of a concurrent program as an event structure. The event structure is a graph, that encodes all possible executions of a program. For a weak memory model of a programming language to be use- ful, several results about this model should be established. One of them is the presence of an effective compilation scheme from high- level language to the low-level assembly languages of modern hardware architectures. Chakraborty & Vafeiadis have provided only pen and paper proofs of compilation correctness for a weaker version of their model. Our work is dedicated to the mechanization of WeakestMO model, together with compilation correctness arguments, using the Coq proof assistant. We have proven the correctness of compilation to x86, ARMv8 and POWER hardware memory models, by using the IMM (intermediate memory model) [2], a recently proposed model that abstracts over details of the hardware models.
I will talk about Effpi: an experimental toolkit for strongly-typed concurrent and distributed programming in Dotty (a.k.a. the future Scala 3 programming language), with verification capabilities based on type-level model checking. Effpi's key ingredient is a novel blend of behavioural types (from pi-calculus theory) infused with dependent function types (from Dotty). Effpi addresses one of the main challenges in developing and maintaining concurrent programs: many concurrency errors (like protocol violations, deadlocks, livelocks) are often spotted late, at run-time, when applications are tested or (worse) deployed in production. Effpi aims at finding such problems early, when programs are written and compiled, through a lightweight verification approach that integrates with Dotty and its toolchain. Effpi provides: (1) a set of Dotty classes for describing communication protocols as types; (2) an embedded DSL for concurrent programming, with process-based and actor–based abstractions; (3) a Dotty compiler plugin to verify whether protocols and programs enjoy desirable properties, such as deadlock-freedom; and (4) an efficient run-time system for executing Effpi’s DSL-based programs. The combination of (1) and (2) allows the Dotty compiler to check whether an Effpi program implements a desired protocol/type; and this, together with (3), means that many typical concurrent programming errors are found and ruled out at compile-time. Further, (4) allows to run highly concurrent Effpi programs with millions of interacting processes/actors, by scheduling them on a limited number of CPU cores. In this talk, I will provide an overview of Effpi, and its theoretical foundations; then, I will discuss its future developments.
Joint work with Nobuko Yoshida and Elias Benussi.
It is still unclear what the semantics of C11's sequentially consistent (SC) operations should be. We examine the SC semantics of Lahav et al in the light of local data-race freedom (LDRF), a generalisation of data-race freedom recently proposed by Dolan at al. Roughly, an LDRF semantics guarantees that race-free regions of a program exhibit sequentially consistent semantics, even when occurring within a larger racy program. We show that the existing C11 SC semantics are too weak to satisfy a desirable LDRF property. We present a natural strengthening of the existing semantics that does provide such a guarantee. We define a compilation strategy for this semantics that uses the IMM intermediate memory model of Podkopaev at al, and prove this strategy correct. This compilation strategy is stronger and likely less efficient than the standard proposals. Therefore, we explore more efficient compilation strategies for the widely deployed TSO and Arm v8 memory models.
In this talk, we will look at one approach to make the model checking of certain privacy-encoding properties in simple programs more efficient. Concretely, we target knowledge-intrinsic properties over the states of a program, such as “a program-observer knows that variable x is equal to y + 5”. To formalise these, we introduce a “program-epistemic” logic, in which we can express statements like: if command/program C starts at a state satisfying φ, then in all states where the execution of C finishes, agent A is epistemically unaware of π. In the latter, π is a formula which can contain several knowledge/epistemic operators scoping quantifier-free first-order formulae. We show that, in some cases and for simple programs, model checking this logic can be reduced to SMT-solving. Lastly, we report our experimental results which show that this technique significantly outperforms epistemic model checking.
Joint work with Nikos Gorogiannis and Franco Raimondi, published at IJCAI2017.
Scalable reasoning for fine-grained concurrent programs interacting with shared memory is a fundamental, open research problem. The Concurrent Separation Logics line of work achieved great compositionality for partial correctness. We have comparatively little understanding of compositional reasoning about progress (liveness) properties: that is, something good eventually happens (e.g. termination, deadlock-freedom). Yet, the intricacies of the design of concurrent programs often arise precisely from the need to make the program correct with respect to progress properties.
We introduce TaDA Live, a compositional separation logic for reasoning about the termination of fine-grained concurrent programs. In this talk, we will explain the design of TaDA Live total specifications and verification through examples. We will demonstrate that our specifications are expressive, concise, abstract, and atomic. To show they are also useful, we will give an overview of a verification system that uses these specifications. The proof system achieves compositionality by introducing a number of innovations: ghost state for liveness, modular deadlock detection and thread-local liveness reasoning.
Joint work with Julian Sutherland, Azadeh Farzan and Philippa Gardner.
Threads and shared memory have been introduced to JavaScript, and the JavaScript specification now includes an axiomatic relaxed memory model. We discuss interesting characteristics of the JavaScript model in comparison to C/C++, such as its mixed-size nature and lack of undefined behaviour. We detail our discovery and correction of several flaws in the model, including a violation of the SC-DRF property promised by the specification. We also discuss our verification efforts with respect to the corrected model.
Modern distributed databases weaken data consistency guarantees to allow for faster transaction processing. It poses several challenges: formalising user-observable behaviour; then verifying protocols of databases and reasoning about client applications. We abstract such databases to centralised multi-version key-value stores and client views which provide partial information about such stores. We propose an operational semantics that is parametric in the notion of execution test, which determines if a client with a given view is allowed to commit a transaction. Different execution tests give rise to different consistency models, which are equivalent to the declarative consistency models defined in the literature. Using our semantics, we prove the correctness of distributed protocols and analyse the robustness of simple client applications.
Joint with Andrea Cerone, Azalea Raad and Philippa Gardner.
EasyCrypt is an interactive proof assistant for a higher-order logic with built-in support for the formalisation of code-based game-based cryptographic proofs. In this talk, I will give a brief introduction to the discipline of code-based game-based proofs, and explain how EasyCrypt leverages programming language techniques to formalise them. I will then discuss some of the current and upcoming challenges, in particular those related to the shift of cryptographic discipline towards concurrent models of computing, hoping to spark some progress in those directions.
In this talk I will present a work in progress tool for simulating relaxed memory models over the full ARMv8-A architecture. This tool is built upon Sail, a custom language for specifying the semantics of instruction set architectures. We combine a complete Sail specification of ARMv8.5-A derived from authoritative vendor provided psuedocode with a custom Sail to SMT translation, allowing us to simulate the concurrent behavior of litmus tests spanning the entire architecture. As an illustrative example, one could consider how the instruction cache maintenance instructions interact with self-modifying code in an axiomatic setting, or many other interesting cases that have not been explored previously.
We consider the problem of verifying the safety of data invariants for highly-available distributed objects. This is difficult because availability precludes strong consistency when network failures may occur. We propose a proof methodology for distributed objects that propagate state. The methodology is: (i) modular: one can reason about each individual operation separately, and (ii) sequential: one can reason about a distributed object as if it were a sequential one. We automate the methodology using Boogie, an SMT-based tool. We illustrate the methodology with representative examples.
Replicated tree data structures are found in distributed filesystems. These systems need to support a move operation that allows a subtree to be moved to a new location within the tree. Such a move operation is easy to implement in a centralised system, e.g. with a designated primary replica, but it is difficult in settings where different replicas can concurrently perform arbitrary move operations. We present an algorithm (with a highly-available move operation) that handles arbitrary concurrent modifications on trees, while ensuring that the tree structure remains valid (in particular, no cycles are introduced), and guaranteeing that all replicas eventually converge towards the same consistent state. We formally prove the correctness of our algorithm using the Isabelle/HOL interactive proof assistant.
We develop powerful and general techniques to mechanically verify realistic programs that manipulate heap-represented graphs. These graphs can exhibit well-known organization principles, such as being a DAG (acyclic) or a disjoint-forest; alternatively, these graphs can be totally unstructured. The common thread for such structures is that they exhibit deep intrinsic sharing and can be expressed using the language of graph theory. We construct a modular and general setup for reasoning about abstract mathematical graphs and use separation logic to define how such abstract graphs are represented concretely in the heap. We develop a Localize rule that enables modular reasoning about such programs, and show how this rule can support existential quantifiers in postconditions and smoothly handle modified program variables. We demonstrate the generality and power of our techniques by integrating them into the Verified Software Toolchain and certifying the correctness of six graph-manipulating programs written in CompCert C, including a 400-line generational garbage collector for the CertiCoq project. While doing so, we identify two places where the semantics of C is too weak to define generational garbage collectors of the sort used in the OCaml runtime. Our proofs are entirely machine-checked in Coq.
C remains central to our infrastructure, making verification of C code an essential and much-researched topic, but the semantics of C is remarkably complex, and important aspects of it are still unsettled, leaving programmers and verification tool builders on shaky ground. This paper describes a tool, Cerberus-BMC, that for the first time provides a principled reference semantics that simultaneously supports (1) a choice of concurrency memory model (including substantial fragments of the C11, RC11, and Linux Kernel memory models), (2) a modern memory object model, and (3) a well-validated thread-local semantics for a large fragment of the language. The tool should be useful for C programmers, compiler writers, verification tool builders, and members of the C/C++ standards committees.