Concurrency is not parallelism

The terms are often used as if synonomous and often confused. The best definition I have seen that to me was clear and workable was from Rob Pike (the creator of the Go programming language:

Concurrency is the composition of independently executing 'processes'

Parallelism is the simultaneous execution of (possibly related) computations

So:

Concurrency is about the way we structure programs

Parallelism is about the way they run

The nice thing about this definition is it's then easy to find examples of concurrent but not parallel (running multiple threads on a single core) and parallel but not concurrent (instruction-level parallelism, data parallelism), and even when concurrency and parallelism are both present but to a different extent (running 50 threads multiplexed across 4 cores).

This Talk from Rob is worth watching.

Time in Clojure

Reference types

All of the reference types hold a value and are updated by functions with the same signature

(<changer> reference f [args*])

All can be derefed with @reference or (deref reference)

The main difference between them if they are synchronous or asynchronous, coordinated or uncoordinated.

Synchronisation - Does the operation block?
Coordination - Can multiple operations be combined in a transaction?

Atoms (Synchronous,Uncoordinated)

(swap! atom f)
(swap! atom f x)
(swap! atom f x y & args)

(compare-and-set! atom oldval newval)

(reset! atom newval)

Refs (Synchronous,Coordinated)

(Must be inside a dosync transaction)

(ref-set ref val)

(alter ref fun & args)

(commute ref fun & args)

Agents (Asynchronous,Uncoordinated)

(send a f & args)
;; (executed on a bounded thread pool)

(send-off a f & args)
;; (executed on an unbounded thread pool)

Using agents to serialise access to non-threadsafe resources

If we start multiple threads and start writing to the console with println from them, very quickly you start to get overlapping writes.

(defn start-thread
  [fn]
  (.start
   (Thread. fn)))

(defn loop-print
  [n]
  (let [line (str n ":**********")]
    (println line)
    (Thread/sleep (rand 5))
    (recur n)))

(defn -main []
  (dotimes [n 50]
    (start-thread #(loop-print n))))

One way around it is to wrap the contested resource in an agent

(defn write [w content]
  (doto w
    (.write w content)
    (.write "\n")
    .flush))

(def console (agent  *out*))

(def log-file (agent (io/writer "LOG" :append true)))

(defn loop-print
  [n]
  (let [line (str n ":**********")
        sleep-time (rand 5)]
    (Thread/sleep sleep-time)
    (send-off console write (str n ":*********"))
    (if (= n 50)
      ;; We have a separate file log for the 50th thread
      (send-off log-file write (str "sleeping for" sleep-time)))
    (recur n)))

(defn -main []
  (dotimes [n 100]
    (start-thread #(loop-print n))))

Here the write function takes something writeable and a string, writes the string to it with a newline at the end and then (importantly) returns the object (this is the behaviour of doto). This allows it to be passed to an agent, this way the value of the agent is always the writer and as the agent serialises application of the functions passed into it the writer is only written by one function at a time.

The guidance for whether to send or send-off to an agent is if the function does any IO or not, send-off is for IO operations and is on an unbounded pool whereas send is for CPU bound operations and is on a fixed-size pool. Either way the message queue is unbounded and you might eventually suffer memory issues (this is one of the reasons core.aync and CSP are better as they provide back-pressure and make you think up front about buffering and queue size)

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.RuntimeException: Agent is failed, needs restart
    at clojure.lang.Util.runtimeException(Util.java:223)
    at clojure.lang.Agent.dispatch(Agent.java:238)
    at clojure.core$send_via.doInvoke(core.clj:1915)
    at clojure.lang.RestFn.applyTo(RestFn.java:146)
    at clojure.core$apply.invoke(core.clj:623)
    at clojure.core$send_off.doInvoke(core.clj:1937)
    at clojure.lang.RestFn.invoke(RestFn.java:442)
    at clojure_exchange.agents$loop_print.invoke(agents.clj:24)
    at clojure_exchange.agents$_main$fn__34.invoke(agents.clj:31)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

Agents and STM

Agents are integrated with STM, any dispatches made in a transaction are held until it commits, and are discarded if it is retried or aborted so you can use them to safely trigger side-effects from within transactions. See the docs for more.

Delays and futures

> (defn slow-inc [n]
   (Thread/sleep 1000)
   (inc n))

Delays

When a delay is created, the call to delay returns immediately. The computation captured can be executed by dereferencing the delay.

> (def d (delay (slow-inc 5)))
;; Returns immediately

> @d
;; Will take a second as this is when 'slow-inc' will run
6

Futures

A future will also return immediatly on creation, however it runs the code in another thread (from a pool). It will also blocks on deref until the value is ready.

> (def f (future (slow-inc 5)))
;; Returns immediately as delay did but is already running slow-inc

> @f
;; Depends how quickly you can type
;; but if you took longer than a second it will be ready to return,
;; if not it will block till it is ready
6

So now we have a chance at some parallelism!

We can use future as a way to just kick off new threads, so we can use future where we had start-thread (and we gain a bounded thread pool to run them on)

pmap and friends

If we make a sequence of futures as follows

> (def futures (doall (map #(future (slow-inc %)) [1 2 3 4 5])))
;; Returns immediately (we have to use doall to force evaluation as map is lazy)

If we deref them all, it should take <1s

> (map deref futures)
(2 3 4 5 6)
;; Again should return pretty rapidly but at most a second

pmap

Here is a version of a function called pmap that captures that idea

(defn pmap [f col]
  (let [futures (doall (map #(future (f %)) col))]
    (map deref futures)))

So we can do

> (pmap slow-inc [1 2 3 4 5])

and it will take 1s or thereabouts.

The actual pmap looks like this, notice the warning that f needs to be expensive enough to justify the overhead

(defn pmap
  "Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead."
  {:added "1.0"
   :static true}
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)
         step (fn step [[x & xs :as vs] fs]
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))
                   (map deref vs))))]
     (step rets (drop n rets))))
  ([f coll & colls]
   (let [step (fn step [cs]
                (lazy-seq
                 (let [ss (map seq cs)]
                   (when (every? identity ss)
                     (cons (map first ss) (step (map rest ss)))))))]
     (pmap #(apply f %) (step (cons coll colls))))))

It also has a few friends

(defn pcalls
  "Executes the no-arg fns in parallel, returning a lazy sequence of
  their values"
  {:added "1.0"
   :static true}
  [& fns] (pmap #(%) fns))

> (pcalls function-1 function-2 ...)

(defmacro pvalues
  "Returns a lazy sequence of the values of the exprs, which are
  evaluated in parallel"
  {:added "1.0"
   :static true}
  [& exprs]
  `(pcalls ~@(map #(list `fn [] %) exprs)))

> (pvalues (expensive-calc-1) (expensive-calc-2))

Waiting for futures

You need to call shutdown-agents to exit when the agents/futures are done.

(taken from clojuredocs )

;; Note: If you leave out the call to (shutdown-agents), the program will on
;; most (all?) OS/JVM combinations "hang" for 1 minute before the process exits.
;; It is simply waiting for background threads, created by the future call, to
;; be shut down.  shutdown-agents will shut them down immediately, or
;; (System/exit <exit-status>) will exit immediately without waiting for them
;; to shut down.

;; This wait occurs even if you use futures indirectly through some other Clojure
;; functions that use them internally, such as pmap or clojure.java.shell/sh

SICP Distilled

Ch1 - Building Abstractions With Functions

Ch2 - Building Abstractions With Data

Ch3 - Modularity, Objects, and State

Ch4 - Metalinguistic Abstraction