Liking cljdoc? Tell your friends :D

Shootout 0001

ToC

  1. Introduction
    1. Motivation
    2. Ring Problem
    3. Fan Problem
  2. Observations
    1. Observation Conditions
    2. Platform 1
    3. Platform 2
    4. Names
      1. Columns Explained
      2. Some Program Names Explained
  3. Reactions
  4. Future Work
    1. Machinate
    2. Shootout
      1. More Platforms
      2. More Problems
      3. More Programs

Introduction

Motivation

This shootout is a collection of programs that can be run and the performance observed in some way. The purpose of the shootout is to provide a mechanism for comparing machinate's performance to itself over time, and also to compare against other similar libraries.

The programs are grouped into different implementation of the "same" problem.

Ring Problem

The ring problem involves setting up some number of threads/logical threads/virtual threads/etc to copy from an input channel/stream/queue to an output, arranged to form a ring. The performance measured is the number of laps around the ring that are made in a given fixed time period.

Fan Problem

The fan problem is a sort of "fan out" followed by a "fan in". A thread writes N messages to a channel, and N threads read a message each from that output and then write that message to a single channel, and the original thread reads N messages from that channel and then loops. The performance measurement is again how many iterations of that process happen in a fixed time period.

Observations

Observation Conditions

This iteration of the shootout was run on two different platforms. Platform 1 is a dedicated computer with a very modest 4 cores. Platform 2 is a cloud vm with 48 "vcpus".

The shootout code attempts to determine how many virtual threads the jvm allows to execute concurrently (how many can be mounted at once) and sets core.async's dispatch thread pool (used by go) to be that size. This is an attempt to provide something of an apples to apples comparison, but in some cases, like using core.async's blocking api with virtual threads, may cause some contention for resources between virtual threads and core.async's thread pool.

Platform 1 (4 cores)

tagnamemin(measure)avg(measure)max(measure)count(*)versionrange
fanmachinate-fan1.161251.238490774533661.297883333333338220.0.79-35-g7f85ba3-dirty0.136633333333333
fancore-async-go-loop-fan0.1759666666666670.2086282006920420.22578670.0.79-35-g7f85ba3-dirty0.0497333333333334
fanmachinate0-0-79-fan0.004833333333333330.08247080844285140.1696333333333338370.0.79-35-g7f85ba3-dirty0.1648
ringtransfer-queue-ring0.7296833333333331.052682133178471.497366666666678610.0.79-35-g7f85ba3-dirty0.767683333333333
ringcore-async-go-loop-ring0.8124333333333330.9106898243604431.138616666666678730.0.79-35-g7f85ba3-dirty0.326183333333333
ringmachinate-ring0.59160.8836729487179490.97947800.0.79-35-g7f85ba3-dirty0.3878
ringthread-parking-locking-strawman-ring0.56060.8437460344827590.9448166666666678700.0.79-35-g7f85ba3-dirty0.384216666666667
ringcore-async-virtual-thread-ring0.3951166666666670.8173229684908791.070666666666678040.0.79-35-g7f85ba3-dirty0.67555
ringthread-parking-single-queue-ring0.5918333333333330.8158284278959811.058958460.0.79-35-g7f85ba3-dirty0.467116666666667
ringthread-parking-strawman-ring0.5999833333333330.7614030765825590.8745166666666677530.0.79-35-g7f85ba3-dirty0.274533333333333
ringmachinate-go-loop-ring0.2130.2374841578327440.2704333333333338490.0.79-35-g7f85ba3-dirty0.0574333333333334
ringmachinate0-0-79-ring0.03998333333333330.08325314615690960.1147333333333338370.0.79-35-g7f85ba3-dirty0.07475

Platform 2 (48 vcpus)

tagnamemin(measure)avg(measure)max(measure)count(*)versionrange
fanmachinate-fan0.456850.4695602222222220.482666666666667750.0.79-35-g7f85ba3-dirty0.0258166666666667
fanmachinate0-0-79-fan0.1369833333333330.1526291111111110.168433333333333750.0.79-35-g7f85ba3-dirty0.03145
fancore-async-go-loop-fan0.08350.142269949494950.165816666666667660.0.79-35-g7f85ba3-dirty0.0823166666666667
ringcore-async-go-loop-ring1.167683333333331.261210470085471.34861666666667780.0.79-35-g7f85ba3-dirty0.180933333333333
ringtransfer-queue-ring0.49990.9564928315412191.03095930.0.79-35-g7f85ba3-dirty0.53105
ringthread-parking-single-queue-ring0.6535166666666670.7024325925925930.7236900.0.79-35-g7f85ba3-dirty0.0700833333333334
ringcore-async-virtual-thread-ring0.5815166666666670.6104114285714280.627651050.0.79-35-g7f85ba3-dirty0.0461333333333334
ringthread-parking-strawman-ring0.4239333333333330.5991862573099410.63821140.0.79-35-g7f85ba3-dirty0.214266666666667
ringthread-parking-locking-strawman-ring0.4639666666666670.5865070261437910.608251020.0.79-35-g7f85ba3-dirty0.144283333333333
ringmachinate-ring0.4017833333333330.4831708994708990.499083333333333630.0.79-35-g7f85ba3-dirty0.0973
ringmachinate0-0-79-ring0.1187833333333330.1327415555555560.145783333333333750.0.79-35-g7f85ba3-dirty0.027
ringmachinate-go-loop-ring0.07553333333333330.09975114942528740.139383333333333870.0.79-35-g7f85ba3-dirty0.06385

Names

Columns Explained

  • tag indicates which problem a program belongs to
  • name is the name of the program
  • count is the number of scores (measure) recorded for the given program
  • version is a the output of git describe --tags --dirty in the git tree where the shootout was run.

Some Program Names Explained

  • machinate-fan is the current version of machinate running the fan program using virtual threads
  • machinate0-0-79-fan is machinate version 0.0.79 running the fan program
  • transfer-queue-ring is the ring problem implemented using java's TransferQueue
  • thread-parking-single-queue-ring, thread-parking-strawman-ring, and thread-parking-locking-strawman-ring are channel implementations that exist just as part of the shootout suite as explorations of the channel implementation space.
  • except where indicated in the program name virtual threads are used as the light weight thread mechanism.

Reactions

  1. Machinate performance has improved a fair bit between 0.0.79 and the upcoming release
  2. Machinate performance still trails core.async's channels.
    • On the 4 core machine performance appears to be fairly close
    • On the 48 vcpu vm core.async is smoking machinate in the ring benchmark
    • Why?
      1. machinate may still be allocating more
      2. core.async elides some locks
      3. machinate may be overly aggressive looping in the channel event implementation causing contention.
      4. virtual threads are new :/
  3. These programs are large enough and complicated enough that it doesn't take that many runs to get a fairly stable average score.

Future Work

machinate

Performance is not the only goal of machinate, and machinate provides more than just message passing over channels. But the next bit of work on the performance of message passing over channels will likely look at limiting any looping the channel event does while trying to ensure anything queued is processed.

shootout

More platforms

  1. A thing a single core/vcpu platform might be interesting
  2. It was very disappointing to see lots of improvement in machinates standing on the 4 core platform over the course of development, only to get a smaller improvement on the 48 vcpu platform. I should figure out a way to more regularly test on a beefier machine.

More Problems

  1. core.async and machinate both provide a pubsub implementation. Add a problem based on that.
  2. The fan problem is implemented kind of weird, maybe it should be using mults or the equivalent

More Programs

  1. manifold is glaringly absent from the shootout.
  2. it would be interesting to see how a transfer queue based fan implementation does.

Can you improve this documentation?Edit on sourcehut

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close