Writing a Concurrent Job Queue

I recently saw a review of the Doom 3 BFG edition source code. In particular, the multithreaded job system caught my attention because it’s something I’ve written in the past, and I wanted to revisit it for my current game projects. So naturally, I had to roll another one using a similar approach as id Tech 5, which puts strong emphasis on avoiding locks. Keep in mind that in many scenarios, performance loss from locking may be either nonexistent or not worth the hassle of writing lockless code.

I’d also read about the LMAX disruptor last year, which is the name of the architecture of a high-traffic financial platform. The novelty it brought was heavily optimizing a single business logic thread to run all requests rather than splitting the work among separate threads and synchronizing. The system moved data to/from the business logic thread using ring buffer pipelines. These are great for moving data across threads without involving locks, and I’ve made generous use of them in my design. LMAX also included an event sourcing model to support failover scenarios, and the model of a single thread processing requests in serial works well for deterministic behavior and replays.

Threading

For anyone unfamiliar with threading, locking is a way of ensuring that only a single thread of execution runs in a block of code (a “critical section”) at any given time. Locking involves calls to the operating system and causes other threads to halt execution if they attempt to execute the critical section while one thread claims it. On the other hand, lockless approaches attempt to keep threads running by attempting to write to variables using operations atomic at the hardware level. When write the attempt fails, the thread may “spin” and try again, or it may go off and perform some other work. It’s great for avoiding blocking, but unfortunately lockless approaches can quickly become nontrivial and difficult to understand. One approach or the other may be more appropriate depending on the expected levels of contention over shared resources.

Besides contention, another big concern with multithreaded programming is passing data from one thread to another. Immutable objects are great because we can pass them around without ever worrying about how they’ll be used, but depending on garbage collection concerns and object usage, mutability may be more appropriate. For example, we may have large buffers passed around and employ pooling to reuse the space to avoid memory allocations or garbage collection. In this case, we need to consider whether there are other references to the object and decide whether or not to transfer ownership, and thus responsibility for destroying or recycling the object, to the called method. We could also clone the object and guarantee that nothing else references the new object, but this may have its own overhead depending on the cost of copying.

In C#, an obvious choice is to use the Task Parallel Library and TPL Dataflow, but I needed a solution to work in Unity3D, which is still at the equivalent of .NET 3.5 and does not include these APIs, although I’ve seen TPL backported to 3.5. I’m also slightly concerned about how a large number of Task objects (which cannot be reused) would affect garbage collection times.

Approach

We want to avoid processing spikes on the main thread that cause frames to be dropped. One way to mitigate spikes is to smooth them out by limiting the amount of work allowed in a single frame. This requires that the task can be partially completed, and continuations are a convenient means of creating a state machine to support this. Another way is to offload tasks to background threads, and the purpose of the job/task system is to make that easy.

The basic steps:

  1. Partition app/game into subsystems that support N threads in parallel with little or no synchronization. For example, a map generation subsystem might be stateless and support any number of threads, but a world simulation subsystem may have many complex internal interactions and only support a single thread at a time.
  2. On the main thread, queue up jobs for each subsystem. While this step only involves appending a job callback and argument to a list, careful thought needs to go into the ownership of any mutable arguments passed to the job system.
  3. Once per frame on the main thread, submit the batch of queued jobs to another set of shared pending queues accessible by the background worker threads.
  4. As new batches become available, the worker threads copy them into their own thread-local storage, and from there the jobs can be ordered, traversed, and selected freely without worrying about modifying a shared collection of jobs.
  5. While jobs remain unfinished, the worker threads attempt to acquire a lock on a particular job (using atomic primitives), and if the job is already taken, they continue on to search for another.
  6. Once a job is obtained, the worker thread runs it and posts its output value to an output queue readable by the main thread.
  7. When no jobs remain in local storage or in the pending queues, the worker threads sleep and wait for a signal to notify them that jobs are available again.
  8. Once per frame on the main thread, output queue entries are consumed and completion callbacks are called if the job was not canceled.

Many of the queues are implemented using a ring buffer with an overflow list. Ring buffers are allocated to a fixed power-of-two size and have head and tail pointers, with simple atomic operations to enqueue and dequeue that support concurrent access by a single producer and single consumer. Once capacity is reached, entries go into the overflow list, which does involve OS locks. Alternately, this could be implemented using a resizing buffer, but the code would be more complex and the intent is that a particular ring buffer size is chosen large enough to avoid reaching capacity under nominal loads.

There’s plenty more to talk about, but I’ll save it for another post.

Eval2D F# Interop

Working toward releasing a version of Eval2D, I chose for some inexplicable reason to port core parts of it to an F# library (actually, it’s because I already had a dependency on F# code, and figured I’d roll more functionality into it). My C# library is written in a somewhat functional way already, so it should be easy, right?

Of course it wasn’t. Not because of changes in method bodies, but because I wanted to maintain the same fluent function chaining library design of the C# version, available to C# code. As a result, most difficulties came from trying to make F# work it ways it really wasn’t designed for:

  • Type extension limitations: In F#, extension methods (type extensions) cannot be defined on generic types with concrete type parameters specified. For example, I can extend IEnumerable<T>, but not IEnumerable<int>. In my case, I use extension methods to allow dot notation for chaining functions together in the same way that the F# operator |> operates. Without the ability to specify concrete type parameters, many of my extension methods are unavailable.
  • FSharpFunc: Functions defined in F# (of type FSharpFunc<>) are not implicitly convertible to C# delegates, so I had to define a cast function.
  • Less concise default parameter values: It’s an additional line per parameter. Not so bad, and I’m not sure that every feature like this needs to be part of the language.

Some good points:

  • Indentation-sensitive: No curly brackets needed to identify scope. In some cases, I haven’t been sure how to indent, but I can figure it out through Intellisense.
  • Type inference: This cuts down on a lot of type specifications, but it’s occasionally tricky to trace through the code to determine why the compiler inferred a certain type.
  • Type abbreviations: I’m working with Func<> a lot, and was able to abbreviate certain Func<> types that appear often.

In the end, the core code is F# and extension methods are in a separate C# assembly. I don’t like having two assemblies like this, but I don’t see alternatives at the moment. Overall, I’m happy with the results and hope to release a version soon.

Eval2D: Procedural Content Editor

I’m always trying to cut time off my feedback loop, especially when the work involves tweaking parameters without a clear idea of their impact. This has been especially true for procedural content, where much of my time is spent discovering how to compose functions to produce interesting effects.

For a cheap solution, I’ve used NUnit to define tests that output images, then tweaked parameters and re-run the tests. In some cases, I generate multiple images for ranges of parameters. This process works fairly well, especially with test running integrated into VS 2012 or through an addon like Resharper.

I finally decided it might be worth investing some time into a dedicated solution that allows me to write my code and get instant feedback. Building a small utility also makes for a fun distraction justified with the notion that I’ll be working more effectively with it.

So here’s Eval2D, a procedural texture tool where your C# script is evaluated as you type for instant visual feedback:

It’s a preview pane, a code editor, and a results/errors pane. Autocomplete works in simple cases. Evaluation of highlighted code sections is supported (as seen in many SQL editors). The layout changes when the window widens:

Components

  • AvalonEdit: This is the same editor used for SharpDevelop, and gives me most of what I need in a code editor.
  • Roslyn CTP: Works well for executing the scripts, but I had a bit of trouble querying compilations for type information to support autocomplete.
  • MahApps.Metro: Window styling.
  • Noise library: This my own code, which has a fluent interface around noise/math functions that makes for easy function composition. It’s very flexible, but could be much faster with GPU support.

Planned Features

It works well enough for now, but I’d really like to have fully working autocompletion, popup sliders for numeric literals (as might be seen in shader/material editors or LightTable), and a color picker for hex-formatted uints. Access to a time variable within the script would help in visualizing the effect of changing parameters. For my own use, I’m not interested at all in any kind of modular GUI system like most 3D modeling apps have. I intend to keep a minimalist design and add features only as needed.

Serializing Ships

I’m a big fan of data in transparent formats that allow or encourage hand editing by humans. For Triverse, I’ve imposed the constraint that all ships and maps have a textual representation that captures all information about them through the arrangement of parts.

Other metadata might be useful, such as weapon group definitions or forward direction, but I’m hoping to generate reasonable defaults or possibly to cache user-customized weapon groups for layouts on the client. (Actually, since I first wrote this, I’ve been thinking more in the direction of disallowing arbitrary forward direction and forcing it to be a function of layout.)

A small ship:

  /.\p/.\
  \i/.\i/
\u/.\o/.\u/
  \u/u\u/

This form is also acceptable (slash required):

   . p .
   i . i
 u . o . u
   u u u/

And the resulting ship in-game:

A larger ship:

      /.\p/.\./.\p/.\
    /.\./o\./u\./o\./.\
  /i\./.\u/     \u/.\./i\
/.\./o\u/ \g/o\g/ \u/o\./.\
\./.\./u\ /u\./u\ /u\./.\./
  \i/.\o/.\./p\./.\o/.\i/
    \./.\i/     \i/.\./

And the result:

Each letter represents a type of part. Only one slash must be present to define cell orientation, but I find it helpful in visualizing the ship to include them throughout. This format was not obvious; I went through several other variations before settling on it. I’ve also considered using images to store large maps, which are stored the same as ships internally but would be wasteful in a text format without compression. Margins are irrelevant; they can be trimmed and would have no significant performance impact.

Part types are read in and mapped to a part definitions using a table. This table defines character codes (‘i’, ‘u’, ‘.’, ‘o’, etc) and attributes of parts, including energy requirements, weapon behavior, and any other information related purely to gameplay logic. Other info, such as visuals and audio, are decoupled from these definitions and placed in a separate table. I can imagine modding scenarios where one or both tables could be easily swapped out or added to. However, this use of character codes does impose a convention on custom tables if a modder wants to support existing ships: if a thruster is suddenly mapped to a weapon, the ships wouldn’t work as intended.

Creating this visual serialization is one of the best things I’ve done for testing and development in general. I implemented it early in the project, and it makes defining test cases incredibly easy and allows for quick comparison of results. For example, code to rotate a grid (I’ll cover the math in another post):

let cg = grid @"
    /i\./u\         /u\./i\
    \./.\o/.\i/u\i/.\o/.\./
      \i/.\./u\ /u\./.\i/
        \u/.\./o\./.\u/
          \./p\./p\./ "

for i in 0 .. 6 do
    let rot = GridVector.Rotation i
    cg.RotateTriGrid rot |> formatTriGrid |> printf "%s\n\n"

And the output:

/i\./u\         /u\./i\
\./.\o/.\i/u\i/.\o/.\./
  \i/.\./u\ /u\./.\i/
    \u/.\./o\./.\u/
      \./p\./p\./

        /.\i/.\
        \u/o\./i\
        /i\./.\./u\
      /i\u/ \u/.\./.\
/.\u/o\./.\u/.\o/.\p/
\i/.\./i\./u\./.\p/

/i\./.\i/.\u/.\./p\
\./u\o/.\./u\./o\./p\
      \i/u\ /u\./.\./
        \i/.\./.\u/
        /u\o/.\i/
        \./i\./

      /.\p/.\p/.\
    /u\./.\o/.\./u\
  /i\./.\u/ \u/.\./i\
/.\./o\./i\u/i\./o\./.\
\i/.\u/         \u/.\i/

  /p\./.\u/.\i/.\./i\
/p\./o\./u\./.\o/u\./
\./.\./u\ /u\i/
  \u/.\./.\i/
    \i/.\o/u\
      \./i\./

      /.\i/.\
    /i\./o\u/
  /u\./.\./i\
/.\./.\u/ \u/i\
\p/.\o/.\u/.\./o\u/.\
  \p/.\./u\./i\./.\i/

/i\./u\         /u\./i\
\./.\o/.\i/u\i/.\o/.\./
  \i/.\./u\ /u\./.\i/
    \u/.\./o\./.\u/
      \./p\./p\./

I imagine copy+paste as a useful way to get ships in and out of the game in a sandbox mode, but Unity3D does not expose any cross-platform way to access the clipboard. Dragging + dropping files into a game window would also be convenient. For a browser deployment, I’ll probably have to use a text box outside the game and use Unity’s interop to access it.

Working with text this way, and especially getting rotation working, makes me ponder roguelike potential here. Making time and space discrete would avoid a lot of work, and there might be a better overall game along those lines. Might be fun to prototype, but for now I’m still going down the original path of a real-time continuous world.

2D Physics: Unity3D vs Farseer

The problem: colliding large, concave, moving 2D objects using small quads or circles as colliders to define the overall object boundaries. The small colliders are dynamic: they can be added, removed, or their shapes and positions changed. This doesn’t appear to be a typical physics scenario, so I wrote two implementations for Unity3D: one using its own 3D physics and another integrating Farseer physics.

Unity3D

Pros

  • Built-in: It’s already designed to work with the transform system, and has gizmos for colliders.
  • Decent API: Unity has plenty of API oddities and gotchas, but the physics fit in sensibly with the rest of the system.

Cons

  • Black box: Not always clear what’s happening as a result of changes or calls I make. This has led to one instance where I ended up with odd “popping” effects upon adding or removing colliders when rotation was enabled on objects and I wanted to use my own moment of inertia value. I did not pursue the problem further.
  • 3D: This adds a slight inconvenience in regard to constraining dynamics. It may also mean reduced performance because of greater complexity compared with 2D, although for my purposes I have yet to see problems.
  • Overhead: Each collider requires its own GameObject, which worries me when it comes to performance at scale. It comes back to the black box issue; it may be performant enough, but it’s an unknown that could change.

Farseer

Farseer is a C# port of Box2D. It was originally developed for XNA, but written to work as a stand-alone library as well.

Pros

  • Open-source: It’s great that I can jump in and see exactly what the library is doing. This also makes it easier to test; I can run the simulation in a test environment, allowing me to build automated physics integration tests.
  • 2D: Potential for better performance compared with a 3D library.

Cons

  • Integration effort: This means synchronizing between Farseer bodies and Unity transforms. Gizmos needed to be implemented, but fortunately much of the work was done in Farseer’s debug view and in this port.
  • API: It felt like the functionality was all over the place and a lot of internal details exposed (possibly for good reasons). It’s probably difficult to make changes at a point where it’s so widely used, and I’m sure much of it is the result of performance tuning and its history.
  • Code changes needed: Dynamic fixtures did not work well out of the box: Farseer does not appear to be designed for rapid fixture/shape changes. I had to write my own methods to create/pool fixtures and update shapes which relied on internal details of the library. I also notice a number of places where fixtures are removed from array-based lists by shifting the following array elements, which could conceivably result in poor performance with a large number of objects.

For both, I’d like to have better control over when body properties are automatically recalculated. In my case, I don’t want them to ever be recalculated from the shapes composing them because I already maintain those calculations and because the shapes do not represent the entire object. I haven’t stress-tested them both for a performance comparison, but I’d have a hard time believing Farseer would fare worse.

Another option I’d like to consider is Chipmunk, a 2D C library which appears to have a fairly clean API. Unfortunately, I don’t see any C# ports maintained, and I’m not about to roll my own (and maintain it..) without a compelling reason.

Conclusion

At this point I’ve decided to go with Farseer. It appears to work, I’ve gotten past some of the messy details, and it allows me to test physics-dependent functionality independently from Unity. That last point is the most compelling, and probably won’t change even if Unity gets a 2D solution that otherwise fits my needs.

This comparison came with some side benefits to the codebase:

  • Isolating coordinate mapping: Unity has a left-handed coordinate system, and in the past I’ve had the camera looking in the -Y direction, with the X-Z plane containing the 2D world. I took this opportunity to instead map to X-Y with the camera looking in the +Z direction, defining a common set of methods to support either.
  • Decoupling physics: During the transition, I could swap back and forth between physics implementations to compare results. I’d like to maintain this ability to mock out or swap physics libraries in the future.