Enums, discriminated unions, and boxing

I like the enum types in Swift. They have a lot of the look and feel of classical C enums: at minimum they contain labels associated with integers which can be matched in a familiar-looking switch statement. For example

enum Fruit {
  case Orange
  case Apple
  case Pear
}

func tellFruit(fruit: Fruit) -> String {
  switch fruit {
    case Fruit.Orange: return "Orange"
    case Fruit.Apple: return "Apple"
    case Fruit.Pear: return "Pear"
  }
}

But Swift's enums can do much more than this. The thing I want to talk about is something that, in Swift, is called associated values. Each label can have a different datatype associated with it, which can be bound and matched against within the switch statement. This makes enum a discriminated union. Discriminated unions and pattern matching on them is something to which functional programming languages owe a great deal of their power. Here's how that looks in Swift:

class MyClass { ... }
enum Fruit2 {
  case Orange(Int,String)
  case Apple(String)
  case Pear(MyClass)
}

func tellFruit2(fruit: Fruit2) -> String {
  switch fruit {
    case Fruit2.Orange(let size, let variety):
      return "A \(variety) orange of size \(size)."
    case Fruit2.Apple(let variety):
      return "A \(variety) apple."
    case Fruit2.Pear(let pearInfo):
      return pearInfo.complicatedPearDescritption()
  }
}

While this preserves the basic layout of a C enum and switch, you can see that we are now binding values in the case labels, according to the associated values held by each member of the enum. This is a succinct and immutable (i.e., safe) way of doing something that would probably have involved either a lot of inheritance or template metaprogramming in C++.

The only fly in the ointment with enum in Swift 1.1 is that the compiler wants an enum to be statically-sized out of context. An enum like Fruit2 is just fine, because we know how big (in memory) all of the associated values are going to be without looking at how it is used. MyClass is a reference type, so we know that it will be stored as a pointer. But if we want to use generics (and the way I program, this is going to happen a lot), e.g.:

// Won't compile
enum Fruit<T> {
  case Orange(Int, T)
  case Apple(T)
  case Pear(String, T)
}

we'll get a compiler error. This is because we don't know—without looking at how Fruit<T> gets used—how big a T is. If it's a reference object (i.e., a class instance), then it will just be a pointer, so that's OK. But, in Swift, T could be a value type with arbitrary size. Fruit<Int32> will need a different amount of storage to Fruit<Int64>, but the compiler wants them to be the same.

I'm sure in the fullness of time the compiler will get to the point where it delays compiling these kinds of enums until it knows enough about T. Until then, I need a work-around. I have done it by boxing the generic types so that—whatever they are intended to be used as—they get turned into a reference type in the enum:

// Classes are reference types
class box<T> {
  let item: T
  init(item: T) {
    self.item = item
  }
}

enum Fruit<T> {
  case Orange(Int, box<T>)
  case Apple(box<T>)
  case Pear(String, box<T>)
}

This will now compile because, even without knowing what T is, we now do know how big each member of the discriminated union is because, under the hood, box<T> is always a pointer, regardless of T.

This is only a work-around. For instance, consider if T were a 32-bit integer. It's probably going to be inefficient to allocate a whole object on the heap just for this; but that is just what will happen with box<Int32>. Even worse, whenever T is a reference type, we're going to be using a pointer to an object that contains just one pointer to another heap object which contains the data we actually wanted. Beside inefficiencies, there's also the additional typing we incur by having to construct all these box<T>s. This can be relieved slightly by using another great feature of Swift's enums, which is that we can add methods, including constructors to them:

enum Fruit<T> {
  ...
  init(size: Int, variety: T) {
    self = .Orange(size, box<T>(item: variety))
  }
  init(variety: T) {
    self = .Apple(box<T>(item: variety))
  }
  init(name: String, variety: T) {
    self = .Pear(name, box<T>(item: variety))
  }

  var variety {
    get {
      switch self {
        case .Orange(_, let v): return v.item
        case .Apple(let v): return v.item
        case .Pear(_, let v): return v.item
      }
    }
  }
}

Adding these helpers means that code using Fruit<T> can try to be as ignorant as possible of box<T> so that, when the compiler is upgraded and can handle generic enums better, we can remove it with, hopefully, minimal editing and bug creation.