Monday, August 1, 2011

Refactoring to monadic C# - part 2

This is a follow-up to my previous article "Refactoring to monadic C#", in which I described how to refactor a typical imperative piece of code into functional style, and introduced and applied the Maybe monad.

To quickly recap, the code I refactored was a parser for NuGet's version ranges, e.g. [1.0] or (1.0,2.0), similar to math interval notation.

It's probably a good idea to keep both part 1 and part 2 of this article open side by side for reference and comparison.

Nesting options

After the refactoring we did in part 1, the function that parsed the upper bound of the interval was:

var maxVersion = L.F((string[] parts) =>  {
    var p = parts.Length == 2 ? parts[1] : parts[0];
    if (string.IsNullOrWhiteSpace(p))
        return FSharpOption<Version>.Some(null);
    return ParseVersion(p);
});

But... isn't Some(null) kind of awkward? Aren't option types supposed to save us from nulls?

A useful thing about option types I haven't mentioned is that they can be nested, that is, unlike Nullable<T> it's perfectly valid (and useful) to have an option of an option. Using nested options the code looks like this:

var maxVersion = L.F((string[] parts) =>  {
    var p = parts.Length == 2 ? parts[1] : parts[0];
    if (string.IsNullOrWhiteSpace(p))
        return FSharpOption<FSharpOption<Version>>.Some(FSharpOption<Version>.None);
    return ParseVersion(p).Select(v => v.ToOption());
});

Of course, now we have to change the monadic code in LINQ to handle this nested option (changes in bold):

var versionRange = L.F(() => from x in checkLength(value)
                             from isMin in minInclusive(value)
                             from isMax in maxInclusive(value)
                             let val = value.Substring(1, value.Length - 2)
                             let parts = val.Split(',')
                             from y in checkParts(parts)
                             from min in minVersion(parts)
                             from max in maxVersion(parts)
                             select (IVersionSpec)new VersionSpec {
                                 IsMinInclusive = isMin, 
                                 MinVersion = min.HasValue() ? min.Value : null, 
                                 IsMaxInclusive = isMax, 
                                 MaxVersion = max.HasValue() ? max.Value : null,
                             });

In this particular case, Some(null) wasn't so bad because we're not doing anything else with that value, but more generally, if you need to further process the value, nesting options is the better, cleaner way.

Error messages

Now let's see a different but closely related example: we want to parse some values (perhaps from a web request) and show an error if any parsing fails. If the values are parsed successfully, we do "something" with them (e.g. look them up in a database or whatever).

A typical imperative solution might look like this:

int doSomething(int userID, int id) {
    // fetch some other entity, do "stuff"
    return userID + id;
}

void setError(string e) {}

const string req_userID = "123";
const string req_otherID = "999";
int userID;
var userID_ok = int.TryParse(req_userID, out userID);
if (!userID_ok) {
    setError("Invalid User ID");
} else {
    int id;
    var id_ok = int.TryParse(req_otherID, out id);
    if (!id_ok) {
        setError("Invalid ID");
    } else {
        Console.WriteLine(doSomething(userID, id));
    }
}

If you try to refactor this to Options, like the previous example, you'll quickly see that Options are not quite enough. Options contain either a value or nothing, but in this case we want something that contains either a value or an error message. Such a type is precisely called Either in Haskell. More generally, both Option (called Maybe in Haskell) and Either are sum types (also called tagged union or discriminated union). The most commonly used sum type is bool. Sum types are in turn a special case of algebraic data types.

But I digress. Back to our example, we want to map from Option to Either, by attaching the corresponding error message instead of None where the parsing fails. Just like Option, writing an Either is not hard, but since we're already using F#'s option type we might as well also use F#'s generic sum type (called Choice). C# doesn't have the notion of algebraic data types so we'll add some syntax sugar around Choice to make it more usable.

Defining a monad over Choice is trivial once you understand how the Option monad works. We only have to choose by convention that one of the branches will work as the "error" branch, and we treat it like None in Option. So when we hit an error in the computation, we skip the rest of the computation.

Once we have everything in place, we can translate the code above to:

var somethingOrError =
    from userID in FSharpOption.TryParseInt(req_userID).ToFSharpChoice("Invalid User ID")
    from id in FSharpOption.TryParseInt(req_otherID).ToFSharpChoice("Invalid ID")
    select doSomething(userID, id);

where somethingOrError is of type FSharpChoice<int,string> , that is, it contains either the result of doSomething(...) if the values were parsed correctly, or a string with the error message for the first failed parsing.

But we're not done yet! We still need to either write the result to console or call setError if something went wrong.

if (somethingOrError.IsChoice1Of2) {
    var r = (FSharpChoice<int, string>.Choice1Of2)somethingOrError;
    Console.WriteLine(r.Item);
} else {
    var r = (FSharpChoice<int, string>.Choice2Of2)somethingOrError;
    setError(r.Item);
}

Ewwww. In languages with first-class support for sum types, extracting values is done through pattern matching, not ifs and downcasting (which are leaking implementation details here). Pattern matching, as explained in the Haskell wikibook, "is a convenient way to bind variables to different parts of a given value".

Of course, we don't have pattern matching in C#, but we can easily write an ad-hoc match, by abstracting the actions from the last snippet, so we can now write:

somethingOrError.Match(Console.WriteLine, setError);

Much cleaner, and it also ensures that both branches are covered.

More generic pattern matching can be encoded in C# (there are numerous examples on the web) but it's just too awkward, complex, unwieldy for my taste.

Conclusion

Even though C# lacks first-class support for sum types and pattern matching, they can still be gainfully encoded and used. Using FSharpChoice for sum types is convenient because it already implements equality semantics and you only have to wrap pattern matching once.

Of course, it's not without downsides: FSharpChoice is esentially an anonymous sum type (Choice1Of2 and Choice2Of2 have no real meaning), so you can't tell one FSharpChoice<int,int> from another FSharpChoice<int,int>, even though they may be representing widely different things. Also, as you add more "choices" to the type, it gets increasingly unwieldy, i.e. what does the third type represent in a FSharpChoice<int,string,int,bool> ? Without the ability to name things you have to keep track of this in your head (or add suitable functions, at the cost of convenience and conciseness). However, as shown in this post, the two-type generic Choice is quite simple and useful.

As with the first part of this post, I specifically tried to avoid getting into too much detail about the implementation or how it works under the covers, instead focusing on the end result, at the cost of some handwaving. You can see the entire implementation (it's almost trivial really) on github.

3 comments:

jackfoxy said...

Hi Maricio!

I started following your blog a while ago, and find your posts very informative, thoughtful, and they really advance the cause of practical functional language for .NET. However, other than an academic excercise, I do not understand the interest in trying to functionalize C# (and that goes for Tomas Petricek's fine work too). I'm inclined to let C# be the imperative OO language it is (with LINQ extensions) and concentrate on pushing the frontiers of functional development in F#.

Mauricio Scheffer said...

@jackfoxy: thanks for your comments! I want to do both: push functional techniques in F#, and also apply functional techniques in C# (since most people can't use F# in their dayjob, myself included) without alienating the language.
The point of this article was showing that it's possible to use certain FP techniques in C# without alienating the language while being much more concise, more clear, and pure than its imperative counterpart.

Mauricio Scheffer said...

@jackfoxy: by the way, this is not "demo code" or an academic exercise (I'm an engineer, not a scientist). I'm using this code in production right now.