The right way to handle YAML in Go

Go's standard library includes the ability to serialize and deserialize structs as JSON. While Go doesn't have a native library for YAML, go-yaml can be used to accomplish the same thing. However, if you are trying to support both JSON and YAML, several issues arise when using go-yaml and Go's JSON library together. There's a counter-intuitive way to use both libraries that ends up being much safer and easier to maintain.

But first, let's step back and review JSON serialization in Go.

Serialization in Go

Imagine you have a Go struct:

type Person struct {
     Name string
     Age int
}

And you want to turn this into JSON to output it to web clients:

// Create a Person struct.
p := Person{"John", 30}
// Convert it to JSON.
j, err := json.Marshal(p)
if err != nil {
    return err
}
fmt.Println(string(j))
/* Output:
{"Name":"John","Age":30}
*/

Looks great - except our field names aren't quite right. To make a field public in Go, you capitalize the first letter. But in JSON, it's customary to have the first letter of fields be lowercase. We solve this problem by using struct "tags," a Go construct to add metadata to a struct's fields.

type Person struct {
     Name string `json:"name"` // Use "name" instead of "Name" for the JSON
     Age int `json:"age"`      // field name.
}

Now our output is correct:

fmt.Println(string(j))
/* Output:
{"name":"John","age":30}
*/

Struct tags can be used for other things, like indicating whether a field should be omitted when it's equivalent to a zero value (0 for numbers, empty string for strings, etc.). Creating a struct from JSON is as simple as calling json.Unmarshal:

j := []byte(`{"name":"John", "age": 30}`)
// Initialize the variable p as type Person.
var p Person
err := json.Unmarshal(j, &p)
if err != nil {
     return err
}
// Using the awesome go-spew library to pretty-print the struct.
spew.Dump(p)
/* Output:
(main.Person) {
 Name: (string) (len=4) "John",
 Age: (int) 30
}
*/

There's one more thing we should cover before discussing YAML. If you have an object that has a custom way of being serialized and deserialized, you just define some methods on the struct and Go's JSON library will use those.

type ChecksumPerson struct {
    Name string
    Age int
    Checksum string
}
    
func (p *Person) MarshalJSON() error {
    // Automatically add a checksum field.
    return []byte(fmt.Sprintf(`{"name": %#v, "age": %d, "checksum": %#v}`,
        p.Name, p.Age, checksum(p)), nil
}

func (p *Person) UnmarshalJSON(b []byte) error {
    var cp ChecksumPerson
    err := json.Unmarshal(b, &cp)
    if err != nil {
        return err
    }

    err := checkChecksum(cp)
    if err != nil {
        return err
    }

    p.Name = cp.Name
    p.Age = cp.Age
    return nil
} 

Note that this is likely not the best way to implement an actual checksum on your JSON, but it serves here for demonstration purposes.

Now what if we want to do the same with YAML?

How go-yaml works

go-yaml works virtually identically to the JSON library, except instead of JSON struct tags, you use YAML struct tags, and instead of MarshalJSON and UnmarshalJSON, you use MarshalYAML and UnmarshalYAML.

type Person struct {
    Name string `json:"name" yaml:"name"` // Supporting both JSON and YAML.
    Age int `json:"name" yaml:"name"`
}
p := Person{"John", 30}
// Convert the Person struct to YAML.
y, err := yaml.Marshal(p)
if err != nil {
    return err
}
fmt.Println(string(y))
/* Output:
name: John
age: 30
*/

And similarly for deserialization.

The interesting bit is that JSON is actually a subset of YAML. Yes, you read that correctly - all JSON is actually fully valid YAML. "How is that possible?" you ask. "YAML doesn't have any of those ugly brackets or braces or quotation marks..."

Well, the YAML syntax we're all familiar with is only one way of writing YAML. For example, usually we write YAML collections like this:

- name: John
  age: 30
- name: Mary
  age: 35

But it's actually also valid YAML to write it like this:

- {name: John, age: 30}
- {name: Mary, age: 35}

Or like this:

[{name: John, age: 30}, {name: Mary, age: 35}]

Throw in some double quotes (also valid YAML) and you've essentially arrived at JSON.

So, let's say you're writing a web service that accepts both YAML and JSON but serializes out as JSON. This is easy because you don't have to force the user to indicate their choice - you just use yaml.Unmarshal on the way in, and json.Marshal on the way out.

But don't do that.

The problem with using both go-yaml and Go's JSON library

The first problem is obvious: you have to maintain duplicates of all your struct tags and custom Marshal/Unmarshal methods, which is a non-trivial task for a sufficiently large project.

The second problem is a little more subtle. Serialization and deserialization of structs can be a tricky thing with edge cases that you do not want handled inconsistently. One example is the handling of null values. Remember the custom MarshalXXXX and UnmarshalXXXX methods above? Well, if you have a null value in place of the struct, the JSON library still calls those methods with the null value to give you an opportunity to handle the null properly. But in go-yaml, they aren't called at all - go-yaml tries to instantiate a blank struct on its own. This can lead to some very confusing and difficult to find bugs.

To demonstrate this issue, let's define a struct in Go that we want to serialize and deserialize as a string in JSON:

type DateOfBirth struct {
     Time *time.Time  // Pointer to a time.Time struct.
}

And let's add it to our Person struct:

type Person struct {
    Name string `json:"name" yaml:"name"`
    Age int `json:"name" yaml:"name"`
    DOB DateOfBirth `json:"dob" yaml:"dob"`
}

But as we said, we want DateOfBirth to serialize and deserialize as a string in JSON, so we add the proper MarshalJSON and UnmarshalJSON methods.

// Called to convert a DateOfBirth struct into a JSON string value.
func (dob *DateOfBirth) MarshalJSON() ([]byte, error) {
    // If the time in the DateOfBirth struct is zero, encode the object
    // as JSON's null.
    if dob.Time.IsZero() {
        return []byte("null"), nil
    }
    // Otherwise return the string representation of dob.Time,
    // e.g. "2014-01-01 10:01:01 +0000 UTC"
    return json.Marshal(dob.Time.Format(time.RFC3339))
}

// Called to convert the JSON value into a DateOfBirth struct.
func (dob *DateOfBirth) UnmarshalJSON(b []byte) error {
    // If the string null is passed in, set dob.Time to a valid
    // blank time.Time struct.
    if string(b) == "null" {
        dob.Time = &time.Time{}
        return nil
    }

    // Otherwise, set dob.Time to the passed in time.
    var timeStr string
    // Basically removes the double quotes from b and converts
    // it to a string.
    json.Unmarshal(b, &timeStr)

    // Parse the string to produce a proper time.Time struct.
    pt, err := time.Parse(time.RFC3339, timeStr)
    if err != nil {
        return err
    }
    dob.Time = &pt
    return nil
}

Which works like this:

j := []byte(`{"name": "John", "age": 30, "checksum": "1abcf", "dob": null}}`)
var p Person
err := json.Unmarshal(j, &p)
if err != nil {
     return err
}
spew.Dump(p)
/* Output:
(main.Person) {
 Name: (string) (len=4) "John",
 Age: (int) 30,
 DOB: (main.DateOfBirth) {
  Time: (*time.Time)(0x20828a020)(0001-01-01 00:00:00 +0000 UTC)
 }
}
*/

See how *p.DOB.Time was set to a zeroed time.Time struct? That's because null was passed into DateOfBirth.UnmarshalJSON, which then converted that to a blank Time struct. What happens if we try to do this with go-yaml?

y := []byte("name: John\nage: 30\ndob: null")
var p Person
err := yaml.Unmarshal(y, &p)
if err != nil {
     return err
}
spew.Dump(p)
/* Output:
(main.Person) {
 Name: (string) (len=4) "John",
 Age: (int) 30,
 DOB: (main.DateOfBirth) {
  Time: (*time.Time)(<nil>)   // <-- different than above
 }
}
*/

The go-yaml library intercepts the null and doesn't even bother to call UnmarshalYAML - it just creates a zeroed object itself and assigns it to DOB. This is not only not the behavior we want, but it is inconsistent with how the JSON library works, requiring you to handle subtly different flows and results.

This is particularly bad because go-yaml is not entirely consistent on what it even considers a null value, because in YAML you can have value: null, value: "null", or even value: \n, and currently there's a bug where even value: "" is interpreted as null and UnmarshalYAML is not called. This led to some really insidious bugs in a project I was recently working on where we were unmarshaling data types back and forth from native data structures and were getting nils where we shouldn't have been.

The above is just one example of an inconsistency. There are a few others and, even if they all get fixed, there is the opportunity for more to show up the future. So you're probably wondering at this point, is there a better way?

The right (or at least, better) way to handle YAML in Go

Turns out, most YAML can be represented just fine as JSON. There are a few exceptions, like maps having maps as keys in YAML, but those exceptions generally aren't relevant to using YAML as a serialization format for structs since, for example, map keys must strictly be strings to be able to map onto struct field names. As a result, a completely acceptable way to handle YAML here is to convert it to JSON first and then use json.Unmarshal to turn it into a struct. And to serialize into YAML, to use json.Marshal to convert a struct to JSON and then convert that JSON to YAML. Fortunately, the go-yaml library works just fine for this task, and only needs some code to use it to convert YAML to JSON instead of serialize/deserialize structs.

This approach is ideal not only because of the current bugs in go-yaml, which will may be fixed, but because having two different code paths doing serialization and deserialization opens you up to an entire class of bugs that doesn't exist if you reuse the JSON library. Plus you only need to maintain one set of struct tags and one set of custom marshal/unmarshal methods.

https://github.com/ghodss/yaml is a library that implements this pattern. It implements yaml.Unmarshal and yaml.Marshal in the way described above, and also conveniently exports yaml.YAMLToJSON and yaml.JSONToYAML methods if you'd like to convert back and forth on your own.

With regard to the null cases above, to make sure you always handle them correctly you just need to make sure that value: \n, value: null, value: "null" in YAML always convert to {"value": null} in JSON (and that value: "" never does). This is the case, as confirmed by tests.

Note that there are a few other uses for YAML beyond converting them to and from Go structs. gosexy/yaml, for example, uses YAML to store and access configuration files. In those cases, this may not be the right approach, and you may want the full power of YAML. But when serializing and deserializing structs, the above method is more robust and maintainable in the long run.

And as a final note to anyone who might be writing a Go serializer to a different format, first see if the approach outlined above will work for your format. If not, my only advice would be to not try to be clever and implement a different interface to serializing and deserializing structs. Just follow the JSON behavior exactly (including the signature for the custom Marshal and Unmarshal methods). That will greatly help the users of your library implement your library alongisde other serializers.