C++ killed the get & set accessors

Created 27th May, 2006 04:53 (UTC), last edited 28th June, 2008 12:41 (UTC)

A simple way to get started on the principles behind writing C++ libraries by seeing how to get rid of all of those accessor functions; all to the tune of Video Killed the Radio Star¹ [1Of course the meter doesn't quite work, but it's close.].

One of the basic rules of encapsulation² [2Why encapsulation is a Good Thing™ is beyond the scope of this article.] is that we should hide the implementation of our classes. We define the interfaces and not the data structures so that we are free to alter our implementations without having to worry quite so much about how this will affect the client code.

This leads to a style of implementation where every attribute ends up with get and set accessor functions. In turn this leads to classes that look a lot like this:

class GeoPosition {
    public:
        GeoPosition();
        GeoPosition( double lat, double lon );

        double getLat() const;
        void setLat( double newValue );
        double getLon() const;
        void setLon( double newValue );
    private:
        double lat, lon;
};

I'm sure that I don't need to labour the point by showing the blindingly obvious implementation here. Compare this with the following code:

class GeoPosition {
    public:
        GeoPosition();
        GeoPosition( double lat, double lon );

        double lat, lon;
};

That's a huge saving in the declaration alone. The implementation has just shortened by no small order either. The problem is that it breaks the encapsulation. If we ever need to do anything more complex with the attribute we will have to re-write a lot of client code which is going to be very time consuming. What sort of thing are we likely to want? The obvious thing is a range check.

Why the range check? Well, both latitudes and longitudes are constrained to a range of ±180°³ [3There are lots of reasons why you may not actually want to constrain them in an implementation, but for our purposes we'll go with this specification.Of course the best reason not to constrain them like this was pointed out to me on Reddit. The longitude is ±180°, but the latitude is of course ±90°. I will update this eventually with an example that does work and still illustrates the point I'm trying to make. In the meantime I hope you bear with my idiocy.]. This means that our accessors aren't quite as brain dead as we might first imagine. The set members should really look something like this [4In production code I wouldn't use this exception. Consider this an illustration.]:

void GeoPosition::setLat( double nLat ) {
    if ( nLat < -180. || nLat >= 180. )
        throw std::out_of_range( "New latitude is out of allowed range of +-180 degrees" );
    lat = nLat;
}

Note that I've followed the standard C++ convention of allowing the lower bound but excluding the upper bound. The get is still fairly brain-dead though, looking like this:

double GeoPosition::getLat() const {
    return lat;
}

A slightly more idiomatic accessor

The first thing we're going to do is to change the accessors so that they're slightly more idiomatic. Because we can overload the same name with different parameters in C++ we can just chop the get and set from the names. This gives us a class that looks like this:

class GeoPosition {
    public:
        GeoPosition();
        GeoPosition( double lat, double lon );

        double lat() const;
        void lat( double newValue );
        double lon() const;
        void lon( double newValue );
    private:
        double m_lat, m_lon;
};

This means the we can now replace anything that looks like:

place.setLat( place.getLat() + 10. );
return place.getLat();

With this:

place.lat( place.lat() + 10. );
return place.lat();

It may seem like a small matter, but again it saves us some typing. More importantly it also gives us a lot more flexibility on how to implement the attribute which at the end of the day is one of the reasons for using encapsulation.

Even if you don't read any further than this you should take away that in C++ you should never use get and set in accessor names because it limits what you can do later on. What those things are is what the rest of this article is about.

Objects to the rescue

We found that we could get rid of the get and set parts of the names, but there's still a lot of boring repetitive code to write. Boring and repetitive tasks is one of the things that classes help us to deal with so maybe we can write a class to handle it all for us.

Let's see what happens if we implement the attribute as a class.

class Latitude {
public:
    Latitude();
    Latitude( double );

    double get() const {
        return m_lat;
    }
    void set( double nLat ) {
        if ( nLat < -180. || nLat >= 180. )
            throw std::out_of_range( "New latitude is outside of allowed range of +-180 degrees" );
        m_lat = nLat;
    }
private:
    double m_lat;
};

Hmmm… What was I going to call the accessors? I can't leave them with no name so I've brought back in get and set. Not ideal.

It turns out that C++ provides us a way of not giving them a name at all. We can actually overload the () operator [5Classes which implement this operator are normally called functor classes because instances of them can be used with the same syntax as function calls.]. Now this sounds odd and in a way it is, but the reason that it's there is for exactly this sort of eventuality. We overload an operator by defining a method name of operator X where X is the operator. For our purposes we want () so this gives us operator ().

Now watch carefully when we use it though. The syntax is pretty obvious when you think about it, but looks really strange at first.

class Latitude {
public:
    Latitude();
    Latitude( double );

    double operator ()() const {
        return m_lat;
    }
    void operator ()( double nLat ) {
        if ( nLat < -180. || nLat >= 180. )
            throw std::out_of_range( "New latitude is outside of allowed range of +-180 degrees" );
        m_lat = nLat;
    }
private:
    double m_lat;
};

If you think about you'll see that the first () is the operator name and the second set are for the operator's parameters. The first one of course doesn't have any parameters [6We could have written operator()( void ) if we'd wanted to, but the void is optional and normally left out in C++.] and the second one takes a double. Just to double underline this (ahem), I've written the members as prototypes with the accessor names in bold below:

double operator ()() const; void operator ()( double nLat );

The whole operator () is the name of the function.

We can now use this in our first class like this:

class GeoPosition {
    public:
        GeoPosition();
        GeoPosition( double lat, double lon );

        Latitude lat;
        double lon() const;
        void lon( double newValue );
    private:
        double m_lon;
};

Note that I've only changed the latitude member so far. To manipulate it we will have code that looks like this:

place.lat( place.lat() + 10. );
return place.lat();

Notice that this looks exactly like the accessors we had earlier after we removed the get and set. This is the first part of the reason why leaving the get and set out of accessor names is a pretty good idea. It gives us the possibility of switching between writing accessor members in a class and using a seperate helper class to implement them [7You may need to think a little about this. The reason is that if we use the get and set in the names then we will be trying to replace getLat and setLat which isn't possible. Try it and you'll see what I mean.].

We can imagine that we can of course do exactly the same thing with a class Longitude which would reduce the size of our class even more, but we don't actually have to. Looking at the class you can see that it doesn't really need to be limited to latitude at all. If we rename it we can use it for both:

class SphericalCoordinate {
public:
    SphericalCoordinate();
    SphericalCoordinate( double );

    double operator ()() const {
        return m_coord;
    }
    void operator ()( double nCoord ) {
        if ( nCoord < -180. || nCoord >= 180. )
            throw std::out_of_range( "New co-ordinate value is outside of allowed range of +-180 degrees" );
        m_coord = nCoord;
    }
private:
    double m_coord;
};

Just by renaming it we have something that is suitable for use as either latitude or longitude.

class GeoPosition {
    public:
        GeoPosition();
        GeoPosition( double lat, double lon );

        SphericalCoordinate lat, lon;
};

This is clearly much better:

  • The class is shorter and much more clearly expresses our intent without any excessive verbosity.
  • Instead of writing two sets of accessors which are nearly identical we've written one in a helper class.
  • We have two classes that do exactly half the job that one was doing. This means that each class is smaller and easier to understand.
  • We're still free to change the underlying implementation if we want to because we haven't changed the syntax used to access the latitude and longitude.

If you stop reading now you have a technique that you can use to help you write classes quicker and with less errors. But we can do even better.

Then convert it to a template

What is a template and what is it meant to do for us? A template is a way to write a class so that we can configure some things that we would otherwise have to specify in our code. What do we mean?

In our class we've decided that we want our co-ordinate to be stored as a double. This is fine, but what if there are some places that we need to store the location as an int and other places where we need to store it as a double. Do we need to write two versions of our class? Do we need a SphericalCoordinateInt and a SphericalCoordinateDouble?

It's exactly this that the templates do for us. We can write the class without having to decide what type we use to store the actual co-ordinate. The syntax is actually fairly straightforward. To change the class into a template we simply tell it that it is going to be a template and tell it that we want to specify a type to use. The first part of the class declaration changes to this:

template< typename t_coordinate >
class SphericalCoordinate ...

Now we just need to replace the occurances of double with t_coordinate in our complete implementation:

template< typename t_coordinate >
class SphericalCoordinate {
public:
    SphericalCoordinate();
    SphericalCoordinate( t_coordinate );

    t_coordinate operator ()() const {
        return m_coord;
    }
    void operator ()( t_coordinate nCoord ) {
        if ( nCoord < -180. || nCoord >= 180. )
            throw std::out_of_range( "New co-ordinate value is outside of allowed range of +-180 degrees" );
        m_coord = nCoord;
    }
private:
    t_coordinate m_coord;
};

In order to use it in our class we now have:

class GeoPosition {
    public:
        GeoPosition();
        GeoPosition( double lat, double lon );

        SphericalCoordinate< double > lat, lon;
};

This now looks a little more complex, but actually it has one important benefit over the previous version—the precision of the latitude and longitude are now explicitly stated in our GeoPosition class and can just as easily be changed there [8I'm guessing that you will also notice that we could templatise GeoPosition too; and in some situations that may be the right thing to do.].

This last point is more important than it at first seems. When somebody else is going through this code the fact that the actual underlying type is in GeoPosition will make it easier for them to understand the class. This means that although we have used much more complex techniques the code is actually easier to maintain. And even better, because we're not writing the same accessors time and again we have something that is much more likely to be correct.

Some final thoughts

I'm going to just bullet these out. There are some other considerations that you should be aware of, but nothing that should stop you from using the technique:

  • It is possible to templatise the range check, but not without using even more complex template syntax in the template implementation. We'll leave that for another day.
  • The set accessor (the void operator()( t_coordinate ) in our case) at the moment doesn't return anything. There is a good argument that it should return the new value (or even the old value). This depends on the context that it is being used in though. If you suspect that there is any chance of confusion between the return value being the old or the new one then don't do it. The principle of least surprise would to me intimate that it should return the new value, but what one person finds surprising is often very different to what another finds surprising.
  • You may be wondering why we don't just use some of the other operators (assignment and casts) in order to make the template bahave exactly like the plain data type. The reason here is that it changes the syntax and means that if we ever need to go back to using accessor functions then we have to re-write all the client code again. A laborious task.
  • There is one aspect of encapsulation that this technique doesn't help with. If you retain the accessors as members of the original class then it is possible to update the accessor without forcing a recompile. I don't worry too much about this as in practice it's pretty rare not to have to recompile. Obviously there will be some situations where this may be more important.
  • Because our template class members are inline there is no run-time overhead for using the technique. In fact, it is possible that if the compiler inlines the accessors properly it may even be faster than the original accessors.
  • The templates that you use can be extended to handle other common idioms and patterns. For example:
    • Managing strings by enforcing maximum lengths, and range enforcement for any other data type.
    • Generation of SQL for UPDATE or INSERT statements or even WHERE clauses (for example adding single quotes where appropriate).
    • Converting to and from COM types (for example to and from BSTR or VARIANT).
    • Memoization (or caching of complex results).

Conclusions

We've seen what we should call accessors and why. We've also seen that we can use helper classes to implement the attributes which reduces the amount of code we need to write which is good for all sorts of reasons. We've also seen that we can use some simple template syntax to start us to get used to them and to put control of the types closest to where they're needed.


As ever there is an interesting discussion on Reddit.

Pages

  1. A simple meta-accessor