Vector fun for bowling

Created 22nd May, 2006 06:20 (UTC), last edited 18th December, 2007 14:57 (UTC)

I came across an interesting piece by Ron Jeffries about implementing a scoring algorithm for bowling only using vectors. At the end of the article he says:

I'm not sure what would happen if we tried to build a vector-oriented solution in Java or C#. I'm sure it would be larger, and that we'd have to put a bunch of code in “our” classes instead of in Array and Object. On the other hand, I'm confident that we could have our score method look very much like the one we have here, processing the vectors “all at once” in a very similar way.

Might be worth doing. Any takers?

Well, I'm not going to play with C# or Java, but I thought it looked like an interesting thing to look at in C++.

Specification

There are a number of things that we have to decide before we even start on the solution.

We're looking at scoring a bowling game. Ron seems to know the rules pretty well, but it's a long time since I last played so a basic recap is probably in order.

  • The game consists of ten frames.
  • A frame consists of one or two throws.
  • At the beginning of a frame there are ten pins standing up.
  • For each frame you score points equal to the number of pins you knock down.
  • If you knock down all of the pins on your first ball that frame ends and you include in the score for that frame the next two balls.
  • If you knock down all of the pins on your second ball you include in the score for that frame the next ball.
  • If you knock down all the pins in the first ball of the last frame you get two extra throws.
  • If you knock down all the pins with the second ball of the last frame you get one extra throw.

There are all sorts of other rules, but these are the ones that I think are important for calculating the score.

The original algorithm in J

The algorithm that Ron had from Henry Rich (via June Kim) was written in J and the entire program is:

NB. Index of each ball that starts a frame
framex =: 10 {. 0 {~^:a:~  _1 _1  ,~  i.@# > :@:+  10&~:
NB. Score for each ball, assuming that ball starts a frame
scoreifframe =: 3   +^:(9<])`+/@|.\   ,&0
NB. Pick the balls that actually start frames & add em up
gamescore =: [: +/ framex { scoreifframe

I must admit that this looks like a random collection of symbols to me, but Ron does at least explain how he's interpreted it which is all I need (I am after all copying his version of the algorithm, not the original).

Approach

Ron's solution is delivered as a single Ruby class (not counting the test class), but although it masquerades as an object-oriented solution it is actually a functional programming solution wrapped in a class. I expect that the solution in a language like Miranda would be as short as the J implementation. I know that I would find it a lot easier to read (but that's only because I've written Miranda programs and not J ones).

Another part of the solution is of course validation of the input. As I don't know Ruby I'm not 100% sure, but it looks to me that he is not validating that the scores are correct, i.e. that they conform to a complete sequence of scores that are actually possible given the rules of bowling.

Normally I would baggage everything up with belts-and-braces checking of everything. For this exercise though I'm gong to assume that the scores are at least possible ones for knocking pins down in frames, but I won't assume that the game is complete and nor will I assume that they haven't carried on for too long.

Testing

I've never programmed in Ruby, but it looks like Ron didn't have to do too much boilerplate code in setting up a test harness — just repeatedly ran the program against his inputs. C++ isn't gong to let me do that (or at least not without a lot of irritating code and compile cycles); there's going to be an overhead in structuring the program so that I can easily specify the inputs and check the outputs.

There are three ways of handling this¹ [1There are of course any number of other ways of doing this, but these three are the ones that seemed reasonable to me at the time]:

  1. Embed the test cases in the C++ and just run a single command line program to do all the testing.
  2. Pass the scores in via the command line and return the score as an exit code.
  3. Use a scripting language to specify the test cases and wrap the C++ function into a COM wrapper.
The COM class we will use to test the C++ implementation—click for details

Solution one is the most obvious one, but I'm not going to use it. The problem is that it doesn't afford changes to the tests. The ability to be able to test a software system without access to the code and a compiler is vital for pretty much all of the real programming that I do so the test harness must function as a separate entity where new test cases can easily be configured.

The second solution allows me to do this, but the batch file that would run it would be pretty cumbersome. This leaves the third option which is the one I'm going to use. Making the COM object and all of the handling that this requires is going to take a lot of code, but the MSVC code wizards will do all of that for me, so, I'm not going to worry about it.

Test harness

The test harness is simply a Windows scripting file which sets up the ball scores and then requests the score.

One important consequence of testing in this way is that I can't test the partial results. As I break the problem down I can only see if the final score is what I expect, not that the different parts of the calculation do what I expect them to.

This may mean that I will have a harder time debugging the scoring. In the worst case it means that I may end up with two bugs cancelling each other out. In practice I can use the test cases to ensure that the partial results I get from a partially complete solution gives me information on the underlying building blocks.

This is really white box testing. I'll talk a little bit about black box testing near the end.

<?xml version="1.0" encoding="utf-8"?>
<package><job>
	<runtime>
		<description>Test the vector bowling code.</description>
	</runtime>

<script language="javascript"><![CDATA[


Test( 0, [] );
Test( 0, [ 0, 0, 0 ] );
Test( 10, [ 10 ] );
Test( 10, [ 10, 0 ] );
Test( 30, [ 10, 10 ] );
Test( 90, [ 8,1, 7,2, 6,3, 5,4, 4,5, 3,6, 2,7, 1,8, 2,7, 3,6 ] );
Test( 24, [ 6,4, 6,2 ] );
Test( 24, [ 6,4, 6,2, 0,0, 0,0, 0,0, 0,0, 0,0, 0,0, 0,0, 0,0 ] );
Test( 300, [ 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10 ] );
Test( 200, [ 10, 5,5, 10, 6,4, 10, 7,3, 10, 4,6, 10, 3,7, 10, 9,1 ] );
Test( 200, [ 9,1, 10, 5,5, 10, 6,4, 10, 7,3, 10, 4,6, 10, 3,7, 10 ] );


function Test( score, balls ) {
	var game = WScript.CreateObject( "VectorBowling.Game" );
	for ( var ball in balls )
		game.addBall( balls[ ball ] );
	var calculated1 = game.score, calculated2 = game.score;
	if ( calculated1 == score && calculated2 == score )
		WScript.Echo( "Pass" );
	else
		WScript.Echo( "Failed: " + calculated1 + "/" + calculated2 + " not " + score + ": " + balls.toString() );
}


]]></script>
</job></package>

Ron has picked what looks like some pretty good test cases and I've added a couple to give me some extra information to use as I build up the first solution.

Vector calculation

The vector version turned out to be much simpler than I'd initially thought. I was also surprised at how terse it was:

int score( const std::vector< int > &array, std::vector< int >::size_type pos ) {
	if ( pos < array.size() ) return array.at( pos );
	else return 0;
}


std::vector< std::vector< int >::size_type > frame_starts( const std::vector< int > &balls ) {
	std::vector< std::vector< int >::size_type > ret;
	for ( std::vector< int >::size_type pos( 0 ); pos < balls.size(); balls[ pos ] == 10 ? ( pos += 1 ) : ( pos += 2 ) )
		ret.push_back( pos );
	if ( ret.size() > 10 ) ret.resize( 10 );
	return ret;
}


std::vector< int > ball_scores( const std::vector< int > &balls ) {
	std::vector< int > ret;
	for ( std::vector< int >::size_type pos( 0 ); pos < balls.size(); ++pos ) {
		int part_score( score( balls, pos ) + score( balls, pos + 1 ) );
		if ( score( balls, pos ) == 10 || part_score == 10 )
			ret.push_back( part_score + score( balls, pos + 2 ) );
		else
			ret.push_back( part_score );
	}
	return ret;
}


int sum_indexes( const std::vector< int > &scores, const std::vector< std::vector< int >:: size_type > &indexes ) {
	int sum( 0 );
	std::vector< std::vector< int >:: size_type >::const_iterator index( indexes.begin() );
	for ( std::vector< int >::size_type pos( 0 ); pos < scores.size() && index != indexes.end(); ++pos ) {
		if ( pos == *index ) {
			++index;
			sum += scores[ pos ];
		}
	}
	return sum;
}


int calculate_score( const std::vector< int > &balls ) {
	return sum_indexes( ball_scores( balls ), frame_starts( balls ) );
}

It isn't the most elegant of solutions though. The code is very tightly coupled with the std::vector and there are a lot of for loops. The solution is an imperative one which is horribly inefficient. It does however map fairly well to the three lines of the original J solution:

  • frame_starts()—Finds the indexes of the first balls in each frame.
  • ball_scores()—Gives a vector of scores for each ball as if it was the first in a frame.
  • sum_indexes()—Takes the two partial results and calculates the final score.

The only other function, score(), is just to make the fetching of a ball score convenient given that we don't want to run off the end of the vector.

The way that we dealt with our knowledge that a complete game is ten frames is really horrid. The use of resize() to chop off the frame numbers we don't need is just grotesque. If we didn't want them we shouldn't have calculated them. This could be tidied up, but it wouldn't alter my basic dislike of the final result by much.

This separation into a function for each of the three parts of the algorithm feels good though, even though it doesn't make up for the fact that I'm building a lot of partial results in expensive data structures. I make heavy use of the the fact that the vector is random access, but I also create them piecemeal which is about as inefficient as I can get. Thankfully they are very small in this case, but it's still not a good habit to get into.

The vector implementation of the scoring is a cool calculation, but in C++ it just doesn't feel right. I'm sure that I could spend more time on it and get something that looks more elegant, but at the end of the day passing vectors around like that just feels wrong (and there are no bonus points for passing pointers to vectors around either).

In Javascript it wouldn't worry me at all, but C++ to me imposes greater constraints that I then take advantage of to make more efficient code. Copying all of these vectors all over the place sends shudders down my spine.

Ron remarks about all of the single line functions he has:

Did you notice that every method is only one line long? That's a characteristic of good Smalltalk code, and good Ruby code, in my opinion. You can't get there in Java or C# — much less C++ — and programmers who work in those languages are often troubled by there being so many tiny methods. Smalltalkers and Rubyists love it.

It's true that C++ is not going to fit a local declaration, a loop and a for into one line of code (well it can actually—look at calculate_score() in the iterator solution next), but his point isn't really about one line or multi-line functions. His point is rather that you should strive to work within the idiom of the community that you share your tools with, and this vector solution just isn't idiomatic C++ and lots of one line functions isn't C++ either.

Iterator calculation

An idiom that feels much more at home is to use iterators to handle the calculation. Of course there aren't any iterators that we can use as the problem is not something I'd expect to find in a general purpose library, but that doesn't mean we can't write our own.

template< typename const_iterator >
class BowlingScore : private const_iterator {
public:
	BowlingScore( const_iterator b, const_iterator e )
	: const_iterator( b ), frame( 0 ), end( e ) {
	}
	BowlingScore( const_iterator e )
	: const_iterator( e ), frame( 10 ), end( e ) {
	}

	static typename const_iterator::value_type calculate_score( const_iterator begin, const_iterator end ) {
		return std::accumulate( BowlingScore( begin, end ), BowlingScore( end ), const_iterator::value_type() );
	}

	void operator ++() {
		++frame;
		if ( const_iterator::operator *() != 10 ) const_iterator::operator ++();
		if ( *this != end ) const_iterator::operator ++();
	}
	bool operator !=( const BowlingScore &f ) const {
		return const_iterator::operator !=( f ) && frame != f.frame;
	}
	typename const_iterator::value_type operator *() const {
		const_iterator pos( *this );
		value_type score( *pos );
		if ( ++pos == end ) return score;
		bool strike( score == 10 );
		score += *pos;
		if ( ( strike || score == 10 ) && ++pos != end ) score += *pos;
		return score;
	}
private:
	int frame;
	const const_iterator end;
};


int calculate_iterator( const std::vector< int > &balls ) {
	return BowlingScore< std::vector< int >::const_iterator >::calculate_score( balls.begin(), balls.end() );
}

This solution feels much more elegant than the purely vector based one. One major benefit is that it doesn't build a lot of vector instances that are only used for partial results. I expect that this solution will execute much faster (I haven't profiled anything though). It certainly has a much lower memory overhead.

It also exhibits a slightly different seperation of the essentials of the algorithm into three members of the iterator class:

  • operator ++—Takes us to the beginning of the next frame.
  • operator *—Calculates the score for the current ball as if it where the start of a frame.
  • operator !=—Uses knowledge about the number of frames in the game to make sure we don't sum past the end.

We then drive this to get our result:

  • calculate_score() is now just a simple accumulator which skips from one frame to the next summing the scores as it goes.

Note that this pulls out the end of game test into a much more explicit form. I think this is probably a Good Thing™.

Comparison and testing

All in I think the iterator version is more elegant. I'm not sure how any maintenance programmer coming to either would view them. The iterator is at least idiomatic and the idea of skipping between frames seems (at least to me) a bit more obvious than the vector solution. The vector solution steps outside the normal C++ idioms, but at least the multiple for loops are fairly easy to analyse.

I can't help but think that the iterator solution much more closely matches how a person calculates the score of a game. This fact alone makes it a better solution from a maintenance point of view, especially if we add a short comment to each of the three main operators in the solution.

  • operator ++Move to the next frame
  • operator !=Stop at frame ten or the last throw we have
  • operator *How many points the frame starting here is worth

The iterator solution also keeps the basics of the functional version, but expressed in C++. I can use the iterator to go through and find the starts of each frame without scoring them (by using just operator ++). I can also use it to find the score for any ball (assuming that it is the first of a frame) irrespective of its actual position in the game by using operator *.

The solution also has the ability to calculate everything in the way that the original J version does, but is more spare in that it only actually asks for the result of a given ball's value to the final score if (and only if) it actually is at the start of a frame—this lazy evaluation is another feature I expect from functional programming languages.

This feels much more like the sort of solution I would end up with in Miranda and as such the iterator approach feels much more functional than the vector approach.

In the same way that Ron found all_roll_scores.take(frame_starts).sum marvelous I think the single line accumulator call, std::accumulate( BowlingScore( begin, end ), BowlingScore( end ), const_iterator::value_type() ) is truly elegant.

And as an added bonus it is completely type neutral whilst still being completely type safe, again just as I would expect in a Miranda solution.

Black box testing

The tests that I've used are not very plentyful and this makes me nervous. Are my solutions actually correct? In a real application there are two things that I would do to make myself feel better:

  1. Write a game generator as part of the test harness. This would create random games and score them. This could play the two versions of the algorithm off against each other to ensure they were consistent.
  2. Find a domain expert, i.e. a bowling referee or a champion player, and ask a few questions:
    • Can I get some scores for recent games you've played/refereed?
    • Did I miss any rules to do with scoring?
    • When somebody first scores games what are the mistakes they normally make?
    • What scores do you think are hard?
    • What else should I know?

I tend to like randomised tests as they often bring up interesting cases. For many (most?) applications the rules for building test data can be expressed seperately from the rules for dealing with it. Here the script would follow a number of simulated throws and only need worry about how many pins are available and which frame number it is on. A small change to the COM test object would allow the two algorithms to be compared. This won't of course spot problems that are caused by my lack of understanding of the game. That's what I'd need the domain expert for.

Another way of doing it

Looking back at the beginning of the iterator based solution I can now see that when I realised that std::for_each() was not going to work I was actually at a fork in the road. I chose to implement a more complex iterator and reduce the functor object to a mere summation.

An alternative of course would be to have a very simple iterator and a much more complex functor. It would need to hold internal state in order to know how many times to count the next score it is given:

  • The first ball always counts once.
  • If a ball at the beginning of a frame is a ten then the next two balls are counted an extra time each.
  • If the second ball in a frame gives a total of ten for the frame then the next ball is counted an additional time.

So, for example, if the first three balls are all strikes (ten pins down) then the first ball is counted once. The next ball is counted twice (once for itself and once because it is the second of three counted an additional time from the first ball) and the third ball is counted three times (once for itself; once because it is the first of the next two balls from the previous one; and once because it is the second of two balls from the first).

Clearly this will be a much more complex functor, but it could just be fed scores as they occur. This has a very nice usage pattern as it makes the streaming of the results much easier and only needs to visit each ball score exactly once.

So, does anybody want to tackle this other way of doing it? Maybe this is the most natural way of doing it in a state based language? Would this be the natural way of doing it on a Turing Machine?

How about SQL

I have a suspicion that it would be really interesting to do in SQL using a trigger² [2I guess not every database engine supports the trigger actions needed for this. For those that don't I think replacing the INSERTs and UPDATEs below with stored procedures should be acceptable so long as the only state storage is a single database row.].

First ball would be done through (1234 is just some unique id for the game, 7 is the number of pins knocked down):

INSERT INTO PlayerGame ( playerGame_id, pins ) VALUES ( 1234, 7 )

And then subsequent balls would be (3 is the number of pins knocked down—a spare):

UPDATE PlayerGame SET pins=3 WHERE playerGame_id=1234

At any time the following would return the current score:

SELECT gameScore FROM PlayerGame WHERE playerGame_id=1234

I guess the table would be something like (first three columns—all others whatever you need for the implementation):

  • playerGame_id—int, not null, primary key—A single game for a single player
  • pins—int, not null—The number of pins knocked down on the last ball
  • gameScore—int, not null, default 0—The total score for the game so far for this player

Of course only a single table can be used with one row per player/game. Any takers on this implementation?

Pages

  1. Game class
  2. Test harness
  3. Vector calculation
  4. Iterator calculation

Discussion for this page