Wednesday, 12 October, 2005

A bug solved and one lurking

My primary task on the graphics project is to update our 3D world editor to work with the new graphics engine.  We've been delivering monthly milestones to the customer so that they can see our progress and test the product.  We try to make bugs the highest priority.  Fortunately, we're working with a mature tool that has its idiosyncracies but is in most cases reasonably solid.  The drawback to working with such a tool is that, when a problem does crop up, it's usually tough to track down and repair.  I ran into such a problem today.

The Timeline Editor is a subsystem that lets you script motions of objects or cameras in the world.  It's a pretty simple concept:  set the object's positions at particular times and the underlying animation infrastructure will move the object smoothly between those positions.  The dialog looks like this:

The numbers are seconds.  In this example, I have set the camera's position at start (0 seconds) and at several other points in time.  This particular example is a 30 second level fly-through that returns to the starting point and then repeats.  The long vertical bar at the 4 second mark is the current position in the timeline.  You can use the mouse to move the time bar to any position and the objects being controlled will move to the proper position.

The important things to note here are the existence of a horizontal scrollbar and zoom capability.  In theory, the timeline can be any length.  You can use the scrollbar to scroll through the existing time marks, or drag the current time bar to the right in order to set a time mark far into the future.  This all works great, and I'm able to script some very long animations.  But I'm getting ahead of myself.

The problem I ran into today was that if I attempted to drag the time bar to the left of the zero mark, it would jump forward a few seconds.  And if I moved the mouse outside the window on the left, the thing would start scrolling forward in a big hurry.  It took a while to track down, but I finally traced the problem to this line of code in the WM_MOUSEMOVE handler:

int MouseX = LOWORD(lparam);

Makes sense, right?  WM_MOUSEMOVE passes the current mouse position as two unsigned words in the lparam:  horizontal position in the low word and vertical position in the high word.  This is standard Windows API stuff.  The problem is that the mouse position is actually a signed quantity.  If you drag the mouse outside the left border of the window, the horizontal position is -1.  Except due to the vagaries of binary representation and C++ conversion rules, converting an unsigned 16-bit word to a 32-bit integer results in an unsigned 16-bit value.  The above code worked as expected in Windows 3.1 when an int was 16 bits long.  Once I figured out what was happening, it was easy enough to change the code to this:

int MouseX = (short)(LOWORD(lparam));

That forces a signed conversion and then a sign extension.  Problem solved.

Except that there might be a much bigger problem lurking in that code.  If the scrollbar code is using the old WM_HSCROLL notifications, then the virtual window is limited to 32768 pixels, or about 655 seconds of timeline at 50 pixels per second.  The zoom capability limits that further, cutting the timeline length in half for each zoom level.  If I limit the zoom to three levels, giving the user a resolution of about 2.5 milliseconds, then the timeline is limited to only 75 seconds in length.  That's not long enough.

There are two possible solutions:  ignore the values passed with WM_HSCROLL and use the GetScrollPos function to read the scroll position, or get rid of the scrollbars and use a world-to-screen transformation.  For this application, a 32-bit scroll position would be plenty (giving me about 43 million seconds or roughly 500 days).  For other applications, even a 32-bit scroll position is inadequate.

These kinds of bugs keep programmers up nights.