Playing With Strings

Take the following code:

	CString str1("Startt"),
			str2("Start\0");
	str1.SetAt(str1.GetLength()-1, '\0');

	str1 += "End";
	str2 += "End";

What would you see when watching the resulting strings? Probably not what you expect:

This is a simplified version of a much dirtier, very real bug I dealt with recently. Several string and debugger features joined forces to cause this behaviour.

First – the debugger: it apparently watches CStrings as c-strings – displaying their essentially-LPTSTR member m_pszData.  Thus, any null embedded in the string (well, the first null, really) is treated as a terminating null – anything past it would not be displayed. When we force a watch on the full CString buffer, a fuller picture is revealed:

So the ‘End’ suffix was added to str1 after all – but why the difference between str1 and str2?  How can initializing a string with an embedded null be any different than setting that null in the next line? The next clue is obtained by observing GetLength() for both strings. Note that GetLength returns the length of the allocated string buffer, not the strlen of the underlying c-string. (It is utterly unimaginable that such a basic behaviour goes undocumented.)


So, str1 and str2 are indeed somehow different before adding the ‘End’ suffix. In fact, they are fundamentally different even before manually setting the null in str1:

	CString str1("Startt"),
			str2("Start\0");

	int len1 = str1.GetLength(),	// gives 6
		len2 = str2.GetLength();	// gives 5 !

The issue now has nowhere left to hide. Stepping with the debugger into the CString ctors reveals the root cause: the constructor used for both CStrings accepts a char*-type as argument (in retrospect – how could it be otherwise?). So, just like in the debugger itself, the first embedded null is treated as a terminating null – anything past it would never make it into the CString. Try the following and see for yourself:

	CString str3("First\0Second"); // str3 now contains only "First" !

Once this root cause was understood, the bug was a half-line fix.

Thanks and kudos go to Alexander M. of wordpress support, who found and fixed within 1 hour (!) a wordpress bug that I reported, to make this post possible: until yesterday, wordpress would ignore explicit nulls (backslash + zero) between quotes, in a sourcecode section.

OptimizedMesh DirectX Sample Having Issues With Large Meshes

The DirectX SDK comes with quite a few nice samples, neatly organized in a sample browser. Quoting the documentation from the OptimizedMesh sample:

This OptimizedMesh Sample sample demonstrates the different types of meshes D3DX can load and optimize, as well as the different types of underlying primitives it can render. An optimized mesh has its vertices and faces reordered so that rendering performance can be improved.

Sadly, it turns out the code as is cannot load meshes with more than 64K vertices (much less optimize them). Now I’m sure somewhere in the SDK a disclaimer is buried, saying there’s no warranty, this isn’t production code, the usual yadda yadda. Still , seemed to me like optimizing meshes is a topic that is of interest mostly to an audience dealing with large meshes (certainly I was), so this really deserves a fix.

The sample browser comes with neat ‘feedback’ links, and I did communicate this to MS a while ago. They never did get back to me, so I thought someone out there might benefit from the fix online.

In the main source file, OptimizedMesh.cpp, make the following addition:

...
// Load the mesh from the specified file
hr = D3DXLoadMeshFromX( strMesh, D3DXMESH_SYSTEMMEM, pd3dDevice,
      ppAdjacencyBuffer, &pD3DXMtrlBuffer, NULL,
      &g_dwNumMaterials, &pMeshSysMem );

if( FAILED( hr ) )
   goto End;

if(pMeshSysMem->GetOptions() && D3DXMESH_32BIT)
   g_dwMemoryOptions |= D3DXMESH_32BIT;

// Get the array of materials out of the returned buffer, and allocate a texture array
d3dxMaterials = (D3DXMATERIAL*) pD3DXMtrlBuffer->GetBufferPointer();
...

In a nutshell, the culprit is a tragic legacy of DirectX mesh files: by default, meshes allocate only 16 bit for a vertex index in the stored index buffer. Thus, meshes with more than 2^16 vertices require some explicit treatment – as listed here.

Coders at Work

I started reading Coders at Work, and it is just as good as Jeff and Joel say. The Jamie Zawinski chapter is brilliant. Brad Fitzpatrick  – while he may be an exceptional developer, he’s a ‘wow, like, dude!’  kind of speaker, and not much fun to read. The real highlight for me (so far) is Peter Norvig.

So far I’ve been successfully avoided the temptation of rehashing stuff in this blog, but the Norvig interview is just too good. Every single paragraph in his interview is worth hanging as an office poster.  (Plus, unlike the Zawinski interview, I haven’t seen it quoted around that much yet). Here are a few of his words, that are a real lesson to live by:

Seibel: How do you avoid over-generalization and building more than you need and consequently wasting resources that way?
Norvig: It’s a battle. There are lots of battles around that. And, I’m probably not the best person to ask because I still like having elegant solutions rather than practical solutions. So I have to sort of fight with myself and say, “In my day job I can’t afford to think that way.” I have to say, “We’re out here to provide the solution that makes the most sense and if there’s a perfect solution out there, probably we can’t afford to do it.” We have to give up on that and say, “We’re just going to do what’s the most important now.” And I have to instill that upon myself and on the people I work with. There’s some saying in German about the perfect being the enemy of the good; I forget exactly where it comes from—every practical engineer has to learn that lesson.

Seibel: Why is it so tempting to solve a problem we don’t really have?

Norvig: You want to be clever and you want closure; you want to complete something and move on to something else. I think people are built to only handle a certain amount of stuff and you want to say, “This is completely done; I can put it out of my mind and then I can go on.” But you have to calculate, well, what’s the return on investment for solving it completely? [My emph - OS] There’s always this sort of S-shaped curve and by the time you get up to 80 or 90 percent completion, you’re starting to get diminishing returns. There are 100 other things you could be doing that are just at the bottom of the curve where you get much better returns. And at some point you have to say, “Enough is enough, let’s stop and go do something where we get a better return.”