RGB vs. BGR in Bitmaps

What is the channel order in bitmap pixels? Does the R channel occupy the least- or most- significant byte?

Ah, the answer to this simple question held me back for nearly a day while debugging some image processing code. MSDN, while being excruciatingly verbose on obsolete legacies of bitmaps (device-dependent bitmaps, anyone?) is remarkably mysterious here.

First, they note that:

The system and applications use parameters and variables having the COLORREF type to pass and store color values … Applications can create a color value from individual component values by using the RGB macro.

Which says that R is the least-significant (as in COLORREF). This drove the original design of my code, but external viewers kept showing the colors wrong in images I produced.

Reversing the buffer contents from RGB to BGR solved the problem – the cleanest way is just using RGBQUAD instead of COLORREF. Still, to this moment I cannot find an explicit mention of this being the expected channel order in bitmap storage.  There’s this quote:

When creating or examining a logical palette, an application uses the RGBQUAD structure to define color values and to examine individual component values.

And this:

The RGBQUAD structures specify the RGB intensity values for each of the colors in the device’s palette.

Both docs indicate that RGBQUAD is the expected channel order for indexed bitmaps: those where pixel colors are not stored explicitly, but as indices into a central color-table (or palette). This hasn’t really been useful since devices started to show more than 256 colors, but it feels like the docs still consider it more important than non-indexed bitmaps.

Turns out that that for regular, non-indexed bitmaps – i.e., biBitcount equals 16, 24 or 32 – the expected channel order is the same as for the palette: R being the most significant byte.

Extra Chatter

If you insisted you could force a different channel ordering on the stored bitmap, using color masks (it’s a wikipedia link – by far more readable than MSDN). You’d have to -

  1. use either BITMAPV4HEADER or BITMAPV5HEADER.
  2. Set the bV4V4Compression (historical typo?) or bV5Compression fields to BI_BITFIELDS.
  3. Set the bVxRedMask/bVxGreenMask/ bVxBlueMask to indicate your desired channel masks (x being 4 or 5).
  4. Expect the rest of the world to have massive trouble using your images.

Three F-keys Gotchas

We recently did a small internal app, that had to use all 12 F-keys – which turned out to be surprisingly cumbersome. I hadn’t found this stuff concentrated on a single place, and certainly it could have saved me some trouble.

F1 Gotcha

Quote:

On Win32 systems, the operating system will generate the WM_HELP message when F1 is pressed.”

This isn’t much trouble if you don’t handle the message, as VK_F1 key message is still being sent. On MFC (and other) wizard generated apps, you might have to explicitly disable OnHelp on message maps:

BEGIN_MESSAGE_MAP(CMyWinApp, CWinApp)
…
  //ON_COMMAND(ID_HELP, CWinApp::OnHelp)   ! comment this
…
END_MESSAGE_MAP()

F10 Gotcha

Quote:

If the F10 key is pressed, the DefWindowProc function sets an internal flag. When DefWindowProc receives the WM_KEYUP message, the function checks whether the internal flag is set and, if so, sends a WM_SYSCOMMAND message to the top-level window. The WM_SYSCOMMAND parameter of the message is set to SC_KEYMENU.

So to get proper VK_F10 notifications, you can either bypass DefWindowProc completely – which is unfeasible, or handle specifically the WM_SYSCOMAND message. On MFC apps, that amounts to something like:

void MyWnd::OnSysCommand( UINT nID, LPARAM lParam )
{
  if(nID == SC_KEYMENU) // F10 pressed
    ::SendMessage(m_Child->GetHwnd(), WM_KEYDOWN, VK_F10, NULL);
    // The NULL in LPARAM is kinda sloppy. If you use key-message nuances, invest here a bit further.
  else
    __super::OnSysCommand(nID, lParam);
}

F12 Gotcha

This one is the most obscure – on some developer machines, pressing F12 seems to give a weird error message:

…This may be due to a corruption of the heap…  This may also be due to the user pressing F12…

To cut a long search short, this is a built in Win32 debugging feature. It can indeed be disabled through this registry key:

HKLM\Software\Microsoft\Windows NT\CurrentVersion\AeDebug\UserDebuggerHotkey

But contrary to what the connect page says, I wouldn’t advise to change it into just any nonzero value: this value is the scan code for the key that that would force a breakpoint!  If, for example, you change it to 0×9, you’d have the same problem while using the Tab key (on standard keyboards). I suggest setting the value to 0xFF – looking at some scan code tables around, I saw none that map it to an actual key.

Of course this issue would never manifest itself on customer machines, but solving it can make dev work much easier.

Afterthought

It seems an odd design choice to bake something as basic as F-keys handling so deep into the OS. I’m guessing this was done back when the wildest applications imaginable were word processors and spreadsheets, and the main design goals were to save what was perceived as boilerplate code rather than allow for flexibility. I still think modern app wizards should generate code that lets you opt-in, rather than opt out of these old key handling mechanisms.

Reading Monitor Physical Dimensions, or: Getting the EDID, the Right Way

We recently needed to know the physical size of monitors on customer machines. Getting it right was a surprisingly tedious research – and definitely something that deserves more web presence – and so the results are below.

1. GetDeviceCaps

- is the immediate answer. The argument flags HORZSIZE / VERTSIZE are advertised to give the -

Width/Height, in millimeters, of the physical screen.

Alas, as many have discovered, GetDeviceCaps just does not work as advertised with these flags.

2. GetMonitorDisplayAreaSize

- is the next obvious guess. The documentation doesn’t state whether the obtained values are in pixels or physical units – I suspect it’s vendor specific, but didn’t get to check it myself since I kept getting the dreadful LastError 0xc0262582: “An error occurred while transmitting data to the device on the I2C bus.     unsigned long”. Gotta say I didn’t insist too much since the entire Monitor Configuration API set is both new to Vista and already ‘legacy graphics’, which are explicitly described as-

Technologies that are obsolete and should not be used in new applications.

3. WMI

There’s a good chance that this Managed Instrumentation code  gets the job done. I didn’t get to test it, since

(1) It is exceptionally complicated (CoSetProxyBlanket anyone? How about some nice IWbemClassObjects to go with that?),

(2) WMI supports monitor classes only since Vista, which makes it irrelevant to most of the world (40%-50% as of Sep 2011).

4. Spelunking the Registry

Unlike what many, many, say, the physical display information is in fact available to the OS, via Extended Display Identification Data (EDID). A copy of the EDID block is kept in the registry, and bytes 21/22 of it contain the width/height of the monitor, in cm. Some have tried digging into the registry directly, searching for the EDID block, but the code in the link didn’t work for me and worked (I guess) for the poster by pure accident: the exact registry path to the EDID is not only undocumented, but does in practice vary from one vendor to another.

This is, however, a step in the right direction – which turned out to be:

5. SetupAPI !

Finally, here’s some code that works almost perfectly, courtesy of Calvin Guan. Turns out there is a documented way of obtaining the correct registry for a device:

  1. Call SetupDiGetClassDevsEx to get an HDEVINFO handle.
  2. Use this HDEVINFO in a call to SetupDiEnumDeviceInfo to populate an SP_DEVINFO_DATA struct.
  3. Use both HDEVICE and HDEVINFO in a call to SetupDiOpenDevRegKey, to finally get an HKEY to the desired registry key – the one that holds the EDID block.

Below is a (larger than usual) code snippet. Beyond some general cleanup, a few fixes were applied to Calvin’s original code:

(1) the REGSAM argument in SetupDiOpenDevRegKey is set to KEY_READ and not KEY_ALL_ACCESS to allow non-admins to run it, (2) Fix a small memory leak due to a missing SetupDiDestroyDeviceInfoList call (thanks @Anonymous!), (3) the monitor size is extracted from the EDID with millimeter precision, and not cm (thanks other @Anonymous!)

#include <atlstr.h>
#include <SetupApi.h>
#pragma comment(lib, "setupapi.lib")

#define NAME_SIZE 128

const GUID GUID_CLASS_MONITOR = {0x4d36e96e, 0xe325, 0x11ce, 0xbf, 0xc1, 0x08, 0x00, 0x2b, 0xe1, 0x03, 0x18};

// Assumes hDevRegKey is valid
bool GetMonitorSizeFromEDID(const HKEY hDevRegKey, short& WidthMm, short& HeightMm)
{
	DWORD dwType, AcutalValueNameLength = NAME_SIZE;
	TCHAR valueName[NAME_SIZE];

	BYTE EDIDdata[1024];
	DWORD edidsize=sizeof(EDIDdata);

	for (LONG i = 0, retValue = ERROR_SUCCESS; retValue != ERROR_NO_MORE_ITEMS; ++i)
	{
		retValue = RegEnumValue ( hDevRegKey, i, &valueName[0],
			&AcutalValueNameLength, NULL, &dwType,
			EDIDdata, // buffer
			&edidsize); // buffer size

		if (retValue != ERROR_SUCCESS || 0 != _tcscmp(valueName,_T("EDID")))
			continue;
		
		WidthMm  = ((EDIDdata[68] & 0xF0) << 4) + EDIDdata[66];
		HeightMm = ((EDIDdata[68] & 0x0F) << 8) + EDIDdata[67];

		return true; // valid EDID found
	}

	return false; // EDID not found
}

bool GetSizeForDevID(const CString& TargetDevID, short& WidthMm, short& HeightMm)
{
	HDEVINFO devInfo = SetupDiGetClassDevsEx(
		&GUID_CLASS_MONITOR, //class GUID
		NULL, //enumerator
		NULL, //HWND
		DIGCF_PRESENT, // Flags //DIGCF_ALLCLASSES|
		NULL, // device info, create a new one.
		NULL, // machine name, local machine
		NULL);// reserved

	if (NULL == devInfo)
		return false;

	bool bRes = false;

	for (ULONG i=0; ERROR_NO_MORE_ITEMS != GetLastError(); ++i)
	{
		SP_DEVINFO_DATA devInfoData;
		memset(&devInfoData,0,sizeof(devInfoData));
		devInfoData.cbSize = sizeof(devInfoData);

		if (SetupDiEnumDeviceInfo(devInfo,i,&devInfoData))
		{
			HKEY hDevRegKey = SetupDiOpenDevRegKey(devInfo,&devInfoData,
				DICS_FLAG_GLOBAL, 0, DIREG_DEV, KEY_READ);

			if(!hDevRegKey || (hDevRegKey == INVALID_HANDLE_VALUE))
				continue;

			bRes = GetMonitorSizeFromEDID(hDevRegKey, WidthMm, HeightMm);

			RegCloseKey(hDevRegKey);
		}
	}
	SetupDiDestroyDeviceInfoList(devInfo);
	return bRes;
}

int _tmain(int argc, _TCHAR* argv[])
{
	short WidthMm, HeightMm;

	DISPLAY_DEVICE dd;
	dd.cb = sizeof(dd);
	DWORD dev = 0; // device index
	int id = 1; // monitor number, as used by Display Properties > Settings

	CString DeviceID;
	bool bFoundDevice = false;
	while (EnumDisplayDevices(0, dev, &dd, 0) && !bFoundDevice)
	{
		DISPLAY_DEVICE ddMon;
		ZeroMemory(&ddMon, sizeof(ddMon));
		ddMon.cb = sizeof(ddMon);
		DWORD devMon = 0;

		while (EnumDisplayDevices(dd.DeviceName, devMon, &ddMon, 0) && !bFoundDevice)
		{
			if (ddMon.StateFlags & DISPLAY_DEVICE_ACTIVE &&
				!(ddMon.StateFlags & DISPLAY_DEVICE_MIRRORING_DRIVER))
			{
				DeviceID.Format (L"%s", ddMon.DeviceID);
				DeviceID = DeviceID.Mid (8, DeviceID.Find (L"\\", 9) - 8);

				bFoundDevice = GetSizeForDevID(DeviceID, WidthMm, HeightMm);
			}
			devMon++;

			ZeroMemory(&ddMon, sizeof(ddMon));
			ddMon.cb = sizeof(ddMon);
		}

		ZeroMemory(&dd, sizeof(dd));
		dd.cb = sizeof(dd);
		dev++;
	}

	return 0;
}

SetupAPI is still not the most pleasant of API sets around, but as MSFT’s Doron Holan replied to a user preferring to dig in the registry himself:

Programming is hard. Plain and simple. Some problems are simple, some are hard. Some APIs you like, some you don’t. Going behind the back of those APIs and getting at the data yourself will only cause problems for you and your customers.

I actually had to query the dimensions of a specific monitor (specified HMONITOR). This was an even nastier problem, and frankly I’m just not confident yet that I got it right. If I ever get to a code worth sharing – I’ll certainly share it here.

Breaking on Data Read

You’re probably familiar with Data Breakpoints, and rightfully so: It’s extremely useful to know where a value changes. But did you know that with a little help VS can break when a value is used?

Usage

By ‘little help’ I mean some code. Plenty of free implementations are available: 1, 2, 3, 4. All are very similar (one notable difference discussed below), but I’m used to the first and will use it below.

First, a toy example:

#include "Breakpoint.h" 
...
CBreakpoint g_bp;
…
void Whateva()
{
int a = 3, b;
g_bp.Set(&a, 4, CBreakpoint::Read);
b = a; // g_bp breaks!
g_bp.Clear();
}

The link does mention that you can call CBreakpoint::Clear() from a QuickWatch window (== from anywhere the Expression Evaluator lives, for that matter). What's even more useful - you can call CBreakpoint::Set() from the debugger with a minor additional cast. While debugging the code above, evaluate the following in any watch window:

g_bp.Set(&a, 4, (CBreakpoint::Condition)3)

image 

Internals (well, some, anyway)

Both read and write breakpoints are implemented via debug registers: special registers on a x86 CPU which trigger an 'int 1' interrupt ('debug step') whenever a pre-specified virtual address is accessed. Debug Register Dr7 is set to activate any hardware breakpoint, Dr0-Dr3 determine the type (11b means read/write).

All implementations linked above modify the debug registers via SetThreadContext. The documentation includes a grave warning, that only implementations 3 & 4 seem to respect:

Do not try to set the context for a running thread; the results are unpredictable. Use the SuspendThread function to suspend the thread before calling SetThreadContext.

However I've never had issues with implementation 1, so I assume in practice usage of SetThreadContext with this particular mask (CONTEXT_DEBUG_REGISTERS) is safe.

This usage makes one wonder – are debug registers indeed part of a thread context? Are they reset on every context switch?

The intel manuals, vol 3A, section 16.4.2 details the contents of DR7:

The debug control register (DR7) enables or disables breakpoints and sets breakpoint conditions ... The flags and fields in this register control the following things:

• L0 through L3 (local breakpoint enable) flags (bits 0, 2, 4, and 6) — Enables (when set) the breakpoint condition for the associated breakpoint for the current task. When a breakpoint condition is detected and its associated Ln flag is set, a debug exception is generated. The processor automatically clears these flags on every task switch to avoid unwanted breakpoint conditions in the new task.

Oh dear. Are hardware breakpoints indeed that useless? Are they indeed blind to reads/writes by other threads?

Well obviously, no. It's a 1 minute test to set a HW-Bp, modify its address from a different thread and watch it trigger.

It all boils down to a nuance in x86 terminology: tasks are not threads. Windows does not use the task context switching hardware apparatus that x86 offers, so it really is an OS decision whether to store debug registers per thread – and the obvious choice seems to store them per process. That is probably the reason calling SetThreadContext with CONTEXT_DEBUG_REGISTERS mask is safe also for non-suspended threads.

g_dwLastErrorToBreakOn: Watching Errors on VS Revisited

Raymond Chen posted about SetLastError recently, and an interesting discussion ensued. One comment in particular caught my eye:

The easiest way to catch a specific last error value in debugger is to set ntdll!g_dwLastErrorToBreakOn to that value.

A good while back I needed to break when such a LastError is set, and dug up all sorts of hacks to do so – breaking at SetLastError, setting data breakpoint on the thread-env-block-error, and the like. Beyond being cumbersome and plain ugly such breakpoint tricks can be very slow, and give a lot of false positives.

Seems the Win32 folks had similar needs and I was glad to discover they formed a better (undocumented, but still) solution. The authoritative source seems to be a 2007 post from Microsoft’s Daniel Pearson:

..Hiding inside of kernel32′s address space is a global variable called g_dwLastErrorToBreakOn. It turns out that SetLastError checks the value of this variable and if it’s non-zero, calls DbgBreakPoint if the two [values] match.

It’s a zero-overhead trick, and is very easy to do in Visual Studio: make sure kernel32.dll symbols are loaded, then type in a watch window –

(int*){,,kernel32.dll}_g_dwLastErrorToBreakOn

- and edit the referenced int:

image

Pearson notes two changes introduced in Vista:

(1) Up until XP, only Win32 API implemented in KERNEL32.DLL actually used SetLastError (and so tested g_dwLastErrorToBreakOn) – other dll’s used to set the error value via RtlSetLastWin32Error. Since Vista all Win32 API which set an error do so with SetLastError, so the g_dwLastErrorToBreakOn is much more reliable.

(2) Since Vista, g_dwLastErrorToBreakOn moved to NTDLL.DLL, so the VS usage should be changed to -

(int*){,,ntdll.dll}_g_dwLastErrorToBreakOn

It’s interesting to note that ntdll.dll does contain a separate instance of g_dwLastErrortToBreakOn also on XP machines:

image

But I verified that this value is never read, on calls into both kernel32 and ntdll.

AfxIsValidAddress (and Others) Don’t Work as Advertised

MFC exposes a some memory debugging facilities such as AfxIsValidAddress, which (for debug builds) supposedly -

Tests any memory address to ensure that it is contained entirely within the program’s memory space.

Or does it?   AfxIsValidAddress only delegates the call to the undocumented ATL::AtlIsValidAddress, which reads:

// Verify that a pointer points to valid memory
inline BOOL AtlIsValidAddress(const void* p,
                  size_t nBytes, BOOL bReadWrite = TRUE)
{
      (bReadWrite);
      (nBytes);
      return (p != NULL);
}

The first two lines are just no-ops to silence compiler warnings about unused parameters. The promised verification that p is-contained-entirely-within-the-program’s-memory-space, amounts to the test that p is not null.

AfxIsValidString identically does not hold to its word and tests only if the input string is non-null:

inline BOOL AtlIsValidString(LPCSTR psz,
                          size_t nMaxLength = UINT_MAX)
{
      (nMaxLength);
      return (psz != NULL);
}

AfxAssertValidObject  is undocumented, and runs a bunch of redundant AfxIsValidAddress-es. CObject::AssertValid is documented (‘performs a validity check on this object by checking its internal state’!!), and is just as disappointing:

void CObject::AssertValid() const
{
     ASSERT(this != NULL);
}

The reason is probably that somewhere along the way MS realized there is no sane way to keep such promises.

There are several Win32 API similar in functionality: IsBadWritePtr, IsBadHugeWritePtr, IsBadReadPtr, IsBadHugeReadPtr, IsBadCodePtr, IsBadStringPtr. It has been known since at least 2004 that these functions are broken beyond repair and should never be used. The almighty Raymond Chen and Larry Osterman both discuss the reasons in detail, so just a short rehash: IsBad*Ptr all work by accessing the tested address and catching any thrown exceptions. Problem is that a certain few of these access violations (namely, those on stack guard pages) should never be caught – the OS uses them to properly enlarge thread stacks.  In the words of Michael Howard, who first realized this (or so I think – both Larry and Raymond attribute the CrashMyApplication nicknames to him):

You should also not catch all exceptions, but only types that you know about. Catching all exceptions is just as bad as using IsBad*Ptr.

I’m guessing that AfxIsValidAddress – and sisters – worked the same way, until someone realized they too were probably generating more debugging effort than they were saving. However, while the Win32 guys decided to leave their API semantics as are and clearly document them as obsolete and dangerous, the MFC guys turned their API into no-ops and did cleanup work that cannot be described as anything but sloppy. Not only is the documentation wrong, source comments are too:

// AfxIsValidAddress() returns TRUE if the passed parameter points
// to at least nBytes of accessible memory. If bReadWrite is TRUE,
// the memory must be writeable; if bReadWrite is FALSE, the memory
// may be const.

BOOL AFXAPI AfxIsValidAddress( ...

– and even the MFC sources themselves still contain many naive uses of these no-op tests (search around, you’ll find plenty).

Bonus: What If You Really Have To Test Memory for Validity?

This is after all not so far fetched. Years ago I had to work around a bug in nVidia GPU drivers, where occasionally some API (namely IDirect3DVolumeTexture9::LockBox, applied on a huuuuuuuge texture) returned success but gave a bogus memory buffer. The workaround was to VirtualQuery the address, and if not accessible – retry the process on a smaller volume texture. It went something like -

...
D3DLOCKED_BOX box;
HRESULT res = pVolumeTexture->LockBox(0, &box, NULL,
                                     D3DLOCK_DISCARD);
_ASSERT(res==D3D_OK  && box.pBits);
// this should have sufficed, but...

MEMORY_BASIC_INFORMATION meminf;
VirtualQuery(box.pBits, &meminf, sizeof(meminf));

BOOL  bOk = (meminf.State == MEM_COMMIT)  &&
      ( 0 !=  (meminf.Protect &
      ( PAGE_READWRITE | PAGE_WRITECOPY | PAGE_EXECUTE_READWRITE));

if (!bOk)
 //fallback
...

Of course VirtualQuery is innately sluggish, so this is completely unacceptable as mere parameter validation. Use this only when you know you have to, not just to be on the safe side.

Deleting Folders

RemoveDirectory requires the input folder to be empty. That typically requires repeatedly FileFind’ing the folder contents (either with the MFC wrapper or directly with the Win32 API) and DeleteFile‘ing. Things soon get interesting when you discover you need more code to detect subfolders and recursively empty and delete them – the code for a simple task seems to get out of hand.

Sarath suggests a seemingly more pleasant way, SHFileOperation.  A quick rehash:

bool DeleteDirectory( CString strPath )
{
  strPath += _T( ‘\ 0′ );

  SHFILEOPSTRUCT strOper = { 0 };
  strOper.hwnd = NULL;
  strOper.wFunc = FO_DELETE;
  strOper.pFrom = strPath;
  strOper.fFlags = FOF_SILENT | FOF_NOCONFIRMATION;

  if ( 0 == SHFileOperation ( &strOper ))
  {
    return true;
  }
  return false;
}

This is an attractive alternative indeed, but turns out it has a quasi-bug, a hidden gotcha, bizarre error-reporting and some lacking capabilities.

1. Quasi-bug

MSDN mentions that this sort of SHFileOperation usage must be followed by an SHChangeNotify call. The code should read:

...
if ( 0 == SHFileOperation ( &strOper ))
{
  SHChangeNotify(SHCNE_RMDIR, SHCNF_PATH, strOper.pFrom, NULL);
  return true;
}
...

I call this a quasi-bug because (a) I’ve no idea how SHChangeNotify affects the shell, (b) a toy-test I just did shows that windows explorer immediately picks up this SHFileOperation change without an explicit SHChangeNotify, (c) Not only Sarath but also Jonathan Wood omits the SHChangeNotify call right there on MSDN, and finally (d) this just seems a silly API design. SHFileOperation is a shell API – it can easily (probably has no choice but to-) notify the shell himself of whatever needs notifying, and I cannot imagine a scenario where a user might prefer to skip such a notification.   Gotta ask about this at Raymond’s some day.

2. Hidden Gotcha

You should never use relative paths as an input to SHFileOperation, as (quoting MSDN) “Using it with relative path names is not thread safe”. Apparently the implementation somehow is thread safe for absolute path. Must be some arcane file system issue buried deep inside.

3. Bizarre Error Reporting

Quoting again:

Do not use GetLastError with the return values of this function.

To examine the nonzero values for troubleshooting purposes, they largely map to those defined in Winerror.h. However, several of its possible return values are based on pre-Win32 error codes, which in some cases overlap the later Winerror.h values without matching their meaning. …   for these specific values only these meanings should be accepted over the Winerror.h codes. However, these values are provided with these warnings:

  • These are pre-Win32 error codes and are no longer supported or defined in any public header file. To use them, you must either define them yourself or compare against the numerical value.
  • These error codes are subject to change and have historically done so.
  • These values are provided only as an aid in debugging. They should not be regarded as definitive…

Feels like an all-but-deprecated API, and it is indeed superseded by IFileOperation since Vista.

4. Lacking Capabilities

Things get even more interesting when your folder contains read-only or hidden files. If you do enumerate and delete the folder contents yourself this is easily rectifiable by a SetFileAttributes call. The shell API has no way (that I know of) to achieve similar functionality.

Bottom Line

For any real production code I whole heartily recommend against SHFileOperation calls. Using it has real potential of dooming your code users and maintainers for weird, time consuming bugs.

It’s really not that terrible a bullet to bite – just a few dozen more code lines. Even better, you can find them here (MFC version):


VOID MakeWritable(CONST CString& filename)
{
  DWORD dwAttrs = ::GetFileAttributes(filename);
  if (dwAttrs==INVALID_FILE_ATTRIBUTES) return;

  if (dwAttrs & FILE_ATTRIBUTE_READONLY)
  {
    ::SetFileAttributes(filename,
    dwAttrs & (~FILE_ATTRIBUTE_READONLY));
  }
}

BOOL DeleteDirectory(CONST CString& sFolder)
{
  CFileFind   ff;
  CString     sCurFile;
  BOOL bMore = ff.FindFile(sFolder + _T("\\*.*"));

  // Empty the folder, before removing it
  while (bMore)
  {
    bMore = ff.FindNextFile();
    if (ff.IsDirectory())
    {
      if (!ff.IsDots())
        DeleteDirectory(ff.GetFilePath());
    }
    else
    {
      sCurFile = ff.GetFilePath();
      MakeWritable(sCurFile);

      if (!::DeleteFile(sCurFile))
      {
        LogLastError(); // just a placeholder - recover whichever way you want
        return FALSE;
      }
    }
  }

  // RemoveDirectory fails without this one!  CFileFind locks file system resources.
  ff.Close();

  if(! ::RemoveDirectory(sFolder))
  {
    LogLastError();
    return FALSE;
  }
  return TRUE;
}

Duplicate Volume Serial Numbers

We recently released a product version, with yearly licenses attached to the machine’s Volume Serial Number.  Now it is called a ‘serial number’, and it seems as meaningless and as random as a UID (mine is 34EE-10A0), so it must be a UID. Right?

Well, not quite. This ID characterizes a volume, not a disk. If you have a partitioned disk, just type at a command prompt  ‘dir c:’ and ‘dir d:’ (or whatever) and watch your partitions’ different VSNs. As the link teaches, the VSN data is part of the partition’s extended boot sector, and is no more then a hash of the partition-creation date & time (i.e., disk formatting date & time).  So, it’s not technically unique – if any two disks are formatted (or partitions created) at the exact same time, they’d have identical VSN. Also – since its only 4 Bytes, the chances of a random hash-duplication are very real.  Just for the sports, if it’s evenly distributed and the world has, say, 1 billion computers,  the chances of duplicate-free distribution of VSN is around 0.187^(1 billion). So there are out there in fact quite a few duplicate VSNs.  But hey – unless you’re Microsoft, such global-scale stuff really shouldn’t trouble you. I mean, c’mon – say you have – what, 1000 clients? 10,000?  make it a hundred-thousand clients. You should never worry about the chance of a duplicate VSN. Now should you?

The real and sad answer, as I recently discovered, is that if you have two clients who use an identical computer model (at least by Dell, but probably true for all other major vendors), the chance of them having identical VSN is exactly ONE.

Dell do not separately format and install every hard drive of the kajillion they deploy. They make some master copy, then deep-copy it around (as us home users do with Acronis, Norton Ghost or whatever). As noted, the VSN is part of the data on the disk, and so is copied as well.

We tried to confirm this officialy with Dell, so far without success. The issue has very sparse web presence too, hence – this post. Hope it helps someone.

Memory Fragmentation Trouble

We recently had some weird issues that turned out to emanate from a failure to allocate a large consecutive chunk of heap memory.  (It was an exceptional pain to nail the cause there – maybe more on that in a future post).  The desired allocation was to be  ~400M, and since machines today ship more-or-less-by-default with 2G-4G RAM, there shouldn’t be a real justification for such allocations to fail.  Or should there?

First of all, regardless of your available physical RAM, your real memory playground size is 2G – the bottom half of your process’ address space, its user-mode portion.  Yes, I’m well aware of the /3GB boot.ini switch, and trust me – you don’t want to go there in a 3D application. I was badly burnt there already.  PAE/AWE have downright hostile API sets too – you’d just have to do with 2G.

The real issue here is memory fragmentation.

An obvious solution would be migrating to Win64, and forgetting about fragmentation issues for the near century. Sadly, this was not a feasible option for us: we have a legacy stash of in-house 32-bit custom hardware drivers, and migrating those would be the absolute last resort.

Happily, a  surprisingly short online research gave quite a few constructive 32-bit directions. Here are some.

  1. Low Fragmentation Heap is a nice built in feature, on by default since Vista.  you should apply LFH to the CRT heap, retrieved by _get_heap_handle (just try the sample code). Even better – try applying to all process heaps.   There should be no reason not to apply this to all projects, except (screeeeeeeeeeeeech..) it seems the magic doesn’t work on standard debug builds.  Which, well, err, makes it kinda useless.
  2. HeapDecommitFreeBlockThreshold is a magical registry key that is advertised to make a noticeable difference. It does so by causing the heap to hold on to small allocations just a bit longer. Such increase of the HeapManager jurisdiction can potentially prevent page ‘theft’ for non-heap usage, thereby reducing some fragmentation factors.
  3. Typically a lot of fragmentation (at the 100Megs scale) is caused by sparse mapping of binary images to the process address space, at load time.
    In simpler English, say your process uses forty 1-Meg dll, and maps them to memory in regular 50Meg intervals.  They now sparsely occupy just 40Megs of your available 2G, leaving no consecutive memory chunk larger than 49M!
    To counter that, first map your virtual address usage. Until recently you’d have to use either vadump or direct code instrumentation, but since this summer you have the incredible (as always) SysInternals tool VMMap. When you spot some dll’s that are just teasingly smiling at you from the middle of your address space, use editbin.exe to ruthlessly rebase them away.
  4. Pre-designate a large heap (say 500M) at link time, thus giving the heap a head start in the race for consecutive pages.

I decided to try the steps in order of increasing effort, and am overjoyed to say (2) & (4) sufficed. We now successfully allocate 400M chunks.

We did peek into the process with VMMap, though, and it did surface some interesting finds. For one, babylon translator, installed on all our development machines, has the HUTZPA to inject captlib.dll into the very middle of our precious address space.

My hunch says rebasing could indeed hold the highest impact. We may have to try that too eventually – I hope to post with some findings.

EnumDisplayMonitors Troubles – NvCpl solutions.

Recently I had EnumDisplayMonitors presenting some strange behaviour, exposing 2 monitors when in fact only one was connected.

The nVidia Control panel API eventually came to the rescue – it exposes the entire functionality available via the nVidia control panel, and implemented in NvCpl.dll in the Windows\system32 folder. Specifically NvCplRefreshConnectedDevices(), quote from the manual, ‘refreshes the connection state cache for all display outputs on the selected GPU’.

Never heard of the ‘connection state cache’ – indeed, by the documentation another API (NvCplGetActiveDevicesString), retrieves the ‘connected device
state that was cached by the driver during such system events as bootup, logon, or opening of the display properties control panel’.

Bottom line, the following code solved the problem:

typedef BOOL (APIENTRY *NvRefreshProc)( IN DWORD );

VOID RefreshDisplayCache()
{
   HANDLE hNvCpl = LoadLibrary(_T("NvCpl"));

   // The graphics driver caches monitors setup, and sometimes gets out of sync.
   // Here we force a refresh.
   if (hNvCpl)
   {
      NvRefreshProc ProcRefresh = (NvRefreshProc)
            GetProcAddress(hNvCpl, "NvCplRefreshConnectedDevices");

      if (ProcRefresh)
            (ProcRefresh) (1);  // input flag - NVREFRESH_NONINTRUSIVE
   }
}

As an added bonus, I’m now able to programmatically set ‘fixed aspect-ratia scaling’, which I couldn’t until now.


typedef DWORD (APIENTRY *dtcfgexPROC)( LPSTR );

...

// change display to fixed aspect-ratio scaling, so as not to distort the
	// picture on wide screens.
	if (hNvCpl)
	{
		dtcfgexPROC ProcAdd = (dtcfgexPROC) GetProcAddress(m_hNvCpl, "dtcfgex");

		if (ProcAdd)
			(ProcAdd) ("setscaling 1 5"); // set scaling on display #1 to mode 5 (fixed aspect ratio)
	}

(check out dtcfgex in the manual for details.)

The API contains lots of other low level goodies. If you make extensive use of it, you might prefer to statically link against NvCpl.dll and include the header NvCpl.h (contained in this nVidia sample download).

I have no idea how to achieve both tasks on ATI cards (and whether similar cache or scaling mode even exists there) but we ship only nVidia’s – so no worries there, for now.