Forcing Construction of Global Objects in Static Libraries

Suppose you have a global object whose constructor does useful stuff – say, registration somewhere or initialization of global resources. Suppose further this object isn’t directly accessed anywhere – you just need the functionality in its ctor (which might trigger other functionality eventually. All is fine, until we add the last assumption: suppose this object lies in a static library.

This seems to be a long lasting pain, ultimately arising from the old (‘broken’? let’s just say ‘outdated’) C++ compiler-linker model. The way the linker works is by repeatedly searching for implementations of yet-unresolved referenced symbols, and including only the obj files with such implementations – thereby dropping entirely obj files with no external references, such as the one containing the global object whose ctor you need to run.

To make things concrete, take the following toy example:


//main.cpp
#include <tchar.h>

int _tmain(int argc, _TCHAR* argv[])
{
	return 0;
}

//GlobalInLib.cpp – compile as static lib
#include <stdio.h>
#include <tchar.h>

struct UsefulCtor
{
	UsefulCtor()  { _tprintf(_T("ThereIsNoSpoon")); }
};

UsefulCtor MyGlobalObj;

Under normal linkage, MyGlobalObj would be ignored. You can verify this either by putting a breakpoint in its constructor and see that it is never hit, or inspecting the output console window and see that it is empty.

<Aside> An interesting discussion arose a while ago in MS forums on whether this behavior violates the standard. Here, einros writes:

The C++ standard, section 3.7.1, specifies:

“If an object of static storage duration has initialization or a destructor with side effects, it shall not be eliminated
even if it appears to be unused, [...]“

But MS’ Holder Grund clarifies –

[Your quote of the standard] only holds if the corresponding translation unit is part of the program. In my definition and the one of at least four major toolchain implementators, it is not.

</Aside>

Enter ‘Use Library Dependency Inputs’.

This arcane combo box in the project references dialog has the sole documented effect of enabling incremental linking for static libs, but the interesting part is how it does it:

When this property is set to Yes, the project system links in the .obj files for .libs produced by dependent projects, thus enabling incremental linking.

And indeed, setting this option to True causes construction of MyGlobalObj in the example above.

Turns out you can force construction of globals in static libs after all.

 

 

Addendum: Only after writing this post did I come across this excellent 2005->2012 thread, which mentions this setting as a solution. Still, this effect of the linker is all but undocumented, and qualifies as deserving-more-web-presence.

My Guest Post on the VC Team blog

I answered a public invitation by Eric Battalio of the VC team – and just now published an article on the VC blog, introducing the native Expression Evaluator:

Every time you use the Watch window, a lot is going on behind the scenes. Whenever you type a variable name, something needs to map that name to the memory address and type of the named variable, then display that variable, properly formatted based on its type. Conversely, when you modify the contents of a variable – something needs to take your text input, convert it to the right type and correctly update the memory at the right address.

That something is the Expression Evaluator. It is an impressive and often overlooked piece of technology and once familiar with it, you can put it to good use, sometimes in surprising ways!

Check it out!

Geometric Inverse Application 1: Barycentric Coordinates

Last time I jotted down some equations suggesting how you should understand 3d matrix inverses, or how to solve 3×3 equations. Below is a first application, for obtaining barycentric coordinates.

Barycentric coordinates are the canonical way of describing a point within a triangle (or more generally, within a polygon, or just any convex point set). Briefly put, suppose you’re given a triangle with vertices A, B & C, and an interim point P:

P’s position relative to A, B and C can be described by a set of 3 scalars, say α, β & γ, called its barycentric coordinates:

For P to lie in the plane formed by A, B & C these ingredients must satisfy –

And for P to lie within the triangle, they must satisfy –

You can think of these equations as describing a recipe for cooking up P from the ingredients A, B & C: α as the amount of A you need to put in, β the amount of B γ of C. These coordinates are very useful, for example, for interpolation: quantities that are stored for A, B & C can be mixed with the same coefficients and applied to P.

Now how do you actually find barycentric coordinates? Well, the equation defining them can be rewritten in matrix form:

Which gives a still-not-very-explicit expression for the coordinates:

The derivation in the previous post gives a way to deduce expressions for each coordinate. Say, for α:

And similarly:

For some extra geometric flavour, note that these quotients can be understood as ratios of areas: α is the ratio of the area of the triangle P-B-C to the area of the full triangle A-B-C:

Finally, a correction of an apparently common misconception. I’ve heard a few times the interpretation of barycentric coordinates as a expressing distances of some sort – it is indeed tempting to think that if α is close to 1 then P’s distance from A is small. That just isn’t true. For example in this setup –

A is the triangle vertex closest to P, and still the α coordinate is zero – as low as it can get. When ‘cooking up’ P, we have to mix in only B and C – with no A at all.

Geometric Interpretation of a 3D Matrix Inverse

I work a lot with 3D calculations, and every so often a non trivial 3D tidbit comes along. Some of these might be of use to others – and so, by the power vested in me as absolute monarch of this blog, I hereby extend its scope to some light 3D geometry. I’ll try to keep posts in this category less rigorous and yet more verbose than regular math write-ups.

Take a 3×3 matrix that you wish to invert, say M. Think of its columns as 3-dimensional vectors, A, B & C:

image

Now take it’s sought-after inverse, M-1, and think of its rows as 3D vectors, say v, u & w. That means essentially:

image

Next focus just on v, M-1 ‘s first row. What can be said of it, in terms of A, B & C?  Looking in the first row of the multiplication result – the identity matrix – we see:

image

Which means in particular that v is orthogonal to both B and C. Assuming B and C aren’t co-linear (otherwise M wouldn’t be invertible in the first place) there is but a single direction in 3D space which is perpendicular to both, and it can be written as B×C  – vector-product or cross-product of B and C.  Hence v must be a multiplication by some scalar – say α – of this direction:image

To deduce α remember the v must be normalized so that its dot product with A gives 1.  And so:image

The triple product in the denominator, A∙(B×C), should look familiar: that is in fact det(M) – the determinant of the original matrix. Had we inverted M with a more traditional apparatus, say Kramer’s rule, we would have divided by this determinant directly.

Naturally similar expressions are obtained for the other rows, u & w :

image

All the denominators are in fact equal, to each other and to det(M).

Why all the hassle?

First, for the fun of it. Personally I find it much easier to understand – and thus remember – geometric stories than algebraic ones.

Second, this formulation exposes several optimization opportunities.

  1. After computing B×C you can obtain the first of (and so all of) the denominators, by simply taking a dot product with A.
  2. If you need just a single row of the inverse matrix, you can calculate it directly – without having to invert the entire matrix.
    This is not as far fetched as it might seem: say you formulate a 3×3 linear equation set, but you’re actually interested only in the 1st solution coordinate:imageJust take the 1st row of M’s inverse, as outlined above, and dot-product it with b:
    image

Third, using analytical expressions as above for solving linear equations is generally preferable to numeric solvers. For matrices as small as 3×3, solving numerically would probably be a bad idea anyway – even traditional, tedious inverses (with adjoint matrices and all) would be preferable to numeric solutions.

BTW, higher dimensional analogues do exist – and are as easy to derive – but the main added value, namely direct geometric insight, is lost beyond three dimensions.

VS Support Policy

As far back as this MS support page goes, Visual studio editions had a 5-year mainstream support period, and since VS .NET 2003 – a 10 year extended support period. In particular, VS2010 mainstream support is advertised to end on Jul 14, 2015.

Now given that MS releases major new VS versions roughly once every 2 years, such a support period can be quite a burden. In my (much, much smaller) organization we don’t bother with backward support at all, we just ask those pesky cry-baby customers to upgrade to our latest version before we even consider checking their bug reports. The logistics of testing and patching multiple versions can admittedly get exhausting – so I would have had great respect to the magnitude of the task that DevDiv took upon themselves when they chose to support 2.5 versions backwards.

If they would have actually done so.

As of ~July 2012, the VS bug submission form in Connect no longer enables even reporting issues with VS versions prior to 2012 (note, I think that was before VS2012 even reached RTM):

image

Long before that, bugs I filed against VS2010, along with complete, consistent repros, were closed as either not reproducible or fixed – if they happened to be resolved in VS2012.

For all practical purposes, support in VS2010 ended less than 2 years after its release – and less than 1 year after its first service pack release!  In our organization (and I suspect in many others) we don’t even consider upgrading VS before the newest version has had its run, and reached a level of maturity attainable probably only at a service pack release. That leaves us with less than a year of practical support, which is beyond annoying – it borders on fraud.

I should probably post here more details about specific unresolved bugs I reported. Beyond that and the much needed venting, I don’t see much that can be done.

Template Meta Programming is Still Evil

I won’t include a meta-programming intro paragraph here, since if you’re not familiar with it – I sincerely hope you stay that way. If you insist, get an idea online or read the book (it’s a good read, but can’t say I recommend it since the entire purpose of this post is to persuade you to not use what it teaches).

I don’t like meta-programming. Passionately so. What’s worse, I seem to be pretty much the only one: I can’t really find any anti-MP texts around!  So either

(a) There’s a community of MP-bashers lurking somewhere out of my reach,

(b) I’m waaaay off, and comments to this post would make me see my wrongdoing and shy away to a dark cave for a while,

(c) The world really misses an anti-MP manifest.

However the case turns out it would do me good to try and articulate these thoughts. Here goes.

Meta-Programming is Hacking, not Engineering

One might refer as hacking to pouring orange juice using chopsticks, cutting toothpastes open, or amplifying phones with glass pitchers.  On the less-adorable side, we also refer as ‘hacks’ to chewing gum car fixes. Here’s a suggested definition that tries to encompass the creative, improviser, and often lazy aspects of hacking:

Hacking: Achieving a goal by using something in a way it wasn’t designed for.

Defining the other side of the scale is much easier:

Engineering: the discipline, skill, and profession of acquiring and applying scientific … and practical knowledge, in order to design and build  …  systems and processes.

Now how would you classify meta-programming?   It’s inception, for one, gives a clear hint:

Historically TMP is something of an accident; it was discovered during the process of standardizing the C++ language that its template system happens to be Turing-complete, i.e., capable in principle of computing anything that is computable.

C++ types were never designed to perform compile time calculations. The notion of using types to achieve computational goals is very clearly a hack – and moreover, one that was never sought but rather stumbled upon.

The Price

Don’t take it from me, take it from two guys who know an itzy bit more about C++.

Herb Sutter, former secretary of the C++ standardization committee, is one.

Herb: Boost.Lambda, is a marvel of engineering… and it worked very well if … if you spelled it exactly right the first time, and didn’t mind a 4-page error spew that told you almost nothing about what you did wrong if you spelled it a little wrong. …

Charles: talk about giant error messages… you could have templates inside templates, you could have these error messages that make absolutely no sense at all.

Herb: oh, they are baroque.

Jim Radigan, MSVC compiler lead developer, probably understands a thing or two himself.

Jim: We’ve been able to use templates, we’ve been able to do a whole bunch of things.

Charles: Do you use advanced sort of meta-programming at the compiler level?

Jim: We try to steer away from really complex things like that because what happens is.. the tire hits the road when it’s two o’clock in the morning and somebody sends you a pri-zero bug, say, windows doesn’t boot.  …

so what we engineered for, clearly, is maintainability. You want somebody to come up to speed, be able to go in, binary search windows and step through the debugger in the compiler and find out where we did the illegal sequence.  …

One of the other things that happen when we go to check code into the compiler is we do peer code review. So if you survive that, it’s probably ok, it’s not too complex. But if you try to check in meta-programming constructs with 4-5 different include files and virtual methods that wind up taking you places you can’t see unless you’re in a debugger – no one is going to let you check that in.

… We do use STL, but we don’t go really abstract because we want to be able to quickly debug.

So bottom lines,  using meta-programming you end up pouring substantially more effort into writing code that even builds.  Your code maintainers end up pouring substantially more effort to be able to understand and debug that code.

The Benefit

Concise, elegant code.

As far as I can say, that is the sole benefit of this ordeal.

Think about that for a second. The very real reward for using MP in your code is the moment of satisfaction of having solved a hard riddle. You did stuff in 100 lines that would have otherwise taken 200.  You grinded your way through incomprehensible error messages to get to a point where if you needed to extend the code to a new case, you would know the exact 3-line template function to overload. Your maintainers, of course, would have to invest infinitely more to achieve the same.

I whole heartily empathize – I can get lost for days in such riddles (in and out of programming), and I still remember the joy of having first deciphered a SFINAE construct in code.

It might be a necessary stage in every developers professional path, but one must mature out of it. You have to return and think of your tools as exactly that, tools: unless you’re a standard committee member C++ is a means to an end, not a goal by itself. The geeky pleasure of having mastered the esoteric side effects of some language features is completely understandable, but engineering-wise the price can be formidable – so please, please, fight this temptation valiantly.

Perhaps some day..

The original post title was the way-more-catchy ‘MP is evil’. I modified it to ‘Still Evil’ because I have high hopes: C++11 seems to be very aware of the desire to make compile-time programming a designed language feature, and not just a collection of library hacks.

Let’s talk again in the future. I’ll be very open to revise my opinions when concepts are standardized and any compiler implements constexpr.

_DllMain@12 already defined

We recently faced this linkage error:

error LNK2005: _DllMain@12 already defined in MSVCRT.lib(dllmain.obj)

Searching gives ~36K results as of July 2012, many of which seem high quality (StackOverflow, MS, CodeProject etc.), and I was certain it would be a simple matter of finding a fix online and blindly applying it. However it seems the root cause in our particular case wasn’t covered yet (AFAIK), and it seems worthwhile to document.

The MS KB article teaches that this is a linkage order problem – MFC libs must be linked before the CRT ones – but none of the fixes the article proposes worked. We did have one build configuration which was successful and one which failed with the above LNK2005 (Release – but it really doesn’t matter) so I dumped two /VERBOSE linker outputs for the two configurations and diffed them. After some admittedly tedious inspection, an interesting difference came up – these lines were dumped only in the successful build:

Found __afxForceUSRDLL

Referenced in Stdafx.obj
Loaded mfcs100d.lib(dllmodul.obj)

The symbol name implies that it is intended to force some linkage, and including it seems to have the beneficial effect of loading the mfc lib mfcs100d.lib.  Indeed, searching reveals the following lines in dllmodul.cpp:

#ifdef _X86_
extern "C" { int _afxForceUSRDLL; }
#else
extern "C" { int __afxForceUSRDLL; }
#endif

and the following in afx.h:

// force inclusion of DLLMODUL.OBJ for _USRDLL
#ifdef _USRDLL
#pragma comment(linker, "/include:__afxForceUSRDLL")
#endif

So it turns out there’s a single condition that governs the linkage to the MFC library mfcs100/d (the one containing DllModul.obj, which exports _afxForceUSRDLL), and that condition is – _USRDLL being defined.   Our linking project was indeed a dll and somehow the default _USRDLL preprocessor macro was missing from it – restoring the definition fixed the linkage.

So bottom line, if you get a ‘DllMain@12 already defined’ linkage error for a dll, here’s another thing to try: make sure _USRDLL is defined in your project C++ property sheets.

A Day with VS11 Beta – part 2.5: Auto Vectorizer, done right

Start at the end: the main example analyzed in the previous post is plain wrong. This loop:

for (int i=0; i<1000; ++i)   sum += a[i];

Vectorizes perfectly.

Even after me wrongfully accusing his team with this fictitious vectorization miss, Jim Hogg was kind enough to (1) test it and report this reduction loop is indeed vectorized, (2) link to my post, and worse yet, (3) say he enjoyed this blog.   What can I say, I’m embarrassed and humbled.   Thanks Jim.

My mistake was not – as Jim suspected – omitting /fp:fast. Rather, the problem was I coded multiple simple tests into a single console app main function, and debugged the resulting binaries from ICC/MSVC in disassembly mode.   From a more thorough inspection it seems both ICC and MSVC now do an aggressive interleaving of computations, and if as I suspect the aging PDB format still maps a consecutive range of instruction addresses to each source line – the debugger has a hard time matching location in disassembly to a source line. All in all, most probably I pulled the right conclusions on the wrong loops.

I did similar tests again – this time checking a single loop in every test.  A different case quickly turned up where ICC vectorizes and MSVC doesn’t:

double  a[2] = { 1., 2.};
double b[20000];
double S = 0;
for(int i=0; i<20000; i+=2)
S += a[0]*b[i] + a[1]*b[i+1] ;

And just to make extra sure, here’s some disassembly:

MSVC:

image

ICC:

image

ICC does some loop unrolling too so the code is harder to follow – but for skimming purposes it suffices to note the ‘packed double’ mul version (mulpd) in ICC, contrasted with the ‘scalar double’ mul version (mulsd) in MSVC. Similar results are seen in single precision floats too.

As in the previous post, this is simplified code that aims to capture the essence of real vectorizable scenarios. Suppose, for example, you need to transform a 3D mesh by a fixed rotation and translation. This amounts to a large loop with computations of the above type: one argument constant, the other scanning an array.  Such code might benefit considerably from auto vectorization.

The real test was the last one to be described at the blog: build and measure some real life computationally intensive code.  I did just that, and the results were – as noted – no measurable improvement over VC10.  So either my code has less to benefit from vectorization than I hoped, or – the gaps remaining in the vectorizer hold more promise than the gaps already filled.

I gotta try and measure performance with ICC one day – if I’ll ever have the patience. Our code builds for nearly half an hour on MSVC, so I’m guessing ICC builds would have to be done neither nightly or over-weekendly.