Software Development in C++

This section contains articles relating to software development in C++ in general terms: development tools, the software process and discussions about the good, the bad and the ugly in C++.

In this issue, our survey of C++ compilers continues and several contributors make critical comments about the cost and complexity of using C++. I have held the second instalment of my compiler-writing series over to Overload 8 for various reasons that will, I hope, become clear in that issue!

C++ compilers - mainly for OS/2

by Francis Glassborow

Before I tackle the primary subject of OS/2 C/C++ compilers I'd like to take a little space to expand on my column in the last issue.

For some reason I completely forgot to mention the most outstanding feature of Salford Software's C and C++ compilers - their debug support and in particular, runtime debugging. Anyone who has written more than the most trivial of programs will have tripped over memory problems (though they may not realise it yet).

Memory problems

There are four major categories of memory problem:

1. Dangling pointers (and references in C++), i.e., using a pointer variable that is no longer attached to underlying memory. For example, returning a pointer or reference to a local (auto) object. Often such abuse actually appears to work because the associated memory has not been reused yet. This actually makes matters worse because the defect will only manifest rarely, and will get past many programmer contrived test suites.

2. Writing beyond the end of an object - or sometimes before the beginning but this is a far less frequent problem. C's mechanisms for handling array parameters (and dynamic arrays) make this problem particularly vicious and frequently impossible (or effectively so) to detect statically (at compile time).

3. Memory leaks - the commonest form of resource leakage. This is another problem that is difficult or impossible to detect statically and which needs special tools to detect dynamically. In a way, it is the exact reverse of `dangling pointers' because it occurs when all pointers and references to dynamically assigned memory are lost before that memory is freed. The real sting in this problem is that it often only manifests as a serious problem when a program has been running for hours, days or possibly weeks. Virtual memory resources make it worse by delaying ultimate collapse.

4. Reading uninitialised memory. Any attempt to read from memory that your program has not previously written to will exhibit undefined behaviour. Unfortunately, undefined behaviour often manifests by doing exactly what you expected. That makes it rather difficult to detect.

I recently had an instance in my training room where a programmer was puzzled because his program always ran correctly the first time and failed the second. The first time the program ran, his assumption that a variable was zero agreed with the memory provided. On running the program again he got the same storage with the results he had written to it during the first execution.

Clever operating systems such as Windows NT make this kind of problem harder to detect because they clean up storage before re-allocating it to a new task.

I believe NT stamps 0xDEADBEEF all over freed memory? Clearly not a vegetarian operating system... - Ed.

There are tools available to tackle these problems and I hope readers will write in to describe anything that helps them reduce the incidence of these problems.

Salford Software C & C++

The item that I completely forgot to mention in my notes last time was that these compilers provide special support for detecting all the above categories of memory problem. This is not the place to go into details except to say that it is probably the cheapest tool for this kind of debugging, certainly much cheaper than such commercial products as `Purify'.

Symantec C++ 7.0

Note that this release requires substantial resources including 16 Mbytes of RAM (not the 8 that EXE magazine mentions). The full requirements as specified by Symantec are in the March issue of C Vu.

I was being a bit optimistic when I wrote my last column but my latest information is that version 7 will ship on March 25th (I guess those who had the sense to get to our AGM may already know that).

...if Symantec's team had turned up! - Ed.

The upgrade price is GBP89 and the special offer price (until the end of May) is GBP149 - at these prices it must be worth considering upgrading your machine to 16 Mbytes. The way things are going you are likely to need that much for sensible performance with the next generation of OS's and development tools anyway.

I mentioned last time that the parser had been disconnected from the rest of the compiler. I can now be more precise and tell you that it has been wired into the editor so that your code is being parsed while you write it. This is not intrusive in that it lets you write whatever you want to and ignores what it does not understand. May be we could persuade Symantec to provide an option which was a bit more intrusive (i.e., howls when you write code that will not parse).

The environment (IDE) is one of the best that I have used, though it will take those of you used to the cruder early 1990's PC IDE's some time to get to grips with it. Those coming from powerful workstation environments may wonder what is so special.

The MSWindows support is based on MFC so at least you will have a few familiar bugs and defects to work round. By the way, if your code is in any way critical you should only use MFC (and code generated using it) if you are familiar with the details of MFC. It is far too late to discover an MFC problem when running mission critical code. I, like many others, can live with the bugs in such Microsoft products as Word for Windows 6 because I do not multi-task critical software along side it.

Those of you who think you can just install a product and jump in to using it at once will find this a difficult product but that is because such attitudes are unrealistic. If you want something better you must expect to invest some effort in learning to use the new product.

Compilers for other OS's

Before tackling OS/2 compilers, a few words about all the others available. What I need from you, the readers, is short descriptions of the compilers you use in VMS, UNIX etc. I do not have the hardware to tackle most of these and even where I do, I lack the time to master yet more operating systems before looking at the relevant development tools.

There is an exception to this and this is GNU C and G++. Sean added a comment about UNIX users expecting these to be free. This is fundamentally true but - and in a PC world it is a big but - you will still have to get a copy as well as copies of all the other tools you will need such as debuggers, profilers etc. The cost of this in a UNIX context (remembering that Unix was designed with programmers in mind) is very low. In addition the tools will work just about straight out of the box (well for Unix gurus it will).

The cost in a PC environment is quite different. Here we expect the basic commercial tools to cost a few (very few) hundred pounds. For the novice, the tools must work directly without any fiddling. It was in this context that I was suggesting that the GNU development tools were not suitable and not that low cost, the actual delivery of the free software will cost close to the price of a low end PC C/C++ IDE such as Turbo C++.

I would still argue that the entire GNU development environment, debuggers and all, is free. However, Francis' point is well taken - GNU software does not always run "out of the box" and can therefore prove expensive to get running - Ed.

How about one of the Linux specialists writing a series on using G++ with Linux. Such a series could be at one of two levels. That for experienced UNIX users and professional programmers could focus on quality programming and tool support. On the other hand there is a place for a series for inexperienced UNIX users and part time programmers aiming at leading the reader from the start. The former would seem appropriate for publication here while the latter would, I think, better fit C Vu.

It would also be nice to see the Macintosh specialists report on the compilers available for their system. I find people often assume that everyone else will know as much as they do about what is available. It isn't true others know more, less and the same but different.

Anyone out there use Symantec, MPW or Code Warrior on the Mac? Write it up and send it to Francis to collate! - Ed.

C++ for OS/2

I wonder how many of you realised that there was some sort of order (though not a complete ordering) to the list I presented last time? Well I'm using similar criteria this time.

Free Software Foundation G++

Let me be entirely truthful about this product; I have never used it. I assume it must exist because I cannot believe that it does not. If any reader has used it, would they write in about their experience of it. I would be particularly interested in any support given to the OS/2 GUIs.

Metaware High C/C++

The problem I have with this one is that Metaware with extreme promptness shipped the wrong compiler across the Atlantic. They sent a Windows NT version, which is fine and when I get to do a round-up of Windows NT tools I'll have some relevant experience but in the meantime I can only make general comments about their compilers.

One point that is well worth keeping in mind is that their is a strong relationship between Metaware and IBM. Metaware wrote the SOM compiler for IBM and also provide a direct C++ to SOM compilation system.

What is SOM? Well that is a little complicated to answer in the current context but I'll give a brief (and I hope not too inaccurate) answer. One of the growth areas in current computing is DLLs and forms of object linking. The problem from the C++ point of view is that any change in a class declaration changes the object module so that relinking is often not enough. This is particularly problematical when your program utilises a DLL. If the DLL version does not have the same layout for classes that your code expects there will be a horrible crunch.

SOM tackles this problem by providing an extra layer of indirection in a language independent way. This means that for a relatively small overhead (less than 15%) in performance your program can use both current and future versions of other SOM conforming software.

The real fun starts as we move into distributed systems and support via DSOM.

Er, yes, but what does SOM actually stand for? - Ed.

Watcom C++ 10

The major advantage of this product is that you get the OS/2 version along with the MSDOS / Windows / Windows NT varieties.

As you would expect from a high quality compiler specialist, this is an excellent compiler. The IDE is pretty rudimentary, which is less significant for those who already have OS/2 development tools from which they can, to a large extent, build their own IDE.

I wish Watcom would go out and negotiate with companies such as Blue Sky and Kaseworks. Add products from these companies to Watcom compiler technology and you have something really special. The problem is that full products from these companies are expensive to buy for any but the specialist developer. Once you have tried special versions attached to a compiler you are likely to want the full product if your work merits it.

If you need to support more than one platform on an Intel x86 based machine this is a compiler you should consider very seriously.

Borland C++ 2.0 for OS/2

This is the only compiler product that I know of that supports both OS/2 GUIs and Microsoft ones.

Watcom, above, not withstanding? - Ed.

With this release Borland includes OWL for OS/2. This is not a perfect match for OWL for MSWindows but it a pretty good one. The product is well up to Borland's normal standard.

The down side is that it is a separate set of tools at a separate purchase price. What we really need is an x86 platform developers CD with both these tools and the MSWindows ones together.

In the meantime, if you need to develop for both Microsoft and IBM GUIs on an Intel x86 platform this has got to be worth serious consideration. The pity is that other priorities at both Borland and Novell (I think they are still responsible) have delayed the development of OWL for Appware.

IBM C Set ++ 2.01

This is IBM's package of development tools. It is the latest release version though by the time this is published we will not be that far from the next release.

As always with products from IBM this is a solid well constructed product. I don't mean that it is entirely bug free - I don't think that there are any products of this complexity for which you can say that. However if your code does not behave the way you expect the chances are pretty high that your expectations were wrong.

Of course, with a language still under development and refinement it may be that you know about the current state of the language while this compiler is still implementing the 1992 version but even the best of firms has this kind of problem.

The development environment is among the best that I have used and the bundled KASE:Set from Kaseworks puts all the other code generators for AFXs to shame.

If you program solely for IBM platforms (OS/2 etc.) then by all means look at the other products but this is the one that you will buy. I can hardly wait to get my hands on the next version.

Conclusion

Well that's my lot. If you want to know about other compilers you will have to hope that your fellow members will send me reports to collate and publish in future issues.

I hope you can understand why I get so irritated by those who ask me what is the best C++ compiler. There is no such thing and anyone who tries to give you an answer without first checking what you want to do is too ignorant to be worth listening to.

People who answer questions without asking any of their own are unlikely to provide useful answers.

Francis Glassborow

No such thing as a free lunch

by Alan Griffiths

Introduction

C++ is a wonderfully expressive language but it places stringent demands upon the developer's competence. In doing this it imposes a cost on any development using C++ which has to be balanced against the benefits offered by the capabilities of the language. Expressive power and skill are often linked - a violin is harder to play than a Stylophone but can, in the hands of a virtuoso, produce music that is in a different class. However some of the difficulties associated with C++ are not caused by its capabilities, they are caused by the way in which the language has evolved. In particular: the need for compatibility with the past has brought such baggage as the C declaration syntax; while the "don't pay for things unless they're used" principle has led to such costly default options as static linkage of member functions.

I have used a wide range of programming languages over the last twenty years; C++ is unique both in the facilities it offers and in the continuing effort required to use it competently. I don't mind the effort needed to use the expressive power of the language but the effort required to circumvent soluble problems is a continual irritation. In short C++ programming is not only hard, but also harder than it needs to be.

I am not saying that programming in C++ is wrong; far from it - I frequently need its power of expression, but this power often comes at an excessive cost. It takes considerable practice on the violin to play a tune (I can't), but anyone can play one on a Stylophone (at least I can). The other difference is that there are many more ways to play the tune - the results may be much better but the cost is higher. It is always necessary to consider the costs and C++ is pricing itself out of the market. If I have a program to be written and a choice of a trainee programmer and Visual Basic for a couple of weeks or an experienced C++ programmer for a couple of weeks (or an inexperienced one for a few months) which route am I going to take? The Visual Basic program may not be as elegant or efficient, but it is far cheaper.

Having just made some claims about the unnecessary cost of using C++ I should come up with some justifications! A continual problem for me is the unhelpful defaults of many features of the language, for instance:

* member functions don't default to virtual;

* default constructors, copy constructors, and assignment operators are generated automatically.

Other problems for the developer are caused by:

* the lack of a syntax for referring to classes by their relationships ("my base class"),

* with the addition of "exception handling" C++ is no longer a "better C", and

* constraints on the program that cannot be checked automatically (e.g., the "one definition rule").

Allow me to elucidate...

Non-virtual default for member functions

The static linkage of member functions (and destructors) is really an optimisation, and any optimisation choices really belong to the latter stages of the development cycle (that is not as a cost throughout the whole of program development). If member functions were declared "virtual" by default then, when it becomes apparent that a function needs to be overridden by a derived class, there would be no need to amend the original class and recompile it and everything that references the class declaration.

The default is "justified" on the basis that the overhead of a virtual function call is avoided except where explicitly requested. However, I cannot believe that the cost of dynamic binding is significant in the majority of cases. In speed terms suppose that dynamic binding adds 20% to the function call overhead and 10% of the programs execution time is spent in the function call overhead - this is almost certainly an overestimate and still only gives a 2% performance hit. Of more relevance are small classes that have large numbers of instances. These may not be able to stand the overhead of a vtable reference in the memory mapping of the class.

Before anyone writes in and tells me that I should just put virtual before almost all member function declarations let me point out that this is my argument. It is the need to know this is desirable and the time spent overriding the language default that are unnecessary costs.

In addition, (and this is common to a number of the other points) it is impossible to override the defaults in library code that is outside my control. To cite a particular example of a problem library: there are a number of classes in the MFC library that should (allegedly :-) have virtual destructors but don't. If the default were "correct" then this would be very unlikely to have happened. It is not just Microsoft that make this error - it is also a problem with the current draft of the proposed "Standard Library".

The "big three"

There are many classes for which the automatic generation of the "big three" (the default constructor, the copy constructor, and the copy assignment operator) is a positive menace. If, for example, a pointer to dynamic memory is not initialised (generated default constructor), or is "bit copied" (generated copy constructor or assignment operator) and then "deleted" in the destructor, then memory management is compromised and there are no guarantees of subsequent program behaviour.

The committee recently clarified that the generated copy constructor and copy assignment operator perform memberwise copy and memberwise assignment respectively. Such copying or assigning of an uninitialised value causes undefined behaviour so you may not even get to your destructor - Ed.

Any class that manages a resource needs to declare the "big three" to avoid problems. Of course to change the language to prevent automatic generation for classes which contain pointers (or member/base classes without the corresponding functions) leads to a problem about how to code copy constructors and assignment operators.

Naturally, tools like "lint" can be used to check for these functions (and some of the other problems mentioned). However, the need for such aids complicates the development process and (as mentioned above) does not help if it is library code in error.

Referring to related classes

Coding copy constructors and assignment operators "by hand" is difficult because there is no syntax for navigating the network of base classes. The lack of a syntax for "base class of this class" also leads to problems with maintaining inheritance trees in cases where derived classes supplement the behaviour of virtual functions by explicitly calling the corresponding function in the base class.

It would be nice to say, for instance, "the direct base class with this function", but instead one must identify the specific base class whose member function is to be called and hope that anyone adding a class between them in the inheritance graph updates the reference. C++ would be simpler to use if this process were automated. (Of course, if one gets the design right first time...)

No longer a better C

For a large part of its development period it has been possible to treat C++ as "a better C", which provides a pool of programming resources. Although ex-C programmers may not produce ideal C++, they could be productive and be gently introduced to C++ programming constructs during the course of a development. (One such programmer, after a few days spent coding some functions with "C++" names such as AClass::AClass and AClass::method was asking how one went about writing a class. He took some convincing that he had already written most of one.)

The advent of "exception handling" changed all that. This flow control mechanism affects every piece of code and needs to be understood by the programmer. As indicated above it is possible to produce correct code without a clear understanding of the "class" mechanism. However, a lack of understanding of "exception handling" is far too likely to lead to problem code like the following:

void f()
{
char* buf1 = new char[100];
char* buf2 = new char[100];

if (buf1 && buf2)
{
// Something
}

delete [] buf2;
delete [] buf1;
}
This is now badly broken - if an exception is thrown anywhere between initialising buf1 and deleting it, then the memory that it references will "leak". Of course, on many platforms losing a few bytes like this may not be an issue, but the same problem exists with more complex objects and other types of resource.

Some other languages that use exception handling also include "garbage collection" which trades these problems for another, more intractable set (when you find you have insufficient control over the "garbage collection" process you have no options). In C++ the code can be fixed (below) but the style seems less natural to those moving from C or early C++ implementations:

void g()
{
char* buf1 = NULL;
char* buf2 = NULL;
try
{
buf1 = new char[100];
buf2 = new char[100];

if (buf1 && buf2)
{
// Something
}

delete [] buf2;
delete [] buf1;
}
catch (...)
{
delete [] buf2;
delete [] buf1;

throw;
}
}
Naturally, this is not the only solution, but unless you wish to obscure meaning by avoiding the direct use of pointers in this type of code then the alternatives are equally long winded.

The "One Definition Rule"

I'm not sure of the current phrasing of the "One Definition Rule" - the draft Standard makes it clear that "clarification" is taking place. It says something to the effect that there may only be one definition of any entity within a program, and if not the behaviour of the program is undefined. It also adds the helpful information that the development environment need not offer any diagnostic message.

This means that if both you and the developer of a library you are using decide to define the same "entity" then there need be no diagnostic and the program could do anything! Just imagine what trying to police such a requirement without diagnostic aids does to your development costs.

In conclusion

As I said at the beginning, "C++ is a wonderfully expressive language" - it is; it allows a wider range of programming idioms and algorithms than any other language that I've encountered. The downside of C++ is the need for a much higher level of competence in using it. If C++ had a different history, or there were less focus on "don't break existing code" these problems could be addressed.

At the time of writing the language standardisation process has reached a stage where the chance of fixing any of these problems is remote. The cost will now fall on the developer.

Alan Griffiths

Subsidising lunch? - a reply

by Sean A. Corfield

First of all, let me say that I think Alan makes an excellent point about the demands that C++ places on developers. There is no doubt that the learning curve for a language as complex as C++ is much steeper than for, say, C. It may not be so clear-cut that the benefits are correspondingly higher too and so I shall not attempt to argue that point. I shall, however, put on my compiler-writer / X3J16 hat and respond to several of Alan's more specific points.

A non-virtual cost

Alan argues against the non-virtual default for member functions and estimates a 2% performance penalty for using virtual everywhere instead. Typically, 1 in 5 instructions in generated code are function calls. Even assuming calls are no more expensive that ordinary instructions (and they often are), a program will spend about 20% of its time calling functions. On a particular machine, a function call instruction takes 2 cycles - what overhead does a virtual call add? First, you have to load the address of the vtable from the object, which takes 3 cycles. Then you have to load the address of the function from that table - another 3 cycles. Plus the call. This quadruples the cost of the call. If half of all the function calls were virtual, this would add 30% to program execution time. Moving to another machine, the call takes 3 cycles compared to a load (average 6 cycles) and an indirect call (average 10 cycles) - a factor of more than 5 on the call, and an overall factor of 40% on the program. Of course, in these days of faster processors, even factors such as these should not matter too much...

My thanks to Derek Jones for providing typical execution times on two very different architectures - Ed.

As for the draft Standard Library making the mistake of using non-virtual destructors - I can't think of any library classes that are intended to be used as base classes, with the exception (sic) of the exception class hierarchy which does have virtual destructors.

Base class names

In one OO-language, you can refer to a base class with the keyword inherited. This was proposed for C++ by Dag Brück some years ago (see Stroustrup's Design and Evolution book for details). The proposal was not accepted for two reasons. Firstly, what happens if you have multiple base classes? Secondly, there was already a way to do this within the language:

class Derived : public Base		// #1
{
public:
typedef Base inherited; // #2
void f() { inherited::f(); }
};

Admittedly, this suffers from the multiple base class problem too, and if you change #1 without changing #2...

Exceptions break everything

I'd love to be able to argue with Alan on the negative impact of exception handling but, unfortunately, it's even worse than he indicated! Let's look again at the "fixed" version of his example:

void g()
{
char* buf1 = NULL;
char* buf2 = NULL;
try
{
buf1 = new char[100];
buf2 = new char[100];

if (buf1 && buf2)
{
// Something
}

delete [] buf2;
delete [] buf1;
}
catch (...)
{
delete [] buf2;
delete [] buf1;

throw;
}
}

Is this fixed? Not quite! What happens if new fails? It throws an exception and does not return. In the example above, testing that buf1 and buf2 are not null pointers is redundant. In fact, it makes no difference in the above case but the fact that new throws bad_alloc instead of returning zero will "break" almost every program written before exception handling. One common trick in use today is to add the statement:

set_new_handler(0);

near the beginning of main() which often sets the behaviour of global operator new back to the "old" behaviour. This was not portable and in Austin (March `95) the committee voted to remove this "hack" and provide a standard way to use new without having to deal with exceptions - see The Casting Vote in this issue for more details.

One solution to this problem is to embrace the "initialisation is resource acquisition" idiom where the "resource", in this case memory, is "acquired" by a constructor and released by the corresponding destructor. The draft Standard Library provides several ways to do this - for the example above, it would be more "natural" to use the vector template class:

void g()
{
vector<char> buf1(100);
vector<char> buf2(100);
// Something
}

This does mean, of course, that you need to "know" even more about C++ and its library but the benefits are more maintainable programs since you no longer clutter up functions with error-prone housekeeping code.

Just One Definition?

Alan complains that no diagnostic is required for a violation of the "One Definition Rule" which is a reasonable complaint, but let us look back at C first. The ODR corresponds roughly to the link-model used in C: if you provide more than one definition of a function or object at link-time, it causes undefined behaviour. So we appear to have made no progress over C. Wait a minute though - what about C++'s "type-safe" linkage, you ask? Consider the following:

/* file1.c */
int a;
/* file2.c */
void a();
int main()
{
a();
}

On some systems, a C compiler will successfully link this program because it uses only names for linkage, not types. Some systems might give a link-time message - I once saw the very mysterious "too far to jump" message from a linker presented with the above code. Now consider a C++ system: it typically encodes a function's calling sequence into the name. This means that the link-names of a the variable and a the function will be different. So C++ has actually helped us here!

My conclusion

At the end of the day, I basically agree with Alan - C++ is harder to use than C - and I think his comparison between a Stylophone and a violin is well-drawn. I don't blame the language (and I don't really think Alan does either) - I blame IT management for giving everyone a violin and saying "right, now play a tune!" What C++ highlights is the need for better training, better tools and more realistic expectations.

Sean A. Corfield

Operators - an overloaded menace

by George Wendle

I have received some private comments about George's last column so I feel compelled to explain my position: like Francis for CVu, I do not edit George's column (other than to correct typos) which means he may well be more controversial than you care for - he also might be completely wrong! That is for you, dear reader, to decide. I hope that George's columns will encourage several of you to respond - in the past, a particularly barbed attack on the C++ standards committee (CVu5.6) caused me to write a somewhat outraged response (CVu6.1) - Ed.

I like C++, it has the potential for being a great language but it is also exceptionally complicated almost, I think, to a degree where the designers themselves do not understand the implications of their decisions.

What I would like to see is a concerted effort to simplify the language itself and make it easier to use with predictable results - predictable, that is, to the ordinary working programmer not just to balding whiz kids.

The language designers seem prone to introducing things that make their lives easier, often by allowing compilers to implicitly support something which would otherwise have to be made explicit.

One area that is a minefield of unwanted complexity is that of overloading. What is so wrong about forcing programmers to disambiguate close decisions? Doing so might persuade them to look more carefully at their designs and reconsider the degree to which they overload things. By the way, it would be no bad thing if the designers reversed their habit of overloading new, subtly different, meanings onto keywords like static. Actually that keyword is a complete disaster akin to the term chosen for new style function declarations: "prototypes". Both words are already in active use in computer science for other purposes.

Enough of this pre-amble. Let me come to the point of this article - overloading, and specifically operator overloading. Before dealing with the latter let me take a quick look at function overloading.

Function overloading - a harmless convenience

I must admit that I think the idea of function overloading is quite elegant, even if it is generally unnecessary. Bjarne Stroustrup writes in his book "The Design and Evolution of C++" that the idea arose from the need to provide multiple versions of a class's constructor function. There are other solutions to this problem but I agree that function overloading is a `nice' answer. Once you introduce it for that reason you might as well make it a general facility.

Once you have function overloading you need a method to resolve uses of an overloaded function name. The first part is to collect all the candidates for the decision.

The rule is currently simple (I say currently, because I do not understand namespaces well enough to be sure that it will remain simple in future.)

Start by examining the current scope, remembering that where the call is to a member function - always identifiable because an object or pointer to object will decorate the call - the initial scope is that class's scope.

Search that scope for all declarations of the required identifier, if any are found that is your complete candidate set.

Otherwise repeat the process for each scope containing that scope.

Keep going until you either obtain a candidate set or have failed while searching the global scope.

In the next stage trim the candidate set to those that have the right number of parameters (being careful to leave in appropriate versions of declarations that fit by using default parameters.)

Now look to see if one of the candidates has the types of its parameters exactly matching those of the arguments in the call. If so, use it (if two match at this stage, take the programmer out and shoot him/her - its probably an acne-ridden male teenage, bedroom whiz kid hacker, but to say so would make me guilty of so many -isms that the PC world would put out a contract on me.)

Don't worry George, the PC police do not roam the pages of Overload - Ed.

If not, you will have to go into best fit mode and start playing games with type conversions. This stage needs drastic simplification because the rules are just too fine grained for good sense. It may mean that ambiguity rarely arises, but it also means that sometimes the resolution is not the one that you expected, leaving some subtle defect in your work. I much prefer to have a compiler require me to be more precise than to have it double guess me. Now we have a range of new-style casts, disambiguation through casting an argument is much less dangerous.

The end result is that function overloading is fine. You only have to use it for constructors. If we shout loud enough the granularity of resolution might be coarsened or one of the providers of support tools might provide a tool that would warn of close calls.

Noted :-) - Ed.

Good programmers (usually those whose employers have supported with training and time to develop skills) will use function overloading with care. Bad programmers, well I doubt that anything will make them better (but see my column in CVu7.3).

Operator overloading

I bet you thought this was just a variety of function overloading. You could not be more mistaken. It is completely different, it is in the language for different reasons and it has its own overloading rules. These are so complicated that I am not sure that I fully understand them myself, so feel free to write in tearing the following to shreds. With Sean Corfield as editor I am sure he will act as referee and prevent any actual spilling of blood.

Before we start providing any overloading on operators, the language has a fully defined set of operators, each appropriately overloaded (or not provided if inappropriate) in the context of the built-in types. Whatever mechanism implementors use to support these operators, it is inaccessible.

On the other hand, programmers who wish to overload an operator must do so by providing a function to do the work. Despite the slightly eccentric form of such an operator function, it is a function and is subject to exactly the overloading rules that pertain to other functions.

This can lead to some weird behaviour. Consider the following:

void fn()
{
int i;
i= 1 + 2; // the RHS will, I think,
// be statically evaluated
// by the compiler.
i = operator+ (1,2); // does what?
}

Well that explicit call to the operator+ function won't be able to call the normal `+' for ints because no such function exists (well it may be an implicit function provided by the compiler implementor - but we cannot use that). Instead it will have to search global scope for any available user provided versions. These certainly will not be for two int arguments because the language rules explicitly forbid users providing their own versions for parameter lists that do not include any user defined types.

That rule is, in itself, an error because it prevents users from providing their own mixed mode arithmetic via operators. One of the eccentricities of C++ is the automatic type conversion rules it inherited from C and this rule prevents me from fixing that.

Actually, the language doesn't forbid this - but only when at least one operand is a user-defined type is the full search performed, otherwise only built-in operators are considered - Ed.

Next case. Consider:

void fn(){
MyType m(...); //initialised with
// appropriate values
int i;
i=m+1; //A
i=operator + (m, 1); //B
i=m.operator + (1); //C
}

At line A the compiler first looks in the scope of MyType to see if I have provided an operator + function

If I have, it starts the normal process of overload resolution, but what is the candidate set? Only those in the current scope? Those in the current scope and the built-in ones? Those in the current scope, built-ins and globals? All those from the current scope outwards through all enclosing scopes to global scope?

If you truly know the answer to this question, I take my hat off to you. I don't. Of one thing I am certain, the normal name hiding rules for nested scopes do not apply to operators. They cannot or else declaring an operator will hide and inhibit the use of all versions in enclosing scopes.

Now suppose that as well as an in-class definition of MyType::operator+(MyType) there is a file scope (or wider) definition of operator+(MyType, int). Under what circumstances will this exact match be found? Only if no resolution (however bad) can be found in class? Never (i.e., the in-class version hides the other)? Always?

Suppose that MyType provides a conversion to YourType. When will versions of operator+ with YourType as the left operand be considered?

Now let me turn to line B above (explicit call to operator+). I assume that this can only consider versions provided in the scope where it is used or in some outer containing scope. However I have to confess that I am not entirely sure of this.

Whether I am right or wrong, it is certainly the case that the explicit use of an operator function will result in quite different overload rules from those that are used when I use the operator itself.

Obviously line C only searches within the scope of MyType and its enclosing scopes. Obviously? What about the case where MyType contains an operator YourType() function? Of course you already know the rules for this situation. You do, don't you? Oh, well, perhaps I over simplified the rules for overloading functions, or did I?

Questions, questions, everywhere a question

Have you noticed how many questions I have asked above? Some I know the answer to, some I don't, but my knowledge is irrelevant. The important thing is how much can we expect from the competent programmer like yourself. I bet if I gave my questions to two C++ experts I would get two sets of answers that differed in at least one instance.

Even those that can answer all the above consistently may find that they are not so sure when we throw namespace and templates into the mixture. When we get operators defined in template classes, or worse still get offered template operator functions, we really do need a very clear understanding of the overloading rules for both functions and for operators.

Conclusion

In the meantime I think we should all be very wary of overloading operators. I think we can just about live with the provision of in-class operators as long as they really do represent the natural expectations of naive users of that class.

On the other hand, I think that any global provision of operators is highly dangerous. Frankly, I would like to see producers of class libraries completely avoid the provision of out-of-class operators. If they must provide them, please do so by providing the functionality in-class and wrapping it up in an inline function (see Francis Glassborow's article in Overload 6). Such inline operator functions should be in a separate header file so that the user determines their availability not the library provider.

The rules for operator overloading need to be cleaned up and made comprehensible to mere mortals such as I, until they are the best advice is `do not use them, they will introduce unexpected behaviour into your work and that of your clients'.

Finally, could our new editor (congratulations on your first issue) either write a detailed explanation of overloading or commission some other expert to do so. I guess it might even take several issues.

George Wendle

Thankyou George. A detailed explanation of overloading would be very likely to fill several issues of Overload! Perhaps I'll take up the challenge after I finish my cOOmpiler series or maybe I can persuade someone else to write a series on overloading? Just to add more spice to the issue, the Standards committee have been making changes to operator name lookup too - see my Casting Vote column in this issue - Ed.

Overloading on const and other stories

by Kevlin Henney

I read George Wendle's article "Overloading on const is wrong" in Overload 6 with great interest. I have always been a keen advocate of const and the idea of const-correctness in code: it permits the visible expression of certain design level decisions in code for the benefit of both the compiler and the human. So where should we draw the line: why should some member functions be const and not others, what are the exceptions to the rule, and would you like biscuits with your const?

I thought that was "would you like fries with your const?" - Ed.

Sean also raised an issue in reply to an old letter of mine. Why should the assignment operator return a non-const reference to its left hand operand?

Overloading on const

George cited a few examples where overloading on const arguments appeared to be a bad idea. The only problem I have with these is that they did not appear to be real examples:

void fn(D&);
void fn(const D&);

Looking over my code, I only ever use const overloading in the context of a class and I have been unable to find any functions overloaded on const that do not differ in either return type or argument count. Clearly something interesting, and hopefully useful, is going on if I feel the accessibility of the current object should dictate the result type. George cites the classic example of operator[]. Providing a subscript operator for a vector, string or map class is practically a fundamental requirement:

string motd = "hello";
motd[0] = 'j';

What such an operator must also ensure is the preservation of const-ness. Consider a string class with only one subscript operator:

char& operator[](size_t) const;

If it did not return a reference, the change to motd above would not be possible. However, not declaring it const would actually prevent routines passed references to const strings from reading through the string character at a time. There is a problem with this one size fits all approach:

const string greeting = "hey";
greeting[2] = 'p';

This is legal, but is clearly a violation of the expected semantics. The solution is to overload on const-ness to determine the level of access the user should have:

char operator[](size_t) const;
char& operator[](size_t);

Beyond subscription

If this were the only example of this technique I might be inclined to agree with George that it is an exception and should be catered for separately, but it is not. This example outlines a general principle related to member access. In search of concrete examples you need go no further than the STL. Each container may be iterated over. An iterator is defined to have pointer-like semantics and may be initialised to the beginning of a container, its end or to the result of a search.

One problem that has previously caused problems with iterator classes is that they often fail to preserve the const-ness of what they are iterating over, i.e., through an iterator I can gain writable access to const objects. Alternatively, the iterator provides only lowest common denominator access -- but it is frustrating being given read-only access to a writable object! The STL addresses this problem in a disarmingly simple manner by requiring both const and non-const iterators. For example, for access from the first element a container class would include the declarations

iterator       begin();
const_iterator begin() const;

Overloading should only be used to give similar concepts similar names, and this is clearly the case here. Suggesting that the const version should be renamed begin_const breaks with this, causing the programmer to do the name mangling instead of the compiler.

Same thing, only different

All the discussion so far has centred on function pairs that differ in return value but are behaviourally identical. There are a few cases where the semantics and mechanism can also differ. An example of this is a create-on-demand awk-like array for which the non-const subscript operator creates the indexed element with a default value if it does not already exist. The const version would throw an exception:

const Type& operator[](const Key&) const
throw(out_of_range);
Type& operator[](const Key&);

On the whole, behavioural differences between const and non-const versions of an overloaded pair should be either non-existent or minimal.

I agree and the STL gets around this by simply not defining a non-const version of the subscript operator for map (STL's associative array template class) - Ed.

However, there is an example I feel would be useful that breaks with this requirement. One of the few areas that the C standard I/O library wins out over its C++ counterpart is pattern matching on input. As its name suggests, the scanf function implements a simple generic scanner, albeit a somewhat insecure and idiosyncratic one. Taking advantage of the difference between non-const references and const references or values it is not hard to imagine an equivalent facility for C++:

cin >> day >> '/' >> month >> '/' >>
year;

For such a scheme to work well, the type of literal strings would have to be const char* rather than char*. Sean made a proposal to rid C++ of this irksome piece of C heritage; sadly it was not accepted by the powers that be.

And I haven't yet discovered why the Core WG did not adopt this proposal - Ed.

The functionality described could be implemented using manipulators (see "Writing your own stream manipulators", Overload 5):

cin >> day >> match('/') >> month >>
match('/') >> year;

These could take advantage of templates and template specialisation. However, I do not believe there are any proposals to standardise such a cluster of classes and it would be good to have a simple version already in place that echoed the versatility of scanf in softer, safer tones. Perhaps const-ness in C++ has not been taken far enough?

Back on the chain gang

Method chaining, also known as cascading, is a useful technique for grouping a sequence of related operations together in a single statement. The result of a function, that would otherwise be void, can be used for further operations on the object of interest. A primitive form of this is available with many of the C string functions. In C++ the most conspicuous example of chaining is in the I/O library:

cout << "The temperature at " << time
<< " on " << date
<< " is " << temperature
<< '.' << endl;

The result of each call to operator<< is a reference to the ostream that was used for output. Chaining is also present in the C language itself; it is not just restricted to the library:

a = b = c;

The result of each assignment is a modifiable lvalue of the left hand side and not a copy of that value.

Only in C++ I'm afraid! In C, the result of an assignment is not an lvalue - Ed.

The proposed standard library, and much of my own code, follows this idiom. Non-const member functions that might otherwise return void often return *this.

coord.radius(new_r).radians(new_theta);
motd.assign(subject).append(" is ") .append(opinion);
dir_list.sort().reverse();

The last example is, for some reason, currently not possible with the STL. It appears to be an oversight that hopefully will be rectified by the library committee: first, it is clearly useful; second, it is important that all library components are written to a common style which, in this case, is that of chainability.

Assigns and wonders

All well and good, but what about the assignment operator? This is the issue that Sean raised in response to my criticism of one recommendation in the Ellemtel Programming in C++: Rules and Recommendations document (included on the disk that came with Overload 4). The discussion above suggests that because the assignment operator is a non-const member function it should return a non-const reference to the assignee. Many other sources support this view:

* The definition of assignment for the built-in types;

* Compiler generated assignment operators return a non-const reference;

* Assignment operators in the fledgling C++ standard library return non-const references;

* Many of the good authors in the C++ community support this as a standard idiom (e.g., Stroustrup, Coplien, Meyers, etc.).

These are, to say the least, quite persuasive reasons. This is clearly standard form, yet the Ellemtel guide suggests that returning a const reference is better. To probe this decision we must better understand what coding rules and recommendations might help us to achieve:

1. readability, e.g., indentation, identifier names;

2. defined-ness, e.g., the result of a[i++] = i++ is not well defined;

3. security, e.g., use of gets can seriously affect the health of your program;

4. insurance against accident, e.g., declaring without definition a private copy constructor and assignment operator prevents accidental copying of certain classes of objects;

5. conformance to expectation, i.e., preservation of the Principle of least astonishment;

6. interoperability, i.e., the ability to mix with other components written to a standard form.

In other words, rules and recommendations are a response to, and a preventative cure for, possible problems. What are the problems that the Ellemtel guide is trying to lay to rest? Unfortunately only one example is given:

(a = b) = c;

This is a pointless and pathological piece of code, but how does it measure up against the criteria for a problem seeking a solution:

1. This is quite readable -- pointless, yes, but with parentheses forcing the precedence it is easy to see what is going on. Indeed, it might be argued that the chained assignment without parentheses offers more scope for confusion.

2. This is well defined: a is assigned the value of b, and then a is overwritten by an assignment from c. Again, pointless, but certainly well defined.

3. It is also secure -- no problems with dangling pointers, corrupting memory, etc.

4. You have to force the precedence to get this code fragment, so such code is unlikely to be produced by accident. I don't know about you, but my typos are normally quite simple: I have yet to accidentally enclose a well formed expression with balanced parentheses -- and not notice!

5. In the light of what I mentioned earlier I would expect this example to compile cleanly.

6. If a, b and c are iterators or containers, this code conforms to the signature requirements for assignment laid out by the STL for containers and assignable iterators.

The only problem I was able make out was that the authors of the guide were uncomfortable with C and C++! If they wish to break a de facto (bordering on de jure) standard, they will have to do better than one contrived and weak example. By this, I do not mean that many weak and contrived examples will strengthen their case ;-)

The Ellemtel guide even states, inadvertently, why you should ignore their recommendation:

Designing a class library is like designing a language! If you use operator overloading, use it in a uniform manner; do not use it if it can easily give rise to misunderstanding.

I have already described the uniform manner above. In other words, a non-const reference returned from an assignment is not a problem but an expectation: the absence of a problem does not require a solution, but expectations should be met.

Kevlin Henney

operator= and const - a reply

by Mats Henricson and Erik Nyquist

We are pleased to see Kevlin Henney so thoroughly scrutinising one of the recommendations in our public domain document. We are prepared to change this in our forthcoming book "Industrial Strength C++".

The document was last updated in 1992, and at that time there were quite a few writers that advocated a const reference to this as return value. Actually, we got the idea from Scott Meyers after a speech at USENIX C++ 1991 in Washington. Also, Rob Murray's widely acknowledged book, "C++ Strategies and Tactics" recommends this (page 32, 2.2.1 Return value of operator=):

Assignment operators should return a constant reference to the assigned-to object.

One reason why a const reference might actually be of least astonishment is that this is the way it works in C. Try this in your favourite C compiler:

int main()
{
int x = 1;
int y = 2;
int z = 3;

(z = y) = x; /* From Sun C compiler:
illegal lhs of assignment operator
*/

return 0;
}

In C++, on the other hand, this code is legal since by default the result of an assignment expression is a non-const reference of the object assigned to. This is the motivation as to why a non-const reference is appropriate as return value for overloaded assignment operators.

Why have this incompatibility between C and C++? We really don't know! Maybe Bjarne had a bad day in the early eighties when he decided to change this? ;-)

Mats Henricson

Erik Nyquist

I asked Bjarne Stroustrup about this gratuitous difference between C and C++ and got the following response - Ed.

Why make the change? Why not? The value of:

(a = b)

is a which is an lvalue. Also, we have found real examples of the general form:

T& f(T& a, const T&) { return a=b; }

Bjarne Stroustrup

Whilst putting this issue together, I was reading Scott Meyers' column in The C++ Report, January 1995, where he talks about writing max and min functions. He notes that maintaining const-correctness is very difficult with templates and I can now see a parallel between that and the assignment operator. Like Mats and Erik above, I may well change my view on this - Ed.
Mirrored from http://www.accu.org/