The macro processor which is normally invoked by compilers for CeeLanguage and CeePlusPlus as the first stage of input processing. Performs the following tasks:
- Comment stripping and broken line (backslash-newline) concatenation
- File inclusion (via #include directive; how inter-module dependencies are expressed and implemented in C/C++)
- Conditional compilation (#ifdef, #if, etc.)
- Macros and manifest constants
Commonly known as "cpp", which can be confusing as that acronym often refers to C++ (the language proper) as well. Can also be invoked as a standalone preprocessor; there is (was) a legacy build configuration tool (imake) which used cpp to generate Makefiles from templates (Imakefiles).
cpp directives are easily recognizable in a C/C++ file; they're the lines that start with a # followed by a word (define, include, if, ifdef, etc.) Whitespace may come before or after the #, making nested preprocessor directives a bit easier to read.
C/C++ (CeeCeePlusPlus) without the preprocessor is virtually useless. (Some say that they are even with cpp, but that's a LanguagePissingMatch for another day).
See CeePreprocessorStatements for more info on how to use it.
CeePreprocessor was added to C as a sort of afterthought, and not by C's creator(BrianKernighan). It was advocated by AlanSnyder, designed by MikeLesk,and implemented by JohnReiser. See Ritchie's history of C http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
DougMcIlroy probably had some input also, since he was an early and important figure in the history of macro processing.
cpp is widely considered, and with a fair amount of good reason, to be one of the worst macro processors in history - both general purpose and language specific. Of course, that's not entirely fair - it wasn't designed for the purposes it is put to today (and there is so much stuff using it that a redesign isn't practical). However, the flaws and weaknesses of cpp are well-known. They include, but are not limited to:
-
The inability for macro expansions to contain further preprocessor directives
-
No ability to form lists or other aggregate data structures. "#define foo x,foo" doesn't prepend x to a list of tokens called foo; it loops endlessly.No, foo, as defined in the previous sentence, doesn't loop. Instead, the preprocessor disables the name foo from macro expansion. The preprocessor goes to extreme lengths to prevent recursion.--VesaKarvonen
- I wouldn't say "extreme lengths", it's just careful. Painting things blue is a pretty straightforward algorithm.
-
Only really understands two datatypes (ints and strings), and cannot do any meaningful processing on strings other than comparing them for equality.
-
No looping or iteration capability. (Is CeePreprocessor TuringComplete?)
-
Exclusive CallByName semantics make macros a poor replacement for functions in many instances. (Of course, if C/C++ could treat blocks as expressions, this would be a moot point… GCC now has this, using ({…code…},) syntax)
No, it DOES have "looping or iteration capability" (see the second bullet). It's just backwards and inside out from the way any sane person thinks about a problem. See #ifdef and #ifndef. Can I see some LoopsForCeePreprocessor? --AdamBerger
- This comment doesn't really stand on its own without further explanation. The second bullet points out a limitation on recursion. More general recursion can be done with #include and the #if directives you mention, but the utility of that approach is completely opaque to someone who hasn't already done it.
And some things about it are just plain annoying:
- A macro definition must consist of only one line. backslash-newline can be used, of course, to spread it across multiple lines, but that's annoying. Especially when backslash-newline is inadvertently replaced with backslash-whitespace-newline.
- The need for RedundantIncludeGuards
- Whitespace is significant in some contexts. "#define foo(x) (x+5)" defines foo to be a one-argument macro with formal parameter x; "#define foo (x) (x+5)" defines foo to be the string "(x) (x+5)"
BjarneStroustrup has stated on numerous occasions that many features in CeePlusPlus were put in to reduce the dependence on the preprocessor - he doesn't like it one bit.
It is often speculated that many programmers dislike macro processors in general due to bad experiences with cpp. At least two noted language designers (JamesGosling, of JavaLanguage fame, and BertrandMeyer, the designer of EiffelLanguage) have publicly stated a disdain for macro preprocessors of any sort. AlanKay doesn't like them either, but for a different reason - he believes that all binding should be late binding; and a preprocessor that only runs at read time is diametrically opposed to that philosophy.
For someone who knows the C preprocessor, it might come as a surprise that the macro subset of the C preprocessor (ignoring the #include-mechanism), is actually about as computationally complete as any finite computer.
Formally, the C preprocessor macro subset isn't strictly TuringComplete, because the macro expansion mechanism can't get into an infinite loop, but it is capable of executing arbitrary programs that terminate in finite number of computational steps (requiring an evaluator whose size is a logarithm of the number of steps), thus the C preprocessor is formally capable of executing any algorithm (a finite program that always terminates) on any specific finite input acceptable to the algorithm.
I have designed and implemented a functional programming language, called Order, using the C preprocessor macro mechanism. The language is basically a complete PurelyFunctional programming language that has features such as CallByValue semantics, FirstClassFunctions, LambdaExpressions, PartialApplication of functions (CurryingSchonfinkelling), LexicalScoping and even FirstClassContinuations as well as FirstClassEnvironment. An Order program can output, or generate, an arbitrary sequence of preprocessing tokens using non-mutating SideEffects (which is one of the reasons why the language is CallByValue). The standard prelude of the language provides arithmetic on arbitrary precision natural numbers and an extensive set of first order and HigherOrderFunctions on sequences. The Order language can be used as a C preprocessor MetaLanguage to generate, for example, C program code. Here are some documented examples:
- http://cvs.sf.net/viewcvs.py/chaos-pp/order-pp/example/lambda/lambda.c?view=markup
- http://cvs.sf.net/viewcvs.py/chaos-pp/order-pp/example/array_ops.c?view=markup
- http://cvs.sf.net/viewcvs.py/chaos-pp/order-pp/example/is_function.hpp?view=markup
- http://cvs.sf.net/viewcvs.py/chaos-pp/order-pp/example/duffs_device.c?view=markup
Don't get me wrong. As you should realize, I understand the limitations and capabilities of the C preprocessor rather intimately. (I'm also familiar with the syntax-case and syntax-rules macro systems of SchemeLanguage, for example, as well as other syntactic extension facilities (e.g. Camlp4 of ObjectiveCaml) and program transformation systems (e.g. IntentionalProgramming).) I've implemented the Order interpreter mostly due to my interest in programming languages and for fun. However, I also think that the Order interpreter could potentially be very useful in C and C++ programming for (even non-trivial) CodeGeneration and SyntacticAbstraction. For example, the lambda-example above uses datatype (variant record) definition and deconstruction macros as well as a simple parser generator macro that have been implemented with the aid of the Order interpreter.
I should also mention the Chaos library, designed by PaulMensonides, which demonstrates a wide range of advanced C preprocessor programming techniques, including techniques based on the #include-mechanism.
DavidAbrahams has written an introduction to preprocessor metaprogramming using the Boost Preprocessor library. The chapter can be read at http://boost-consulting.com/tmpbook/preprocessor.html.
Sounds interesting -- formalizing a complete approach, where individuals have I think frequently reinvented bits of the wheel in the past over and over.
Are there any credible alternatives to cpp when it comes to syntax extension? Some have mentioned EmFour, which is frankly WORSE than cpp when it comes to quoting conventions.
The usability of m4 is directly proportional to the ease of distinguishing the quote ' and backquote ` characters on your terminal. (Of course, with modern windowing systems, a GoodProgrammerTypeface would help). Of course, m4 was not designed to preprocess C code, and m4 doesn't know how to follow cpp #include directives, if that's important. I've found m4 occasionally useful (with a lot of work) for certain kinds of meta-programming (often, a Perl script that generates C code is more useful though); but as a replacement for cpp--as a front-end to c--it leaves a lot to be desired.
One person claimed long ago to have written lisp macros to produce C for all of C's syntax, resulting in what amounts to C-in-sexps (I saw some examples .. it was interesting, but it would never mix in with regular C, so it was basically just a new language). Anything on the order of camlp4 or some other syntax extension facility that actually understands C syntax?