[C] Hidden features of C - Programming On Unix

Users browsing this thread: 1 Guest(s)
benwaffle
Members
yrmt
Grey Hair Nixers
Nice! I like the for comma thing
bottomy
Registered
Don't really like calling these things "hidden" features, because they aren't hidden at all. Just most people don't read the standard or what language extensions and builtins their compiler offers.

Anyway here's a few things that I don't see used much, that weren't mentioned in that link. Some of these are new features added to C in C11, and could be achieved using compiler extensions however they're particularly useful having them as apart of the standard.

The first is alignment (added in C11). In this example, it uses alignment to make sure that the types are stored at addresses that have (at least) their lowest bit unused. So it can then use that space to store a flag.
Code:
#include <stdio.h>
#include <stdint.h>

typedef _Alignas(2) float float_t;
typedef _Alignas(2) char char_t;
typedef uintptr_t generic_t;

void PrintGeneric(generic_t);


int main(int argc, char *argv[])
{
    generic_t a = (generic_t)&(char_t){ 'e' }, b = (generic_t)&(float_t){ 3.14f } | 1;
    
    PrintGeneric(a);
    PrintGeneric(b);
    
    
    return 0;
}

void PrintGeneric(generic_t GenericValue)
{
    if (GenericValue & 1) printf("%.2f\n", *(float_t*)(GenericValue & ~1));
    else  printf("%c\n", *(char_t*)(GenericValue & ~1));
}


The next is generic selection (added in C11). The example uses it as a way of getting the format specifier for the current type.
Code:
#include <stdio.h>

#define FORMAT_SPECIFIER(x) \
_Generic((x), \
int: "%d", unsigned int: "%u", \
long: "%ld", unsigned long: "%lu", \
long long: "%lld", unsigned long long: "%llu", \
float: "%f", \
double: "%f", \
long double: "%Lf",\
char *: "%s", char [sizeof(x)]: "%s", \
default: "" \
)

int main(int argc, char *argv[])
{
    printf(FORMAT_SPECIFIER("blah"), "blah");
    return 0;
}

Now that isn't a particularly useful example of generics. One thing I found it really useful for was when I made a modifiable constant value system for my game engine (so constant values could be modified at runtime in an editor, really useful when you know you want a constant value but unsure what value specifically). It would then let me simply do RUNTIME_CONSTANT(value). e.g. int a = RUNTIME_CONSTANT(34);.


The next thing is format specifiers for fscanf (or variants of); the same can also be said for the format specifiers of fprintf (or variants of) but it has differing specifiers and can't be bothered with an example. There's a few things to cover here. One thing you'll often hear when regarding fscanf is that it is not secure when dealing with strings (and as a result many avoid it), but it's perfectly safe to use for strings when done properly. The other thing is that it is more flexible than most assume/know about, so it can be used for more complex input though there is a limit (it's still rather restrictive compared to other string processing solutions that other languages support, and it can also be less efficient to use fscanf over handling the converting yourself as it needs to parse the format string).

A fscanf conversion format specifier (something that begins with %) can set whether it should be ignored (won't be assigned to one of the arguments), a maximum field width (can limit the number of characters to be included in the conversion), a length modifier, and the conversion specifier. Some of the less common conversion specifiers are scansets (%[], or %[^], the latter producing a scanlist of anything but those between the brackets; these can be useful when you want only particular input or you want to include isspace characters), as well as some specifiers specific to some types you may use (size_t, u/intmax_t, ptrdiff_t, stdint types while they don't have their own specifier they do define macros with the appropriate specifier for their type).

The first part of the example uses a scanset to accept only numbers, either up to 15 characters (as determined by the maximum field width) or a non-number. While the next part of the example uses the maximum field width to limit the sizes of the numbers we want, as they're packed tightly together.
Code:
#include <stdio.h>

int main(int argc, char *argv[])
{
    char a[16];
    scanf("%15[0123456789]", a); //could alternatively be %15[0-9] though it's implementation defined
    printf("%s\n", a);
    
    int b, c, d, e;
    //assume packing is as follows: [3 digit number][1 digit number][4 digit number][2 digit number]
    sscanf("2179444442", "%3d%1d%4d%2d", &b, &c, &d, &e);
    printf("%d, %d, %d, %d\n", b, c, d, e);
    
    return 0;
}


A final example (can't be bothered doing anymore) is of floating point environment access. With it you're able to set different options for how floating point values and operations should be handled (this can be stuff from changing what should cause exceptions, to changing how the values should behave, e.g. how it should be rounded). It's mostly implementation defined, and this example will not be portable. The example assumes that you're compiling on x86 architecture that supports SSE and platform that provides FE_DFL_DISABLE_SSE_DENORMS_ENV functionality, and assumes the it will use SSE over x87 for the floating point operations.

In the example it uses the environment access to disable denormals for SSE (which will cause all denormal values to be treated as 0.0). The alternatives for doing this would normally be either using asm and setting the bits in the MXCSR register or using intrinsics or builtins.
Code:
#include <stdio.h>
#include <float.h>
#include <math.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON

_Bool isDenormal(float a);
float GetDenormal(void);

int main(int argc, char *argv[])
{
    float Value = GetDenormal();
    
    if (isDenormal(Value)) printf("is denormal: ");
    printf("%e %.1f\n", Value, FLT_MAX * Value);
    
    fenv_t Env;
    fegetenv(&Env);
    fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);
    
    Value = GetDenormal();
    
    if (isDenormal(Value)) printf("is denormal: ");
    printf("%e %.1f\n", Value, FLT_MAX * Value); //if using SSE it should be 0
    
    fesetenv(&Env);
    
    return 0;
}

_Bool isDenormal(float a)
{
    return ((a != 0.0f) && (fabsf(a) < FLT_MIN));
}

float GetDenormal(void)
{
    return FLT_MIN * 0.1f; //alternatively could return FLT_TRUE_MIN though for the example the result won't be as intuitive. As doing this you can see the result is FLT_MAX * FLT_MIN but pushed back one decimal place when denormals are available as you would expect.
}
bottomy
Registered
One thing I didn't mention before because it's not necessarily a feature of C in the same sense as other features that have been mentioned but is standard and definitely worth understanding is floating point numbers. Far too few people actually understand them/how they work.

There can be performance implications depending on the type of number it is (e.g. NaN, inf, denormals, in-particular the latter; can depending on the hardware sometimes suffer performance implications, so it's usually best seeing if they do affect performance on the platforms you're targeting and restructuring your algorithms to avoid them). Though more recent hardware typically supports all in hardware and so is less of a concern now.


They're internally structured as (as far as binary32 or float go; double or binary64 and long double or extended or non-standard extended or double, will have a different internal structure typically just amount of bits) is a 1 bit sign flag, a 8 bit exponent (the exponent is actually encoded and depending on it's value there are different rules that have to be applied), and a 23 (including an implicit 24th bit when an exponent in the range 0 - 254 is used, which is set for an exponent in the range of 1 - 254 and not set for 0) bit mantissa or significand (totaling 32 bits). An example of working out the value of a floating point using the internal structure is as follows (for convenience it's still relying on the hardware):

Code:
//uses a compiler extension for ranges in switch statement cases, supported by GCC and Clang, maybe others?
#include <stdio.h>
#include <stdint.h>
#include <math.h>

typedef union {
    float f;
    uint32_t u;
    int32_t i;
    struct {
        //should be noted that this isn't portable at all, so don't be surprised if it doesn't work
        uint32_t mantissa : 23;
        uint32_t exponent : 8;
        uint32_t sign : 1;
    } internal;
} binary32_t;

void PrintAsFloat(binary32_t Val)
{
    switch (Val.internal.exponent)
    {
        case 0:
            printf("%e\n", (Val.internal.sign? -1.0f : 1.0f) * powf(2.0f, -126) * (((float)Val.internal.mantissa / (float)0x800000)));
            break;
            
        case 1 ... 254:
            printf("%e\n", (Val.internal.sign? -1.0f : 1.0f) * powf(2.0f, Val.internal.exponent - 127) * (((float)Val.internal.mantissa / (float)0x800000) + 1.0f));
            break;
            
        case 255:
            if (Val.internal.mantissa != 0)
            {
                printf("%e\n", NAN);
            }
            
            else
            {
                printf("%e\n", (Val.internal.sign? -1.0f : 1.0f) * powf(2.0f, Val.internal.exponent - 127) * (((float)Val.internal.mantissa / (float)0x800000) + 1.0f));
                //or
                //printf("%e\n", (Val.internal.sign? -1.0f : 1.0f) * INFINITY);
            }
            break;
    }
}

int main(int argc, char *argv[])
{
    PrintAsFloat((binary32_t)(1.123456f));
    return 0;
}


There are also important rules to understand when using them such as how precision works. Precision is not constant, it gets affected by the value. The larger the value the lower the precision. After a certain value (and until UINT32_MAX) a float has even less precision than an uint32_t, when it reaches that point it also can't produce odd numbers, but it can of course go above the value of an UINT32_MAX.

Because of the precision and also subsequently because of the rounding that happens, there can be a number of problems with converting to and from double and float, as well as equality comparisons. Reason you'll often see ranges being use to compare for equality. e.g. one value that can't be represented exactly is 0.1 so comparing between a float 0.1 and a double 0.1 won't be equal (as the double will be more accurate but still not exactly 0.1), so (assuming the compiler doesn't do any trickery) (0.1f == 0.1) should return false.

There are also numerous "tricks" you could do with floating point numbers, some could potentially be an optimization however it depends on the use and platform (e.g. on x86 checking the signedness of a float by checking it's signed bit could be an optimization if the float was currently stored in memory and not in a x87 or SSE register). Some tricks are:

Checking the signedness of the float (simply by checking the signed bit).

The same goes for changing the signedness, and so it can be very quick to perform an absolute.

You can efficiently workout a simple 2 to the power of n, by effectively only using the exponent field (accomidating the encoding of that field) of a float (that float will then be the result).
Code:
static inline float powOf2(int8_t Exponent)
{
    return (binary32_t){ .internal = { 0, 127 + Exponent, 0 } }.f;
}

//a little bit more portable but still not, as there is still endianness which could be a problem
static inline float powOf2_(int8_t Exponent)
{
    return (binary32_t){ .u = ((uint32_t)(127 + Exponent) << 23) }.f;
}


Working out an integer base 2 logarithm.
Code:
typedef union {
    double f;
    uint64_t u;
    int64_t i;
    struct {
        uint64_t mantissa : 52;
        uint64_t exponent : 11;
        uint64_t sign : 1;
    };
} binary64_t;

unsigned int Log2(unsigned int x)
{
    binary64_t IntLog2 = { .sign = 0, .exponent = 1023 + 52, .mantissa = x };
    IntLog2.f -= 4503599627370496.0;
    
    return IntLog2.exponent - 1023;
}


Even some things that aren't so relevant today (since hardware support for those operations are now common, and so are usually faster and more accurate to these old hacks).

Such as casting from float to int.
Code:
#define F2I(fp) (*(int32_t*)&(double){ (double)(fp) + 103079215104.0 } >> 16)

printf("%d\n", F2I(15.572f));


Or performing a reciprocal square root (fast inverse square root).



(19-06-2013, 04:31 PM)shix Wrote: Here's a really good slide presentation called "Some dark corners of C".

https://docs.google.com/presentation/d/1...slide=id.p

Thought I'd just go into a little more detail on some of the things they mention there (since some things deserve more of an explanation). Also not all the things that are listed there are very appropriate for proper code, somethings basically just belong in the hacks/obfuscated group.

Where they're using array subscripting, and they're switching the placement of the index and the object. Subscripting is actually the equivalent of (*((p)+(i))), or for multidimensional arrays where it sums each subscript together (e.g. a[2][3][4] = *((p)+ ((i0) * (3 * 4)) + ((i1) * 4) + (i2)), or to access like you would in C could be done as follows *((typeof(***a)*)a + ((i0) * (sizeof(typeof(*a)) / sizeof(typeof(***a)))) + ((i1) * (sizeof(typeof(**a)) / sizeof(typeof(***a)))) + (i2))) unless it has one or more subscripts less than the definition in which case it will be the pointer (adjusted according to the indices).

On the example of pointer aliasing and the use of the restrict qualifier, the three examples in there are all technically different, not any one of them is necessarily wrong though. As it just depends on the intended function. The first one x,y,z could all hold the same pointer (or part of each), whereas the second one they're taking a copy of the value z points to, but x,y,z could still be the same as the first one just the returned result could be different from the first one because of this, and the third one restricts z from being the same or part of x and y but x and y could still be aliases of one another.

The smallest compilable C program (though it's actually just the smallest in terms of the length of the source code, not size of the binary), it could be modified to be the smallest (again source wise) compilable and runnable depending on the particular target platform. As some may store the global variable in a memory segment with executable rights. So for instance on x86 architecture, and at least for the most common platforms you can achieve a runnable program by doing one of the following: (0xc3 = ret instruction)
int main=0xc3;
or
const int main=0xc3;
or specifying through the compiler to make the segment executable.
e.g. on Mac OS X use the -segprot option in the linker, so an example to make the (int main=0xc3;) runnable:
clang blah.c -o blah -Xlinker -segprot -Xlinker __DATA -Xlinker rwx -Xlinker rwx

You also can make it smaller by following the rules of legacy C and opting for implicit int declaration, when you omit a type.


The books they recommend, I've only read one (expert C programming, deep C secrets). It's a pretty good book, but I suggest reading it early on rather than waiting to read it. If you're already experienced with C you probably won't get that much out of it. I read it a long time ago, but at the time because of the mention of expert I thought it was going to be very advanced and so waited till much later to read it, which ended up not making it that beneficial.
venam
Administrators
Takes me back to my asm course where we studied in a chapter IEE-754. It's nice to know that the things I've learn are useful.
Hans Hackett
Members
Learning C... And I knew this is gonna be useful :p
benwaffle
Members
__FUNC__ in the preprocessor expands to the function being called
__LINE__ expands to the line number of the

example:
`#define ERROR fprintf(stderr, "error at line %d in function %d", __LINE__, __FUNC__)`
bottomy
Registered
(08-01-2014, 09:56 PM)benwaffle Wrote: __FUNC__ in the preprocessor expands to the function being called
__LINE__ expands to the line number of the

example:
#define ERROR fprintf(stderr, "error at line %d in function %d", __LINE__, __FUNC__)

Also can't forget __FILE__, __DATE__, __TIME__, etc. For more standard predefined macros you can check the standard. For all predefined macros in your environment (assuming you're using GCC or Clang) you can use -dM or for the end of preprocessor stage -dD.

Your ERROR has a mistake, you meant for the second specifier to be %s not %d :) Also should be noted __func__ is the actual standard one, it is the actual identifier itself (it's not a macro). Though some compilers wrap __FUNC__ around __func__ (they also often provide others like __FUNCTION__, __PRETTY_FUNCTION__).
crshd
Registered
(25-06-2013, 01:23 PM)NeoTerra Wrote: I nominate that bottomy should be the UH professor. Seriously, those are some extremely well written posts!

I second that motion. Dude is more autistic than I. Somebody give him a custom user title already.

Most of his postings end up being gibberish to me, mostly because I don't C. But I can still tell that it's some HQ content.

-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCA/IT d-(---)@ s+: a-- C+++(++++)$ UBL*+++ P+++>++++ L++ E W+++$ !N !o K !w !O M+>++ !V PS+++ PE !Y PGP+ !t-- !5 !X R@ tv- b+ DI D+ G e h r++ y+
------END GEEK CODE BLOCK------

bottomy
Registered
(09-01-2014, 06:18 AM)crshd Wrote:
(25-06-2013, 01:23 PM)NeoTerra Wrote: I nominate that bottomy should be the UH professor. Seriously, those are some extremely well written posts!

I second that motion. Dude is more autistic than I. Somebody give him a custom user title already.

Most of his postings end up being gibberish to me, mostly because I don't C. But I can still tell that it's some HQ content.

It's probably because most of my posts are (or at least end up as) gibberish :p
dami0
Long time nixers
It's really good to have people like you guys around. I always had more of a learning by doing approach rather than spending countless hours prowling through books. Much appreciated.
jmbi
Long time nixers
(24-07-2014, 07:23 PM)dami0 Wrote: It's really good to have people like you guys around. I always had more of a learning by doing approach rather than spending countless hours prowling through books. Much appreciated.

This. Bottomy please come back. :'(