[C] Hidden features of C - Programming On Unix

Users browsing this thread: 3 Guest(s)
bottomy
Registered
One thing I didn't mention before because it's not necessarily a feature of C in the same sense as other features that have been mentioned but is standard and definitely worth understanding is floating point numbers. Far too few people actually understand them/how they work.

There can be performance implications depending on the type of number it is (e.g. NaN, inf, denormals, in-particular the latter; can depending on the hardware sometimes suffer performance implications, so it's usually best seeing if they do affect performance on the platforms you're targeting and restructuring your algorithms to avoid them). Though more recent hardware typically supports all in hardware and so is less of a concern now.


They're internally structured as (as far as binary32 or float go; double or binary64 and long double or extended or non-standard extended or double, will have a different internal structure typically just amount of bits) is a 1 bit sign flag, a 8 bit exponent (the exponent is actually encoded and depending on it's value there are different rules that have to be applied), and a 23 (including an implicit 24th bit when an exponent in the range 0 - 254 is used, which is set for an exponent in the range of 1 - 254 and not set for 0) bit mantissa or significand (totaling 32 bits). An example of working out the value of a floating point using the internal structure is as follows (for convenience it's still relying on the hardware):

Code:
//uses a compiler extension for ranges in switch statement cases, supported by GCC and Clang, maybe others?
#include <stdio.h>
#include <stdint.h>
#include <math.h>

typedef union {
    float f;
    uint32_t u;
    int32_t i;
    struct {
        //should be noted that this isn't portable at all, so don't be surprised if it doesn't work
        uint32_t mantissa : 23;
        uint32_t exponent : 8;
        uint32_t sign : 1;
    } internal;
} binary32_t;

void PrintAsFloat(binary32_t Val)
{
    switch (Val.internal.exponent)
    {
        case 0:
            printf("%e\n", (Val.internal.sign? -1.0f : 1.0f) * powf(2.0f, -126) * (((float)Val.internal.mantissa / (float)0x800000)));
            break;
            
        case 1 ... 254:
            printf("%e\n", (Val.internal.sign? -1.0f : 1.0f) * powf(2.0f, Val.internal.exponent - 127) * (((float)Val.internal.mantissa / (float)0x800000) + 1.0f));
            break;
            
        case 255:
            if (Val.internal.mantissa != 0)
            {
                printf("%e\n", NAN);
            }
            
            else
            {
                printf("%e\n", (Val.internal.sign? -1.0f : 1.0f) * powf(2.0f, Val.internal.exponent - 127) * (((float)Val.internal.mantissa / (float)0x800000) + 1.0f));
                //or
                //printf("%e\n", (Val.internal.sign? -1.0f : 1.0f) * INFINITY);
            }
            break;
    }
}

int main(int argc, char *argv[])
{
    PrintAsFloat((binary32_t)(1.123456f));
    return 0;
}


There are also important rules to understand when using them such as how precision works. Precision is not constant, it gets affected by the value. The larger the value the lower the precision. After a certain value (and until UINT32_MAX) a float has even less precision than an uint32_t, when it reaches that point it also can't produce odd numbers, but it can of course go above the value of an UINT32_MAX.

Because of the precision and also subsequently because of the rounding that happens, there can be a number of problems with converting to and from double and float, as well as equality comparisons. Reason you'll often see ranges being use to compare for equality. e.g. one value that can't be represented exactly is 0.1 so comparing between a float 0.1 and a double 0.1 won't be equal (as the double will be more accurate but still not exactly 0.1), so (assuming the compiler doesn't do any trickery) (0.1f == 0.1) should return false.

There are also numerous "tricks" you could do with floating point numbers, some could potentially be an optimization however it depends on the use and platform (e.g. on x86 checking the signedness of a float by checking it's signed bit could be an optimization if the float was currently stored in memory and not in a x87 or SSE register). Some tricks are:

Checking the signedness of the float (simply by checking the signed bit).

The same goes for changing the signedness, and so it can be very quick to perform an absolute.

You can efficiently workout a simple 2 to the power of n, by effectively only using the exponent field (accomidating the encoding of that field) of a float (that float will then be the result).
Code:
static inline float powOf2(int8_t Exponent)
{
    return (binary32_t){ .internal = { 0, 127 + Exponent, 0 } }.f;
}

//a little bit more portable but still not, as there is still endianness which could be a problem
static inline float powOf2_(int8_t Exponent)
{
    return (binary32_t){ .u = ((uint32_t)(127 + Exponent) << 23) }.f;
}


Working out an integer base 2 logarithm.
Code:
typedef union {
    double f;
    uint64_t u;
    int64_t i;
    struct {
        uint64_t mantissa : 52;
        uint64_t exponent : 11;
        uint64_t sign : 1;
    };
} binary64_t;

unsigned int Log2(unsigned int x)
{
    binary64_t IntLog2 = { .sign = 0, .exponent = 1023 + 52, .mantissa = x };
    IntLog2.f -= 4503599627370496.0;
    
    return IntLog2.exponent - 1023;
}


Even some things that aren't so relevant today (since hardware support for those operations are now common, and so are usually faster and more accurate to these old hacks).

Such as casting from float to int.
Code:
#define F2I(fp) (*(int32_t*)&(double){ (double)(fp) + 103079215104.0 } >> 16)

printf("%d\n", F2I(15.572f));


Or performing a reciprocal square root (fast inverse square root).



(19-06-2013, 04:31 PM)shix Wrote: Here's a really good slide presentation called "Some dark corners of C".

https://docs.google.com/presentation/d/1...slide=id.p

Thought I'd just go into a little more detail on some of the things they mention there (since some things deserve more of an explanation). Also not all the things that are listed there are very appropriate for proper code, somethings basically just belong in the hacks/obfuscated group.

Where they're using array subscripting, and they're switching the placement of the index and the object. Subscripting is actually the equivalent of (*((p)+(i))), or for multidimensional arrays where it sums each subscript together (e.g. a[2][3][4] = *((p)+ ((i0) * (3 * 4)) + ((i1) * 4) + (i2)), or to access like you would in C could be done as follows *((typeof(***a)*)a + ((i0) * (sizeof(typeof(*a)) / sizeof(typeof(***a)))) + ((i1) * (sizeof(typeof(**a)) / sizeof(typeof(***a)))) + (i2))) unless it has one or more subscripts less than the definition in which case it will be the pointer (adjusted according to the indices).

On the example of pointer aliasing and the use of the restrict qualifier, the three examples in there are all technically different, not any one of them is necessarily wrong though. As it just depends on the intended function. The first one x,y,z could all hold the same pointer (or part of each), whereas the second one they're taking a copy of the value z points to, but x,y,z could still be the same as the first one just the returned result could be different from the first one because of this, and the third one restricts z from being the same or part of x and y but x and y could still be aliases of one another.

The smallest compilable C program (though it's actually just the smallest in terms of the length of the source code, not size of the binary), it could be modified to be the smallest (again source wise) compilable and runnable depending on the particular target platform. As some may store the global variable in a memory segment with executable rights. So for instance on x86 architecture, and at least for the most common platforms you can achieve a runnable program by doing one of the following: (0xc3 = ret instruction)
int main=0xc3;
or
const int main=0xc3;
or specifying through the compiler to make the segment executable.
e.g. on Mac OS X use the -segprot option in the linker, so an example to make the (int main=0xc3;) runnable:
clang blah.c -o blah -Xlinker -segprot -Xlinker __DATA -Xlinker rwx -Xlinker rwx

You also can make it smaller by following the rules of legacy C and opting for implicit int declaration, when you omit a type.


The books they recommend, I've only read one (expert C programming, deep C secrets). It's a pretty good book, but I suggest reading it early on rather than waiting to read it. If you're already experienced with C you probably won't get that much out of it. I read it a long time ago, but at the time because of the mention of expert I thought it was going to be very advanced and so waited till much later to read it, which ended up not making it that beneficial.


Messages In This Thread
[C] Hidden features of C - by benwaffle - 12-06-2013, 10:33 PM
RE: Hidden features of C - by yrmt - 13-06-2013, 08:29 AM
RE: Hidden features of C - by bottomy - 13-06-2013, 02:32 PM
RE: Hidden features of C - by bottomy - 20-06-2013, 02:41 AM
RE: Hidden features of C - by venam - 20-06-2013, 03:01 AM
RE: Hidden features of C - by Hans Hackett - 24-06-2013, 06:57 AM
RE: Hidden features of C - by benwaffle - 08-01-2014, 09:56 PM
RE: Hidden features of C - by bottomy - 09-01-2014, 03:54 AM
RE: [C] Hidden features of C - by dami0 - 24-07-2014, 07:23 PM
RE: [C] Hidden features of C - by jmbi - 25-07-2014, 12:43 AM