techex 2017: November 2014

Sunday, November 30, 2014

Trust

When UNIX co-progenitor and super-smarty-pants Ken Ritchie was given a Turing Award, he provided a warning to those within ear shot. Admins and developers often find it satisfactory to review the source code of applications to determine maliciousness. And to a certain extent, this works out all right. Over time we have built a series of expectations of where to expect naughty code based on our experience. We have also chosen to trust other types of tools that we use during this process. We discriminate.

But there's no reason that bad stuff *has* to be in the applications that we expect to find it in. Yes, the clever among us know that compilers can be bad. But we check the source of our compilers and find no bad stuff, and so we assume we are safe.

We do, though, compile the compiler, don't we?

Well, alright then some megalomaniac at Intel or somewhere far upstream decided to embed badness in the embedded distro compilation software. We can still look at the binary of compiled programs to determine What Is Really Going On.

We do, though, tend to use applications to help make machine code human readable, though, don't we?

The point to the thought experiment isn't to stop the unbelievable interchange of ideas and applications that has brought us to where we are today in modern computing. It would be impossible to manually read the machine code of all of the applications that we use and still function in the workplace and our other communities as we are expected to.

Rather, we should be aware of the trust that we place in our tools. We should be aware that when we set out to solve or review problems, we take certain things for granted.

The point is not, that we should never trust. The point is that we should make trust decisions willfully and based on reasonable deductions and facts; not impulse or ease of use.

I came across Ritchie's chat today in a CS class, and its still as relevant today as it ever was. You can read the whole thing here.

Saturday, November 29, 2014

Chess, Encryption and Comic Books (Mind MGMT)

Lately, I've been hooked on a brilliant comic book from genius Matt Kindt, called Mind MGMT. In a nutshell, Mind MGMT follows a cold war era intelligence service based on the conceit that Men Who Stare at Goats-style ESP spook tactics work, and have silently and secretly played a role in the machinations of world politics throughout the 20th century. Mind MGMT is really clever, the art is striking and the whole business is worth a read on its own.

Part of the fun of the comic book is that the creators seamlessly weave the sort of subliminal messaging they use in the plot, into the layout of the comic itself. Fake advertisements in the back of issues contain hidden text, while the margins themselves are formatted like Scantron documents with little limericks where the dotted "fold here" lines usually go.

Just today I read through issue 23, which opens with a tale of a man gifted with the fore-mentioned spying super-powers; a reclusive Bobby Fischer type who communicates through the world with messages encoded in the notation of championship chess game layouts. Have a look for yourself (click to enlarge):

MIND MGMT, Josh Wieder, comic book, chess, encryption

The story fills in the picture a bit, while also providing a series of six chess boards with notation beneath each one. I don't want to spoil the fun of decoding the image for you - what do you think the chess boards spell out? Some things to consider - does each board, or does each notation spell a unique character? Does every board / notation spell the same character every time?

Any way, seeing this inspired me. I don't have much in the way of formal education in cryptography, but even I know that chess boards have been used for cryptography before. What I think would be cool is creating a simple program that would allow you to export a chess board from a computer chess game and use it as part of a cipher for an encryption system. There's even been some more recent publishing being done with chess board cryptosystems (which I have yet to read ... I've got a lot on my plate lately).

Not necessarily the most practical project but IMO a fun distraction / way to sharpen development skills for integrating ciphers into applications.

Friday, November 28, 2014

Programming in C Chapter V - Typecasting

In its simplest sense Typecasting is altering a computer's interpretation of data by implicitly or explicitly changing its data type; for example, by changing an `int` to a `float` and vice verse.

To better understand typecasting, we must start with data types themselves. In programming languages like C, every variable has some kind of `type` that determines how the computer and the user interprets that variable. Each of these data types, for instance `int`, `long long`, `float` and `double` all have their own unique characteristics and are use to handle data types of various ranges and precision.

Typecasting allows us to take a floating point number, like 3.14, and specifying the number before the decimal - 3 - by parsing it to an `int`.

Let's us an example from the English language to better clarify what we mean.

example.

WIND

Each carefully manipulated line in the example above forms a unique symbol. However, these symbols are immediately identifiable to those fluent in a Romance language as letters. We implicitly understand the data type `letter`.

Even more interesting, reviewing the string of `letter` data type symbols composing the example above, we can see that two very different, specific data types are formed. Each of the two words that are formed has a completely different meaning, connotation, pronunciation and history.

There is the noun wind, as in: "The wind blows outside". Yet there is also the verb wind, as in: "Wind up that spool".

This is a valuable analogy inasmuch as it leads us to understand that how we type the data determines how we use that data. The `noun` data type of WIND ought to be used in very different circumstances than the `verb` data type of WIND.

Setting aside more advanced topics such as Natural Language Processing for a moment, let's take for granted that computers do not care about English grammar. Computer programming languages, such as C, rely on the same idea - taking the same bit of data, and using it very differently based on how we cast the `type` of that data.

Here are most common data types of a 32 bit operating system:

1 byte : char
4 bytes : int, float
8 bytes : long long, double

Each byte represents 8 bits of memory storage in a 32 bit OS. Thus, an variable of type `int` will use 32 bits of memory when it is stored. As long as that variable remains of type `int`, the processor will always be able to convert that variable back to its' relevant number. However, we could in theory cast those same 32 bits into a series of boolean operators. As a result, the computer would no longer see a number in that address space, but an altogether different sequence of binary characters. We could then try to read that data as a different numeric type, or even as a string of four characters.

When dealing with numbers and type casting, it is vital to understand how the *precision* of your value will be effected. Keep in mind that the precision can stay the same, or you can lose precision - as in our float 3.14 to int 3 example at the very beginning of our discussion. You cannot, however, gain precision. The data to do so simply does not exist in the addressed memory space you would be attempting to pull it from.

Let's review the 3 most common ways you can lose precision.

Casting a float to an int would cause truncation of everything after the decimal point, leaving us with a whole number.

To perform a float to int conversion, we can perform the following simple operation:

example.

float -> int
float x = 3.7;
(int)x;

In the scenario above, (int)x = 3, because we will have truncated all values after the decimal point.

We can also convert a long long to an int:

long long -> int

As before, this will lead to a loss of higher-order bits. a long long takes up 8 bytes or 64 bits in memory.

Similarly a double can be cast a a float:

double -> float

This will give you the closest possible float to the double without rounding. A double will allow you to store 53 bits or 16 significant digits, while a float has 24 significant bits.

Because floats can only store 24 significant bits, you can only store number up to the value of 2^24 AKA two to the power of twenty-four AKA 16777217.

EXPLICIT VS IMPLICIT CASTING

Explicit casting is when we write the data type in parentheses before the variable name, like so:

(int)x -> explicit casting

Implicit casting is when the compiler automatically changes similar types to a "super-type", or performs some other form of casting without requiring any additional code from the user to perform the operation.

For example when we write the following:

5 + 1.1 -> implicit casting

The values already have types associated with them. 5 is an `int`, while 1.1 is a `float`. In order to add the two of them together, the computer implicitly casts the `int` 5 into a `float`:

(float)5.0 + (float)1.1 -> implicit casting

Implicit casting also allows us to assign variables of different types to each other. We can always assign a less precise type into a more precise one. For instance:

example.

double x;
int y;

We cannot take from this example that `x=y`, because a `double` has more precision than an `int`. On the other hand, it would also be problematic to say `y=x`, because `y` might have a larger value than `x`, and may not be able to hold all of the information stored in the double.

Type casting is also used in comparison operators such as:

< LESS THAN
> GREATER THAN
== EQUAL TO

example.

if (5.1 > 5)

The example above will be returned as one, because the compiler will implicitly cast 5 to a float in order to compare the two numbers. The same would be true of this example as well:

if(2.0 == 2)

Also, don't forget that `int's can be cast to `char's or ASCII values. `char's also need to be reduced to binary, which is why you can easily convert between char's and their respective ASCII values.

Programming in C Chapter IV - Precedence

Precedence is how we answer the question: What operations should we perform first? Whether in solving mathematical equations or writing source code, strict procedural rules of precedence allow the same operations to produce the same results every time.

The first rule of precedence in the C programming language (and many others) is that we always work from the inner-most parentheses out-ward. This is particularly important to remember during bug-testing. Adding parentheses can be a good debugging tactic, but it is bad form to litter your code with un-needed parentheses.

The second rule is that when operators have equal priority, we simply solve from left to right.

With simple arithmetic, precedence or order of operations conforms to PEMDAS - from first to last, in pairs: parentheses and exponents, multiplication and division, and finally addition and subtraction. Multiplication and division share the same precedence in this scenario because, functionally, they are the same operation. After all, division is merely multiplying by the inverse of the denominator. Similarly subtraction can be seen as merely adding a negative value.

example.
3 + 10 / 2 * (9 - 1) - 1
| --> we start with exponents
|
3 + 10 / 2 * 8 - 1 --> * and / are equal, so we do
| both but from left to right
| starting with /
3 + 5 * 8 - 1
|
|
3 + 40 - 1
|
|
43 - 1
| --> last we subtract the two
| remaining digits
42

In C, there are two types of increment/decrement operators:

*Prefix Form*: ++i or --i
*Suffix Form*: i++ or i--

The *Suffix Form* is commonly used in "For" loops. It represents the idea that the current value is used first, and then that value is incremented. The value will only be different the NEXT time the variable is used.

By contrast, using the *Prefix Form* the variable is incremented FIRST, and then it is used in the expression.

Let us consider an example using the integer `x`, which for the purposes of this example we will set to `5`.

example.

int x = 5;

x++; /* x on this line is still 5
if we were to print it immediately
we would get the value 5 */
x; // here, on this line, x = 6
++x; /* using the prefix increment, the value is
incremented first - so on this line the
value would be 7 */

POINTER NOTATION

The last precedence issue that we will consider deals with pointer notation. The Dereference Operator - * - has priority over basic math operators, but not over the suffix increment and decrement operators. This leads us to our final example.

example.

int x = 7; // we create an integer, x and set it to 7
int *y = &x; /* we create a pointer, y, and assign in
to the address of x. When we dereference
y we should get the number assigned to x
which in this case is 7 */
*y++; /* this situation appears to be ambiguous.
do we dereference the pointer to 7, and
then increment? Or do we increment first
and then derefence? In reality, because
the dereference operator does not have
precedence over suffix increments, the
pointer address itself is incremented -
leading to *y being pointed away from x
and into a completely different segment
of memory entirely! */
(*y)++; /* this is the solution to the connundrum
presented by *y++. (*y) dereferences
first to the address of x, which is 7
ONLY THEN does it increment, to 8 */

To recap, our Precedence can take list form as follows:

1. Follow Inner Parentheses Outward
2. i++, i-- Suffix operators
3. *x, &x Dereferences and Address operators
also 3. ++i, --i Prefix operators
4. *, /, % Simple math operations mult, div, percent
5. +, - Simple math operations add, subtract

Programming in C Chapter III - Boolean Values & Operators

Today we will learn a bit about Boolean values, operators and expressions.

Boolean values and conditions are named after 19th century mathematician and logician George Boole who pioneered a field of logic now referred to as Boolean logic; which is based upon grouping and comparing *Boolean values*.

*Boolean Value* - a variable that has two possible conditions; TRUE and FALSE.
Similar to a light switch that can be either on or off, or how binary
numbers can be either 1 or 0.

Boolean values are seem fairly simply on the surface. However, they allow for a dynamic array of combined values that allow for nearly infinite complexity.

*Boolean Operator* - A Boolean Operator combines two Boolean values into a single value. The two most common of such operators are AND and OR, but there are quite a few additional Operators we will explore as well.

*AND* - results in a value of TRUE ONLY if BOTH input values
are TRUE.

examples.
false AND true = false
false AND false = false
true AND true = true

*OR* - results in a value of TRUE if EITHER of the
operator's input values are TRUE

example.
false OR true = true
true OR true = true
false OR false = false

*NOT* - accepts a Boolean Variable and provides the
opposite of that variable.

example.
NOT true = false
NOT false = true

Combining variable and operators into a single statement results in a *Boolean Expression*. Boolean Expressions can be *nested* to provide more complex outcomes. As numbers are prioritized and grouped using the arithmetical order of operations (PEMDAS), Boolean expressions can be grouped using paranthesis.

example.

x=TRUE
y=TRUE
z=TRUE
x AND (y OR (NOT z)) ---> this is 3 expressions
|
|
x AND (y OR FALSE) ---> evaluate the innermost expression 1st
|
|
x AND TRUE ---> finally we reach a tautology
|
|
TRUE AND TRUE = TRUE ---> resolving the nested expressions as
TRUE (all Boolean expressions
necessarily resolve to tautologies or contradictions)

Next, let us review how these nested expressions can be used in a programming language. In C, the syntax for Boolean expressions is different than simply using the words AND, OR and NOT. The translation is as follows:

AND = &&
OR = ||
NOT = !

The nested expressions from our previous example can be translated into C rather simply:

initial example.

x AND (y OR (NOT z))

translated to C.

x && (y || (!z))

Now that we know how to translate Boolean expressions into C, how do we use it? Consider the example of a script that should execute only if a given variable is TRUE. For this purpose, nearly all programming languages provide support for the IF condition.

example in pseudocode.

IF x=true
THEN do Y

example in C.

bool x=true;
if (x)
{
// do Y
}

What if we wish to provide an alternative branch, to assign the script a set of instructions for when x IS NOT true? In these circumstances, we would append ELSE to our IF condition (forming an IF...ELSE condition)

example in C.
bool x=true;
if (x)
{
// do Y
}
else
{
// do Z
}

Another important function is the ELSE IF condition. Using else if, we can branch application behavior in more than two directions.

Suppose you have two Boolean values you would like your script to consider, called X and Y. We want our program to do one thing if X and Y are both TRUE. We want the program to do another thing if either X or Y are TRUE. And finally, we want the program to do a third thing for every other combination of truth statements for X and Y not covered in the first two conditions.

We can express such an application in C using the following code and relying on ELSE IF (in this example we declare X and Y to be TRUE and FALSE, respectively)

example.

bool x = true;
bool y = false;
if (x && y)
{
//code
}
else if (x || y)
{
//code
}
else
{
//code
}

Working with variables in this manner is useful, but we appear to be limited to only a few conditions. Booleans become much more powerful when we introduce *comparisons*. *Comparisons* are ways to evaluate values that are not originally declared as Boolean.

To see if two values are the same, we use: ==

== returns TRUE if the values are equal and FALSE if they are not.

Other common comparisons are:

LESS THAN : <
GREATER THAN : >
LESS THAN OR EQUAL TO : <=
GREATER THAN OR EQUAL TO : >=

Using one last concrete example, let's combine ALL of the topics related to Boolean values we have covered so far, as well as a few topics from our previous discussions as well.

Suppose we have two variables: `temperature` and `isHungry`. `temperature` is a floating point number that can have decimal places. This example is a very simple program to tell someone what to eat based on the temperature.

example.

if (hungry && (temperature >= 100))
{
printf("eat ice cream");
}
else if (hungry && (temperature <= 0))
{
printf("eat spicy food");
}
else if (!hungry)
{
printf("don't eat anything");
}

That's all for today's lesson on Boolean. Try writing your own program using a Boolean function in a loop, such that when the the Boolean value changes, the loop terminates.

Thursday, November 27, 2014

Tamir Rice Video Casts Doubt on Statements from Police

There seems to be a great deal of confusion about what happened between Tamir Rice, a 12 year old who was playing in a park with a BB gun, and the police officer who killed him.

Take, for example, this:

Quite a few members of the "public at large" seem to be convinced that young Tamir Rice was brandishing a convincing pistol replica at the police. The police, after begging Rice to lay down his weapon multiple times, were forced to open fire when young Tamir made some sort of furtive movement toward his waist band, in which this make-believe pistol was ensconced.

While I find it quite troubling that so many of our fellow citizens find it reasonable to leap to the defense of today's police force immediately after they gun down a pre-pubescent child, perhaps in this instance the Public can be forgiven. After all, the narrative described above has largely been formed from police statements of what happened.

Here's the police version: A man calls 911, informing them that someone is brandishing a pistol in the park. We also know this man told 911 dispatchers that he believed the gun was fake. Police claim that they were not informed of this key detail - which, frankly, should be a controversy in and of itself. As we will see, though, its not the worst of what happened.

Shortly after the call, police responded on the scene. The police said the officer yelled at Tamir three times to show his hands, but the boy instead reached to his waistband for the object, which turned out to be a fake gun.

Its this last detail that is really the clincher. Based on the police description of events, their response was still tragic, but reasonable. Police told Tamir to raise his hands three times. The boy failed to respond, and instead reached for what the police had been told was a gun. They then killed him.

The issue is that the officer's description of events is, at worst, an outright lie and at best an intentionally misleading misrepresentation of events. Fortunately, the Tamir family was able to receive and make public a video of the park that completely captures the events. This video is below. I would encourage readers to watch the video; the shooting occurs begin at time code 7:00

As can be clearly seen in the video, police arrive on the scene by nearly bowling over Tamir Rice with their car. The officer on the right opens his car door and immediately opens fire. The New York Times has reported that the length of time that elapsed during this period is 2 seconds.

2 seconds is enough time to open a car door and shoot someone. 2 seconds is not long enough to order someone to raise their hands and for that person to respond to even one such warning. Particularly when the person responding is a 12 year old child who has nearly been struck by a car and is no doubt completely shocked and terrified about what is occurring. There is no way that police told Rice to raise his hands 3 times; 2 seconds is simply not enough time for that to occur.

It is my hope that those who currently support the police narrative of events watch this video. It is my hope that people will question why the police statement is completely irreconcilable with the events in this video. It is my hope that people will question why the image of a child with a toy pop gun is an event worthy of calling the police, when in the recent past such an image was as American as Apple Pie. Does the boy below strike you as being a legitimate threat to law enforcement?

The Justice system in the United States is broken. In our fear we have built a machine that destroys lives and devours children. We ought to pause, now, to consider what can be done to stop it.

Tuesday, November 25, 2014

How To Find Files Over a Certain Size Using Redhat/CentOS/Fedora Linux

Here is a quick tip for all of those Redhat/CentOS/Fedora users out there. Do you need to find all files over a certain size, either in a specific directory, your current directory, or in your entire computer/server?

No problem, just execute the following:

find / -type f -size +500000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

In the example above, I am looking for all files over 500MB in size (500000k, where k = kilobytes). The place where I have typed "/" in the above command indicates the path to search in. By selecting "/" I am searching in the entire filesystem; I could easily indicate a specific directory by changing my command as follows:

find /path/to/my/directory -type f -size +500000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

Alternatively, I could search in my current directory by replacing "/" with "." like so:

find . -type f -size +500000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

Easy!

Thursday, November 13, 2014

The FBI's Letter to Martin Luther King Jr - Uncensored for the First Time

The vile letter above speaks for itself. The letter you see above, published for the first time fully unredacted by the New York Times yesterday, was sent by the FBI to Martin Luther King Jr, in order to compel him to commit suicide. The letter was apparently enclosed with a recording that the FBI believed could provide King was cheating on his wife; the impetus for their suicide demand.

The sections that have for decades remained redacted explicitly reference the tape - it becomes apparent that the only possible reason for censoring this material is that it contains proof that the FBI was conducting warrantless surveillance of US citizens for purely political reasons.

Fortunately we live in an enlightened age where such warrantless eavesdropping is merely a curio of the distant past. Oh, wait ...

I Ran Windows 7 Updates and My Desktop Went Completely Black! What Do I Do?!

So last night (11-12-14) or this morning you ran a package of `Important` Windows Cumulative Security Updates. Gee those do sound important! There were about 11 or so - specifically, the ones most likely to give you trouble are these:

Update for Windows 7 for x64-based Systems (KB3008627)
Security Update for Windows 7 for x64-based Systems (KB3003743)
Security Update for Windows 7 for x64-based Systems (KB2993958)
Security Update for Windows 7 for x64-based Systems (KB2991963)
Security Update for Windows 7 for x64-based Systems (KB3005607)
Security Update for Windows 7 for x64-based Systems (KB2992611)
Security Update for Windows 7 for x64-based Systems (KB3010788)
Security Update for Windows 7 for x64-based Systems (KB3002885)
Security Update for Windows 7 for x64-based Systems (KB3006226)

After diligently downloading and installing these updates, you allow your computer to reboot. The boot process goes smoothly, you log into your computer, only to find a stark black screen greeting you. Your entire intricately-designed array of desktop icons is gone. Your Desktop Image is replaced by an inky black nothing. Worse even than the blackness of space - even space has stars.

The frank obituary to your beautiful desktop's demise is the following:

C:\windows\system32\config\systemprofile\Desktop refers to a location that is unavailable. It could be on a hard drive on this computer or on a network. Check to to make sure the disk is properly inserted.

And its' not just the icons or the desktop. Trying to search for an item from the Start menu will produce an error along these lines (where searchstring is whatever you typed in the taskbar):

"Windows cannot find `search:query=searchstring` Make sure you typed the name correctly, and then try again."

Microsoft's Mouse and Keyboard Center failed to load completely; despite this, my laptop's USB mouse and embedded touch pad functioned properly.

Even non-Windows related applications will have problems. When I encountered this error, I had to launch Google Chrome as an Administrator in order to get it to run. I also use an incredibly handy text editor in Windows called Notepad++. Notepad++ is an ingeniously formatted gem of a Windows text editor; it can open text files that are sized well into the tens of megabytes without crashing, it color encodes scripted text for programming; its awesome. Use it; its free. Anyway, launching Notepad++ also produced errors; the application was unable to find a variety of XML configuration files.

We've established that this problem sucks. So how do we fix it?

First, you may have problems opening a Command Prompt due to the `search:query` error mentioned above. If you have Powershell installed, use that - it will save time and headaches and can function exactly as a normal command prompt would. If installed you can typically find Powershell in the Start Menu by navigating to All Programs -> Accessories -> Windows Powershell

If you do not have Powershell installed you will have to suffer through by opening a window from `My Computer` by navigating through the Start Menu: click Start and then Computer. It is likely that each time you open a window in this manner a new error message will be produced telling you that your systemprofile\Desktop is missing. You can ignore the error, clicking OK to remove it and proceed.

From here on we will be using in example in which the username we are using is Josh. On your computer you will of course replace Josh with your own username.

Either way you use (powershell of window), navigate to the User directory, which for the purposes of this tutorial will be C:\Users\Josh

We are here to check first and foremost that your actual Desktop folder and files still exists. Click or `cd` to the Desktop folder and take a quick look to ensure that everything still exists. If it does, proceed with the tutorial. If your Desktop folder is missing, than stop here - the issue I am describing should not have caused the entire deletion of your Desktop. You will need to restore these files from backup before continuing to troubleshoot; I hope you kept a backup!

Anyway, for those of us who found that C:\Users\Josh\Desktop exists and is populated with files, we will then navigate to the root of the problem: C:\windows\system32\config\systemprofile\

In this directory you are likely to find three items: Two folders, one named "AppData" and the other named "Contacts". The third item will likely be ntuser.dat - although it may be missing if your Windows folder settings are configured to "hide protected operating system files".

FYI don't be a wimp - BE A POWER USER and go to Organize -> Folder and Search Options -> View tab. From there UNclick "Hide protected operating system files" and select the radio button next to "Show hidden files, folders, and drives". Once you have done this you will notice that a new universe of system files is now available for your perusal. I offer less experienced users this tidbit with the explicit promise that they will refrain from two things:

1. DO NOT Delete Files Because You "Don't Know What They Do".
Only Delete Files That You Fully Understand.

2. When You Encounter an Esoteric File DO NOT Search the Name of That File in Google.
All of the Websites in Google Will Tell you it is a Virus and Compel You to Purchase Their
Magical Program to Remove Said Virus. To Understand System Files, You Must RTFM and
Other Actual Books. Like From a Library Books.

Anyway back to the fix. Within the directory C:\windows\system32\config\systemprofile\ you must create a new directory (by right-clicking and selecting New -> Folder or issuing the command mkdir in PowerShell) and name it "Desktop".

Immediately after creating this directory you may notice that some icons have appeared on the black-as-death desktop. However, these won't include your normal icons, and your desktop image as well as any items you have stuck to the taskbar will not have reappeared. Thats okay - right now you are relying on a broken copy of the "Public" Desktop, and the appearance of icons is a signal that the Public profile is getting better.

To finally get all of your profile settings back, along with the precious icons and desktop doo-dads, simply goto Start, hover your mouse over the arrow next to "shut down" (not shut down, just the arrow) and click "Log off" from the resulting contextual menu.

You will be prompted to Log back into your account. Do so, and you will find that everything in your desktop is back to normal. Enjoy!

Wednesday, November 12, 2014

Programming in C - Chapter II - It Really IS Rocket Science

Problems arise with numerical expression in computing. In reality, there are an infinite number of real numbers. However there is clearly not an infinite amount of infinite memory even in the largest of super-computers, and memory that is addressable by an application is only a fraction of the total finite available memory. How to we deal with these obstacles? We will explain more in a moment.

First let's overview in more detail how the C compiler handles numeral types. Consider the application below:

#include <stdio.h>

int main (void)
{
    float f = 1 / 10;
    printf("%.2f\n", f);
    return 0;
}

Here we declare a float, 1/10 which should clearly resolve to 0.1 or 0.10 since I am declaring that printf provide a float with two digits after the decimal point. However, upon complation and excecution the program will stubbornly return a value of "0.00".

Why?

The issue is that I am declaring a float as an operation of two integers - 1 and 10. As a result, the compiler is performing an "implicit typeset" of the float "f" 0.10; as a result throwing out EVERYTHING after the decimal point (without so much as performing a round function). The truncation occurs before the storage in memory (note this is different from an implicit typeset); as a result the float f is able to store "0" as "0.00" without the compiler presenting an error about a symbol conflict or the wrong variable type. "0" isnt an int; it is a truncated float.

How do we resolve this? The easiest solution is to convert the "int" input into "float" input like so:


int main (void)
{
    float f = 1.0 / 10.0;
    printf("%.2f\n", f);
    return 0;
}

Which produces:

#: ./float0
0.10

That is not the only resolution, however. We can explicitly cast the input integer into floats like so, by providing the preferred variable type in parantheses () directly in front of the input to be "cast" into the correct type. This is called *explicit typecasting*:


#include <stdio.h>

int main (void)
{
    float f = (float) 1 / (float) 10;
    printf("%.2f\n", f);
    return 0;
}

Note that 1 and 10 could just as easily be actual variables in the above example.

Let's return to an earlier example, in which we tried to chan the int inputs 1 and 10 into float inputs 1.0 and 10.0. What happens when we look more closely at the output - for instance, by drawing out the decimal point to 20 places instead of 2, like so?


#include <stdio.h>

int main (void)
{
    float f = 1.0 / 10.0;
    printf("%.20f\n", f);
    return 0;
}

The output becomes imprecise:

#: ./float1
0.10000000149011611938

Why? Floats, like all of our other variables types, have a finite amount of memory with which they can be addressed and not an arbitrary amount. Thus there is a finite amount of real numbers that can be represented with each float, as we discussed above. The imprecision that we see here is a result of that limitation.

Put another way, we begin to see a core paradox in computing limitation:

There is an INFINITE amount of numbers that computers CANNOT represent with a FINITE amount of bits.

This limitation has real world consequences. The lesson proceeds with a video from the show "Modern Marvels: Engineering Disasters". The video described the June 4th, 1996 launch of an Un-manned Ariane 5 Rocket carrying satellites designed to determine precisely how the Earth's magnetic field interacts with Solar Winds. The rocket was built for the European Space Agency, and was launched from a facility in French Guyana.

37 seconds into the flight, engineers responsible for the launch first determined that something was wrong - the rocket's nozzles were swivelling in a way that they should not have been.

40 seconds into the flight it became clear that the vehicle was in trouble and might not survive the launch. Mission control at that point made a decision to destroy the rocket completely. The rocket, if it had been allowed to fail on its own, could have become a hazard to public safety as parts and components rained down unto homes and onlookers below.

This was the very first launch of the Ariane 5 class rocket, and the failure that occurred was the result of a software issue.

The Ariane 5's software contained a number that required 64 bits to express correctly. The developers intended to convert this into a 16 bit number. They assumed that the number would never be vey big; most of the digits in that 64 bit number would be zeroes. The assumptions about the size of the 64 bit number were wrong.

Most of the software from the Ariane 5 was originally designed for the Ariane 4; in which the software had been successful. The software was carried over to the new model as it had posed no problems in the old model. However, there was a key difference between the two rockets. The Ariane 5 accelerated much faster than the Ariane 4. The 64 bit number described above was a function of acceleration; numbers that remained "mostly zeroes" in the older model were no longer so in the new, faster model.

The Ariane 5 was not the first rocket in which data conversion (or "type casting") errors played a role in modern rocket technology. In 1991, with the start of the first Gulf War, the famous American PATRIOT Missile manufactured by Raytheon Corporation experienced a failure similar to that of the Ariane 5. As a result of the PATRIOT Missile failure, 28 American soldiers were killed and approximately 100 other wounded when a PATRIOT Missile designed to target and destroy incoming Iraqi SCUD missiles did not fire.

The PATRIOT Interceptor was deployed to protect Saudi Arabia and Israel from Iraqi SCUDs in 1991. The PATRIOT is a medium range Surface to Air rocket. The Orwellian acronym stands for Phased Array TRacking Intercept Of Target. The missile is often loaded into a carriage delivery system that is mounted onto the back of a truck. It is designed to be portable; the Patriot is a 20 feet long and weighs 2000 pounds - while the payload itself is a mere 150 pounds. That payload is a high explosive fragmentation device; the casing of the warhead is designed to act as buckshot. The missiles are packed in a container that holds four and loads onto the pack of a semi trailer.

PATRIOTs have a long history; 20 years before they obtained world-fame through the marketing efforts of George Bush Sr. Originally they were thought of as an air-defense battery, to shoot down planes. The anti-missle capability was a new-feature, one implemented to fulfill the military's unique mandates in the Gulf during Round 1 of the Hussein v Bush saga.

SCUDs fly much faster than the average plane: Mach 5. But reaching the right speed was only part of the problem of the sort of upgrade that Raytheon needed to pull off to keep their Pentagon paymasters happy. When the PATRIOT was rushed into service, Raytheon was unaware that the Iraqi military had modified their fleet of SCUDs; making interception much more difficult; close to impossible. Ironicallly enough, the Iraqi modifications were not intended to defeat interception, but to increase the range of the missle from their original 300km to 600km

The modified SCUDs would `wobble` as they flew inbound to their target, maintaining an unstable trajectory. To increase the range of the rockets, the Iraqis had taken weight out of the warhead, which was loaded onto the tip of the SCUD. This unstable trajectory meant that in the overwhelming majority of cases, PATRIOT missles would fly right past the SCUD, missing it entirely.

Once operators of the PATRIOT realized they had missed their mark (which was close to 100% of the time), they detonated the payload remotely as did the mission control for the Ariane 5 - except in this case the operators could not care less about the safety of those below the PATRIOT; they were concerned that allowing the PATRIOT to land would allow the Iraqis to salvage components. Some might view this concern as strange, given that the PATRIOT was at this point completely worthless. Perhaps the Pentagon feared that Iraqi military engineers could succeed where Raytheon corporation had failed? Not quite - the Pentagon filmed these remote detonations and provided them to the credulous Press, who released the footage of giant airborne fireballs while declaring breathlessly that the film represented yet another victorious interception by the PATRIOT missile of one of Saddam Hussein's diabolical SCUD warheads. These clips, filmed exclusively in night-vision, had a blinding glare during the PATRIOT remote detonation; it was during the moment, when the screen was all greenish-white, that the SCUD streaked past, completely undamaged. When the film began again, viewers saw the night sky, empty except for a shower of sparks and debris that they assumed was the mingled destruction of both missiles but contained only the destroyed PATRIOT.

What was in Iraq a failure rate of 100% became a success rate of 100% for the TV audience back in the United States. Support soared for the glorious campaign and its fearless leader, the command-in-chief.

But you can only fool all of the people some of the time. For Gulf War I, the fakery came to a startling and sudden end one night in the desert of Dhahran in Saudi Arabia. There, even the Pentagon spin doctors were not bold enough to replace the uniforms of the dead. There could be no mis-understanding; the PATRIOT missile was a complete and utter failure, a failure that left 28 young Americans dead.

In Dhahran a PATRIOT battery's radar system lost track of an incoming SCUD. The PATRIOT interceptor did worse than miss the incoming missle; it never launched at all. Because of a programming error.

It was clear to anyone paying attention and with the proper access that the Pentagon was completely un-bothered by the apparent madness of conducting a military campaign in which the primary armament was essentially a plastic pistol that when fired produced a flag printed with the cartoon word "BANG!". So it should be no surprise to us that it was not the Pentagon who pointed out and resolved the software glitch killing American soldiers, but the Israelis. The Israeli military first caught on that the longer a PATRIOT missile system remained on, the larger was the time discrepancies in the targeting systems became. The time discrepancies were the result of a clock application in the targeting computer.

At two weeks before the debacle in Dhahran, the Israelis reported to the Defense Department that the PATRIOT computers "lost" time. After about 8 hours of operation, the system became significantly less accurate. DoD, in their infinite wisdom, responded to the Israeli warning by telling all PATRIOT operators to regularly reboot their targeting computers. The DoD failed to specify how long the PATRIOT should remain online before a reboot; this crucial detail was left as an excerise for the reader. 8 hours? 10 hours? 1000 hours? The operators did not know what the Israelis knew - that there was a very specific window of operation after which the PATRIOT would fail to even appear functional.

The PATRIOT missle battery in Dhahran had been online continuously for 100 hours on the night of February 25th, 1991.

The targeting computer clock was designed to track time to an accuracy of 1/10th of a second. Unfortunately, fractions cannot directly and exactly be expressed in binary.

For example, lets consider the example of 1/3 (or one third). As a decimal number, we cannot exactly express 1/3 - it is a repeating decimal, 0.333333[...], where .3 repeats infinitely. Because computers have a finite amount of memory, the infinite decimal version of 1/3 cannot accurately be represented.

It is this issue that impacted the PATRIOT as it attempted to calculate values represented by fragments of a second. And due to the design of the PATRIOT's flawed application, the errors compounded over time.

After 100 hours of continuous operation, the time measurement flaw had compounded to a misrepresentation of time values by one third of a second. Not long under ordinary circumstances - but a lifetime for a SCUD missile travelling at Mach 5. At Mach 5 a targeting error of 1/3 of 1 second resulted in a trajectory miscalculation of 600 meters.

Immediately preceding the Dhahran catastrophe, a SCUD launch was detected by early warning satellites orbiting the Earth. The satellites were able to predict a general trajectory of Dhahran for the location of the SCUD's impact, but not exactly where it would hit. The PATRIOT's radar system was designed to calculate the missing part of the trajectory and fire an interception missile. The PATRIOT radar system would send a radar pulse, and based on that pulse was designed to format a prediction of the location of the SCUD at the time of the next radar pulse. Such a prediction simulation is referred to as a "range gate". The PATRIOT would then correct its next prediction based on the results, and do so several times; this prediction information would form the basis of the PATRIOT missile's own targeting solution. After all, the PATRIOT missile had to be pointed at where the SCUD *would* be after a given interval of time based on the speed of both the PATRIOT and SCUD; not where the SCUD *was* at the time of the launch of the PATRIOT missile.

It is not difficult to see how absolutely vital an accurate clock is to the proper function of such a system. Even worse, there was an additional "feature" of the PATRIOT radar system that ensured that the malfunctioning clock would result in disaster at Dhahar.

At any given time, there are a multitude of objects flying through the air. Birds. Clouds. In a war zone, there are friendly military aircraft. On top of this is the fact that radar is not, in itself, a flawless data input system. Radar systems are not unknown to produce blips that do not correspond to physical objects for all sorts of reasons.

As such, the PATRIOT system had to have a function for error detection - it just wouldn't do to launch a 2,000 pound missle at a flock of geese. To this end, the targeting computer had a window during which trajectory calculation would occur. If an object remained in this window, trajectory calculation would continue. If, however, an object did not appear within the window than the PATRIOT would dismiss further calculation and label prior input as a false positive. With such a function, again, correct time calculation is absolutely critical. At Dhahrar, the broken time calculator let the PATRIOT battery to look in the wrong areas to calculate its range gate. When the SCUD failed to appear within the range gate, the PATRIOT dismissed prior input as a false positive; when in fact the SCUD was still on its way, but one third of one second outside of where the PATRIOT was looking for it.

The incoming SCUD bypassed the PATRIOT defense battery, slamming directly into its intended target - a military barracks housing hundreds of American soldiers.

All because of incompetent type casting within the PATRIOT targeting application; and furthermore due to the absolutely unethical handling of those software issues by the Defense Department, who knew long in advance of the critical failures of the PATRIOT system but did not take proper action to inform the soldiers whose lives depended on the PATRIOT. Perhaps Pentagon officials feared that explicit orders to replace the software would cause a publicity disaster for those involved with the procurement of the PATRIOT system and their friends at Raytheon. There is no other explanation for the DoD's actions, besides perhaps acknowledging that the DoD cannot function in even the most basic and vital of bureaucratic tasks - that the Pentagon excels in buying war toys and marketing wars but is completely incompetent in the waging of war. Both options, as of the date of this writing, strike the author as equally feasible.

The SCUD missile that destroyed the Dhahran military barracks in Saudi Arabia was the last SCUD successfully fired by Saddam Hussein during the first Gulf War.

Perhaps I am being too harsh on the Pentagon and their pals at Raytheon. A software patch for the PATRIOT missile targeting system was in fact built and delivered to the front. It arrived in Dhahran on February 26th, 1991 - the day after the barracks were destroyed.

***NOTE: This is the Second Chapter of an ongoing series published here on Programming in C which is based upon the Harvard Class CS50. Content here from that class is published solely for educational purposes; I see no profit through the publication of this website including advertising. A good portion of today's post includes content from the show "Modern Marvels", parts of which are transcribed here - again only for the fair use of educational purpose. All barbs, jokes, political statements and assignations of responsibility for the Dharar disaster are entirely my own and not the opinions of Harvard or "Modern Marvels".

"The Box" - New Short Film Shows NY Kids in Solitary

A 5 minute animated short film, called "The Box", recently won a well-deserved award from the New Orleans Film Festival. "The Box" is directed by Michael Schiller and produced in part by the Center for Investigative Journalism

The film follows Ismael “Izzy” Nazario, a 16 year old child who spends 300 days in solitary confinement while imprisoned in Rikers Island. This time was done before Izzy was convicted of a crime. Izzy's mother had fallen victim to cancer before his arrest, leading Izzy to become less engaged in school and try to escape a suffocating situation at home by falling in with friends in the street. This lead to an arrest for theft.

"The Box" uses powerful animation, scrawled in black and white like a sketching on a concrete wall. The images are reinforced by a voice over from Izzy, who describes confrontations with older prisoners who try to steal his shoes, how the ink on letters he received would run and smudge from sweat caused by the insufferable heat in the box.

Viewers see the damage caused by putting a child into a 6' x 8' metal cage and keeping him there for a year. Black dots start to trail through Izzy's vision. Voices from prisoner across the hall seem to echo inside Izzy's head. As time goes on, Izzy makes the time pass by having full-blown conversations with himself. It is hard to understand how forcing a child into these circumstances will provide any benefit to society or the victim of that child's mild crimes.

"The Box" is a film worth watching. You can view the whole thing by following this link.

And Izzy? Despite the awful circumstances he contended with, he's doing quite well. Mr Nazario is now a case worker for children coming out of Riker's Island.

Monday, November 10, 2014

How To Enable CLR on a Microsoft SQL 2005 Server

A while back I worked for a small hosting firm that focused on Microsoft products. As part of my responsibilities I wrote a great deal of documentation for them for a variety of tasks - some basic, some more advanced and problematic.

Anyway I was pleased to see today that these tutorials are still published on their site. Follow this link, for instance, to read an instructional guide on how to enable CLR with MSSQL 2005.

Saturday, November 8, 2014

C Programming Tutorial Part 1 - Compiling C using clang

Part 1 of our C Programming Tutorial covers the basics of compiling. What is a compiler? How does it work? How do I use a compiler to write programs in C?

Every application that you write in C will have to be compiled. Furthermore, compilation errors and failures will be your first indication that you have made a mistake in your program somewhere. Understanding your compiler in and out will help you to write code much more efficiently.

For the purposes of our tutorial today, we will be discussing the clang compiler. clang is widely used - iOS developers should recognize it as the compiler used for developing iPhone apps as part of xCode and Apple's LLVM. I will also use a number of demonstrations; these demonstrations will include source code written in C, assembler and some garbage ASCII that is representative of machine code viewed through a text editor. For my part, I am using a Fedora Linux virtual machine for these demonstrations. That said, as I discussed initially in the introduction to this series of C Programming tutorials, you can follow along using an operating system of your choice, and a compiler of your choice. For years I have used gcc as my compiler-suite-of-choice; I have opted for clang here because it is what is used for the Harvard CS classes that form the basis for this tutorial and many of the included materials. Whether you use clang, gcc or another option for your compiler, nearly all of the principles we discuss here today will remain applicable. You may need to alter the examples just a bit based on the documentation for your own compiler, but that is all. Using a different operating system, like Windows instead of Linux or Ubuntu instead of Fedora, should have little impact on how you follow along; however if you are using MinGW keep in mind that by default your compiler will be gcc and not clang.

If, though, you would like to use *exactly* the same environment I am using, I recommend downloading and installing a virtualization platform like VMWare Player and loading the Harvard Computer Science Appliance. This will give you a very pared down version of Fedora that has C and clang installed and ready to go. You may download the Appliance at no charge here. The same goes for VMWare Player, which is also free and available here.

From here on out, I will be assuming you have C and a compiler installed.

This brings us to an important point that I must address before diving into this first tutorial. Source code and materials for this tutorial are based on Harvard University's CS50 as taught by David Malan. Reproductions of any such material are for purely academic and educational purposes only; I have profited in no way from their reproduction here in any form, particularly as blogspot.joshwieder.com has always and continues to decline the introduction of advertising to protect the privacy of my readers. Doctor Malan and Harvard University have been kind enough to offer these materials under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License, which readers can verify directly on the courses website here. Allowing for the free sharing and circulation of the course materials of one of the finest universities in the world at absolutely no charge to anyone is in keeping with the highest ideals and principles of education and open access. I can think of no higher praise for Harvard and David then what I am attempting to do here, on this site - spreading the information that they have so generously offered to me as a student.

With all that said, lets get to it, shall we?

COMPILERS

What does it mean to *compile* something? In the most general sense, it means transforming code written in one programming language into another. But usually when a developer refers to compiling their code, they mean they have converted their *higher level* programming language (for example, C) to a *lower level* programming language (for example, assembly and machine code).

We will be relying on the compiler "clang" for the purposes of this document. However, many of the core concepts described here apply to a variety of other compilers.

There are FOUR STAGES to the compilation process:

Preprocessing (performed by the Preprocessor)
Compilation (performed by the Compiler)
Assembling (performed by the Assembler)
Linking (performed by the Linker)

PREPROCESSING

Preprocessing can be assigned directives via the source code, using code preceded by a hash sign (#). For example:

example.

    #include <stdio.h>  
    #define NAME "Josh"

Commands preceded by a hash sign in the source code in this manner are referred to as a *Preprocessor Directive*. Two of the most common Preprocessor Directives are "include" and "define".

Using the clang flag "-E", you can compile using only the Preprocessor.

example.

$ clang -E hello.c

By default, the results of "clang -E" print to standard output (i.e. the monitor). However, the results can easily be saved to a new file:

example.

$ clang -E hello.c > hello2.c

In the example above, we have instructed clang to compile only the preprocessor directives in the file `hello.c`, and save the resulting output to a file named `hello2.c`.

Let us assume that in the example above, our file hello.c prepended the preprocessor directive `#include <stdio.h>`. In the resulting output of `hello2.c`, we would see the entire contents of stdio.h copied to the beginning of the file, replacing the #include directive. This is an example of what makes #include directives so useful. Instead of having to manually review all of the function declarations in the stdio.h header file that you need to use for a program, the preprocessor takes care of that chore for you, so long as you provide a single line declaring the #include statement.

COMPILATION

This second stage is where clang actually transforms C source code into assembly code. In order to have clang convert your C into assembly, but proceed no further, use the -S flag.

example.

$ clang -S hello2.c

Using `clang -S` will result in a *.s file (in the example above, hello2.s). The resulting assembly encoded language is processor-specific. In this tutorial, the code has been compiled on a Virtual Machine with a x86 vPU; as a result, the assembly code transformation is as follows (white space and carriage returns have been included as originally formatted with no additional indentation):

example.

application written in C.

-------------------------------------------------------
#include <stdio.h>


#define NAME "Josh"

int

main(int argc, char *argv[])
{
printf("Hello, world! My name is %s!\n", NAME);
}
-------------------------------------------------------

example.
application written in x86 vPU specific assembly.
-------------------------------------------------------
.file "hello2.c"
.text
.globl main
.align 16, 0x90
.type main,@function
main: # @main
# BB#0:
pushl %ebp
movl %esp, %ebp
pushl %esi
subl $20, %esp
movl 12(%ebp), %eax
movl 8(%ebp), %ecx
leal .L.str, %edx
leal .L.str1, %esi
movl %ecx, -8(%ebp)
movl %eax, -12(%ebp)
movl %edx, (%esp)
movl %esi, 4(%esp)
calll printf
movl $0, %ecx
movl %eax, -16(%ebp) # 4-byte Spill
movl %ecx, %eax
addl $20, %esp
popl %esi
popl %ebp
ret
.Ltmp0:

    .size main, .Ltmp0-main

.type .L.str,@object # @.str
.section .rodata.str1.1,"aMS",@progbits,1
.L.str:
.asciz "Hello, world! My name is %s!\n"

    .size .L.str, 30

.type .L.str1,@object # @.str1
.L.str1:
.asciz "Josh"

    .size .L.str1, 5

.section ".note.GNU-stack","",@progbits
-------------------------------------------------------

Very few modern developers write their applications in assembly. However, every application written in C is converted into assembly during the compile process.

Earlier, we described the *Compilation* process as transforming a higher level language into a lower level language; in this case, the higher level language C is transformed into the lower level language x86 vPU assembly. What makes assembly a "lower level" language than C? In assembly, we are very limited in what we can do. There are no loops of any kind. Yet, you can construct the same operations that loops and other un-included functions offer through the limited control structures that assembly does provide.

ASSEMBLING

It is the Assemblers job during this third stage of clang's compile process to convert the assembly code we just viewed and discussed into "machine code". To wit, the Assembler does not provide Assembly as output. Assembly code is provided to the Assembler as Input, and the Assembler provides *machine code* as Output.

*Machine Code* - the actual 1's and 0's that a CPU can understand

Using `clang -c` we can use the Assembler to parse code that has been translated into assembly. This will result in a file with the *.o extension. Note that if you have been following along, you will be performing this step to `hello2.s`, and not `hello2.c`.

example.

$ clang -c hello2.s

If we view the resulting file, hello2.o, using vi the output will resemble this:

$ vi hello2.s

^?ELF^A^A^A^C^@^@^@^@^@^@^@^@^A^@^C^@^A^@^@^@^@^@^@^@^@^@^@^@�^@^@^@^@^@^@^@4^@^@^@^@^@(^@   ^@^G^@^@^@^@^@^@^@^@^@^@^@^@^@U~I�V~C�^T~KE^L~KM^H~M^U^@^@^@^@~M5^^^@^@^@~IM�~IE�~I^T$~It$^D������^@^@^@^@~IE�~I�~C�^T^]�^@Hello, world! My name is %s!   ^@Josh^@^@.rel.text^@.bss^@.note.GNU-stack^@.shstrtab^@.strtab^@.symtab^@.data^@.rodata.str1.1^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^E^@^@^@^A^@^@^@^F^@^@^@^@^@^@^@@^@^@^@;^@^@^@^@^@^@^@^@^@^@^@^P^@^@^@^@^@^@^@^A^@^@^@      ^@^@^@^@^@^@^@^@^@^@^@(^C^@^@^X^@^@^@^H^@^@^@^A^@^@^@^D^@^@^@^H^@^@^@:^@^@^@^A^@^@^@^C^@^@^@^@^@^@^@|^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^D^@^@^@^@^@^@^@^K^@^@^@^H^@^@^@^C^@^@^@^@^@^@^@|^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^D^@^@^@^@^@^@^@@^@^@^@^A^@^@^@2^@^@^@^@^@^@^@|^@^@^@#^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^A^@^@^@^P^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@~_^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@ ^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@~_^@^@^@O^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@2^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@~@^B^@^@~P^@^@^@     ^@^@^@^G^@^@^@^D^@^@^@^P^@^@^@*^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@^P^C^@^@^V^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^D^@��^@^@^@^@^@^@^@^@^@^@^@^@^C^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^D^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^E^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^F^@   ^@^@^@^@^@^@^@;^@^@^@^R^@^A^@^O^@^@^@^@^@^@^@^@^@^@^@^P^@^@^@^@hello2.c^@main^@printf^@^@^@^O^@^@^@^A^E^@^@^U^@^@^@^A^E^@^@'^@^@^@^B^H^@^@  ~                                                                                 ~                                                                                 
 -- VISUAL --                                                  4,80-139      All

-------------------------------------------------------

Note that the Assembler transforms assembly code into code that is readable to the CPU; but not easily readable to human beings.

Translating assembly into machine/object code is not a difficult task. Lines in the assembly code we reviewed, such as "pushl %ebp" correspond to simplistic hexadecimal values, which themselves are easily translated to binary. With a straight-forward translation chart, converting assembly into machine/object code is so easy a human could do it! This ease of process is part of the reason why Assembling is considered a distinct and different process than compiling.

LINKING

Linking is the fourth and final step of the clang compile process. Linking combines a multitude of object files into one big file that you can actually execute. Linking is very system-dependent, so the easiest way to link object files together is to call clang on all of the different files that you wish to link together. If you specify *.o files, than clang will know that it can skip Preprocessing, Compiling and Assembling and get straight to the Linking.

Let us return to the example application that we have been using throughout this tutorial; but this time, we will be making a few small changes to our source code. First, we will add an additional Preprocessor Directive to include the file <math.h>, which we will be linking to our application. math.h is a C library for math functions - it is often included with C installation, but unlike stdio.h, it is also often unlinked by default.

We don't just want to link math.h though - we want to make sure that we can actually use the math.h in our application! So, we will make our program perform a simple math problem and print the result to the screen along with the hello world statement.

example.
#include <stdio.h>
#include <math.h> //this is our new directive

    #define NAME "Josh"   
    int main(int argc, char *argv[])   
    {    
        printf("Hello, world! My name is %s! 3 to the 3rd power is %f \n", NAME,     
        pow(3, 3)); //here we have included our math problem   
    }

Looking back at the beginning of our tutorial, we can see that my very first version of this application was called `hello.c`. I have made these changes above to this original `hello.c` file. I then skip ahead to using `clang -c` to parse my assembly application to machine code; but this time, I am using `hello.c` and not `hello2.s`.

example.

$ clang -c hello.c

As a result of this operation, I now have an object file `hello.o`. I should be able to run clang on this object file one last time and have my application all ready to go. Right?

example.
$ clang hello.o
hello.o: In function `main':
hello.c:(.text+0x27): undefined reference to `pow'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Let's examine this error message a bit to figure out what I did wrong.

The first part of the error reads: "In function `main'". Looking at my source code, I can tell that my program only has one function. So that doesn't help me at all!

Let's go to the next line: "(.text+0x27): undefined reference to `pow'". The first part, "(.text+0x27):", is pretty cryptic. We don't really know what that means yet. 0x27 looks like a hexadecimal character though - in decimal, 0x27 is represented by the number 39, while in binary it is 100111. If we look back at the assembly code of our first version of the hello application, we can also see that .text seems to be relevant to the assembly language:

.file "hello2.c"
.text
.globl main

Still, the second part of the error message is more helpful to us since we still aren't all that familiar with assembly code. "undefined reference to `pow'" tells us that something went wrong with our new math function.

The last line "error: linker command failed[...]" reminds us that clang was running the linker when the error was generated. If we put this together with the information we gathered from the previous line, we have a pretty good idea of what just happened - the compiler didn't know what to do with our math problem, `pow`, because we did something wrong involving the linker.

BINGO! That's it. I forgot to LINK IN the math.h library correctly. As we touched on before, some C libraries are linked in by default, like stdio.h, and others like math.h may be included with C but are not linked in. Adding the Preprocessor Directive copies the function definitions contained in the math.h header files, but it does nothing to tell the compiler where to locate the math librarie's object files. To do this, we use `clang -l` (FYI thats a lower-case "L"), like so:

example.

$ clang hello.o -lm

Wait a minute, didn't I say that we use clang -l? What's all this about -lm? Well, some libraries can be called using an abbreviation for the library. For the math library, the abbreviation is simply "m". So instead of running `clang hello.o -lmath` we run `clang hello.o -lm` and call it a day.

Successfully completing the link of the math library will finish the compilation process, leaving us with a newly created fully executable binary application named "a.out". I am using a Fedora Linux virtual machine, so to run my executable I use "./" - if you are using Windows, you should be running your executable using the Command Prompt (cmd.exe), or perhaps a shell included with Cygwin or MinGW if you went that route.

example.

$ ./a.out Hello, world! My name is Josh! 3 to the 3rd power is 27.000000

Success!

If we wanted to link a number of object files that we have created ourselves, we would specify each object file directly, like so:

example.

$ clang hello.o hello2.o hello3.o [...]

So there are two distinct ways to execute linking using clang. One, we can use the "-l" flag. And two, we can simply call related object files from the command line as illustrated above. There is no practical difference in these two methodologies; but using -l can be more convenient when linking in C libraries that are distributed with the language, so that you can use abbreviated syntax to call the library as we did with "-lm".

There are a couple of important provisos here, however. Only ONE of these object files can specify a `main` function. The `main` function tells the compiler where to begin executing the code. If it exists in multiple places, compilation will fail, and rightfully so.

Also, "-l" flags must be parsed at the END of the clang command.

Which brings us to the end of today's tutorial. As always, if you have any questions or concerns, leave a comment below, reach me on Twitter or Google+, or find additional options on the Contact Page for this website.

Happy coding!

P.S. I can't stress enough that I deserve absolutely no credit for this tutorial - it is based almost *entirely* from my notes from Harvard. If you like what you see - give props to David Malan and the large team of students (course Heads, TFs, etc) responsible for producing the Intro CS program at Harvard.