Introduction to PPC assembly
[HOME]


Preface

The following notes show how to start with a simple C code and, step by step, end with assembly code

Indeed when I started to write these notes I had a PPC 7457. Today I have some 7410. This is not a big issue: the code that follows could be built with most of the available compiler, linker and assmbler. Eventually few changes should be applied to the code. However, for the sake of generality, I'm trying not make an explicit reference to any specific compiler, assembler or linker.

By the way, the above mentioned processors are just derivatives of some other processor. The three main progenitors referenced in this document are the MPC750 (example of derivatives are MPC740 and MPC755), MPC7400 (example of derivative is the MPC7410), and MPC7450 (example of derivative is the MPC7457).


STEP 1: A Plain C Example in one module

Let's start with a simple example in C



Now I can build the code



I used ${CC} so everyone can place its own.
Running the application the output is



STEP 2: A Plain C Example in two modules

Now we split the code into two modules. In the first one I place the following


While in the second module I place


Building the application



and running the application the result doesn't change. What's worth to point out is that now there is a code that can be written in assembly leaving the main module in plain C.



STEP 3: A Mixed C-Assembly Example in two modules

I will leave the main module of the previous example in plain C. For the dosomething() function I will create a dosomething.s (or .asm) file with the following code



I will return on the meaning of the code in a while. For now, it is important to understand that this code can be built and it works in the same way.



Now it's time to understand what we wrote, and which is the difference of the executable of the sencon step and this last one.


STEP 4: Understanding the differences

Let me start first with test02.ppc, which is the test in two plain C modules. I'm going to use a debugger tools, let me say ${GDB}, which can be any gdb-like tools.

The command is something like;



After some version and copyright information we are ready to disassmbly our function via the following command:



Then we do the same for the one written in Assembly.



Just a matter of registers used, but the code is the same. It's interesting noting that however the executable sizes are different. I will return later back on this fact. At the moment, we need to understand the code.


EABI

To understand how to use the registers we need to keep in mind that we are dealing with a RISC architecture. Hence we have a lot of registers available. To be able to write portable code, there is the need of a convention for register usage, parameter passing, stack organization, small data areas, and other things. This set of conventions is known as Embedded Application Binary Interface (EABI)

First of all, let's see which data types are available


Data Types




Register Usage


There are mainly two classes of registers: volatile registers and nonvolatile ones. Volatile register don't have to be preserved across function calls, while nonvolatile registers should be preserved. Amongs the nonvolatile registers there is a set of dedicated registers.

The above classes are applicable to the next kind of registers: in fact, we have 32 general pourpose registers, GPRs and 32 floating point registers, FPRs. Moreover there are also special purpose registers,(LR,CTR,XER), conditional CRs registers, and floating point status and control registers FPSCR. All of them are 32 bit with the exception of the floating point (64 bit), each of the CR (4 bit), and some of the special purpose register (32 / 64 depending on the implementation)

The following table is referred to the EABI, but care must be taken because there are also other ABI interfaces. For instance, IBM has defined three ABIs for the PowerPC architecture (AIX ABI for big-endian 32-bit PowerPC processors which is nearly the same as the PowerOpen ABI, Windows NT, Workplace ABIs for little-endian 32-bit PowerPC processors). Other ABIs have been defined for other Operating Systems.



All the others are volatile registers.


Stack Frame


There is no push/pop instruction for the stack. Each function calling another function (i.e. is not a leaf function) or that is going to modify a nonvolatile register should create a stack frame from memory. The stack frame is created by a function's prologue code and destroyed in its epilogue code. An example of function's prologue could be the following one



And here its epilogue


Another Example

Letís see how the next C function, which swaps two floating points, and where pointers to float are passed as arguments, is resolved in Assembly.



Looking at the assembly code generated by the compiler for this plain C function we have



Now I can build a two modules application to run the above function. It's enough a main that build an array of floating points to be swapped. The main should take the time spent for swapping a fixed amount of floating point data. Doing so, I can than compare the times for the same swapping function but written in assembly.

The main will have an extern declaration for the swapping function and a prototype for a function needed to show the time.



Than a main body to create two arrays to be swapped.



Running the application the output is



Now just replacing the function performing the swap with and following one coded in assembly



Running this latter application, the output is



Again the same results and with the same timing. Now I try to reduce the code just to see if I can speed up the code execution. The following aseembly code is a quick and dirty version for the same function.



Indded the only thing I did is to remove the copy of the arguments, i.e. the floating point pointers, in the registers r8 and r9. I did it just to see some better performance. Here the results.


Notes

There are some notes it is worth to mention here. First the name of the registers could be different depending on the Assembler used. For instance it is possible to find out fp or fr as prefix for the floating point registers. To find out how your compiler and assembler works, a quick way is to build a simple function in C and then to have a look at the disassembled code using gdb or any other debugger tools. By the way, remember that assmebly is a language while assembler is the tool that translate assembly code in machine code.

A note just for those who where used to work with the Motorola 68K. Now it's more like the Intel style: first comes the destination and then the source.

A note about the optimization. In the processor there are independednt units. For instance the Integer units and the floating point units are independent. In case you have some code such as



you can do some optimization considering that there is a part of the code (the first 4 lines) that is handling floating point values while the last free lines are working on general purpose registers. Moreover the results from the first 4 is not needed to the other lines. Having a floating point unit working independently from the integer unit, I can mix the line to make the two units working in parallel. In this way I speed up the code.



...TO BE FINISHED: WORK IN PROGRESS...



Introduction to PPC assembly
[HOME]


D. Allegri

March 2006  Ed. 1.0