Month: August 2015

MIPS compiler support completed

For the last three weeks, we have been working towards completing the MIPS target architecture library. Finally this week, all the IL to ASM conversions have been added to the compiler. The implementations have been tested and we can confirm that we have compiler support for the MIPS32 processor architecture. So how did the testing go?

Testing

Some operations are fairly simple to test along the way, such as Add, Sub, Mul, Div, Shl etc., i.e. operations that use value types, since all we need to check is the result produced by a particular IL operation. The correctness of these operations can easily be verified without the need for a complete library. This is not the case for more complex operations, such as the ones related to objects, arrays and strings, i.e. reference types.

In order to check whether reference types function correctly, many different IL operations have to be implemented (NewObj, InitObj, Ldobj, Isinst etc), so we did not start testing these until the whole library was completed. The functionality of objects can be tested by creating a test class with some methods and fields within it. If these types initialise as expected then we can try to create a new instance of the class. If all these have been a success then we can test whether values passed to the class from outside return to the calling environment correctly. The testing of arrays and strings is a little more straight forward. In order to test arrays and strings, we can try to declare a new instance of these types and apply some basic array or string related manipulation to them and inspect the results.

During the testing process, some issues were uncovered. These issues were mainly related to the differences between the x86 and the MIPS architectures. On MIPS, only values contained in registers can be pushed onto the stack, while x86 lets the programmer push labels and immediate values directly, this is not available on MIPS. This is due to the CISC vs RISC architecture differences between x86 and MIPS. On MIPS, if we want to push a label onto the stack, we must first load the address of that label into a register, using the La instruction and then push that value from the register onto the stack. The same is true for immediate values; first we must move the value into a register then finally push that register onto the stack.

In many cases it is required to push the value zero onto the stack, but to do this we do not need to move zero to a general purpose register because MIPS has a $zero register which is fixed to the value zero. On MIPS there isn’t a Push or Pop assembly instruction as on x86, we need to simulate these operations with store/load instructions followed by adding to the stack pointer register using the Addi instruction.

There is another important thing to keep in mind when programming on MIPS. Data stored in memory must be aligned in such a way that an object’s address must start at a value that is some multiple of their own size in bytes, e.g. a 4-byte word must be 4-byte aligned but a single byte needs no alignment (a.k.a. single byte alignment). So if we want to load from/store to memory using an offset, we need to either make sure the data is aligned or access the data in multiple parts to avoid the alignment issues.

We never need to worry about this when accessing the stack through $sp or $fp since the compiler guarantees these are always 4-byte aligned. For other memory accesses, the FlingOS Compiler does not require memory to be aligned (for compatibility with other architectures). Instead, it uses runtime checks to load memory from or store memory to unaligned addresses. This means C# code can be written more generically, with less for the programmer to worry about but does create a small performance hit at runtime.

What’s next?

What we would like to do during the remainder of this summer is to add a unit testing framework to the project that formalises test cases for current and possible future architectures. This way we can ensure that the current supports for x86 and MIPS are stable and also, if the project is to be extended with a new target architecture library, then implementing that library can be done in an efficient way which is easily testable.

Thank you for reading this and if you are interested in the upcoming Launch Event then please subscribe on the FlingOS homepage.

See you soon…

Roland

Adding 64-bit values using 32-bit registers

Last week, I published a post discussing how to left shift 64-bit values using the available 32-bit sized registers on MIPS. This week, we thought I should do something similar which is to provide you with another article that describes a technique that solves another interesting problem. This time I will discuss how to add two 64-bit values. This may not seem difficult at first, however we must remember that MIPS processors do not have carry flags. We need the carry flag (or something equivalent) to preserve the result of an intermediate computational step. What we have to do is somehow simulate ‘Add with carry’ without the carry to allow for the addition of 64-bit integers using 32-bit registers.

Adding two 32-bit values is trivial since the operands and the result can be contained in single registers, so here we can just use a simple ‘add’ instruction, therefore carry is not needed. The sizes of the operands for the addition instruction are either 32 or 64 bits but in order to perform the addition, the sizes must match. An exception should be raised if the sizes are not the same. Before I discuss the method for adding two 64-bit values, let’s remind ourselves how signed and unsigned values are represented in 32-bit binary. The sign extension does not have an effect on the result of the computation due to the way two’s complement works, I only include it for completeness.

In the case of the unsigned values, the number of possible integer values goes from 0 to 2³² and they are all positive, however if we want to represent signed integers in 32 bits then we need to use one bit, the most significant bit, to represent the sign, so we can represent both positive and negative integers in the range from 0 to 2³¹. A negative value is derived such that you take a positive integer then you apply the method of two’s complement to it. Let’s see an example how to convert a positive number to its negative equivalent in 8-bit binary representation.

Two’s complement:

0000 1010   # = 10
1111 0101   # invert bits
0000 0001   # add 1
1111 0110   # = -10

First you take the integer as a bit string then you invert all the bits and finally you add one to the result. Two’s complement can also be used to convert a negative integer to positive.

Adding 64-bit operands

The two 64-bit integer operands have to be stored in two registers each. The first operand is stored as $t1:$t0, while the second operand is loaded as $t3:$t2.

1. Add low bytes

The first step of this algorithm is to add the two low bytes into $t4 which we will use as a temporary storage for this partial sum.

addu $t4, $t0, $t2

We use the unsigned operation because the low-bytes do not have a sign-bit as the upper bit! We also need a way to know if there is a carry from this partial calculation. For example if we add the following two binary numbers we get a result that cannot be represented in a single register (here I assume that the registers are 4 bits in size), i.e. there is a carry bit.

1111 + 0001 = 10000

What we can do here is to store the carry bit (the 5^th bit if you like) in a temporary register in the next step.

2. Simulate the carry flag

At this point we are going to use the ‘sltu’ instruction to store the carry bit in $t5.

sltu $t5, $t4, $t0

(Alternatively: sltu $t5, $t4, $t2 : either will do)

Here the ‘sltu’ instruction compares $t4 (sum of low bytes) with either of the individual set of low bytes ($t0 or $t2). The computation generated by the ‘sltu’ instruction goes like this. If $t4 < $t0 or if $t4 < $t2 then $t5 = 1, else $t5 = 0. It does not actually matter which set of low bytes we are using for the comparison because $t4 will always be smaller than any of the individual low bytes if there is a carry. But let me show you with some examples:

$t0 = 0000
$t2 = 0000
$t4 = 0000 $t4 = $t0 = $t2 -> $t5 = 0

Here the computation does not generate a carry. We can see that $t5 has been set to 0 as required since $t4 is not less than either $t0 or $t2.

$t0 = 0000
$t2 = 1111
$t4 = 1111 $t4 > $t0, $t4 = $t2 -> $t5 = 0

There isn’t any carry in this example either because $t4 is not less than either $t0 or $t2.

$t0 = 0011
$t2 = 0110
$t4 = 1001 $t4 > $t0, $t4 > $t2 -> $t5 = 0

No carry because $t4 is not less than $t0 or $t2.

$t0 = 0001
$t2 = 1111
$t4 =(1)0000 $t4 < $t0, $t4 < $t2 -> $t5 = 1

Now, in this example we do generate a carry, so let’s see if the method works. Remembering that $t4 is (for these examples) 4-bits in size, so the 5-th bit (the ‘1’) is lost: $t4 = 0000 because the carry bit is lost so, so $t4 < $t0 and $t4 < $t2 therefore $t5 is set to 1.

$t0 = 1111
$t2 = 1111
$t4 =(1)1110 $t4 < $t0, $t4 < $t2 -> $t5 = 1

The last example works for the same reason as the previous one did.

Using my diagram above to visualise how numbers work, it is possible to see that no two numbers added together can ever wrap round over themselves, so the result will only ever be greater than either of the operands if there was no carry. The number system forms a cycle.

3. Add high bytes with carry

Now, we need to add the two high bytes ($t1 and $t3) plus any carry that is left from the addition of the low bytes and store the result in $t1.

addu $t5, $t5, $t1 # $t5 = carry + $t1(high bytes of operand 1)
addu $t1, $t5, $t3 # $t1 = $t5 + $t3(high bytes of operand 2)

Finally we move the sum of the low bytes into $t0.
move $t0, $t4 # move partial sum to $t0

At the end of the process we have a result as a 64-bit value stored as $t1:$t0.

I hope you found this article helpful. See you soon…

Roland

Left shifting 64-bit values using 32-bit registers

Yesterday, while working on the implementation of the left shift IL operation (shl) for MIPS, we came across some challenges related to shifting 64-bit values using 32-bit registers. The solution is a bit tricky, so we thought it would be useful if I produced an article discussing how it can be done.

In order to shift a string of bits, we need two operands; the data itself and a value specifying the distance we want the data to be shifted. The shift value can either be a constant or a value stored in a register. Although in this article we assume that the values are in registers, the same technique can be adapted to be used with the assembly instructions that use constant values. There are four different cases we need to consider in terms of operand sizes:

4-byte data, 4-byte shift value,
4-byte data, 8-byte shift value (this is unsupported by C#),
8-byte data, 4-byte shift value,
8-byte data, 8-byte shift value.

The 4-8 case is unsupported by C#, so in this scenario the compiler throws an exception. The 4-4 case uses a single left shift assembly instruction (sllv) so there isn’t much explanation needed there.

8 – 4 case

In the 8-4 case, the 8-byte sized data is shifted by a value that is represented by a 4-byte binary number. Since we only have 32-bit (4-byte) registers, we need to store the data in two separate registers. One register contains the low bytes ($t0) while the other contains the high bytes ($t1). In this example, I store the shift value in $t2.

Consider a 1-byte left shift, i.e. $t2 contains the value of 8. Now we have a problem because the top byte of $t0 must replace the bottom byte of $t1 as well as the rest of the data must be shifted correctly. So how do we shift an 8-byte value using 4-byte registers? I will show you how by going through some examples. There are two variations of the 8-4 case; one where $t2 < 32 and another where $t2 >= 32.

The method ($t2 < 32)

Let’s say we want to left shift this data by 1 byte (1 byte can be represented by two hexadecimal digits):

Somehow we want the result to end up looking like this:

1. Left shift high bytes by $t2

First, we want to left shift the high 4 bytes ($t1) of the original data by the value carried by $t2, which is 1-byte. The low 4 bytes remain unchanged for now. So we have:

2. Right shift low bytes by (32 – $t2) into temporary

The next step is to logical right shift (not arithmetic right shift!) $t0 by (32 – $t2) into a register which we will use as a temporary storage, say $t4. Since $t2’s value in bits is 8, we right shift $t0 by 24 into $t4. This way we get the proportion of $t0 which we then want to copy into $t1. $t0 and $t1 remain unchanged at this point.

3. OR temporary with high bytes

Here we can combine $t1 and $t4 using the logical OR operation to get the correct result for the high 4 bytes which we store back to $t1.

4. Left shift low bytes by $t2

There is only one thing to do, left shifting the low 4 bytes ($t0) by $t2.

Now the algorithm is complete. If we compare this result with the desired result above, we can see that they are identical.

The method ($t2 >= 32)

If $t2 >= 32 then we are left shifting the data by 32 or more bits which means that the least significant bit (little endian) of the data is pushed beyond the low bytes into the high bytes. In all cases where the shift value is greater than or equal to 32, $t0 ends up filled with zeros and the content of $t1 are lost completely. But in what form does $t0 take over $t1? Let me show you step-by-step as before. Now let’s assume that we are left shifting by 40 bits and we have the same original data as before.

But this time we want the result to be this:

1. Move low bytes into high bytes

We copy the contents of the low bytes into the high bytes. We do this to save the contents of $t0 into $t1. The original data becomes:

2. Fill low bytes with zeros

Since the data is pushed all the way beyond the low bytes, we can fill the $t0 with zeros.

3. Left shift high bytes by ($t2 – 32)

The final step is to left shift $t1 by ($t2 – 32) which is 8 in our case. So the desired result is achieved as expected.

8 – 8 case

In the 8-8 case, both the data and the shift values are 8 bytes in size. There is one important observation we must make; shifting a 64-bit value by 64 bits or more is pointless since the result will always be zero. We also know that the number 64 is represented by this binary number: 0b0100 0000 which can easily be contained in the low bytes of the shift value ($t2). Actually any non-zero value beyond the 6^th bit would yield a zero result but let’s just consider 4 bytes to be the smallest size we can manage.

To conclude, if $t3 is non-zero then the result of the left shift will definitely be zero, while if $t3 is zero then we can proceed the same way as in the 8-4 case by simply ignoring $t3.

I hope you found this article helpful. Please leave a comment if you think I missed something or if you have anything to add and I will do my best to respond.

See you around.

Roland

World first: C# kernel running on MIPS

Exciting news today as we announce our working kernel for MIPS based on the Creator CI20 board.

To the best of our knowledge, this is the first time in the world that anyone has got a C# kernel or operating system (of any form) working on a MIPS processor.

How did we do it?

We started the week by setting up the environment in which we were able to send a kernel binary file to the CI20. To see how this setup process is done, please read Ed’s post published earlier this week. After the connections were established we started to implement the necessary IL operations for the MIPS architecture to get the test kernel working.

The conversion of IL implementations from x86 to MIPS has been relatively smooth, except for the fact that in the MIPS architecture data stored in the memory must be half- or full-word aligned. This caused some initial headaches, however, Ed was determined to get to the bottom of the issue and in little time the problem was solved.

Other differences that required a lot of study and thought were related to the assembly code syntax for MIPS (GNU assembler – GAS) and the instructions available (reduced instruction set for MIPS). Although MIPS is a RISC architecture, meaning that more instructions must be used to perform the equivalent computation compared to x86, there are 16 general purpose registers available (as opposed to 4 on the x86) which makes implementation much easier. Furthermore, the programmer can also make use of pseudo-instructions which speed up implementation.

What does it do?

So what does the test kernel actually do? The answer is both not that much and quite a lot.

Although the functionality of the kernel is limited to changing the colour of the on-board LED and reading/writing characters from/to the UART ports, the fact that the kernel does that much proves that the FlingOS compiler is in a stable and solid state. It is a great proof of concept and initial step from which the new MIPS kernel can progress.

This has been a very exciting week and we are looking forward to completing the target architecture library and expanding the test kernel. In a few weeks time we will have a fully functioning IL compiler for MIPS.

Can I try it?

We’ll be releasing a compiler package and stable copy of the test kernel in the next month. We’d like to expand our compiler and proof-of-concept, test kernel a bit before releasing it to the wild.

Keep an eye on this space! Please ask questions below.

Ed and Roland

Stage 2 : Boot a custom OS

Earlier this week I tweeted that “Stage1: Boot a different OS – complete” meaning that I had successfully booted an alternative OS on the Creator CI20s (which were kindly provided by our sponsor Imagination Technologies^®. Well, today I succeeded in booting a very basic custom operating system on the CI20s, so here’s how I did it.

For starters, I downloaded and installed the current compiler toolchain for Windows – Sourcey Codebench for MIPS available here. I installed mine to a non-standard directory but it works just fine. We’ll come on to how to use it later.

I also downloaded and install Putty and Serva – both of which are necessary tools. Putty provides a console interface to the serial connection required to talk to the CI20’s U-Boot bootloader. Serva provides an easy way to set up a TFTP server on Windows. Both of these tools are free. Again, we’ll come on to how to use them later.

Lastly, I had to buy one small (cheap) bit of extra hardware – a USB to UART converter. Be aware that there are two chips widely used to produce these devices. One of them only supports Linux and version of Windows 7 and earlier. The other chip supports Linux and all current versions of Windows – so make sure you get the right one! I bought one for Windows 8.1 (i.e. the second type of chip) from Amazon (with one day delivery on a Sunday no less!). I ordered from 3C4u who use Amazon. (The first time I tried to order two of these the package never arrived. I re-ordered and they were delivered fine. Amazon gave me a refund and Prime subscription extension for the first order so I’m not complaining too much!)

For the custom OS I wanted to try out something which I knew worked. So I went online and found Lardcave.net’s great series of tutorials on writing a custom CI20 OS. I cloned the Git repo and started following the instructions. For the USB to TTL chip I have, the tutorial is correct, you need to connect RXD on the converter to TXD on the CI20, and visa-versa for TXD on the converter and RXD on the CI20. To avoid having to use the power cable, you can also attach the 5V pin to abny of the CI20’s 5V_IN pins on the primary expansion header. The board will power on as soon as you connect the 5V pin so hold off until later for that! You’ll also need to connect an Ethernet cable to the same hub or switch your PC/laptop/WiFi hub is connected to – this will be so the CI20 can connect to the TFTP server.

After cloning the Git repo I ran across a few issues. The Lardcave.net tutorials were written for Max/Linux and for if you compile GCC yourself. Since I installed Sourcery CodeBench , the Makefile was not set up to compile properly. I eventually worked out how to it so it works properly. Here is a copy of my make file:

AS=mips-linux-gnu-as -mips32
CC=mips-linux-gnu-gcc
LD=mips-linux-gnu-ld
OBJCOPY=mips-linux-gnu-objcopy
CFLAGS=-Os

OBJS=start.o main.o

hello.bin: hello.elf
 $(OBJCOPY) -O binary $< $@

hello.elf: $(OBJS)
 $(LD) -EL -T linker.lds -o $@ $+

%.o: %.[Sc]
 $(CC) $(CFLAGS) -EL -c -o $@ $<

clean:
 rm -f *.o *.elf *.bin

I combined this with a simple batch script called build.bat in the same directory as the Makefile which allows me to specify the path to Sourcery CodeBench’s bin folder, instead of other version of GCC which I have installed. The batch script looked like this:

@echo off
SET BD=C:\Users\Ed\Documents\Coding\C\2015\MIPS\Compiler\bin
%BD%\cs-make
pause
@echo on

Where BD is set to the path to Sourcery CodeBench’s “bin” folder.

Having compiled the “hello.bin” file I proceeded to set up Putty and the TFTP server. Here are a series of images I took showing the process. By the time I was done entering the commands into Putty (also shown below), the CI20 showed the nice, purple LED as expected.

2015-08-02 - Putty Config — Putty configuration

For the Putty configuration shown above, remember to update the COM port name to the name of the COM port device on your computer. This can be found by opening Device Manager and looking under the Ports node of the tree.

For the Serva configuration shown above, remember to update the “TFTP Server root directory” to the same folder as your “hello.bin” file is in.

2015-08-02 - Putty Console — Putty console

For the commands to U-Boot, remember to replace the serverip “192.168.0.6” and the ipaddr “192.168.0.20” with values for your network. The IP should be the IP address of the computer which is running Serva (which can be looked up by doing “ipconfig /all” in a Windows command prompt on the server computer). The “ipaddr” can be any value you like but the first three parts must match the IP address of your server.

If the boot completes successfully, you should see a Serva log similar to the one above. The LED on the CI20 should turn purple as shown below.