Reviewing Computer System

date

Nov 12, 2023

slug

reviewing-computer-system

author

status

Public

Data and number system (binary, hex, decimal)

ANALOGUE AND DIGITAL

Analogue - Data that represented on a continuous scale (real numbers)

e.g. voltage, height, distance, amount of magnetisation on audio tapes.

Good for real world measurements, transients (spikes), transistors.

Digital - Data that is represented as whole numbers (count)

e.g. class size, coins in your pocket, sound samples on CD recordings,

Good for noise rejection, computers

DIGITAL ADVANTAGES

Digital systems are easier to:

design and modify

store data

maintain accuracy and precision

program operations

to protect from noise

to create on an IC chip

However the real world is mostly analogue

Translation is inaccurate.

The more bits the more precise.

DIGITAL CODE - 1

Data in computer is stored digitally as 0s and 1s in a binary code.

The 1s and 0s are represented in many ways:

by a voltage e.g. 0 or 5 volts (actually we don't use exact voltages) eg < 0.8 volts == 0 > 2.4 volts == 1 (TTL)

by the absence or presence of a pit (CD)

by the absence or presence of a magnetic field (disks)

Why binary?

Simple, robust, universal laws (Shannon-Hartley), flexible

c.f. sending decimal using voltages.... long cables... interference.

DIGITAL CODE - 2

groups of bits can represent numbers, characters, computer instructions.

How we interpret a 32-bit word depends on what we expect! (Exercise)

Hardware has to be able to manipulate groups of bits differently depending on what they represent.

Amazingly, we can use combinations of Gates to do this

8-bit example

Possible interpretations:

Character ‘1/2’ assuming extended ASCII/ISO

The number 171

The number -85

NUMBER SYSTEMS

We use a "positional number system" #big-endian

in decimal numbers, the further left a digit is, the greater the power of 10 it is multiplied by.
e.g., 348 = 3 x 10^2 + 4 x 10^1 + 8 x 10^0

This system applied to other Radix (or Bases)

eg. Decimal radix = 10 digits 0..9
Binary radix = 2 digits 0..1
Octal radix = 8 digits 0..7
Hexadecimal radix = 16 digits 0..F

Binary numbers

Computers work in binary number

Convert to decimal exercise:

notation: in maths use 01110_2 to indicate a binary number

some systems use other ways such as %01110 or b01110

counting table

Decimal	Binary
0	0000
1	0001
2	0010
3	0011
4	0100
5	0101
6	0110
7	0111
8	1000
9	1001
10	1010
11	1011
12	1100
13	1101
14	1110
15	1111
16	10000
17	10001

Note: In binary, each digit represents an increasing power of 2, starting from the rightmost digit which is $ 2^0 $. The leftmost digit filled in for 16 and 17 represents $ 2^4 $, or 16 in decimal, which is why we move to a new column for these numbers.

HEXADECIMAL NUMBERS.

Hex is another abbreviation for binary numbers.

Each byte represented by 2 hex digits; each hex digit represents 4 bits in an 8-bit byte.

Hex to binary can be done one digit at a time: e.g., what is

in binary?

It then shows the conversion of hexadecimal 4E to its decimal equivalent:

HEXADECIMAL NUMBERS

0=0, 1=1, 2=2, 3=3, 4=4, 5=5, 6=6, 7=7, 8=8, 9=9

Memorise: A=10, B=11, C=12, D=13, E=14, F=15

mathematically use

to indicate a hexadecimal number

some systems use other ways such as 0x1110 or 01110h

Or sometimes hex is default (not decimal)

leading 0x means hex

HEX - BINARY (4 BITS TO A HEX DIGIT)

BIT-WISE OPERATIONS

Can base decisions on a single bit #button state

e.g. if (bit represents True) then...
Usually 1 means True, 0 False.

Unary operation: NOT

Complement ("invert", "flip") the value

Binary* operations: AND, NAND, OR, NOR, XOR

N-bit operations: e.g. 4-input AND, OR

All can be implemented using binary building blocks ("2-input gates")

THE GATE SYMBOLS REPRESENT BOOLEAN ARITHMETIC:

The symbol for AND gate followed by "> 1 AND 1" with the truth table "1.1=1, 1.0=0, 0.0=0, 0.1=0" and a red dot followed by "AND, &"

The symbol for OR gate followed by "> 1 OR 1" with the truth table "1+1=1, 1+0=1, 0+0=0, 0+1=1" and a red plus sign followed by "OR, |"

The symbol for XOR gate followed by "> 1 XOR 1" with the truth table "1^1=0, 1^0=1, 0^0=0, 0^1=1" and a red circled plus sign followed by "XOR, ^"

INPUTS, OUTPUTS

Wires (just drag from one gate pin to another) can connect pins on the gates to each other and to switches and lights.

The power supply is universal and invisible. We don't have to worry about it.

In real life there will be a +ve rail (Vcc) and a -ve rail (GND) with a connection of each to every chip on the board.

COMBINATIONAL CIRCUIT

A combinational circuit is a connected arrangement of logic gates with the set of inputs and outputs.

A combinational circuit can be described by a truth table showing the binary relationship between n input variables and m output variables.

Two examples: half adder, full adder

These circuits serve as basic building blocks for the construction of more complicated arithmetic circuits.

HALF-ADDER

The most basic arithmetic circuit is the addition of two binary digits.

A combinational circuit that performs the arithmetic addition of two bits is called a half-adder.

It consists of two inputs and two outputs:

Two inputs get summed
Two outputs: sum and carry bits

It is necessary to have two output bits because the sum 1 + 1 is binary 10, which has two digits.

FULL-ADDER

A full-adder is a combination circuit that forms the arithmetic sum of three input bits.

It consists of three inputs and two outputs.

Two of the inputs represent the two bits to be added.

The third input represents the carry from the previous lower significant position.

One of the outputs will be the carry into the next higher significant position, i.e. into the next full adder.

Of course two output bits are necessary because the arithmetic sum of three binary digits ranges from 0 to 3, and 2 and 3 need two bits.

Gates are not instantaneous

The image contains the following text:

Ci and the output of X XOR Y arrive at different times!!!

The circuit is unstable.

If only we could synchronise things !!

Clock

Could be something simple like a 555 timer (astable multivibrator using an RC timing element), a crystal oscillator, a phase-locked loop or an atomic clock. Probably just a chip

Clock feeds into the ALU

Choose which circuit forms the output

At the top, it says "(two logic controlled switches)"

There are two input labels "X" and "Y" connected to the inputs of the gates.

There is a label "Ci" circled in red, likely indicating the "Carry in" input.

The text near the bottom reads "AND output selected if Ci is low, OR output selected if Ci is high."

There is a label "Co" circled in blue, likely indicating the "Carry out" output.

Building a Computer? Things we need:

memory within the CPU = banks of flip-flops Instruction Pointer_ (counter) full adder + shift operations Building a computer? Things we need: Fast memory (RAM)

program and data Central processing unit (CPU): registers, IP, ALU Input / Output Channels accel cards Persistent memory (Disks) Motherboard Screen Keyboard Mouse

D-TYPE FLIP-FLOP

> Set data to high - Q goes high on next clock pulse. Stays high.

Set presets Q to high when pulled high

Reset clears Q when pulled high D Flip Flop clock data

D-FLIP-FLOPS AS A REGISTER OR LATCH

A register (many bits) or latch (usually one bit) can be made up from a series of D-Flip-Flops driven by a common clock. The transfer from the D side to the Q side for all D flip flops occurs simultaneously as this clock changes.

BIG ENDIAN AND LITTLE ENDIAN DATA FORMAT

> Technical definition for the order of bytes:

Big Endian:

The most significant byte resides in the smallest memory address

Little Endian:

The least significant byte resides in the smallest memory address

SUMMARY Registers

D Flip Flops generally form the building blocks of registers

Registers:

allow us to store bit strings that represent data

reside close to the CPU, allowing fast and easy access

> Bit Endianness determines how we interpret bit strings:

Big Endian: most significant bit comes first

Lile Endian: least signifcant bit comes frst

> Next Lecture: ripple counters with JK Flip Flops

SHIFT Registers

A shift register takes input from one end, and at each clock change this value is moved to the next D-Flip-Flop.

This is used in serial data transfer when a byte (say) of data sent on a cable one bit after another can be collected in a series of D Flip-Flops to rebuild the whole data byte. This is called serial-to-parallel conversion.

SHIFT REGISTERS Here is how a high travels through a 3 bit shift register. For this example we assume that each of the shift register bits is cleared at the start.

SHIFT LEFT - SHIFT RIGHT Registers

The shift registers shown so far shift data to the right. A simple rewiring gives a shift register that can move data left. Of course these may also have the ability to parallel load.

WHAT DOES A SHIFT REGISTER DO?

Moves a state (number such as 0, 1) from a low order bit to a higher order bit.

Multiplies a (binary) number by two.

Number of "shifts" depends on number of clock cycles.

Level counter (e.g. volume control)

Can use a counter to enable/disable clock, thereby programming the amount of shift.

Can shift in the other direction - divide numbers by 2.

Can have two clocks - one for left-shift, one for right-shift or use gates to determine the shift direction

SERIAL-TO-PARALLEL CONVERSION

Some shift registers allow all flip-flops to load at once,

This gives parallel-to-serial conversion

RIGHT-SHIFT SERIAL OR PARALLEL INPUT: 4-BIT

Flip-flops are connected (output to input) with a common clock to cascade input from high bytes to low bytes.

WHAT ABOUT BI-DIRECTIONAL SHIFTING (SELECTABLE)?

This takes some thinking!

You need a pin to select which direction

You need to allow inputs to any D Flip Flop to come from either direction

Try and design a 2 bit selectable direction Shift Register!

Serial input only
You will probably need some OR gates, AND gates, and a NOT gate

THE LOGISIM SHIFT REGISTER

COUNTERS IN DIGITAL LOGIC

> A circuit that stores and increments the number of occurrences of an event:

Most commonly clock pulses

> Two types we will consider:

Asynchronous (ripple) counter

Synchronous (common clock) counter

> J-K Flip Flops central to both

RIPPlE COUNTER

Ripple counters utilise the toggle setting of J-K Flip Flops:

Consider this circuit

What happens to Q, each clock pulse ?:

Ripple counters utilise the toggle setting of J-K Flip Flops:

And finally, four FFS

Q, Q,, Q2, Q,?

> The ripple effect of FF toggles exactly matches binary counts:

each oscillates at half the frequency of the FF before

The bits stored in the FFs thus represent incrementing binary values

BUT - on which edge of the clock pulse?

COUNTING DOWN?

What if I want to count down instead of up?

Requires nothing more than a change of FF trigger:

Change FF to trigger on rising edge of clock pulse

Lets take a look

SUMMARY Ripple Counter

Ripple counter achieves a simple binary counter

It utilises the J-K toggle setting:

J= 1, K=1,

> It is asynchronous because:

No common clock

State changes ripple through from least significant FF

› We can change directionality of counter using FF trigger:

Falling edge = increment

Rising edge = decrement

MODULO 6 COUNTER

> What do we need to do ?

An incrementing counter: falling edge trigger - easy!

A forced reset of all Flip Flops to zero when the counter reaches 6:*

some extra logic gates to monitor things, and trigger a FF reset - not too tricky

Detect the illegal state (6)

Reset the FFs to wrap around to zero

MODULO 6 COUNTER WITH A MOMENTARY ILLEGAL STATE

Detecting the first illegal state (6 in this case) and immediately resetting to 0 (don't wait for the clock) by using the asynchronous master reset (MR) or CLR'

This circuit uses a cascading clock

Summary Modulo 6 counter

Counters don't have to wrap around at powers of 2

Can use outputs of FFs and approrpaite logic gates to detect illegal state"

Slight problem though:

Master reset is asynchronous
The illegal state momentarily still happens !

ALTERNATIVE - COUNTER WITH COMMON CLOCK

We can avoid the illegal state by detecting the last legal state (eg., 5 in a modulo 6 counter), and then set to 0 on next clock pulse.

This requires a non-cascading counter.

We need a common clock

Lets build it up from 1 to 3 bits .....

MOD 6 COUNTER WITH COMMON CLOCK

First a 1 bit counter

1-bit (counts 0...1...0...1...) Enable (Set)

Set J and K to make it toggle

2-BIT COUNTER

2-bit (counts 00...01...10...11...00...)

Connect Q1 to J2 and K2

Now Q2 only changes state if Q1 is set (halves the frequency).

but there is no Co (when state = 11)

Because we have a common clock

COUNTING WITH 3 BITS (LITTLE-ENDIAN)

The issue is that the gates are all using the same clock, so Q3 will toggle at 3, and then the counter will reset to O Try it, You will get the number sequence: 0, 1, 2, 7, 0...

> Add an AND gate to only toggle Q3 when both Q1 and Q2 are set. That's our Carry Out (Co).

NOW TO RESET WHEN IT GETS TO 6 (ILLEGAL STATE)...

When the output is 110 (Q1=0, Q2=1, Q3=1), use a 2-input AND gate to trigger the Reset on all of the Flip-Flops.

Issues The synchronous (common clock) counter

Currently the illegal state (6) still momentarily occurs

Solution:

add D-Flip-Flops as a buffer before displaying

> Adds a delay of 1 clock pulse

3 bit Common Clock - Little Endian - No illegal state

SUMMARY The synchronous (common clock) counter

> Common clock counter synchronises FF state transitions

One clock to rule them all!

> Requires explicit logic gates check FF transitions

Not quite as elegant as ripple counters!

> D FFs can buffer output for a clock pulse:

Allow circuit to stabilise

illegal state cannot "escape"

Building a computer? Things we need:

TYPES OF MEMORY - 1

ROM (read only memory): All reading at full speed. Just got the address and go there.

-contents built-in at time of manufacture.

PROM - Programmable read only memory. Programmed by a once-only irreversible operation, e.g. in factory.

EPROM - Erasable Programmable read only memory. Can be removed from the computer and erased and programmed (slowly) by using special apparatus (e.g. UV light).

Sometimes called "field-programmable", i.e. "in the field"
Bulk erased: every byte erased at the same time
Byte programmable: write bytes one by one.

EEPROM - Electrically erasable read only memory. Can be erased and reprogrammed byte by byte in situ, but writing is slower than normal reading.

TYPES OF MEMORY - 2

RAM random access memory.

A misnomer it should be RWM (read write memory) - both ROM and RAM

allow random access

Static RAM - retains information until power removed. Fast, larger area of silicon per byte, modest power requirement.

Dynamic RAM - retains information as long as the contents are refreshed frequently enough. Smaller area of silicon per byte, low power requirement.

Does not use flip-flops.
Uses tiny capacitors to store electric charge.
Because the charge leaks away have to rewrite ("refresh") every few milliseconds.

TYPES OF MEMORY - 3

> SDRAM (Synchronous DRAM)

Hybrid of dynamic and static technology

"clocked" by the main CPU clock

> DDR (double data rate) SDRAM

Chips produce data on rising and falling edges of the clock

Higher data rates, eg 64 bits / nsec

TYPES OF MEMORY - 3

FRAM - Ferroelectric RAM

• Uses atom position in unit cell (in theory). Actual density about the same as DRAM. Non-volatile (no power / refresh to keep state).

Used in specialist devices (where you never want to "turn it off and on again" to fix it. Random access

Flash memory - (EEPROM)

Charge stored between insulators. Write bit by injecting electrons through a barrier layer (physically damaging it). Used in USB drives. Good for about 30,000 writes. Random access

Core memory

Magnetic "core" memory: each bit is stored as the direction of magnetisation of tiny rings (like doughnuts). 1955-1975. 1000 x slower than DRAM.

Memory Addressing

RAM consists of one or many chips

RAM is organised into words (of e.g. 32 bits)

Words are grouped into pages.

Words are selected by an address

A number of m bits

Control bits specify whether to read or write

Bits stored in the selected word go to or from the data bus

A byte is typically the smallest addressable chunk of memory. We can address the 4 bytes in a 32-bit word, but we can't address individual bits in each byte. Exceptions include Micron 3D-XPoint

RAM ROM SUMMARY

> Many different types of memory:

ROMS variants

RAM variants

Trade offs of speed, space, expense, longevity

> All slower to access than registers

STACKS

Random access memory requires knowing the address of every byte/word you want to access

Stacks offer a way of organising and accessing memory without random (indexed) access:

There are hardware stacks and software stacks.

Hardware stacks created out of dedicated shift registers

Software stacks typically defined in RAM using conventions (we'll come back to this):

IN SIMPLE TERMS...

A stack allows us to mothball/backup/hibernate a process/ task at will on the receipt of an interrupt or code invocation.

To do this, we

push instructions/data that we will need later onto the stack;

do the task;

and then pop the stored data back off the stack and

continue as before.

Stacks (6 deep) Example: a hardware stack with a depth of 6. Let it hold one item (5) as shown If 7 is now pushed onto the stack it becomes Push on a 0. Pop off the O Pop off the 7 Push a 2 Pop off the 2 If you remove another item you get the 5 and the stack is now empty.

SOFTWARE STACKS

We can also use ordinary RAM to implement a stack.

Stack space only limited by the amount of addressable RAM

The "top" of the stack moves "up" and "down" as things are pushed or popped, and the location of the "top" is stored in a register (Stack pointer).

SUMMARY of Stacks

> Stacks are a fundamental data:

Simple and efficient storage/recalling of data

Reduces need for storing memory addresses

Can only access data at the top of the stack

We store data by "pushing" data onto the stack

We recall data by "popping" data off of the stack

Hardware stacks utilise dedicated shift registers:

soon we are going to build one ourselves!

> Software stacks use RAM

Build a stack

Start with a bidirectional shift register....

1. Start off with 4 D flip-flops:

2. ADD THE STANDARD SIGNALS

Clock

> CLR (resets flip flops when up in Logisim)

ADD THE SERIAL INPUT (D)

> Data is the serial input (SI)

CONNECT EACH Q OUTPUT TO THE NEXT DATA INPUT

ADD SOME LEDS TO SEE THE PARALLEL OUTPUT / REGISTER STATE

> We can now modulate the Data input and see the on state propagate through the register (left to right).

HOW DO WE MAKE IT GO BACKWARDS?

Wire-up the cascade backwards

Data goes in the far end, each Q outputs to the D input of the previous Flip-Flop.

HOW DO WE MAKE THE DIRECTION SELECTABLE?

> A controlled gate

Controlled input gate: Selects A if high, B if low

PROGRAMMABLE WIRING

We can use one of these circuits for each input to a Flip-Flop, and use a common control signal to determine direction.

SELECTING THE OUTPUT WITH GATES

> We also need to determine which direction output from a Flip Flop flows!

> This circuit has a common input, and selectable output.

> We can use one of these for each output from a Flip-Flop, and use a common control signal to determine direction.

LET'S PUT THIS ALL TOGETHER (JUST 2 FFS TO START WITH)

SELECT INPUT

ADD THE OUTPUT DIRECTION SELECTION > R circuit enabled (red)

WHAT COULD POSSIBLY GO WRONG?

Can't short the outputs of two gates together.

Have to Add with an OR aate.

Conflict

THE FIX

The OR gates combine the outputs of the controlling AND gates and pass through the signal from whichever one is enabled... to the D inputs on each Flip-Flop.

Generally need about 3 gates per flip-flop per bit

NEED MORE DEPTH?

This is a 4-deep shift register ( http://www.ee.usyd.edu.au/tutorials/digital_tutorial/part2/ register06.html).

As you can see, there are alternative ways of wiring it up, with different logic gates.

TO MAKE A STACK...

So far we have made a 2-stage bi-directional 1-bit shift register.

To make a proper stack:

Add depth (flip-flops with associated control logic)

Add width (bits) in parallel (common clock, control signals).

identical shift-registers - one for each bit.

SUMMARY build a stack

Hardware stacks formed using banks of shift registers

Simple input selective circuits allow direction selectability:

Programmable gates!

To make a proper stack:

Add depth (flip-flops with associated control logic)

Add width (bits) in parallel (common clock, control signals).

identical shift-registers - one for each bit.

Assembly

int answer = 10 + 4;

Step 1 – the values 10 and 4 need to be represented somewhere in memory – probably registers

int answer = 10 + 4;

Step 2 – this plus sign needs to be mapped to some sort of instruction that tells the CPU/ALU to use it’s Adder (remember the adder circuit!) on these values

int answer = 10 + 4;

Step 3 – the output of this “adder” must be stored somewhere – again, probably a register.

int answer = 10 + 4;

Step 4 – define a location in memory big enough to hold an integer, and give it a name.

int answer = 10 + 4;

Step 5 – store the result of the addition, currently in a register, in the location associated with the symbolic name “answer”

int answer = 10 + 4;


MOV R0, #10 ; Step 1a:
MOV R1, #4 ; Step 1b:
ADD R3, R0, R1 ; Step 2 and Step 3
STR R3, answer ; Step 5
HALT ; STOP!
answer: .word 0 ; Step 4


mov r1, #1 ;r1 = register 1, #: delimiter, #1: literal value(not a pointer)
;1: value in decimal

Add v1


add r2, #1 ;r2: to register 2

add 1 to r2

r2 = r2+1

r2++

Add v2


add r2, r6; r2: to register 2

add r6 to r2

r2 = r2 + r6

Add r3


add r2, r5, r6

add r6 and r5 and put into r2

r2 = r5 + r6

SUB (works the same as ADD)


sub r2, #1; sub: subtract, r2: to register 2

subtract 1 from r2

‘dec’ in some asm

LABELS

Labels are used to give meaningful names for:

the location of instructions for branching,

specify locations of data for loading and storing.

Labels can be made up of any upper or lower case alpha-numeric characters

They must start with a letter

Labels essentially allow programmers to assign meaningful names to memory address locations that will be used often.

This is how we define variables (amongst other things we'll talk about later) in assembly code.

Labels definitions are looked at by the assembler before the executable code


someplace: .word 0

: ->Colon indicates it’s a label definition

someplace → label name (whatever you want)

.word → Type (i.e. size) of data to be stored

0 →Value to initialise memory location with

Define “someplace” as a label for a memory location holding a word value of 0

Store Register


str r1, someplace

, → delimiter, reference (a pointer)

someplace: offset value in decimal

write value in r1 into the memory location associated with the label “someplace”

Summary ARM Basic

ASM programming represents the lowest level of human readable programming

Near 1-to-1 match with machine instructions

RISC versus CISC

ASM programming is useful for:

Hardware specific programming

Optimising code for performance

Reverse engineering and analysing code behaviour

if (x < 2): (3)

x = 0 ← (1)

else: (4)

x = 3 ← (2)

(1) (2) ⇒ First thing to notice: we have two mutually exclusive computations to perform!

(3) (4) ⇒ Second thing to notice: we have a condition that determines which instruction to branch to ← This requires the ability to branch only when a specific condition is true

IF Tests: CMP

In ARM assembly, the CMP (Compare) instruction allows values to be compared:

CMP R1, R2 subtracts the 2nd value (R2) from the 15t (R1)

The result of this subtraction is then used to update the Application Program Status Register (APSR)

Performed by the ALU

Specifically, 4 flag bits are updated within the APSR:

ALU result was Negative

Z ALU result was Zero

C ALU set the Carry bit

V ALU result caused oVerflow

The Program Status Register (PSR) combines:

Application Program Status Register (APSR)

Interrupt Program Status Register (IPSR)

Execution Program Status Register (EPSR).

These registers are mutually exclusive bitfields in the 32-bit PSR. The bit assignments are:

Table 2.4. APSR bit assignments

Bits	Name	Function
[31]	N	Negative flag
[30]	Z	Zero flag
[29]	C	Carry or borrow flag
[28]	V	Overflow flag
[27]	Q	Saturation flag
[26:0]	-	Reserved

Access these registers individually or as a combination of any two or all three registers, using the register name as an argument to the MSR or MRS instructions. For example:

read all of the registers using PSR with the MRS instruction

write to the APSR N, Z, C, V, and Q bits using APSR_nzcvq with the MSR instruction.

inspect the outcome of a CMP here

Details


cmp r2, #1234 ;cmp: compare, r2: r2: register 2

compare r2 and #1234

r2 - #1234 set APSR with ALU flags, store the ALU flags in the APSR

Conditional Branching using the APSR

Branch (B) reads the APSR and jumps according to the flags and the relevant suffix you give it.

In ARMlite, there are five variants of Branch:

B - unconditional branch

BEQ 'Branch if EQual'

BGT 'Branch if Greater Than'

BLT 'Branch if Less Than'

'Branch if Not Equal'

In the ARM instruction set more generally, there are many others

Determining the comparison from the flags

In ARM assembly, the condition code suffix can be added to many operations. e.g. movne r1,#12

Suffix	Flags	Meaning
EQ	Z set	Equal
NE	Z clear	Not equal
CS or HS	C set	Higher or same (unsigned >= )
CC or LO	C clear	Lower (unsigned < )
MI	N set	Negative
PL	N clear	Positive or zero
VS	V set	Overflow
VC	V clear	No overflow
HI	C set and Z clear	Higher (unsigned >)
LS	C clear or Z set	Lower or same (unsigned <=)
GE	N and V the same	Signed >=
LT	N and V differ	Signed <
GT	Z clear, N and V the same	Signed >
LE	Z set, N and V differ	Signed <=

how does this translate to assembly code


if (x <= 2): 
x = 0


LDR R0,X // assume X is holding a value 
CMP R0,#2 // compare contents of R0 with #2 
BGT skip // if R0 > 2 then jump to skip 
MOV R0,#0 // if R0 <= 2 then assign R0 #0 
skip: // continue program from here ...


if (x <= 2): 
x = 0 
else: 
x = 3


LDR R0,X // assume X is holding a value
CMP R0,#2 // compare contents of R0 with #2
BGT else // if R0 > 2 then jump to else
MOV R0,#0 // case for R0 <= 2
B cont // Branch to label cont
else:
MOV R0,#3 // case for R0 > 2
cont:


if (x <= 2):
x = 0
else if (x <= 4):
x = 2
else
x = 4


LDR R0,X // assume X is holding a value
CMP R0,#2 // compare contents of R0 with #2
BGT else1 // if R0 > 2 then jump to label else
MOV R0,#0 // case for R0 <= 2
B cont // Branch to label cont
else1: // if R0 > 2
CMP R0,#4 // compare contents of R0 with #4
BGT else2 // if R0 > 4 then jump to label else2
MOV R0, #2 // otherwise, handled case for R0 > 2 <= 4
B cont // Branch to label cont
else2: // if R0 > 4
MOV R0,#4 // case for R0 > 4
cont: // exit point of conditionals. Continue with program from here

Looping and Branching

Recall Labels

Labels are used to give meaningful names to locations in memory

Previously we saw this used to define variables: Eg someplace: . word O

Defines "someplace" as a label for a memory location holding a word of value O

But we can also use labels to mark the location of instructions in our programs !

Instructions have addresses too !

Recall that before a program is executed, it is loaded into memory

Every instruction therefore occupies a word of memory, and has an address like any other data

Recall also that this is a core principle of the Von Neumann architecture

We can therefore use the address of specific instructions to jump from one location to another

often referred to as branching

The Program Counter

• To understand labels and branching we need to understand how the CPU keeps track of which instruction to execute next • It uses the special register, the Program Counter (PC) register

Counter

After each instruction is executed, the PC register is updated with the address of the next instruction to execute.

This update is performed by a hardware counter within the ALU

Labels as branch points


MOV R0, #0
Loop: ;Label referring to the address of the next instruction
ADD R0,R0,#1
B Loop;A Branch instruction, telling the program counter that the next 
;instruction to execute is at address of label ‘Loop’
HALT

Unconditional Branch


B someplace; This can be any word aligned address, 
;but is almost always given as a label, defined elsewhere in the program

Branch to the address represented by the label someplace

Bit-wise operators

Instructions that manipulate individual bits given to them in their operands

Generally best understood by considering values in their binary form

Many of these will feel familiar from the first half of this unit

AND – Logical bit-wise AND

Syntax:

AND R3, R1, R2

Performs a bit-wise logical AND on the two inputs given in the second and third operands

Stores the result in the destination, (first operand)

R1 "AND" R2":

ORR — Logical bit-wise OR

Syntax:

ORR R3, R1, R2

Performs a bit-wise logical OR on the two inputs given in the second and third operands

Stores the result in the destination, (first operand)

R1 "ORR" R2":

EOR – Logical bit-wise Exclusive OR

Syntax:

EOR R3, R1, R2

Performs a bit-wise logical XOR on the two inputs given in the second and third operands

Stores the result in the destination, (first operand)

Eg, H

R1 "EOR" R2":

Observations on AND,ORR,EOR

Have clear analogies to the logical gates we have seen previously

Generally useful for specific bit manipulations to control hardware, as well as bit masking (e.g, to extract part but not all of a 32 bit string).

Also for performing comparisons between, or merging bit strings.

LSL - Logical Shift Left

Syntax:

LSL R1, R1, #3

Shifts each bit of the input value to the left, by the number of places specified in the third operand, losing the leftmost bits, and adding zeros on the right.

Stores the result in the destination, (first operand)

Eg,

If R1 = #0b01101110, and shift = #3

R1 becomes #0b01110000

LS - Logical Shift Left

Syntax:

LSR R1, R1, #3

Shifts each bit of the input value to the right, by the number of places specified in the third operand, losing the rightmost bits, and adding zeros on the left.

Stores the result in the destination, (first operand)

Eg,

If R1 = #0b01101110, and shift = #3

R1 becomes #0b00001101

LSL and LSR Observations

Assuming Little Endian notation:

A single LSL doubles the value represented by the bit string

A single LSR halves the value represented by the bit string

Both instructions invoke a shift register to perform task in hardware

As with all bit-wise operators, they can be performed in one CPU cycle.

Form the backbone for implementing multiplication and division operations, which typically require multiple CPU cycles to calculate (we'll come back to this later)

Memory Addressing - LDR and STR

A common workflow

The CPU and ALU work with values stored locally in registers

A standard workflow for operating on values in memory:

Load value(s) into registers

Perform operations on values in registers

Write result to register(s)

Store value(s) back to memory

LDR and STR

LDR ('Load Register') copies a value from a memory location into a register

STR ('Store Register) copies a value from a register into a memory location

E.g.:

LDR RO, Oxfc loads the contents of the word starting at memory address 0x000fc into RO.

STR R1,120 stores the contents of R1 into the word starting at memory address

(decimal) 120.

STR R3, myLabel stores the contents of R3 into the word at the address represented by label myLabel.

LDR R4, myLabel loads the contents of the word at the address represented by label myLabel

LDR and STR (cont)

The address passed to LDR/STR must be 'word aligned'

that is, it must be divisible by 4

Therefore:

LDR R3, 0x81 is an invalid instruction because address 0x00081 is not a valid word address (it is not divisible by four).

The Assembler will trap this.

LDRB and STRB

LDR and STR have byte addressable variants!

LDRB will load just a single byte of memory

into the least significant 8 bits of a register - setting the other bits to 0

STRB will store the least significant 8-bits of a register into a specified

byte of memory

not altering any neighbouring bytes

LDRB and STRB (cont)

LDRB and STRB do not require the specified memory address to be word aligned (divisible by four).

For example

LDRB R3,0x81 is a valid instruction, and will load the contents of the single byte at address 0x00081 into the least significant 8 bits of R3.

Indirect Addressing

LDR and STR also support 'indirect addressing'

Indirect addressing allows memory to be addressed via registers holding the address

In some programming languages (e.g., C), this is the basis of 'pointers'

a memory word that holds the address of another memory word

In ARM assembly:

LDR R0,[R1] will load into RO, the contents of the memory address that is currently held in R1.
STR R3,[R4] will store the value in R3 to the memory word with address currently held in R4

Indirect and Indexed Memory Addressing

LDR and STR memory addressing

We have previously discussed and seen how LDR and STR support indirect addressing

allows memory to be addressed via registers holding the address

For example:

LDR RO, [R1] will load into RO, the contents of the memory address that is currently held in R1.

STR R3, [R4] will store the value in R3 to the memory word with address currently held in R4

LDR Indirect example

MOV R1, #0x00040 LDR RO, [R1]

R1 = 0x00040

R0 = 0xffffffff

STR Indirect example

MOV RO, #Oxffffffff MOV R1, #0×00040 STR RO, [R1]

STR: taking value into R0 and storing it into memory location,

Indexed Memory Addressing

It is often desirable to address specific words/bytes of memory as an offset from some base address.

For example:

Accessing dedicated memory locations that control/interact with external hardware/systems
Pixel display
Arrays (blocks of memory used to collect values of the same type)
Structs (blocks of memory used to collect values of different types)

In such examples, there is typically a base address indicating the start of addresses dedicated to the specific use-case.

We can therefore index specific memory locations within as an offset from this base address

Indexed Addressing Syntax

In ARM assembly


LDR R0, [R1 + #4] ; Load the value at memory address [R1] + 4 bytes into R0
STR R2, [R1 + #16] ; Store the value in R2 in memory at address [R1] + 16 bytes
LDR R0, [R1 + R3] ; Load the value at memory address [R1] + R3 (no. of bytes) 
;into R0
STR R2, [R1 + R4] ; Store the value in R2 in memory at address [R1] + R4 
;(no. of bytes)

Indexed Addressing LDR Example


MOV R1, #0x00040
LDR R0, [R1 + #4]

R1 = ?

R0 = ?


MOV R1, #0x00040

⇒ R1 = 0x00040

LDR R0, [R1 + #4]

R1 = 0x00040 → to 0x0004

LDR R0, [R1 + #4]

#4] → 0x00040 + 4 = 0x00044

Indexed Addressing STR Example


MOV R0, #0xffffffff
MOV R1, #0x00040
MOV R2, #16
STR R0, [R1 + R2]

Indexed Addressing LDRB Example


MOV R1, #0x00040
LDRB R0, [R1 + #2]

[R1 + #2]

Indexed Addressing STRB Example

Some Observations and Notes

For word index addressing (ie STR, LDR):

the offset number of bytes must be a multiple of 4 (ie., 4 bytes per word)

For byte index addressing (ie STRB and LDRB):

individual bytes can be addressed so no need for offset to be multiple of 4

Remember ARMlite uses little endian byte addressing

If a byte offset is larger than 4 then it will be addressing bytes in adjacent words

Busy Wait Timer (Flashing LED)

Let's simulate a flashing LED

We can simulate a flashing LED by drawing a single pixel in ARMLite's graphical display

ARMlite has three modes of pixel graphics display:

Low resolution (32 × 24 pixels)
Medium resolution (64 × 48 pixels)
High resolution (128 × 96 pixels)

For this we will use the low resolution modality

ARMlite Low Res Pixel Graphics

In ARMLite low-res modality, there are 32 × 24 = 768 pixels.

The display window is mapped to a contiguous block of memory.

Pixel colours are specified as 24 bit Red-Green-Blue values (8 bits per colour component)

Eg Bright Green is 0x000FF000, White is Ox00FFFFFF
In ARMlite only the least significant 24 bits of each word are used

Each pixel therefore has an address, and can be written to (STR) or read from (LDR) like any memory location

ARMlite Low Res Pixel Graphics

In low-res mode, each pixel is assigned a label, starting from top left and moving across each row:

Pixelo, Pixel1, Pixel2 .....Pixel767

For convenience, labels pre-exist for standard colour codes:

Eg., the colour green can be represented by the predefined label green
green == #0×00008000

ARMlite Low Res Pixel Graphics

Let's paint a green pixel in the middle of the display: The index of the middle of the display will be 11*32 + 15 = 367


MOV RO, #.green // set RO to RGB for green
STR RO, •Pixel367 // write RO to address of Pixel 367

simulate a flashing LED

Issue - the flashing is too fast !

In fact, in a real system this would probably not produce a flash at all, just a dimmer LED


MOV R0, #.green
MOV R1, #.white
flash:
STR R0, .Pixel367 ;(1)
STR R1, .Pixel367 ;(2)
;The issue is between (1) and (2) -> we need to insert a delay 
;between the “ON” and the “OFF" ->we need a timer!
B flash

A "Busy Wait" Timer

Pseudocode:

Set up pixels for display
Loop1:

Turn LED on (paint green pixel)
Busy wait
Turn LED off (paint white pixel)
Busy wait

branch to loop1

A "Dumb" Busy Wait Timer

Busy wait:

a loop that executes until it reaches a certain value
keep CPU occupied

set limit value // some large number

initialise counter to 0

timer:

Add 1 to counter

Compare counter to limit value

branch to timer if counter < limit

So we need a conditional branch as part of the timer loop !


MOV R2, #5000000 // init number of iterations for timer
MOV R2,R3 // MOV iterations into working register R3
timer1:
SUB R3,R3,#1 // subtract 1 from R3
CMP R3, #0 // compare R3 with #0
BNE timer1 // keep looping until R3 reaches zero

Incorporating timer into LED flash • Recall pseudo code: Set up pixels for display Loop1: Turn LED on (paint green pixel) Busy wait Turn LED off (paint white pixel) Busy wait

branch to loop1

Incorporating timer into LED flash


1   mov r0, #.green
2   mov r1, #.white
3   mov r2, #5000000
4 flash:
5   str r0, Pixel367    ; flash on
6   mov r3, r2
7 timer1:
8   sub r3, r3, #1
9   CMP r3, #0
10  BNE timer1
11  str r1, Pixel367    ; flash off
12  mov r3, r2
13 timer2:
14  sub r3, r3, #1
15  CMP r3, #0
16  BNE timer2
17  B flash
18  halt

Busy Wait

Busy Wait v1

Last video we applied the “dumb” busy wait timer


MOV R2, #5000000 // init number of iterations for timer
MOV R2,R3 // MOV iterations into working register R3
timer1:
SUB R3,R3,#1 // subtract 1 from R3
CMP R3, #0 // compare R3 with #0
BNE timer1 // keep looping until R3 reaches zero

Imagine you executed the code on two different CPUs?

RPi 2B (900MHz quad-core ARM Cortex-A7 processor)
RPi 4B (1.5GHz quad core ARM Cortex-A72 processor)

Will the delay be the same on both?

No !

A Better Busy Wait

Most CPUs have access to a register that maintains a real time count.

On the Raspberry Pi for example, a Timer register counts in 1 microsecond intervals (106 per second)

In ARMlite, a register maintains a 1 second counter

accessible via the pre-defined .Time label

Eg LDR RO, •Time loads the current time in seconds (since Jan 1 2000) into RO

We can utilise this to produce a hardware independent real-time delay

A Better Busy Wait - psuedocode

Set desired delay time in seconds start_time = current time in seconds loop: now = current time in seconds remaining_time = (now - start_time) compare remaining_time, delay loop if remaining_time <= delay

A Better Busy Wait

Let's allocate some variables to registers

R2 - desired delay
R3 - now (where the current time will be stored)
R4 - start time
R5 - elapsed time (R3-R4)

Solution


1   mov r0, #.green
2   mov r1, #.white
3   mov r2, #1      ; 1 sec delay time
4 flash:
5   str r0, Pixel367 ; flash on
6   LDR r3, .Time   ; start time
7 timer1:
8   LDR r4, .Time   ; current time
9   sub r5, r4, r3  ; elapsed time = current time - start time
10  CMP r5, r2
11  BLT timer1
12  str r1, Pixel367 ; flash off
13  LDR r3, .Time   ; start time
14 timer2:
15  LDR r4, .Time   ; current time
16  sub r5, r4, r3  ; elapsed time = current time - start time
17  CMP r5, r2
18  BLT timer2
19  B flash
20  halt

A Better Busy Wait

Provides an absolute timer (no longer dependent on processor speed!)

However, our approach is still a form of busy wait!

CPU still occupied for duration of delay

Essentially this is polling the timer

Is there a potentially even better approach ?

Well yeah! We could implement an interrupt-based timer

We will come back to this later in semester!

Revisiting our Flashing LED Example

Recall our flashing LED program from last week, and in particular the delay function


;; *****************************************
;; delay function
;; inputs R0 - time delay in seconds
delay:
    push {R3,R4,R5,R6}
    MOV R3, R0          ; move delay time param into R3
    LDR R4, .Time       ; get start time
timer:
    LDR R5, .Time       ; update time
    SUB R6, R5, R4      ; calc elapsed time
    CMP R6, R3          ; compare elapsed to delay time
    BLT timer
    pop {R3,R4,R5,R6}
    RET
;; *****************************************

This is a busy-wait timer, or in other words, an example of polling

The CPU constants checks for a state – in this case, that the time exceeds a delay time

There is a better way ! Using Interrupts

Now back to our Flashing LED

Lets use ARMlite's Clock Interrupt for this !

To set it up the interrupt handling:

The handler will be called everytime the clock raises an interrupt
So each time it will draw either a green ("LED on") pixel or a white ("LED off") depending on what state the LED is currently in

This is quite different to previously where we inserted a delay between each state change!

Now we're waiting to be told when to change state:

Event driven !

Flashing LED with INT: Interrupt Handler


///////////////////////////////// 
/// interrupt handler: toggleLED
// toggles state of LED (on "green" or off "white")
// 
toggleLED:
		PUSH {RO}
		CMP R3, #O // check state 
		BNE off
		MOV R3, #1 // if R3 0, then 1
		MOV RO, #.green
		B drawpixel
off:
		MOV R3, #0 // if R3 1, then O
		MOV RO, #.white 
drawpixel:
		STR RO, .Pixel367 // draw the pixel
		POP {RO}
		RFE

Flashing LED with INT: Interrupt Handler


CMP R3, #O ; check state -> Check next state to enact (on or off)


BNE off
;next state is “off” ?
;Then turn LED off
off:
MOV R3, #0 ; if R3 1, then 0
MOV R0, #.white

MOV R3, #1 ; if R3 0, then 1 - next state is “ON” ?
;Then turn LED ON
MOV R0, #.green

drawpixel:
STR R0, .Pixel367 ; draw the pixel - Draw the pixel (R0 will be set as
;either green or white)


PUSH {R0}
POP {R0}

We push and pop R0 so we can restore it after the function is finished.

Flashing LED with INT: Full Program


1|// Set up Interrupt handling
2| MOV R0,#toggleLED
3| STR R0,.ClockISR
4| MOV R0,#1000
5| STR R0,.ClockInterruptFrequency
6| MOV R0,#1
7| STR R0,.InterruptRegister //Enable all interrupts
8| MOV R3, #1 // R3 keeps track of state of LED
9|mainProgram:
10| B mainProgram //Here, just an empty loop!
11|/////////////////////////////////
12|/// interrupt handler: toggleLED
13|// toggles state of LED (on "green" or off "white")
14|//
15|toggleLED:
16| PUSH {R0}
17| CMP R3, #0 // check state
18| BNE off
19| MOV R3, #1 // if R3 0, then 1
20| MOV R0, #.green
21| B drawpixel
22|off:
23| MOV R3, #0 // if R3 1, then 0
24| MOV R0, #.white
25|drawpixel:
26| STR R0, .Pixel367 // draw the pixel
27| POP {R0}
28| RFE

Line 9 and 10: This is just an infinite loop because we are only flashing LED and there is nothing else to do but wait for clock interrupts, however this could potentially be doing other useful things

FUNCTIONS

Functions/methods/procedures/sub-routines:

A callable block of organised, re-usable code
Typically single action
accepts arguments (ie parameters)

Eg in C:


int add(int x, inty)
{
int sum = x + y;
return(sum);
}

FUNCTION BASICS

When a function is called:

Arguments need to be placed somewhere the function can access
program control shifts to the function's instructions

When a function completes:

Return value needs to placed somewhere for the calling function to retrieve
program control shifts back to the instruction immediately after where it was called from

Managing this requires a some house keeping needed !

High level programming languages hide most of this !
Not ASM!

FUNCTIONS IN ASM

Not 'native' to assembly

We need to do a lot of the management ourselves

Argument passing:

How do we pass arguments from one function to another

Storing and recalling register values

each function we call will want to use the same registers (only 13 general purpose registers !)
How do we manage this?

Managing the program control

Jumping from one function to another, and then returning back !

RegIster MANAGEMENT

Application Binary Interface (ABI) sets standard way of using ARM registers.

ro-r3 used for function arguments and return values
r4-r12 promised not to be altered by functions
Ir and sp used for stack management
pc is the next instruction - we can use it to exit a function call

ABI

Register	Brief	Preserved	Rules
ro	Argument and result	No	r0 and r1 are used for passing the first two arguments to functions, and returning the results
r1	Argument and result	No	of functions. If a function does not use them for a return value, they can take any value after a function.
r2	Argument	No	r2 and r3 are used for passing the second two arguments to functions. There values after a
r3	Argument	No	function is called can be anything.

CALLING FUNCTIONS

By convention, the first two function arguments are loaded into r0 and r1.

The next two are put into r2 and r3.

The return value of the function is written into r0 and r1 (lowest word in ro).

The function promises not to alter r4-r12.

… but suppose the function needs to use many registers to do calculations...

SUMMARY FUNCTION

Functions are the building blocks of programs:

Organised, re-usable blocks of code

Higher level programming languages have built in support for functions:

Not ASM!

One thing we need to manage is register use

Application Binary Interface (ABI) defines conventions for the use of registers

Next lecture:

How do we store and recall register values ? With a stack of course !

FUNCTIONS IN ASM

Not 'native' to assembly

We need to do a lot of the management ourselves

Argument passing:

How do we pass arguments from one function to another

Storing and recalling register values

each function we call will want to use the same registers (only 13 general purpose registers !)
How do we manage this?

Managing the program control

Jumping from one function to another, and then returning back !

PUSH, POP AND THE STACK

ARM computers have a software stack*.

A separate area of RAM is available for temporary values.

A value in a register can be pushed onto the stack to preserve it for later.

It can be popped off later (in LIFO order).

We can get the memory location (a pointer to it) by checking the SP (R13) register.

SOFTWARE STACK?

A section of RAM managed by the SP (stack pointer) register.

A sort of 32-bit (64-bit in ARM8) wide array which starts (element 0) high in RAM and grows down as values are added to it.

The stack pointer stores the memory location of the last value added (pushed) to the stack.

Each push decrements SP by 4 (4 bytes per word).

A pop operation removes the last value in the stack and increments the SP by 4 (4 bytes per word)

Software stack (depth only limited by RAM)

Example:

x = don’t care.

2.push 5 – SP decremented by 4. [SP] points to 5

3.push 7 - SP decremented by 4, [SP] points to 7

4.push 0 – SP decremented by 4. [SP] points to 0

5.pop 0 – SP incremented by 4. [SP] points to 7

6.pop 7 – SP incremented by 4. [SP] points to 5

7.push 2 – SP decremented by 4. [SP] points to 2.

8.pop 2 – SP incremented by 4. [SP] points to 5.

pop 5 - SP incremented back to it’s starting value.

The stack is now empty. [SP] points to end of stack.

EXAMPLE SYNTAX

Push and pop accept multiple registers if in a 1, ,,...y list push {r4,r5} ;back them up onto the stack ;use r4 and r5 for something else pop {r4,r5} ;restore them from the stack

Alternatively, do one at a time (but pop in reverse order)

stack: Correct order is preserved for {lists}

RECALL THE ABI

Application Binary Interface (ABI) sets standard way of using ARM registers.

r0-r3 used for function arguments and return values
r4-r12 promised not to be altered by functions
Ir and sp used for stack management
pc is the next instruction - we can use it to exit a function call

ABI CONVENTIONS

ABI compliant functions:

Use ro-r3 for passing and returning values to functions
Promise not to alter r4-r12

... but suppose the function needs to use many registers to do calculations ??

We can use the stack to store and recall register values !

PassIng ARGUMENTS TO FUNCTIONS

To re-use the registers we need to:

Back up registers we need to re-use in a function
Store arguments for the function in r0-r3
Call the function
Read the return values from r0-r1 (optional)
Restore the registers we backed up.

SUMMARY

Software Stack:

Dedicated RAM used to store values FILO

Special register "sp" used to store address of start of the stack

Stacks allow us to store and recall register values efficiently

Stacks integral to functions:

We need to store and recall register values so we don't run out of registers to use!

FUNCTIONS IN ASM

Not 'native' to assembly

We need to do a lot of the management ourselves

Argument passing:

How do we pass arguments from one function to another

Storing and recalling register values

each function we call will want to use the same registers (only 13 general purpose registers !)
How do we manage this?

Managing the program control

Jumping from one function to another, and then returning back !

RECALL OUR “FLASHING LED” PROGRAM


1   mov r0, #.green
2   mov r1, #.white
3   mov r2, #1      ; 1sec delay time
4 flash:
5   str r0, Pixel367 ; flash on
6   LDR r3, .Time   ; start time
7 timer1:
8   LDR r4, .Time   ; current time
9   sub r5, r4, r3  ; elapsed time = current time - start time
10  CMP r5, r2
11  BLT timer1
12  str r1, Pixel367 ; flash off
13  LDR r3, .Time   ; start time
14 timer2:
15  LDR r4, .Time   ; current time
16  sub r5, r4, r3  ; elapsed time = current time - start time
17  CMP r5, r2
18  BLT timer2
19  B flash
20  halt

DEFINING FUNCTIONS

We're obviously duplicating code in this example, which in general we want to avoid

Why not write a function to do this !

Imagine we had already written a function called “delay”:


1   mov r0, #.green
2   mov r1, #.white
3   mov r2, #1      ; 1sec delay time
4 flash:
5   str r0, Pixel367 ; flash on
6   LDR r3, .Time   ; start time
;BL delay: Calling function “delay” Program control jumps to Instruction 
;address represented by the label delay
7   BL delay        ; call delay function
8   str r1, Pixel367 ; flash off - str: Once the function is complete,
;program control returns to instruction after function call
9   LDR r3, .Time   ; start time
10  BL delay        ; call delay function
11  B flash
12  halt

Now we have replaced the duplicated code with a function call

KEY REGISTERS

Program counter (pc, also r15):

Holds the address of the next instruction to execute

Link Register (Ir, also r14):

Holds the address of instruction to return to after a function is complete

HOW ARE THEY USED FOR FUNCTION CALLS?

Program counter (рс):

Is updated when a branch to label (BL) is encountered

Link Register (Ir):

holds what was in po register before it was changed
i.e., address of the next instruction after the function call
brings us back to where we came from (we'd be lost otherwise!

HELPFUL INSTRUCTION - BL

BL label$:

causes program control to jump to labelS,

but also ....

copies next instruction to Ir so we know how to get back!

DEFINING FUNCTIONS

Imagine we had already written a function called “delay”:


7 BL delay

10 BL delay

BL delay sets: • the PC register to be address of delay • the LR to register to the address of next instruction

DEFINING DELAY FUNCTION

So far we have only talked about how we might call a delay function

Lets now think about writing the actual function

Lets the write the function so that it takes a single argument:

The number of seconds to delay

DELAY FUNCTION


delay: ;Label defining start of function in memory
push {R3,R4,R5,R6} ;begin Push all registers we are about to use onto stack 
;so we can restore them at the completion of function
MOV R3, R0 ; move delay time param into R3
LDR R4, .Time ; get start time
timer:
LDR R5, .Time ; update time
SUB R6, R5, R4 ; calc elapsed time
CMP R6, R3 ; check elapsed time
BLT timer
pop {R3,R4,R5,R6} ;end Push all registers we are about to use onto stack 
;so we can restore them at the completion of function
RET

MOV R3, R0: This is where we accept the parameter passed in via R0. We move it into a register to work with locally


LDR R4, .Time ; get start time
timer:
LDR R5, .Time ; update time
SUB R6, R5, R4 ; calc elapsed time
CMP R6, R3 ; check elapsed time
BLT timer

This implements the Delay just like we saw in (the “better dumb timer”)

RET

An ARMlite instruction specifically for returning from Functions. It makes sure the Program Counter register is updated using the address in Link Register

BL AND RET

BL and RET instructions work as a pair

BL (Branch to Label) essentially does this:


MOV LR,PC; copy current next instruction to LR 

MOV PC, #delay; set PC to make "delay" the next instruction


MOV LR,PC; copy current next instruction to LR

We do this so we know where to return to after the function is finished


MOV PC, #delay; set PC to make "delay" the next instruction

We do this so we know where to jump to next to execute the function

BL AND RET

RET (ie., RETurn), is how we exit functions, and return where we came from.

RET essentially does this:

MOV PC, LR

It copies the original next instruction (ie., the one we first copied into LR before we called the function) back to PC so that it is the next instruction executed.

WHOLE PROGRAM Program Counter and Link Register


1   mov R0, #.green
2   mov R1, #.white
3   mov R2, #1      ; 1sec delay time
4 flash:
5   str R0, Pixel367 ; flash on
6   LDR R3, .Time   ; start time
7   push {R0}
8   MOV R0, R2
9   BL delay        ; call delay function
10  pop {R0}
11  str R1, Pixel367 ; flash off
12  LDR R3, .Time   ; start time
13  push {R0}
14  MOV R0, R2
15  BL delay        ; call delay function
16  pop {R0}
17  B flash
18  halt
19 delay:
20  push {R3,R4,R5,R6}
21  MOV R3, R0      ; move delay time param into R3
22  LDR R4, .Time   ; get start time
23 timer:
24  LDR R5, .Time   ; update time
25  SUB R6, R5, R4  ; calc elapsed time
26  CMP R6, R3
27  BLT timer
28  pop {R3,R4,R5,R6}
29  RET

SUMMARY Program Counter and Link Register

Function calls require branching to a different instruction address

Use bl to branch to a label
Use RET to return back to calling code

Program counter (pc) and Link Registers:

pc: address of the next instruction to execute
Ir: address of instruction to return to after a function is complete

FUNCTIONS IN ASM

Managing the program control

Jumping from one function to another, and then returning back!

RECALL OUR “FLASHING LED” PROGRAM


1   mov R0, #.green
2   mov R1, #.white
3   mov R2, #1      ; 1sec delay time
4 flash:
5   str R0, .Pixel367 ; flash on
6   LDR R3, .Time    ; start time
7   push {R0}
8   MOV R0, R2
9   BL delay         ; call delay function
10  pop {R0}
11  str R1, .Pixel367 ; flash off
12  LDR R3, .Time    ; start time
13  push {R0}
14  MOV R0, R2
15  BL delay         ; call delay function
16  pop {R0}
17  B flash
18  halt
19 delay:
20  push {R3,R4,R5,R6}
21  MOV R3, R0       ; move delay time param into R3
22  LDR R4, .Time    ; get start time
23 timer:
24  LDR R5, .Time    ; update time
25  SUB R6, R5, R4   ; calc elapsed time
26  CMP R6, R3       ; compare elapsed to delay time
27  BLT timer
28  pop {R3,R4,R5,R6}
29  RET

Last lecture we created a function to call for delays

Line 9 and 11, between line 19 and 29

Between 4 and 18 and inside that: between 5 and 10 and between 11 and 16

But we can decompose this further!

Let's write a flash function, and pass it a colour to flash, and a time delay between flashes.

The only difference between these blocks is the colour of the pixel being drawn.

We can write this as a function and pass the colour (and delay time) as parameters

DRAW PIXEL FUNCTION


drawpixel:
PUSH (R3, R4}
MOV R3, RO ; copy pixel colour to R3
MOV R4, R1 ; copy delay time to R4 STR R3, .Pixel367; draw pixel with colour
PUSH {RO}
MOV RO, R4 ; pass delay time to delay function
BL delay ; call delay function
Pop {RO} RET ; job done

FLASH LOOP (NOW CALLING DRAWPIXEL)


mov R2, #1 ; 1sec delay time
flash:
MOV R0, #.green ; pass green to drawpixel
MOV R1, R2 ; pass time delay
BL drawpixel ; "flash on“
MOV R0, #.white ; pass white to drawpixel
MOV R1, R2 ; pass time delay
BL drawpixel ; "flash off"
B flash
HALT

DRAW PIXEL FUNCTION (FIXED)


drawpixel:
PUSH {R3,R4}
MOV R3, R0 ; copy pixel colour to R3
MOV R4, R1 ; copy delay time to R4
STR R3, .Pixel367 ; draw pixel with colour
PUSH {R0}
MOV R0, R4 ; pass delay time to delay function
PUSH {LR} ; backup LR before BL delay overwrites it!
BL delay ; call delay function
POP {LR} ; restore LR after we’ve returned from delay
Pop {R0}
RET ; job done

SUMMARY - Nested Function Calls

All of this is interesting because this is exactly what compilers have to handle when translating your code in higher level languages

Critical components include LR and PC registers,

But also the stack !

The stack allows functions to do what they do using all registers, and then restores what was there previously

This is also critical to interrupt handling
we'll come back to this

Array Basics

Array basics

Arrays are dedicated blocks of memory for storing lists of values of the same type (ie size)

Almost all programming languages support arrays:

E.G in C programming:

int somearray|10];

Defines a block of memory referred to as "somearray", which holds 10 integer (32 bit)

Array basics int somearray[10]; Using memory in ARMlite, the above could translate to a block of memory like this.

“somearray” refers to an address, marking the start of the allocated block

How does this translate to ARM assembly ?

We simply define a label!


int somearray[10];


somearray: .Word
0
0
0
0
0
0
0
0
0
0


somearray[3] = 5;

We have the address of the array which needs to be stored in a register


MOV R0, #somearray


somearray[3] = 5;

We have an index to a 32 bit integer within the array.

Address of this value will be 3 x 4 = 12 bytes from start of array

We can store this in a register


MOV R0, #somearray
MOV R1, #12

somearray[3] = 5;

We can keep the value to store in another register


MOV RO, #somearray
MOV R1, #12
MOV R2, #5
somearray: .Word
0
0
0
0
0
0
0
0
0
0

somearray[3] = 5;

Finally, we can write the value to the array using STR


MOV R0, #somearray
MOV R1, #12
MOV R2, #5
STR R2, [R0 + R1]
somearray: .Word
0
0
0
0
0
0
0
0
0
0

HALT: make sure program is halted so it does not try and execute anything in the array

Some array definition for single byte arrays

For a generic array of single bytes


somebytearray: .Byte 1 3 221 56

For an array of ASCII characters (ie., a string):


somestring: .ASCIZ “hello world\n”

This directive ensures a character per byte, and the appending of a NULL character at the end of the string.

An alternative is .ASCII, which does not append the NULL character

Memory Alignment

Memory Alignment is a common requirement:

hardware often imposed alignment requirements to ensure integrity

When defining byte addressable arrays, precede the array definitions with Align.

For example:


.ALIGN 128
somestring: .ASCIZ “hello world\n”

.ALIGN N ensures the next instruction is aligned with a word address divisible by N (in general, and multiple of 4 is generally fine).

Iterating though an array


numarray[10]: {3, 2, 5, 3, 5, 6, 1, 2, 4, 9}
sum = 0
i = 0
while(i < 10)do
sum = sum + numarray[i]
i = i + 1;
end while

Lets do this in Assembly code

First, define the array with some labels:


arraysize: 40
numarray: .Word 3
2
5
3
5
6
1
2
4
9

Lets do this in Assembly code

Next, initialise some registers:


MOV R0, #numarray
MOV R1, #0 // index
MOV R2, #0 // sum
HALT
arraysize: 40
numarray: .Word 3
2
5
3
5
6
1
2
4
9

Lets do this in Assembly code

Next, setup a label and conditional branch for looping


MOV R0, #numarray
MOV R1, #0 // index
MOV R2, #0 // sum
arrayloop:
LDR R4,arraysize
CMP R1,R4
BLT arrayloop
HALT
arraysize: 40 // 10 * 4 bytes
numarray: .Word 3
2
5
3
5
6
1
2
4
9

Next, insert code to access each item in array and increment index


MOV R0, #numarray
MOV R1, #0 // index
MOV R2, #0 // sum
arrayloop:
LDR R3, [R0 + R1] // access array item at current index (R1)
ADD R1, R1, #4 // increment index to next 32 bit word
LDR R4, arraysize
CMP R1, R4
BLT arrayloop
HALT
arraysize: 40 // 10 * 4 bytes
numarray: .Word 3
2
5
3
5
6
1
2
4
9

Next, insert code to access each item in array and increment index


MOV R0, #numarray
MOV R1, #0 // index
MOV R2, #0 // sum
arrayloop:
LDR R3, [R0 + R1] // access array item at current index (R1)
ADD R1, R1, #4 // increment index to next 32 bit word
LDR R4, arraysize
CMP R1, R4
BLT arrayloop
HALT
arraysize: 40 // 10 * 4 bytes
numarray: .Word 3
2
5
3
5
6
1
2
4
9

Finally, add each value to the accumulating “sum”


MOV R0, #numarray
MOV R1, #0 // index
MOV R2, #0 // sum accumulator
arrayloop:
LDR R3, [R0 + R1] // access array item at current index (R1)
ADD R2, R2, R3 // add value to accumulator (R2)
ADD R1, R1, #4 // increment index to next 32 bit word
LDR R4, arraysize
CMP R1, R4
BLT arrayloop
HALT
arraysize: 40 // 10 * 4 bytes
numarray: .Word 3
2
5
3
5
6
1
2
4
9

Extended Array Example - lower case conversion

Array (and ASCIl) example - convert to lowercase

Let's write an ASM program that reads a string from input and converts all alphabetic characters to lower case

Algorithm:

Store input string in array
Iterate through the array and check each character

If character needs converting, convert to lowercase
If not, leave as is

Once complete, write string to output

Using ASCII to do conversion

Recall every character has an ASCIl code (0 - 255)

Uppercase alphabetic characters are within the range: 65 - 90

Using ASCII to do conversion

Recall every character has an

ASCIl code (0 - 255)

Uppercase alphabetic characters are within the range: 65 - 90

Lowercase alphabetic characters are within the range: 97-122

Conversion of uppercase to lowercase requires subtracting 32 from ASCII value

• E.g. “A” (65) -> (65 + 32) -> “a” (97)

Uppercase to lowercase

First we need to define an array to store the string


.Align 256
stringData: .BLOCK 256

We then need to read an input string into the array


MOV R0, #stringData
STR R0, .ReadString
.Align 256
stringData: .BLOCK 256

We now need to implement a loop that inspects each character.

Lets start with the basic loop structures first. We need:

A register to hold the array base address
A register to hold the index of the current character
A register to store the actual character value once
A condition to determine when we have finished scanning each character

We now need to implement a loop that inspects each character.

Lets start with the basic loop structures first. We need:

A Label to mark the start of the loop (so we can branch back)


tolower :

A register to hold the array base address (we already have this in RO)


MOV RO, #stringData

A register to hold the index of the current character (let's use R1)


MOV R1, #0 // initialise index to O

A register to store the actual character value once (let's use R2)


LDRB R2, [RO + R1] // reads the byte at address RO+R1

A condition to determine when we have finished scanning each character

we can look for the "NULL" character, which is always appended to the end of any input string.

**The NULL character has ASCII value 0**

A condition to determine when we have finished scanning each character

we can look for the "NULL" character, which is always appended to the end of any input string
So.


CMP R2, #0
BNE toLower

This will keep iterating through the loop “toLower” until the NULL character is reached

putting loop together


MOV R0, #stringData
STR R0, .ReadString
MOV R1, #0 //index
toLower: // start of loop
LDRB R2, [R0+R1] // read byte from array
ADD R1,R1,#1 // increment index
CMP R2, #0
BNE toLower
.Align 256
stringData: .BLOCK 256

conversion code

We now need to consider the conversion:

Only need to convert if read character is actually uppercase alphabetic

Convert only if ASCIl value is between 65 and 90

If character is uppercase alphabetic:

Add 32 to the value
Write the new value back to the array

If character is not uppercase alphabetic:

Leave as is (i.e. skip the above)

Determine if its uppercase alphabetic


toLower: // start of loop
LDRB R2, [R0+R1] // read byte from array
CMP R2, #65 // check lower value limit
BLT skipConversion
CMP R2, #90 // check upper value limit
BGT skipConversion
skipConversion: // jump to here if value not uppercase
ADD R1,R1,#1 // increment index
CMP R2, #0
BNE toLower

if it’s uppercase, convert to lowercase


toLower: // start of loop
LDRB R2, [R0+R1] // read byte from array
CMP R2, #65 // check lower value limit
BLT skipConversion
CMP R2, #90 // check upper value limit
BGT skipConversion
ADD R2,R2,#32 // convert to lowercase value
STRB R2, [R0+R1] // write it back to the array
skipConversion: // jump to here if value not uppercase
ADD R1,R1,#1 // increment index
CMP R2, #0
BNE toLower

All together


MOV R0, #stringData // store array base address
STR R0, .ReadString // read in input
MOV R1, #0 // initialise index
toLower: // start of loop
LDRB R2, [R0+R1] // read byte from array
CMP R2, #65 // check lower value limit
BLT skipConversion // if lower than uppercase range, skip conversion
CMP R2, #90 // check upper value limit
BGT skipConversion // if greater than uppercase range, skip conversion
ADD R2,R2,#32 // convert to lowercase value
STRB R2, [R0+R1] // write it back to the array
skipConversion: // branch destination if value was not uppercase
ADD R1,R1,#1 // increment index
CMP R2, #0 // check if NULL character
BNE toLower // Keep looping if not NULL character
STR R0, .WriteString // write converted string to output
HALT
.Align 256
stringData: .BLOCK 128