Computer system
Reviewing Computer System
date
Nov 12, 2023
slug
reviewing-computer-system
author
status
Public
tags
revision
summary
All from week 1 to week 2 - may be some quizzes
type
Post
thumbnail
category
Computer system
updatedAt
Nov 12, 2023 09:03 AM
Data and number system (binary, hex, decimal)ANALOGUE AND DIGITALDIGITAL ADVANTAGESDIGITAL CODE - 1DIGITAL CODE - 28-bit exampleNUMBER SYSTEMSBinary numberscounting tableHEXADECIMAL NUMBERS.HEXADECIMAL NUMBERSHEX - BINARY (4 BITS TO A HEX DIGIT)BIT-WISE OPERATIONSTHE GATE SYMBOLS REPRESENT BOOLEAN ARITHMETIC:INPUTS, OUTPUTSCOMBINATIONAL CIRCUITHALF-ADDERFULL-ADDERGates are not instantaneousClockClock feeds into the ALUChoose which circuit forms the outputBuilding a Computer? Things we need:D-TYPE FLIP-FLOPD-FLIP-FLOPS AS A REGISTER OR LATCHBIG ENDIAN AND LITTLE ENDIAN DATA FORMATSUMMARY Registers SHIFT RegistersSHIFT LEFT - SHIFT RIGHT RegistersSERIAL-TO-PARALLEL CONVERSIONRIGHT-SHIFT SERIAL OR PARALLEL INPUT: 4-BITWHAT ABOUT BI-DIRECTIONAL SHIFTING (SELECTABLE)?THE LOGISIM SHIFT REGISTERCOUNTERS IN DIGITAL LOGICRIPPlE COUNTERCOUNTING DOWN?SUMMARY Ripple CounterMODULO 6 COUNTERSummary Modulo 6 counterALTERNATIVE - COUNTER WITH COMMON CLOCKMOD 6 COUNTER WITH COMMON CLOCK2-BIT COUNTERCOUNTING WITH 3 BITS (LITTLE-ENDIAN)NOW TO RESET WHEN IT GETS TO 6 (ILLEGAL STATE)...Issues The synchronous (common clock) counter3 bit Common Clock - Little Endian - No illegal stateSUMMARY The synchronous (common clock) counterBuilding a computer? Things we need:TYPES OF MEMORY - 1TYPES OF MEMORY - 2TYPES OF MEMORY - 3Memory AddressingRAM ROM SUMMARYSTACKSIN SIMPLE TERMS...SOFTWARE STACKSSUMMARY of StacksBuild a stack1. Start off with 4 D flip-flops:2. ADD THE STANDARD SIGNALSClock> CLR (resets flip flops when up in Logisim)ADD THE SERIAL INPUT (D)CONNECT EACH Q OUTPUT TO THE NEXT DATA INPUTADD SOME LEDS TO SEE THE PARALLEL OUTPUT / REGISTER
STATEHOW DO WE MAKE IT GO BACKWARDS?HOW DO WE MAKE THE DIRECTION SELECTABLE?PROGRAMMABLE WIRINGSELECTING THE OUTPUT WITH GATESLET'S PUT THIS ALL TOGETHER (JUST 2 FFS TO START WITH)SELECT INPUTADD THE OUTPUT DIRECTION SELECTION
> R circuit enabled (red)WHAT COULD POSSIBLY GO WRONG?THE FIXNEED MORE DEPTH?TO MAKE A STACK...SUMMARY build a stackAssemblyint answer = 10 + 4;Add v1Add v2Add r3SUB (works the same as ADD)LABELSStore RegisterSummary ARM BasicIF Tests: CMPDetailsConditional Branching using the APSRDetermining the comparison from the flagshow does this translate to assembly codeLooping and BranchingInstructions have addresses too !The Program Counter
CounterLabels as branch pointsUnconditional BranchBit-wise operatorsAND – Logical bit-wise ANDORR — Logical bit-wise OREOR – Logical bit-wise Exclusive ORObservations on AND,ORR,EORLSL - Logical Shift LeftLS - Logical Shift LeftLSL and LSR ObservationsMemory Addressing - LDR and STRLDR and STRLDR and STR (cont)LDRB and STRBLDRB and STRB (cont)Indirect AddressingIndirect and Indexed Memory AddressingLDR and STR memory addressingLDR Indirect exampleSTR Indirect exampleIndexed Memory AddressingIndexed Addressing SyntaxIndexed Addressing LDR ExampleIndexed Addressing STR ExampleIndexed Addressing LDRB ExampleIndexed Addressing STRB ExampleSome Observations and NotesBusy Wait Timer (Flashing LED)ARMlite Low Res Pixel Graphicssimulate a flashing LEDA "Dumb" Busy Wait TimerIncorporating timer into LED flashBusy WaitBusy Wait v1A Better Busy WaitA Better Busy Wait - psuedocodeA Better Busy WaitSolutionA Better Busy WaitRevisiting our Flashing LED ExampleFlashing LED with INT: Interrupt HandlerFlashing LED with INT: Interrupt HandlerFlashing LED with INT: Full ProgramFUNCTIONSFUNCTION BASICSFUNCTIONS IN ASMRegIster MANAGEMENTABICALLING FUNCTIONSSUMMARY FUNCTIONFUNCTIONS IN ASMPUSH, POP AND THE STACKSOFTWARE STACK?Software stack (depth only limited by RAM) RECALL THE ABIABI CONVENTIONSPassIng ARGUMENTS TO FUNCTIONSSUMMARYFUNCTIONS IN ASMRECALL OUR “FLASHING LED” PROGRAMDEFINING FUNCTIONSKEY REGISTERSHOW ARE THEY USED FOR FUNCTION CALLS?HELPFUL INSTRUCTION - BLDEFINING FUNCTIONSDEFINING DELAY FUNCTIONDELAY FUNCTIONBL AND RETWHOLE PROGRAM Program Counter and Link RegisterSUMMARY Program Counter and Link RegisterFUNCTIONS IN ASMRECALL OUR “FLASHING LED” PROGRAMBetween 4 and 18 and inside that: between 5 and 10 and between 11 and 16DRAW PIXEL FUNCTIONFLASH LOOP (NOW CALLING DRAWPIXEL)DRAW PIXEL FUNCTION (FIXED)SUMMARY - Nested Function CallsArray BasicsSome array definition for single byte arraysMemory AlignmentIterating though an arrayExtended Array Example - lower case conversionUsing ASCII to do conversionUsing ASCII to do conversion
Data and number system (binary, hex, decimal)
ANALOGUE AND DIGITAL
Analogue - Data that represented on a continuous scale (real numbers)
- e.g. voltage, height, distance, amount of magnetisation on audio tapes.
- Good for real world measurements, transients (spikes), transistors.
Digital - Data that is represented as whole numbers (count)
- e.g. class size, coins in your pocket, sound samples on CD recordings,
- Good for noise rejection, computers
DIGITAL ADVANTAGES
Digital systems are easier to:
- design and modify
- store data
- maintain accuracy and precision
- program operations
- to protect from noise
- to create on an IC chip
However the real world is mostly analogue
Translation is inaccurate.
The more bits the more precise.
DIGITAL CODE - 1
Data in computer is stored digitally as 0s and 1s in a binary code.
The 1s and 0s are represented in many ways:
- by a voltage e.g. 0 or 5 volts (actually we don't use exact voltages) eg < 0.8 volts == 0 > 2.4 volts == 1 (TTL)
- by the absence or presence of a pit (CD)
- by the absence or presence of a magnetic field (disks)
Why binary?
- Simple, robust, universal laws (Shannon-Hartley), flexible
- c.f. sending decimal using voltages.... long cables... interference.
DIGITAL CODE - 2
groups of bits can represent numbers, characters, computer instructions.
How we interpret a 32-bit word depends on what we expect! (Exercise)
- Hardware has to be able to manipulate groups of bits differently depending on what they represent.
- Amazingly, we can use combinations of Gates to do this
8-bit example
Possible interpretations:
- Character ‘1/2’ assuming extended ASCII/ISO
- The number 171
- The number -85
NUMBER SYSTEMS
- We use a "positional number system" #big-endian
- in decimal numbers, the further left a digit is, the greater the power of 10 it is multiplied by.
- e.g., 348 = 3 x 10^2 + 4 x 10^1 + 8 x 10^0
- This system applied to other Radix (or Bases)
- eg. Decimal radix = 10 digits 0..9
- Binary radix = 2 digits 0..1
- Octal radix = 8 digits 0..7
- Hexadecimal radix = 16 digits 0..F
Binary numbers
Computers work in binary number
Convert to decimal exercise:
- notation: in maths use
01110_2
to indicate a binary number
- some systems use other ways such as
%01110
orb01110
counting table
Decimal | Binary |
0 | 0000 |
1 | 0001 |
2 | 0010 |
3 | 0011 |
4 | 0100 |
5 | 0101 |
6 | 0110 |
7 | 0111 |
8 | 1000 |
9 | 1001 |
10 | 1010 |
11 | 1011 |
12 | 1100 |
13 | 1101 |
14 | 1110 |
15 | 1111 |
16 | 10000 |
17 | 10001 |
Note: In binary, each digit represents an increasing power of 2, starting from the rightmost digit which is \( 2^0 \). The leftmost digit filled in for 16 and 17 represents \( 2^4 \), or 16 in decimal, which is why we move to a new column for these numbers.
HEXADECIMAL NUMBERS.
- Hex is another abbreviation for binary numbers.
- Each byte represented by 2 hex digits; each hex digit represents 4 bits in an 8-bit byte.
- Hex to binary can be done one digit at a time: e.g., what is
in binary?
It then shows the conversion of hexadecimal 4E to its decimal equivalent:
HEXADECIMAL NUMBERS
- 0=0, 1=1, 2=2, 3=3, 4=4, 5=5, 6=6, 7=7, 8=8, 9=9
- Memorise: A=10, B=11, C=12, D=13, E=14, F=15
- mathematically use
- some systems use other ways such as
0x1110
or01110h
- Or sometimes hex is default (not decimal)
to indicate a hexadecimal number
leading 0x means hex
HEX - BINARY (4 BITS TO A HEX DIGIT)
BIT-WISE OPERATIONS
- Can base decisions on a single bit #button state
- e.g. if (bit represents True) then...
- Usually 1 means True, 0 False.
- Unary operation: NOT
- Complement ("invert", "flip") the value
- Binary* operations: AND, NAND, OR, NOR, XOR
- N-bit operations: e.g. 4-input AND, OR
- All can be implemented using binary building blocks ("2-input gates")
THE GATE SYMBOLS REPRESENT BOOLEAN ARITHMETIC:
- The symbol for AND gate followed by "> 1 AND 1" with the truth table "1.1=1, 1.0=0, 0.0=0, 0.1=0" and a red dot followed by "AND, &"
- The symbol for OR gate followed by "> 1 OR 1" with the truth table "1+1=1, 1+0=1, 0+0=0, 0+1=1" and a red plus sign followed by "OR, |"
- The symbol for XOR gate followed by "> 1 XOR 1" with the truth table "1^1=0, 1^0=1, 0^0=0, 0^1=1" and a red circled plus sign followed by "XOR, ^"
INPUTS, OUTPUTS
- Wires (just drag from one gate pin to another) can connect pins on the gates to each other and to switches and lights.
- The power supply is universal and invisible. We don't have to worry about it.
- In real life there will be a +ve rail (Vcc) and a -ve rail (GND) with a connection of each to every chip on the board.
COMBINATIONAL CIRCUIT
- A combinational circuit is a connected arrangement of logic gates with the set of inputs and outputs.
- A combinational circuit can be described by a truth table showing the binary relationship between n input variables and m output variables.
- Two examples: half adder, full adder
- These circuits serve as basic building blocks for the construction of more complicated arithmetic circuits.
HALF-ADDER
- The most basic arithmetic circuit is the addition of two binary digits.
- A combinational circuit that performs the arithmetic addition of two bits is called a half-adder.
- It consists of two inputs and two outputs:
- Two inputs get summed
- Two outputs: sum and carry bits
- It is necessary to have two output bits because the sum 1 + 1 is binary 10, which has two digits.
FULL-ADDER
- A full-adder is a combination circuit that forms the arithmetic sum of three input bits.
- It consists of three inputs and two outputs.
- Two of the inputs represent the two bits to be added.
- The third input represents the carry from the previous lower significant position.
- One of the outputs will be the carry into the next higher significant position, i.e. into the next full adder.
- Of course two output bits are necessary because the arithmetic sum of three binary digits ranges from 0 to 3, and 2 and 3 need two bits.
Gates are not instantaneous
The image contains the following text:
Ci and the output of X XOR Y arrive at different times!!!
The circuit is unstable.
If only we could synchronise things !!
Clock
Could be something simple like a 555 timer (astable multivibrator using an RC timing element), a crystal oscillator, a phase-locked loop or an atomic clock. Probably just a chip
Clock feeds into the ALU
Choose which circuit forms the output
- At the top, it says "(two logic controlled switches)"
- There are two input labels "X" and "Y" connected to the inputs of the gates.
- There is a label "Ci" circled in red, likely indicating the "Carry in" input.
- The text near the bottom reads "AND output selected if Ci is low, OR output selected if Ci is high."
- There is a label "Co" circled in blue, likely indicating the "Carry out" output.
Building a Computer? Things we need:
memory within the CPU = banks of flip-flops
Instruction Pointer_ (counter)
full adder + shift operations
Building a computer? Things we need:
Fast memory (RAM)
program and data
Central processing unit (CPU): registers, IP, ALU
Input / Output Channels
accel cards
Persistent memory (Disks)
Motherboard
Screen
Keyboard
Mouse
D-TYPE FLIP-FLOP
> Set data to high - Q goes high on next clock pulse. Stays high.
- Set presets Q to high when pulled high
- Reset clears Q when pulled high D Flip Flop clock data
D-FLIP-FLOPS AS A REGISTER OR LATCH
A register (many bits) or latch (usually one bit) can be made up from a series of D-Flip-Flops driven by a common clock.
The transfer from the D side to the Q side for all D flip flops occurs simultaneously as this clock changes.
BIG ENDIAN AND LITTLE ENDIAN DATA FORMAT
> Technical definition for the order of bytes:
- Big Endian:
- The most significant byte resides in the smallest memory address
- Little Endian:
- The least significant byte resides in the smallest memory address
SUMMARY Registers
- D Flip Flops generally form the building blocks of registers
- Registers:
- allow us to store bit strings that represent data
- reside close to the CPU, allowing fast and easy access
> Bit Endianness determines how we interpret bit strings:
- Big Endian: most significant bit comes first
- Lile Endian: least signifcant bit comes frst
> Next Lecture: ripple counters with JK Flip Flops
SHIFT Registers
A shift register takes input from one end, and at each clock change this value is moved to the next D-Flip-Flop.
This is used in serial data transfer when a byte (say) of data sent on a cable one bit after another can be collected in a series of D Flip-Flops to rebuild the whole data byte. This is called serial-to-parallel conversion.
SHIFT REGISTERS
Here is how a high travels through a 3 bit shift register. For this example we assume that each of the shift register bits is cleared at the start.
SHIFT LEFT - SHIFT RIGHT Registers
The shift registers shown so far shift data to the right. A simple rewiring gives a shift register that can move data left. Of course these may also have the ability to parallel load.
WHAT DOES A SHIFT REGISTER DO?
- Moves a state (number such as 0, 1) from a low order bit to a higher order bit.
- Multiplies a (binary) number by two.
- Number of "shifts" depends on number of clock cycles.
- Level counter (e.g. volume control)
- Can use a counter to enable/disable clock, thereby programming the amount of shift.
- Can shift in the other direction - divide numbers by 2.
- Can have two clocks - one for left-shift, one for right-shift or use gates to determine the shift direction
SERIAL-TO-PARALLEL CONVERSION
Some shift registers allow all flip-flops to load at once,
This gives parallel-to-serial conversion
RIGHT-SHIFT SERIAL OR PARALLEL INPUT: 4-BIT
Flip-flops are connected (output to input) with a common clock to cascade input from high bytes to low bytes.
WHAT ABOUT BI-DIRECTIONAL SHIFTING (SELECTABLE)?
- This takes some thinking!
- You need a pin to select which direction
- You need to allow inputs to any D Flip Flop to come from either direction
- Try and design a 2 bit selectable direction Shift Register!
- Serial input only
- You will probably need some OR gates, AND gates, and a NOT gate
THE LOGISIM SHIFT REGISTER
COUNTERS IN DIGITAL LOGIC
> A circuit that stores and increments the number of occurrences of an event:
- Most commonly clock pulses
> Two types we will consider:
- Asynchronous (ripple) counter
- Synchronous (common clock) counter
> J-K Flip Flops central to both
RIPPlE COUNTER
- Ripple counters utilise the toggle setting of J-K Flip Flops:
- Consider this circuit
- What happens to Q, each clock pulse ?:
- Ripple counters utilise the toggle setting of J-K Flip Flops:
- And finally, four FFS
- Q, Q,, Q2, Q,?
> The ripple effect of FF toggles exactly matches binary counts:
- each oscillates at half the frequency of the FF before
- The bits stored in the FFs thus represent incrementing binary values
- BUT - on which edge of the clock pulse?
COUNTING DOWN?
- What if I want to count down instead of up?
- Requires nothing more than a change of FF trigger:
- Change FF to trigger on rising edge of clock pulse
- Lets take a look
SUMMARY Ripple Counter
- Ripple counter achieves a simple binary counter
- It utilises the J-K toggle setting:
- J= 1, K=1,
> It is asynchronous because:
- No common clock
- State changes ripple through from least significant FF
› We can change directionality of counter using FF trigger:
- Falling edge = increment
- Rising edge = decrement
MODULO 6 COUNTER
> What do we need to do ?
- An incrementing counter: falling edge trigger - easy!
- A forced reset of all Flip Flops to zero when the counter reaches 6:*
some extra logic gates to monitor things, and trigger a FF reset - not too tricky
- Detect the illegal state (6)
- Reset the FFs to wrap around to zero
MODULO 6 COUNTER WITH A MOMENTARY ILLEGAL STATE
Detecting the first illegal state (6 in this case) and immediately resetting to 0 (don't wait for the clock) by using the asynchronous master reset (MR) or CLR'
- This circuit uses a cascading clock
Summary Modulo 6 counter
- Counters don't have to wrap around at powers of 2
- Can use outputs of FFs and approrpaite logic gates to detect illegal state"
- Slight problem though:
- Master reset is asynchronous
- The illegal state momentarily still happens !
ALTERNATIVE - COUNTER WITH COMMON CLOCK
- We can avoid the illegal state by detecting the last legal state (eg., 5 in a modulo 6 counter), and then set to 0 on next clock pulse.
- This requires a non-cascading counter.
- We need a common clock
- Lets build it up from 1 to 3 bits .....
MOD 6 COUNTER WITH COMMON CLOCK
- First a 1 bit counter
- 1-bit (counts 0...1...0...1...) Enable (Set)
- Set J and K to make it toggle
2-BIT COUNTER
- 2-bit (counts 00...01...10...11...00...)
- Connect Q1 to J2 and K2
- Now Q2 only changes state if Q1 is set (halves the frequency).
- but there is no Co (when state = 11)
- Because we have a common clock
COUNTING WITH 3 BITS (LITTLE-ENDIAN)
The issue is that the gates are all using the same clock, so Q3 will toggle at 3, and then the counter will reset to O Try it, You will get the number sequence: 0, 1, 2, 7, 0...
> Add an AND gate to only toggle Q3 when both Q1 and Q2 are set. That's our Carry Out (Co).
NOW TO RESET WHEN IT GETS TO 6 (ILLEGAL STATE)...
When the output is 110 (Q1=0, Q2=1, Q3=1), use a 2-input AND gate to trigger the Reset on all of the Flip-Flops.
Issues The synchronous (common clock) counter
- Currently the illegal state (6) still momentarily occurs
- Solution:
- add D-Flip-Flops as a buffer before displaying
> Adds a delay of 1 clock pulse
3 bit Common Clock - Little Endian - No illegal state
SUMMARY The synchronous (common clock) counter
> Common clock counter synchronises FF state transitions
- One clock to rule them all!
> Requires explicit logic gates check FF transitions
- Not quite as elegant as ripple counters!
> D FFs can buffer output for a clock pulse:
- Allow circuit to stabilise
- illegal state cannot "escape"
Building a computer? Things we need:
TYPES OF MEMORY - 1
ROM (read only memory): All reading at full speed. Just got the address and go there.
-contents built-in at time of manufacture.
- PROM - Programmable read only memory. Programmed by a once-only irreversible operation, e.g. in factory.
- EPROM - Erasable Programmable read only memory. Can be removed from the computer and erased and programmed (slowly) by using special apparatus (e.g. UV light).
- Sometimes called "field-programmable", i.e. "in the field"
- Bulk erased: every byte erased at the same time
- Byte programmable: write bytes one by one.
- EEPROM - Electrically erasable read only memory. Can be erased and reprogrammed byte by byte in situ, but writing is slower than normal reading.
TYPES OF MEMORY - 2
RAM random access memory.
- A misnomer it should be RWM (read write memory) - both ROM and RAM
allow random access
- Static RAM - retains information until power removed. Fast, larger area of silicon per byte, modest power requirement.
- Dynamic RAM - retains information as long as the contents are refreshed frequently enough. Smaller area of silicon per byte, low power requirement.
- Does not use flip-flops.
- Uses tiny capacitors to store electric charge.
- Because the charge leaks away have to rewrite ("refresh") every few milliseconds.
TYPES OF MEMORY - 3
> SDRAM (Synchronous DRAM)
- Hybrid of dynamic and static technology
- "clocked" by the main CPU clock
> DDR (double data rate) SDRAM
- Chips produce data on rising and falling edges of the clock
- Higher data rates, eg 64 bits / nsec
TYPES OF MEMORY - 3
FRAM - Ferroelectric RAM
- • Uses atom position in unit cell (in theory). Actual density about the same as DRAM. Non-volatile (no power / refresh to keep state).
- Used in specialist devices (where you never want to "turn it off and on again" to fix it. Random access
Flash memory - (EEPROM)
- Charge stored between insulators. Write bit by injecting electrons through a barrier layer (physically damaging it). Used in USB drives. Good for about 30,000 writes. Random access
Core memory
- Magnetic "core" memory: each bit is stored as the direction of magnetisation of tiny rings (like doughnuts). 1955-1975. 1000 x slower than DRAM.
Memory Addressing
- RAM consists of one or many chips
- RAM is organised into words (of e.g. 32 bits)
- Words are grouped into pages.
- Words are selected by an address
- A number of m bits
- Control bits specify whether to read or write
- Bits stored in the selected word go to or from the data bus
A byte is typically the smallest addressable chunk of memory. We can address the 4 bytes in a 32-bit word, but we can't address individual bits in each byte. Exceptions include Micron 3D-XPoint
RAM ROM SUMMARY
> Many different types of memory:
- ROMS variants
- RAM variants
- Trade offs of speed, space, expense, longevity
> All slower to access than registers
STACKS
- Random access memory requires knowing the address of every byte/word you want to access
- Stacks offer a way of organising and accessing memory without random (indexed) access:
- There are hardware stacks and software stacks.
- Hardware stacks created out of dedicated shift registers
- Software stacks typically defined in RAM using conventions (we'll come back to this):
IN SIMPLE TERMS...
- A stack allows us to mothball/backup/hibernate a process/ task at will on the receipt of an interrupt or code invocation.
- To do this, we
- push instructions/data that we will need later onto the stack;
- do the task;
- and then pop the stored data back off the stack and
- continue as before.
Stacks (6 deep)
Example:
a hardware stack with a depth of 6.
Let it hold one item (5) as shown
If 7 is now pushed onto the stack it becomes Push on a 0.
Pop off the O
Pop off the 7
Push a 2
Pop off the 2
If you remove another item you get the 5 and the stack is now empty.
SOFTWARE STACKS
- We can also use ordinary RAM to implement a stack.
- Stack space only limited by the amount of addressable RAM
- The "top" of the stack moves "up" and "down" as things are pushed or popped, and the location of the "top" is stored in a register (Stack pointer).
SUMMARY of Stacks
> Stacks are a fundamental data:
- Simple and efficient storage/recalling of data
- Reduces need for storing memory addresses
- Can only access data at the top of the stack
- We store data by "pushing" data onto the stack
- We recall data by "popping" data off of the stack
- Hardware stacks utilise dedicated shift registers:
- soon we are going to build one ourselves!
> Software stacks use RAM
Build a stack
- Start with a bidirectional shift register....
1. Start off with 4 D flip-flops:
2. ADD THE STANDARD SIGNALS
Clock
> CLR (resets flip flops when up in Logisim)
ADD THE SERIAL INPUT (D)
> Data is the serial input (SI)
CONNECT EACH Q OUTPUT TO THE NEXT DATA INPUT
ADD SOME LEDS TO SEE THE PARALLEL OUTPUT / REGISTER STATE
> We can now modulate the Data input and see the on state propagate through the register (left to right).
HOW DO WE MAKE IT GO BACKWARDS?
- Wire-up the cascade backwards
- Data goes in the far end, each Q outputs to the D input of the previous Flip-Flop.
HOW DO WE MAKE THE DIRECTION SELECTABLE?
> A controlled gate
Controlled input gate: Selects A if high, B if low
PROGRAMMABLE WIRING
We can use one of these circuits for each input to a Flip-Flop, and use a common control signal to determine direction.
SELECTING THE OUTPUT WITH GATES
> We also need to determine which direction output from a
Flip Flop flows!
> This circuit has a common input, and selectable output.
> We can use one of these for each output from a Flip-Flop, and use a common control signal to determine direction.
LET'S PUT THIS ALL TOGETHER (JUST 2 FFS TO START WITH)
SELECT INPUT
ADD THE OUTPUT DIRECTION SELECTION > R circuit enabled (red)
WHAT COULD POSSIBLY GO WRONG?
- Can't short the outputs of two gates together.
- Have to Add with an OR aate.
Conflict
THE FIX
The OR gates combine the outputs of the controlling AND gates and pass through the signal from whichever one is enabled... to the D inputs on each Flip-Flop.
NEED MORE DEPTH?
- This is a 4-deep shift register ( http://www.ee.usyd.edu.au/tutorials/digital_tutorial/part2/ register06.html).
- As you can see, there are alternative ways of wiring it up, with different logic gates.
TO MAKE A STACK...
- So far we have made a 2-stage bi-directional 1-bit shift register.
- To make a proper stack:
- Add depth (flip-flops with associated control logic)
- Add width (bits) in parallel (common clock, control signals).
- identical shift-registers - one for each bit.
SUMMARY build a stack
- Hardware stacks formed using banks of shift registers
- Simple input selective circuits allow direction selectability:
- Programmable gates!
- To make a proper stack:
- Add depth (flip-flops with associated control logic)
- Add width (bits) in parallel (common clock, control signals).
- identical shift-registers - one for each bit.
Assembly
int answer = 10 + 4;
Step 1 – the values 10 and 4 need to be represented somewhere in memory – probably registers
int answer = 10 + 4;
Step 2 – this plus sign needs to be mapped to some sort of instruction that tells the CPU/ALU to use it’s Adder (remember the adder circuit!) on these values
int answer = 10 + 4;
Step 3 – the output of this “adder” must be stored somewhere – again, probably a register.
int answer = 10 + 4;
Step 4 – define a location in memory big enough to hold an integer, and give it a name.
int answer = 10 + 4;
Step 5 – store the result of the addition, currently in a register, in the location associated with the symbolic name “answer”
int answer = 10 + 4;
MOV R0, #10 ; Step 1a: MOV R1, #4 ; Step 1b: ADD R3, R0, R1 ; Step 2 and Step 3 STR R3, answer ; Step 5 HALT ; STOP! answer: .word 0 ; Step 4
mov r1, #1 ;r1 = register 1, #: delimiter, #1: literal value(not a pointer) ;1: value in decimal
Add v1
add r2, #1 ;r2: to register 2
r2 = r2+1
r2++
Add v2
add r2, r6; r2: to register 2
r2 = r2 + r6
Add r3
add r2, r5, r6
r2 = r5 + r6
SUB (works the same as ADD)
sub r2, #1; sub: subtract, r2: to register 2
‘dec’ in some asm
LABELS
Labels are used to give meaningful names for:
- the location of instructions for branching,
- specify locations of data for loading and storing.
Labels can be made up of any upper or lower case alpha-numeric characters
They must start with a letter
Labels essentially allow programmers to assign meaningful names to memory address locations that will be used often.
This is how we define variables (amongst other things we'll talk about later) in assembly code.
Labels definitions are looked at by the assembler before the executable code
someplace: .word 0
: ->Colon indicates it’s a label definition
someplace → label name (whatever you want)
.word → Type (i.e. size) of data to be stored
0 →Value to initialise memory location with
Define “someplace” as a label for a memory location holding a word value of 0
Store Register
str r1, someplace
, → delimiter, reference (a pointer)
someplace: offset value in decimal
write value in r1 into the memory location associated with the label “someplace”
Summary ARM Basic
- ASM programming represents the lowest level of human readable programming
- Near 1-to-1 match with machine instructions
- RISC versus CISC
- ASM programming is useful for:
- Hardware specific programming
- Optimising code for performance
- Reverse engineering and analysing code behaviour
if (x < 2): (3)
x = 0 ← (1)
else: (4)
x = 3 ← (2)
(1) (2) ⇒ First thing to notice: we have two mutually exclusive computations to perform!
(3) (4) ⇒ Second thing to notice: we have a condition that determines which instruction to branch to ← This requires the ability to branch only when a specific condition is true
IF Tests: CMP
- In ARM assembly, the CMP (Compare) instruction allows values to be compared:
- CMP R1, R2 subtracts the 2nd value (R2) from the 15t (R1)
- The result of this subtraction is then used to update the Application Program Status Register (APSR)
- Performed by the ALU
- Specifically, 4 flag bits are updated within the APSR:
- N
ALU result was Negative
- Z ALU result was Zero
- C ALU set the Carry bit
- V ALU result caused oVerflow
The Program Status Register (PSR) combines:
- Application Program Status Register (APSR)
- Interrupt Program Status Register (IPSR)
- Execution Program Status Register (EPSR).
These registers are mutually exclusive bitfields in the 32-bit PSR. The bit assignments are:
Table 2.4. APSR bit assignments
Bits | Name | Function |
[31] | N | Negative flag |
[30] | Z | Zero flag |
[29] | C | Carry or borrow flag |
[28] | V | Overflow flag |
[27] | Q | Saturation flag |
[26:0] | - | Reserved |
Access these registers individually or as a combination of any two or all three registers, using the register name as an argument to the MSR or MRS instructions. For example:
- read all of the registers using PSR with the MRS instruction
- write to the APSR N, Z, C, V, and Q bits using APSR_nzcvq with the MSR instruction.
inspect the outcome of a CMP here
Details
cmp r2, #1234 ;cmp: compare, r2: r2: register 2
r2 - #1234 set APSR with ALU flags, store the ALU flags in the APSR
Conditional Branching using the APSR
- Branch (B) reads the APSR and jumps according to the flags and the relevant suffix you give it.
- In ARMlite, there are five variants of Branch:
- B - unconditional branch
- BEQ 'Branch if EQual'
- BGT 'Branch if Greater Than'
- BLT 'Branch if Less Than'
- BNE
'Branch if Not Equal'
- In the ARM instruction set more generally, there are many others
Determining the comparison from the flags
In ARM assembly, the condition code suffix can be added to many operations. e.g. movne r1,#12
Suffix | Flags | Meaning |
EQ | Z set | Equal |
NE | Z clear | Not equal |
CS or HS | C set | Higher or same (unsigned >= ) |
CC or LO | C clear | Lower (unsigned < ) |
MI | N set | Negative |
PL | N clear | Positive or zero |
VS | V set | Overflow |
VC | V clear | No overflow |
HI | C set and Z clear | Higher (unsigned >) |
LS | C clear or Z set | Lower or same (unsigned <=) |
GE | N and V the same | Signed >= |
LT | N and V differ | Signed < |
GT | Z clear, N and V the same | Signed > |
LE | Z set, N and V differ | Signed <= |
how does this translate to assembly code
if (x <= 2): x = 0
LDR R0,X // assume X is holding a value CMP R0,#2 // compare contents of R0 with #2 BGT skip // if R0 > 2 then jump to skip MOV R0,#0 // if R0 <= 2 then assign R0 #0 skip: // continue program from here ...
if (x <= 2): x = 0 else: x = 3
LDR R0,X // assume X is holding a value CMP R0,#2 // compare contents of R0 with #2 BGT else // if R0 > 2 then jump to else MOV R0,#0 // case for R0 <= 2 B cont // Branch to label cont else: MOV R0,#3 // case for R0 > 2 cont:
if (x <= 2): x = 0 else if (x <= 4): x = 2 else x = 4
LDR R0,X // assume X is holding a value CMP R0,#2 // compare contents of R0 with #2 BGT else1 // if R0 > 2 then jump to label else MOV R0,#0 // case for R0 <= 2 B cont // Branch to label cont else1: // if R0 > 2 CMP R0,#4 // compare contents of R0 with #4 BGT else2 // if R0 > 4 then jump to label else2 MOV R0, #2 // otherwise, handled case for R0 > 2 <= 4 B cont // Branch to label cont else2: // if R0 > 4 MOV R0,#4 // case for R0 > 4 cont: // exit point of conditionals. Continue with program from here
Looping and Branching
Recall Labels
- Labels are used to give meaningful names to locations in memory
- Previously we saw this used to define variables: Eg someplace: . word O
Defines "someplace" as a label for a memory location holding a word of value O
- But we can also use labels to mark the location of instructions in our programs !
Instructions have addresses too !
- Recall that before a program is executed, it is loaded into memory
- Every instruction therefore occupies a word of memory, and has an address like any other data
- Recall also that this is a core principle of the Von Neumann architecture
- We can therefore use the address of specific instructions to jump from one location to another
- often referred to as branching
The Program Counter
• To understand labels and branching we need to understand how the
CPU keeps track of which instruction to execute next
• It uses the special register, the Program Counter (PC) register
Counter
- After each instruction is executed, the PC register is updated with the address of the next instruction to execute.
- This update is performed by a hardware counter within the ALU
Labels as branch points
MOV R0, #0 Loop: ;Label referring to the address of the next instruction ADD R0,R0,#1 B Loop;A Branch instruction, telling the program counter that the next ;instruction to execute is at address of label ‘Loop’ HALT
Unconditional Branch
B someplace; This can be any word aligned address, ;but is almost always given as a label, defined elsewhere in the program
Branch to the address represented by the label someplace
Bit-wise operators
- Instructions that manipulate individual bits given to them in their operands
- Generally best understood by considering values in their binary form
- Many of these will feel familiar from the first half of this unit
AND – Logical bit-wise AND
- Syntax:
AND R3, R1, R2
- Performs a bit-wise logical AND on the two inputs given in the second and third operands
- Stores the result in the destination, (first operand)
- Eg,
R1 "AND" R2":
ORR — Logical bit-wise OR
- Syntax:
ORR R3, R1, R2
- Performs a bit-wise logical OR on the two inputs given in the second and third operands
- Stores the result in the destination, (first operand)
- Eg,
R1 "ORR" R2":
EOR – Logical bit-wise Exclusive OR
- Syntax:
EOR R3, R1, R2
- Performs a bit-wise logical XOR on the two inputs given in the second and third operands
- Stores the result in the destination, (first operand)
- Eg, H
R1 "EOR" R2":
Observations on AND,ORR,EOR
- Have clear analogies to the logical gates we have seen previously
- Generally useful for specific bit manipulations to control hardware, as well as bit masking (e.g, to extract part but not all of a 32 bit string).
- Also for performing comparisons between, or merging bit strings.
LSL - Logical Shift Left
- Syntax:
LSL R1, R1, #3
- Shifts each bit of the input value to the left, by the number of places specified in the third operand, losing the leftmost bits, and adding zeros on the right.
- Stores the result in the destination, (first operand)
Eg,
If R1 = #0b01101110, and shift = #3
R1 becomes #0b01110000
LS - Logical Shift Left
- Syntax:
LSR R1, R1, #3
- Shifts each bit of the input value to the right, by the number of places specified in the third operand, losing the rightmost bits, and adding zeros on the left.
- Stores the result in the destination, (first operand)
Eg,
If R1 = #0b01101110, and shift = #3
R1 becomes #0b00001101
LSL and LSR Observations
- Assuming Little Endian notation:
- A single LSL doubles the value represented by the bit string
- A single LSR halves the value represented by the bit string
- Both instructions invoke a shift register to perform task in hardware
- As with all bit-wise operators, they can be performed in one CPU cycle.
- Form the backbone for implementing multiplication and division operations, which typically require multiple CPU cycles to calculate (we'll come back to this later)
Memory Addressing - LDR and STR
A common workflow
- The CPU and ALU work with values stored locally in registers
- A standard workflow for operating on values in memory:
- Load value(s) into registers
- Perform operations on values in registers
- Write result to register(s)
- Store value(s) back to memory
LDR and STR
- LDR ('Load Register') copies a value from a memory location into a register
- STR ('Store Register) copies a value from a register into a memory location
- E.g.:
- LDR RO, Oxfc loads the contents of the word starting at memory address 0x000fc into RO.
- STR R1,120 stores the contents of R1 into the word starting at memory address
- (decimal) 120.
- STR R3, myLabel stores the contents of R3 into the word at the address represented by label myLabel.
- LDR R4, myLabel loads the contents of the word at the address represented by label myLabel
LDR and STR (cont)
- The address passed to LDR/STR must be 'word aligned'
- that is, it must be divisible by 4
- Therefore:
- LDR R3, 0x81 is an invalid instruction because address 0x00081 is not a valid word address (it is not divisible by four).
- The Assembler will trap this.
LDRB and STRB
- LDR and STR have byte addressable variants!
- LDRB will load just a single byte of memory
- into the least significant 8 bits of a register - setting the other bits to 0
- STRB will store the least significant 8-bits of a register into a specified
- byte of memory
- not altering any neighbouring bytes
LDRB and STRB (cont)
- LDRB and STRB do not require the specified memory address to be word aligned (divisible by four).
- For example
- LDRB R3,0x81 is a valid instruction, and will load the contents of the single byte at address 0x00081 into the least significant 8 bits of R3.
Indirect Addressing
- LDR and STR also support 'indirect addressing'
- Indirect addressing allows memory to be addressed via registers holding the address
- In some programming languages (e.g., C), this is the basis of 'pointers'
- a memory word that holds the address of another memory word
- In ARM assembly:
- LDR R0,[R1] will load into RO, the contents of the memory address that is currently held in R1.
- STR R3,[R4] will store the value in R3 to the memory word with address currently held in R4
Indirect and Indexed Memory Addressing
LDR and STR memory addressing
- We have previously discussed and seen how LDR and STR support indirect addressing
- allows memory to be addressed via registers holding the address
- For example:
- LDR RO, [R1] will load into RO, the contents of the memory address that is currently held in R1.
- STR R3, [R4] will store the value in R3 to the memory word with address currently held in R4
LDR Indirect example
MOV R1, #0x00040
LDR RO, [R1]
R1 = 0x00040
R0 = 0xffffffff
STR Indirect example
MOV RO, #Oxffffffff
MOV R1, #0×00040
STR RO, [R1]
STR: taking value into R0 and storing it into memory location,
Indexed Memory Addressing
- It is often desirable to address specific words/bytes of memory as an offset from some base address.
- For example:
- Accessing dedicated memory locations that control/interact with external hardware/systems
- Pixel display
- Arrays (blocks of memory used to collect values of the same type)
- Structs (blocks of memory used to collect values of different types)
- In such examples, there is typically a base address indicating the start of addresses dedicated to the specific use-case.
- We can therefore index specific memory locations within as an offset from this base address
Indexed Addressing Syntax
In ARM assembly
LDR R0, [R1 + #4] ; Load the value at memory address [R1] + 4 bytes into R0 STR R2, [R1 + #16] ; Store the value in R2 in memory at address [R1] + 16 bytes LDR R0, [R1 + R3] ; Load the value at memory address [R1] + R3 (no. of bytes) ;into R0 STR R2, [R1 + R4] ; Store the value in R2 in memory at address [R1] + R4 ;(no. of bytes)
Indexed Addressing LDR Example
MOV R1, #0x00040 LDR R0, [R1 + #4]
R1 = ?
R0 = ?
MOV R1, #0x00040
LDR R0, [R1 + #4]
R1 = 0x00040 → to 0x0004
LDR R0, [R1 + #4]
- #4] → 0x00040 + 4 = 0x00044
Indexed Addressing STR Example
MOV R0, #0xffffffff MOV R1, #0x00040 MOV R2, #16 STR R0, [R1 + R2]
Indexed Addressing LDRB Example
MOV R1, #0x00040 LDRB R0, [R1 + #2]
[R1 + #2]
Indexed Addressing STRB Example
Some Observations and Notes
- For word index addressing (ie STR, LDR):
- the offset number of bytes must be a multiple of 4 (ie., 4 bytes per word)
- For byte index addressing (ie STRB and LDRB):
- individual bytes can be addressed so no need for offset to be multiple of 4
- Remember ARMlite uses little endian byte addressing
- If a byte offset is larger than 4 then it will be addressing bytes in adjacent words
Busy Wait Timer (Flashing LED)
Let's simulate a flashing LED
- We can simulate a flashing LED by drawing a single pixel in ARMLite's graphical display
- ARMlite has three modes of pixel graphics display:
- Low resolution (32 × 24 pixels)
- Medium resolution (64 × 48 pixels)
- High resolution (128 × 96 pixels)
- For this we will use the low resolution modality
ARMlite Low Res Pixel Graphics
- In ARMLite low-res modality, there are 32 × 24 = 768 pixels.
- The display window is mapped to a contiguous block of memory.
- Pixel colours are specified as 24 bit Red-Green-Blue values (8 bits per colour component)
- Eg Bright Green is 0x000FF000, White is Ox00FFFFFF
- In ARMlite only the least significant 24 bits of each word are used
- Each pixel therefore has an address, and can be written to (STR) or read from (LDR) like any memory location
ARMlite Low Res Pixel Graphics
- In low-res mode, each pixel is assigned a label, starting from top left and moving across each row:
- Pixelo, Pixel1, Pixel2 .....Pixel767
- For convenience, labels pre-exist for standard colour codes:
MOV RO, #.green // set RO to RGB for green STR RO, •Pixel367 // write RO to address of Pixel 367
simulate a flashing LED
- Issue - the flashing is too fast !
- In fact, in a real system this would probably not produce a flash at all, just a dimmer LED
MOV R0, #.green MOV R1, #.white flash: STR R0, .Pixel367 ;(1) STR R1, .Pixel367 ;(2) ;The issue is between (1) and (2) -> we need to insert a delay ;between the “ON” and the “OFF" ->we need a timer! B flash
A "Busy Wait" Timer
- Pseudocode:
- Set up pixels for display
- Loop1:
- Turn LED on (paint green pixel)
- Busy wait
- Turn LED off (paint white pixel)
- Busy wait
- branch to loop1
A "Dumb" Busy Wait Timer
- Busy wait:
- a loop that executes until it reaches a certain value
- keep CPU occupied
set limit value // some large number
initialise counter to 0
timer:
- Add 1 to counter
- Compare counter to limit value
- branch to timer if counter < limit
So we need a conditional branch as part of the timer loop !
MOV R2, #5000000 // init number of iterations for timer MOV R2,R3 // MOV iterations into working register R3 timer1: SUB R3,R3,#1 // subtract 1 from R3 CMP R3, #0 // compare R3 with #0 BNE timer1 // keep looping until R3 reaches zero
Incorporating timer into LED flash
• Recall pseudo code:
Set up pixels for display
Loop1:
Turn LED on (paint green pixel)
Busy wait
Turn LED off (paint white pixel)
Busy wait
branch to loop1
Incorporating timer into LED flash
1 mov r0, #.green 2 mov r1, #.white 3 mov r2, #5000000 4 flash: 5 str r0, Pixel367 ; flash on 6 mov r3, r2 7 timer1: 8 sub r3, r3, #1 9 CMP r3, #0 10 BNE timer1 11 str r1, Pixel367 ; flash off 12 mov r3, r2 13 timer2: 14 sub r3, r3, #1 15 CMP r3, #0 16 BNE timer2 17 B flash 18 halt
Busy Wait
Busy Wait v1
- Last video we applied the “dumb” busy wait timer
MOV R2, #5000000 // init number of iterations for timer MOV R2,R3 // MOV iterations into working register R3 timer1: SUB R3,R3,#1 // subtract 1 from R3 CMP R3, #0 // compare R3 with #0 BNE timer1 // keep looping until R3 reaches zero
- Imagine you executed the code on two different CPUs?
- EG
- RPi 2B (900MHz quad-core ARM Cortex-A7 processor)
- RPi 4B (1.5GHz quad core ARM Cortex-A72 processor)
- Will the delay be the same on both?
- No !
A Better Busy Wait
- Most CPUs have access to a register that maintains a real time count.
- On the Raspberry Pi for example, a Timer register counts in 1 microsecond intervals (106 per second)
- In ARMlite, a register maintains a 1 second counter
- accessible via the pre-defined .Time label
- Eg LDR RO, •Time loads the current time in seconds (since Jan 1 2000) into RO
- We can utilise this to produce a hardware independent real-time delay
A Better Busy Wait - psuedocode
Set desired delay time in seconds
start_time = current time in seconds
loop:
now = current time in seconds
remaining_time = (now - start_time)
compare remaining_time, delay loop if remaining_time <= delay
A Better Busy Wait
- Let's allocate some variables to registers
- R2 - desired delay
- R3 - now (where the current time will be stored)
- R4 - start time
- R5 - elapsed time (R3-R4)
Solution
1 mov r0, #.green 2 mov r1, #.white 3 mov r2, #1 ; 1 sec delay time 4 flash: 5 str r0, Pixel367 ; flash on 6 LDR r3, .Time ; start time 7 timer1: 8 LDR r4, .Time ; current time 9 sub r5, r4, r3 ; elapsed time = current time - start time 10 CMP r5, r2 11 BLT timer1 12 str r1, Pixel367 ; flash off 13 LDR r3, .Time ; start time 14 timer2: 15 LDR r4, .Time ; current time 16 sub r5, r4, r3 ; elapsed time = current time - start time 17 CMP r5, r2 18 BLT timer2 19 B flash 20 halt
A Better Busy Wait
- Provides an absolute timer (no longer dependent on processor speed!)
- However, our approach is still a form of busy wait!
- CPU still occupied for duration of delay
- Essentially this is polling the timer
- Is there a potentially even better approach ?
- Well yeah! We could implement an interrupt-based timer
- We will come back to this later in semester!
Revisiting our Flashing LED Example
Recall our flashing LED program from last week, and in particular the delay function
;; ***************************************** ;; delay function ;; inputs R0 - time delay in seconds delay: push {R3,R4,R5,R6} MOV R3, R0 ; move delay time param into R3 LDR R4, .Time ; get start time timer: LDR R5, .Time ; update time SUB R6, R5, R4 ; calc elapsed time CMP R6, R3 ; compare elapsed to delay time BLT timer pop {R3,R4,R5,R6} RET ;; *****************************************
This is a busy-wait timer, or in other words, an example of polling
There is a better way ! Using Interrupts
Now back to our Flashing LED
- Lets use ARMlite's Clock Interrupt for this !
- To set it up the interrupt handling:
- The handler will be called everytime the clock raises an interrupt
- So each time it will draw either a green ("LED on") pixel or a white ("LED off") depending on what state the LED is currently in
- This is quite different to previously where we inserted a delay between each state change!
- Now we're waiting to be told when to change state:
- Event driven !
Flashing LED with INT: Interrupt Handler
///////////////////////////////// /// interrupt handler: toggleLED // toggles state of LED (on "green" or off "white") // toggleLED: PUSH {RO} CMP R3, #O // check state BNE off MOV R3, #1 // if R3 0, then 1 MOV RO, #.green B drawpixel off: MOV R3, #0 // if R3 1, then O MOV RO, #.white drawpixel: STR RO, .Pixel367 // draw the pixel POP {RO} RFE
Flashing LED with INT: Interrupt Handler
CMP R3, #O ; check state -> Check next state to enact (on or off) BNE off ;next state is “off” ? ;Then turn LED off off: MOV R3, #0 ; if R3 1, then 0 MOV R0, #.white MOV R3, #1 ; if R3 0, then 1 - next state is “ON” ? ;Then turn LED ON MOV R0, #.green drawpixel: STR R0, .Pixel367 ; draw the pixel - Draw the pixel (R0 will be set as ;either green or white)
PUSH {R0} POP {R0}
Flashing LED with INT: Full Program
1|// Set up Interrupt handling 2| MOV R0,#toggleLED 3| STR R0,.ClockISR 4| MOV R0,#1000 5| STR R0,.ClockInterruptFrequency 6| MOV R0,#1 7| STR R0,.InterruptRegister //Enable all interrupts 8| MOV R3, #1 // R3 keeps track of state of LED 9|mainProgram: 10| B mainProgram //Here, just an empty loop! 11|///////////////////////////////// 12|/// interrupt handler: toggleLED 13|// toggles state of LED (on "green" or off "white") 14|// 15|toggleLED: 16| PUSH {R0} 17| CMP R3, #0 // check state 18| BNE off 19| MOV R3, #1 // if R3 0, then 1 20| MOV R0, #.green 21| B drawpixel 22|off: 23| MOV R3, #0 // if R3 1, then 0 24| MOV R0, #.white 25|drawpixel: 26| STR R0, .Pixel367 // draw the pixel 27| POP {R0} 28| RFE
Line 9 and 10: This is just an infinite loop because we are only flashing LED and there is nothing else to do but wait for clock interrupts, however this could potentially be doing other useful things
FUNCTIONS
- Functions/methods/procedures/sub-routines:
- A callable block of organised, re-usable code
- Typically single action
- accepts arguments (ie parameters)
- Eg in C:
int add(int x, inty) { int sum = x + y; return(sum); }
FUNCTION BASICS
- When a function is called:
- Arguments need to be placed somewhere the function can access
- program control shifts to the function's instructions
- When a function completes:
- Return value needs to placed somewhere for the calling function to retrieve
- program control shifts back to the instruction immediately after where it was called from
- Managing this requires a some house keeping needed !
- High level programming languages hide most of this !
- Not ASM!
FUNCTIONS IN ASM
- Not 'native' to assembly
- We need to do a lot of the management ourselves
- Argument passing:
- How do we pass arguments from one function to another
- Storing and recalling register values
- each function we call will want to use the same registers (only 13 general purpose registers !)
- How do we manage this?
- Managing the program control
- Jumping from one function to another, and then returning back !
RegIster MANAGEMENT
- Application Binary Interface (ABI) sets standard way of using ARM registers.
- ro-r3 used for function arguments and return values
- r4-r12 promised not to be altered by functions
- Ir and sp used for stack management
- pc is the next instruction - we can use it to exit a function call
ABI
Register | Brief | Preserved | Rules |
ro | Argument and result | No | r0 and r1 are used for passing the first two arguments to functions, and returning the results |
r1 | Argument and result | No | of functions. If a function does not use them for a return value, they can take any value after a function. |
r2 | Argument | No | r2 and r3 are used for passing the second two arguments to functions. There values after a |
r3 | Argument | No | function is called can be anything. |
CALLING FUNCTIONS
- By convention, the first two function arguments are loaded into r0 and r1.
- The next two are put into r2 and r3.
- The return value of the function is written into r0 and r1 (lowest word in ro).
- The function promises not to alter r4-r12.
- … but suppose the function needs to use many registers to do calculations...
SUMMARY FUNCTION
- Functions are the building blocks of programs:
- Organised, re-usable blocks of code
- Higher level programming languages have built in support for functions:
- Not ASM!
- One thing we need to manage is register use
- Application Binary Interface (ABI) defines conventions for the use of registers
- Next lecture:
- How do we store and recall register values ? With a stack of course !
FUNCTIONS IN ASM
- Not 'native' to assembly
- We need to do a lot of the management ourselves
- Argument passing:
- How do we pass arguments from one function to another
- Storing and recalling register values
- each function we call will want to use the same registers (only 13 general purpose registers !)
- How do we manage this?
- Managing the program control
- Jumping from one function to another, and then returning back !
PUSH, POP AND THE STACK
- ARM computers have a software stack*.
- A separate area of RAM is available for temporary values.
- A value in a register can be pushed onto the stack to preserve it for later.
- It can be popped off later (in LIFO order).
- We can get the memory location (a pointer to it) by checking the SP (R13) register.
SOFTWARE STACK?
- A section of RAM managed by the SP (stack pointer) register.
- A sort of 32-bit (64-bit in ARM8) wide array which starts (element 0) high in RAM and grows down as values are added to it.
- The stack pointer stores the memory location of the last value added (pushed) to the stack.
- Each push decrements SP by 4 (4 bytes per word).
- A pop operation removes the last value in the stack and increments the SP by 4 (4 bytes per word)
Software stack (depth only limited by RAM)
Example:
x = don’t care.
The stack is now empty. [SP] points to end of stack.
EXAMPLE SYNTAX
- Push and pop accept multiple registers if in a 1, ,,...y list push {r4,r5} ;back them up onto the stack ;use r4 and r5 for something else pop {r4,r5} ;restore them from the stack
- Alternatively, do one at a time (but pop in reverse order)
stack: Correct order is preserved for {lists}
RECALL THE ABI
- Application Binary Interface (ABI) sets standard way of using ARM registers.
- r0-r3 used for function arguments and return values
- r4-r12 promised not to be altered by functions
- Ir and sp used for stack management
- pc is the next instruction - we can use it to exit a function call
ABI CONVENTIONS
- ABI compliant functions:
- Use ro-r3 for passing and returning values to functions
- Promise not to alter r4-r12
- ... but suppose the function needs to use many registers to do calculations ??
- We can use the stack to store and recall register values !
PassIng ARGUMENTS TO FUNCTIONS
- To re-use the registers we need to:
- Back up registers we need to re-use in a function
- Store arguments for the function in r0-r3
- Call the function
- Read the return values from r0-r1 (optional)
- Restore the registers we backed up.
SUMMARY
- Software Stack:
- Dedicated RAM used to store values FILO
- Special register "sp" used to store address of start of the stack
- Stacks allow us to store and recall register values efficiently
- Stacks integral to functions:
- We need to store and recall register values so we don't run out of registers to use!
FUNCTIONS IN ASM
- Not 'native' to assembly
- We need to do a lot of the management ourselves
- Argument passing:
- How do we pass arguments from one function to another
- Storing and recalling register values
- each function we call will want to use the same registers (only 13 general purpose registers !)
- How do we manage this?
- Managing the program control
- Jumping from one function to another, and then returning back !
RECALL OUR “FLASHING LED” PROGRAM
1 mov r0, #.green 2 mov r1, #.white 3 mov r2, #1 ; 1sec delay time 4 flash: 5 str r0, Pixel367 ; flash on 6 LDR r3, .Time ; start time 7 timer1: 8 LDR r4, .Time ; current time 9 sub r5, r4, r3 ; elapsed time = current time - start time 10 CMP r5, r2 11 BLT timer1 12 str r1, Pixel367 ; flash off 13 LDR r3, .Time ; start time 14 timer2: 15 LDR r4, .Time ; current time 16 sub r5, r4, r3 ; elapsed time = current time - start time 17 CMP r5, r2 18 BLT timer2 19 B flash 20 halt
DEFINING FUNCTIONS
- We're obviously duplicating code in this example, which in general we want to avoid
- Why not write a function to do this !
- Imagine we had already written a function called “delay”:
1 mov r0, #.green 2 mov r1, #.white 3 mov r2, #1 ; 1sec delay time 4 flash: 5 str r0, Pixel367 ; flash on 6 LDR r3, .Time ; start time ;BL delay: Calling function “delay” Program control jumps to Instruction ;address represented by the label delay 7 BL delay ; call delay function 8 str r1, Pixel367 ; flash off - str: Once the function is complete, ;program control returns to instruction after function call 9 LDR r3, .Time ; start time 10 BL delay ; call delay function 11 B flash 12 halt
KEY REGISTERS
- Program counter (pc, also r15):
- Holds the address of the next instruction to execute
- Link Register (Ir, also r14):
- Holds the address of instruction to return to after a function is complete
HOW ARE THEY USED FOR FUNCTION CALLS?
- Program counter (рс):
- Is updated when a branch to label (BL) is encountered
- Link Register (Ir):
- holds what was in po register before it was changed
- i.e., address of the next instruction after the function call
- brings us back to where we came from (we'd be lost otherwise!
HELPFUL INSTRUCTION - BL
- BL label$:
- causes program control to jump to labelS,
- copies next instruction to Ir so we know how to get back!
but also ....
DEFINING FUNCTIONS
Imagine we had already written a function called “delay”:
7 BL delay 10 BL delay
DEFINING DELAY FUNCTION
- So far we have only talked about how we might call a delay function
- Lets now think about writing the actual function
- Lets the write the function so that it takes a single argument:
- The number of seconds to delay
DELAY FUNCTION
delay: ;Label defining start of function in memory push {R3,R4,R5,R6} ;begin Push all registers we are about to use onto stack ;so we can restore them at the completion of function MOV R3, R0 ; move delay time param into R3 LDR R4, .Time ; get start time timer: LDR R5, .Time ; update time SUB R6, R5, R4 ; calc elapsed time CMP R6, R3 ; check elapsed time BLT timer pop {R3,R4,R5,R6} ;end Push all registers we are about to use onto stack ;so we can restore them at the completion of function RET
MOV R3, R0: This is where we accept the parameter passed in via R0. We move it into a register to work with locally
LDR R4, .Time ; get start time timer: LDR R5, .Time ; update time SUB R6, R5, R4 ; calc elapsed time CMP R6, R3 ; check elapsed time BLT timer
RET
BL AND RET
- BL and RET instructions work as a pair
- BL (Branch to Label) essentially does this:
MOV LR,PC; copy current next instruction to LR MOV PC, #delay; set PC to make "delay" the next instruction
MOV LR,PC; copy current next instruction to LR
MOV PC, #delay; set PC to make "delay" the next instruction
BL AND RET
- RET (ie., RETurn), is how we exit functions, and return where we came from.
- RET essentially does this:
- MOV PC, LR
- It copies the original next instruction (ie., the one we first copied into LR before we called the function) back to PC so that it is the next instruction executed.
WHOLE PROGRAM Program Counter and Link Register
1 mov R0, #.green 2 mov R1, #.white 3 mov R2, #1 ; 1sec delay time 4 flash: 5 str R0, Pixel367 ; flash on 6 LDR R3, .Time ; start time 7 push {R0} 8 MOV R0, R2 9 BL delay ; call delay function 10 pop {R0} 11 str R1, Pixel367 ; flash off 12 LDR R3, .Time ; start time 13 push {R0} 14 MOV R0, R2 15 BL delay ; call delay function 16 pop {R0} 17 B flash 18 halt 19 delay: 20 push {R3,R4,R5,R6} 21 MOV R3, R0 ; move delay time param into R3 22 LDR R4, .Time ; get start time 23 timer: 24 LDR R5, .Time ; update time 25 SUB R6, R5, R4 ; calc elapsed time 26 CMP R6, R3 27 BLT timer 28 pop {R3,R4,R5,R6} 29 RET
SUMMARY Program Counter and Link Register
- Function calls require branching to a different instruction address
- Use bl to branch to a label
- Use RET to return back to calling code
- Program counter (pc) and Link Registers:
- pc: address of the next instruction to execute
- Ir: address of instruction to return to after a function is complete
FUNCTIONS IN ASM
- Managing the program control
- Jumping from one function to another, and then returning back!
RECALL OUR “FLASHING LED” PROGRAM
1 mov R0, #.green 2 mov R1, #.white 3 mov R2, #1 ; 1sec delay time 4 flash: 5 str R0, .Pixel367 ; flash on 6 LDR R3, .Time ; start time 7 push {R0} 8 MOV R0, R2 9 BL delay ; call delay function 10 pop {R0} 11 str R1, .Pixel367 ; flash off 12 LDR R3, .Time ; start time 13 push {R0} 14 MOV R0, R2 15 BL delay ; call delay function 16 pop {R0} 17 B flash 18 halt 19 delay: 20 push {R3,R4,R5,R6} 21 MOV R3, R0 ; move delay time param into R3 22 LDR R4, .Time ; get start time 23 timer: 24 LDR R5, .Time ; update time 25 SUB R6, R5, R4 ; calc elapsed time 26 CMP R6, R3 ; compare elapsed to delay time 27 BLT timer 28 pop {R3,R4,R5,R6} 29 RET
Last lecture we created a function to call for delays
Line 9 and 11, between line 19 and 29
Between 4 and 18 and inside that: between 5 and 10 and between 11 and 16
- But we can decompose this further!
- Let's write a flash function, and pass it a colour to flash, and a time delay between flashes.
- The only difference between these blocks is the colour of the pixel being drawn.
- We can write this as a function and pass the colour (and delay time) as parameters
DRAW PIXEL FUNCTION
drawpixel: PUSH (R3, R4} MOV R3, RO ; copy pixel colour to R3 MOV R4, R1 ; copy delay time to R4 STR R3, .Pixel367; draw pixel with colour PUSH {RO} MOV RO, R4 ; pass delay time to delay function BL delay ; call delay function Pop {RO} RET ; job done
FLASH LOOP (NOW CALLING DRAWPIXEL)
mov R2, #1 ; 1sec delay time flash: MOV R0, #.green ; pass green to drawpixel MOV R1, R2 ; pass time delay BL drawpixel ; "flash on“ MOV R0, #.white ; pass white to drawpixel MOV R1, R2 ; pass time delay BL drawpixel ; "flash off" B flash HALT
DRAW PIXEL FUNCTION (FIXED)
drawpixel: PUSH {R3,R4} MOV R3, R0 ; copy pixel colour to R3 MOV R4, R1 ; copy delay time to R4 STR R3, .Pixel367 ; draw pixel with colour PUSH {R0} MOV R0, R4 ; pass delay time to delay function PUSH {LR} ; backup LR before BL delay overwrites it! BL delay ; call delay function POP {LR} ; restore LR after we’ve returned from delay Pop {R0} RET ; job done
SUMMARY - Nested Function Calls
- All of this is interesting because this is exactly what compilers have to handle when translating your code in higher level languages
- Critical components include LR and PC registers,
- But also the stack !
- The stack allows functions to do what they do using all registers, and then restores what was there previously
- This is also critical to interrupt handling
- we'll come back to this
Array Basics
Array basics
- Arrays are dedicated blocks of memory for storing lists of values of the same type (ie size)
- Almost all programming languages support arrays:
- E.G in C programming:
- int somearray|10];
Defines a block of memory referred to as "somearray", which holds 10 integer (32 bit)
Array basics
int somearray[10];
Using memory in ARMlite, the above could translate to a block of memory like this.
How does this translate to ARM assembly ?
We simply define a label!
int somearray[10];
somearray: .Word 0 0 0 0 0 0 0 0 0 0
somearray[3] = 5;
We have the address of the array which needs to be stored in a register
MOV R0, #somearray
somearray[3] = 5;
We have an index to a 32 bit integer within the array.
Address of this value will be 3 x 4 = 12 bytes from start of array
We can store this in a register
MOV R0, #somearray MOV R1, #12
somearray[3] = 5;
We can keep the value to store in another register
MOV RO, #somearray MOV R1, #12 MOV R2, #5 somearray: .Word 0 0 0 0 0 0 0 0 0 0
somearray[3] = 5;
Finally, we can write the value to the array using STR
MOV R0, #somearray MOV R1, #12 MOV R2, #5 STR R2, [R0 + R1] somearray: .Word 0 0 0 0 0 0 0 0 0 0
HALT: make sure program is halted so it does not try and execute anything in the array
Some array definition for single byte arrays
For a generic array of single bytes
somebytearray: .Byte 1 3 221 56
For an array of ASCII characters (ie., a string):
somestring: .ASCIZ “hello world\n”
This directive ensures a character per byte, and the appending of a NULL character at the end of the string.
An alternative is .ASCII, which does not append the NULL character
Memory Alignment
- Memory Alignment is a common requirement:
- hardware often imposed alignment requirements to ensure integrity
- When defining byte addressable arrays, precede the array definitions with Align.
- For example:
.ALIGN 128 somestring: .ASCIZ “hello world\n”
.ALIGN N ensures the next instruction is aligned with a word address divisible by N (in general, and multiple of 4 is generally fine).
Iterating though an array
numarray[10]: {3, 2, 5, 3, 5, 6, 1, 2, 4, 9} sum = 0 i = 0 while(i < 10)do sum = sum + numarray[i] i = i + 1; end while
- Lets do this in Assembly code
- First, define the array with some labels:
arraysize: 40 numarray: .Word 3 2 5 3 5 6 1 2 4 9
- Lets do this in Assembly code
- Next, initialise some registers:
MOV R0, #numarray MOV R1, #0 // index MOV R2, #0 // sum HALT arraysize: 40 numarray: .Word 3 2 5 3 5 6 1 2 4 9
- Lets do this in Assembly code
- Next, setup a label and conditional branch for looping
MOV R0, #numarray MOV R1, #0 // index MOV R2, #0 // sum arrayloop: LDR R4,arraysize CMP R1,R4 BLT arrayloop HALT arraysize: 40 // 10 * 4 bytes numarray: .Word 3 2 5 3 5 6 1 2 4 9
- Next, insert code to access each item in array and increment index
MOV R0, #numarray MOV R1, #0 // index MOV R2, #0 // sum arrayloop: LDR R3, [R0 + R1] // access array item at current index (R1) ADD R1, R1, #4 // increment index to next 32 bit word LDR R4, arraysize CMP R1, R4 BLT arrayloop HALT arraysize: 40 // 10 * 4 bytes numarray: .Word 3 2 5 3 5 6 1 2 4 9
- Next, insert code to access each item in array and increment index
MOV R0, #numarray MOV R1, #0 // index MOV R2, #0 // sum arrayloop: LDR R3, [R0 + R1] // access array item at current index (R1) ADD R1, R1, #4 // increment index to next 32 bit word LDR R4, arraysize CMP R1, R4 BLT arrayloop HALT arraysize: 40 // 10 * 4 bytes numarray: .Word 3 2 5 3 5 6 1 2 4 9
- Finally, add each value to the accumulating “sum”
MOV R0, #numarray MOV R1, #0 // index MOV R2, #0 // sum accumulator arrayloop: LDR R3, [R0 + R1] // access array item at current index (R1) ADD R2, R2, R3 // add value to accumulator (R2) ADD R1, R1, #4 // increment index to next 32 bit word LDR R4, arraysize CMP R1, R4 BLT arrayloop HALT arraysize: 40 // 10 * 4 bytes numarray: .Word 3 2 5 3 5 6 1 2 4 9
Extended Array Example - lower case conversion
Array (and ASCIl) example - convert to lowercase
- Let's write an ASM program that reads a string from input and converts all alphabetic characters to lower case
- Algorithm:
- Store input string in array
- Iterate through the array and check each character
- If character needs converting, convert to lowercase
- If not, leave as is
- Once complete, write string to output
Using ASCII to do conversion
- Recall every character has an ASCIl code (0 - 255)
- Uppercase alphabetic characters are within the range: 65 - 90
Using ASCII to do conversion
- Recall every character has an
- ASCIl code (0 - 255)
- Uppercase alphabetic characters are within the range: 65 - 90
- Lowercase alphabetic characters are within the range: 97-122
Conversion of uppercase to lowercase requires subtracting 32 from ASCII value This will keep iterating through the loop “toLower” until the NULL character is reached We now need to consider the conversion:
• E.g. “A” (65) -> (65 + 32) -> “a” (97)
Uppercase to lowercase
First we need to define an array to store the string
.Align 256 stringData: .BLOCK 256
We then need to read an input string into the array
MOV R0, #stringData STR R0, .ReadString .Align 256 stringData: .BLOCK 256
- We now need to implement a loop that inspects each character.
- Lets start with the basic loop structures first. We need:
- A register to hold the array base address
- A register to hold the index of the current character
- A register to store the actual character value once
- A condition to determine when we have finished scanning each character
- We now need to implement a loop that inspects each character.
- Lets start with the basic loop structures first. We need:
- A Label to mark the start of the loop (so we can branch back)
tolower :
- A register to hold the array base address (we already have this in RO)
MOV RO, #stringData
- A register to hold the index of the current character (let's use R1)
MOV R1, #0 // initialise index to O
- A register to store the actual character value once (let's use R2)
LDRB R2, [RO + R1] // reads the byte at address RO+R1
- A condition to determine when we have finished scanning each character
- we can look for the "NULL" character, which is always appended to the end of any input string.
- A condition to determine when we have finished scanning each character
- we can look for the "NULL" character, which is always appended to the end of any input string
- So.
CMP R2, #0 BNE toLower
putting loop together
MOV R0, #stringData STR R0, .ReadString MOV R1, #0 //index toLower: // start of loop LDRB R2, [R0+R1] // read byte from array ADD R1,R1,#1 // increment index CMP R2, #0 BNE toLower .Align 256 stringData: .BLOCK 256
conversion code
- Only need to convert if read character is actually uppercase alphabetic
- Convert only if ASCIl value is between 65 and 90
- If character is uppercase alphabetic:
- Add 32 to the value
- Write the new value back to the array
- If character is not uppercase alphabetic:
- Leave as is (i.e. skip the above)
Determine if its uppercase alphabetic
toLower: // start of loop LDRB R2, [R0+R1] // read byte from array CMP R2, #65 // check lower value limit BLT skipConversion CMP R2, #90 // check upper value limit BGT skipConversion skipConversion: // jump to here if value not uppercase ADD R1,R1,#1 // increment index CMP R2, #0 BNE toLower
if it’s uppercase, convert to lowercase
toLower: // start of loop LDRB R2, [R0+R1] // read byte from array CMP R2, #65 // check lower value limit BLT skipConversion CMP R2, #90 // check upper value limit BGT skipConversion ADD R2,R2,#32 // convert to lowercase value STRB R2, [R0+R1] // write it back to the array skipConversion: // jump to here if value not uppercase ADD R1,R1,#1 // increment index CMP R2, #0 BNE toLower
All together
MOV R0, #stringData // store array base address STR R0, .ReadString // read in input MOV R1, #0 // initialise index toLower: // start of loop LDRB R2, [R0+R1] // read byte from array CMP R2, #65 // check lower value limit BLT skipConversion // if lower than uppercase range, skip conversion CMP R2, #90 // check upper value limit BGT skipConversion // if greater than uppercase range, skip conversion ADD R2,R2,#32 // convert to lowercase value STRB R2, [R0+R1] // write it back to the array skipConversion: // branch destination if value was not uppercase ADD R1,R1,#1 // increment index CMP R2, #0 // check if NULL character BNE toLower // Keep looping if not NULL character STR R0, .WriteString // write converted string to output HALT .Align 256 stringData: .BLOCK 128