Computer Architecture
The specifications around which a computer’s organizational layout is defined.
- Microcontroller: Embedded all in one device. Specific tasks
- Microprocessor: Processor ←> memory / timer. More generic.
RISC | CISC |
---|---|
simpler | complex |
fixed len: ‘32’ only | variable len: 32, 64 bit |
multiple reg set | single reg set |
single cycle | multi cycle |
hardware control | microprogram control |
highly pipelined | less pipelining |
only LOAD STORE | many memory instructions |
ARM
Advanced RISC Machine. It’s a family of instruction set architectures (ISAs) for computer processors. ARM processors are used in a variety of devices, including mobile phones, portable media players, and GPS navigation systems.
Features of ARM
- Conditional Instrutctions
- Load / Save Architecture
- 32 bit width
- A general shift/ALU op in a single clock-cycle
- 3 addr instruction format
Tradeoffs
- Moving data from one place to another: A common misconception is most time goes in (ALU work)
- Used to calc address/data of where the program is stored (1).
- The RISC compiler bridges the gaps, We should also design a good ISA
Data movement | 43% |
---|---|
Control Flow (branching) | 23% |
ALU | 15% |
Comparison | 13% |
Logical | 5% |
Instead, we can start a new fetch phase after the first decode is in progress: 3-Stage Pipeline
Fetch | Decode | Execute |
---|---|---|
one → | by→ | one→ |
- Concurrency: via Pipelining
- Caching: To reduce average time for frequently used data
- Super Scaling → HPCA
ARM Instructions
- Shortform:
ADD
,SUB
Condition (modifier):EQ
,EG
,MI
,GT
,LE
- {S} optional suffix: Sets
N
,O
,C
,V
,Z
- {Rd}: Reg Destination
- Operand 1 and 2
- Either register or immediate value
- Flexible: Can be immediate value or a register with optional shift
They can be classified as:
- Data Proc:
MOV, ADD, SUB
- Data Transfer:
LDR, STR
- Control Flow:
B, BL, BEQ, BGT
Program Structure
7 ARM modes
code | mode |
---|---|
10000 | user |
10001 | FIQ |
10010 | IRQ |
10011 | SUPER |
10111 | ABORT |
11011 | Undef |
11111 | System |
Register Windows
- Large number of registers
- Processor entry / exit moved to visible windows to give each procedure access to new registers.
- Saves state on stack, and then branch
- This reduces traffic b/w processor ←> memory
Delayed Branches
They use delayed branches so it doesn’t interrupt the smooth flow as we know a branch can result in T/F. But it isn’t great for super-scalar processors.
Status Registers (SR)
The state of CSPR → SPSR on every transition
- N: prev was -ve
- Z: produces 0
- C: carry out
- V: prev was signed bit
Flags
I = 1, disables IRQ F = 1, disables FIQ T bit: (arch with thumb mode only) T= 0 (arm state) T = 1 (thumb state)
Thumb Mode: 16 bit
- Only the reg:
r0-7
are used - narrow data bus improves perf from memory
- subset of functionality of the ARM instruction set
Memory System
- 8 bit signed/unsigned
- 16 bit signed/unsigned: aligned on 2 byte memory
- 1 word signed/unsigned. aligned on 4 byte memory
- A word in ARM is 32 bit
Important
- LOAD: memory value → reg
- STORE: reg → memory
Warning
STORE [R1][R2] This is not allowed
Barrel Shifter
The barrel shifter in ARM assembly can be used to perform efficient multiplication by powers of two, sums, and differences.
- Multiplying by:
- Multiplying by:
2n + 1
→Ra = Ra + (Ra << n)
- Multiplying by
2n - 1
→Ra = (Ra << n) - Ra
Cross Bar Switch
Multiplying by 6
We can calculate 6 * Ra
as:
- Multiply
Ra
by 2 usingMOV Ra, Ra, LSL #1
. - Multiply
Ra
by 3 (which is2 * 1 + 1
) usingADD Ra, Ra, Ra, LSL #1
.
Multiplying by 45
We can calculate 45 * Ra
as:
- Multiply
Ra
by 2 usingMOV Ra, Ra, LSL #1
. - Multiply
Ra
by 22 (which is2 * 11
) usingADD Ra, Ra, Ra, LSL #1
. - Add
Ra
toRa * 22
usingADD Ra, Ra, Ra, LSL #1
. - Add
Ra
toRa * 44
to getRa * 45
.
LSL, LSR
ASR (preserves the MSB)
ROR, RRX
Logical/Arithmetic
Shifted Register Operands
It is possible to use a register to specify the number of bits to be shifted; only the bottom 8 bits are significant.
Table
C to ASM
- A = B + C;
ADD R0, R1, R2 ; A = B + C
- D = A – C; “RSB R3, R2, R0 ; D = A - C`
- F = (G + H) – (I + J) use the register
R0
toR4
as operands F to J respectively.
G = H + A [10].
Branch Instructions & Addressing Modes
Flow control instructions
B | Branch | Program Counter = Label |
---|---|---|
BL | Branch & Link | 1: PC will be copied to R14 the Link Register (LR) before branch is taken.2: Program Counter = Label |
BX | Branch Exchange | Used for changing ARM to Thumb mode or from Thumb mode to ARM mode. |
BLX | Branch Exchange with link | ^^ |
Branch Instruction- (Unconditional)
Conditional Branch Instruction
Ex: Add 2 numbers A,B
LDR: Memory → Reg STR: Reg → Memory
Ex: Sum of N numbers
Table
If A is a label in memory, R1 will hold the memory address of A,
not the value stored at A.
Addressing Half Words
Program to find the sum of N numbers using half word
Byte Data
Program to find the sum of N numbers using Byte Data
Addressing memory locations
- Memory is addressed by a register and an offset. There are 3 ways to offset!
Immediate
Register
Scaled Register
Addressing Modes
Preindexing or Preindexing without writeback
LDR Rd, [Rn, OFFSET]
Preindexing with Writeback or Autoindexing
LDR Rd, [Rn, OFFSET]!
Post indexing
LDR Rd, [Rn] ,OFFSET
Load Multiple
write back