Computer Architecture
The specifications around which a computer’s organizational layout is defined.
- Microcontroller: Embedded all in one device. Specific tasks
- Microprocessor: Processor ←> memory / timer. More generic.
RISC | CISC |
---|---|
simpler | complex |
fixed len: ‘32’ only | variable len: 32, 64 bit |
multiple reg set | single reg set |
single cycle | multi cycle |
hardware control | microprogram control |
highly pipelined | less pipelining |
only LOAD STORE | many memory instructions |
ARM
Advanced RISC Machine. It’s a family of instruction set architectures (ISAs) for computer processors. ARM processors are used in a variety of devices, including mobile phones, portable media players, and GPS navigation systems.
Features of ARM
- Conditional Instrutctions
- Load / Save Architecture
- 32 bit width
- A general shift/ALU op in a single clock-cycle
- 3 addr instruction format
Tradeoffs
- Moving data from one place to another: A common misconception is most time goes in (ALU work)
- Used to calc address/data of where the program is stored (1).
- The RISC compiler bridges the gaps, We should also design a good ISA
Data movement | 43% |
---|---|
Control Flow (branching) | 23% |
ALU | 15% |
Comparison | 13% |
Logical | 5% |
Instead, we can start a new fetch phase after the first decode is in progress: 3-Stage Pipeline
Fetch | Decode | Execute |
---|---|---|
one → | by→ | one→ |
- Concurrency: via Pipelining
- Caching: To reduce average time for frequently used data
- Super Scaling → HPCA
ARM Instructions
- Shortform:
ADD
,SUB
Condition (modifier):EQ
,EG
,MI
,GT
,LE
- {S} optional suffix: Sets
N
,O
,C
,V
,Z
- {Rd}: Reg Destination
- Operand 1 and 2
- Either register or immediate value
- Flexible: Can be immediate value or a register with optional shift
They can be classified as:
- Data Proc:
MOV, ADD, SUB
- Data Transfer:
LDR, STR
- Control Flow:
B, BL, BEQ, BGT
Program Structure
7 ARM modes
code | mode |
---|---|
10000 | user |
10001 | FIQ |
10010 | IRQ |
10011 | SUPER |
10111 | ABORT |
11011 | Undef |
11111 | System |
Register Windows
- Large number of registers
- Processor entry / exit moved to visible windows to give each procedure access to new registers.
- Saves state on stack, and then branch
- This reduces traffic b/w processor ←> memory
Delayed Branches
They use delayed branches so it doesn’t interrupt the smooth flow as we know a branch can result in T/F. But it isn’t great for super-scalar processors.
Status Registers (SR)
The state of CSPR → SPSR on every transition
- N: prev was -ve
- Z: produces 0
- C: carry out
- V: prev was signed bit
Flags
I = 1, disables IRQ F = 1, disables FIQ T bit: (arch with thumb mode only) T= 0 (arm state) T = 1 (thumb state)
Thumb Mode: 16 bit
- Only the reg:
r0-7
are used - narrow data bus improves perf from memory
- subset of functionality of the ARM instruction set
Memory System
- 8 bit signed/unsigned
- 16 bit signed/unsigned: aligned on 2 byte memory
- 1 word signed/unsigned. aligned on 4 byte memory
- A word in ARM is 32 bit
Important
- LOAD: memory value → reg
- STORE: reg → memory
Warning
STORE [R1][R2] This is not allowed
Multiplication Using Barrel Shifter
The barrel shifter in ARM assembly can be used to perform efficient multiplication by powers of two, sums, and differences.
- Multiplying by:
- Multiplying by:
2n + 1
→Ra = Ra + (Ra << n)
- Multiplying by
2n - 1
→Ra = (Ra << n) - Ra
Multiplying by 6
We can calculate 6 * Ra
as:
- Multiply
Ra
by 2 usingMOV Ra, Ra, LSL #1
. - Multiply
Ra
by 3 (which is2 * 1 + 1
) usingADD Ra, Ra, Ra, LSL #1
.
Multiplying by 45
We can calculate 45 * Ra
as:
- Multiply
Ra
by 2 usingMOV Ra, Ra, LSL #1
. - Multiply
Ra
by 22 (which is2 * 11
) usingADD Ra, Ra, Ra, LSL #1
. - Add
Ra
toRa * 22
usingADD Ra, Ra, Ra, LSL #1
. - Add
Ra
toRa * 44
to getRa * 45
.