Kasoft Software: ARM Instruction Set

ARM Instruction Set: Overview

Introduction to this Guide

The purpose of this guide is to provide you with the necessary information to allow you to:

Encode or decode ARM instructions, in order to write an assembler, disassembler or compiler
Improve your awareness of the features of the ARM instruction set
Know and understand the limitations of the current encoding

Because ARM assembler is often explained without any references to the encoding of ARM instructions, I felt it necessary to create this guide. In this overview, I also explain the features of the ARM processor in general terms, and their implications to the processor's performance and efficiency.

Kade Hansson
January 1994

Note About the Acronym ARM

In this guide, ARM stands for Advanced/Acorn RISC Machine/Microprocessor. For example, ARM2 stands for Acorn RISC Microprocessor (revision 2,) because this processor was designed for Acorn. However, ARM610 stands for Advanced RISC Microprocessor (revision 6.10,) as it was designed for Advanced RISC Machines (ARM) Limited and is used in the Apple Newton computer. Acorn also use the ARM acronym to stand for Acorn RISC Machine.

Format of this Guide

This guide is divided into separate text/HTML files, each of which contains information on a particular aspect of the encoding of the ARM instruction set.

Filename	Title
Base (txt)	Base Instructions
Condition (txt)	Conditions
Version1 (txt)	This overview
ALU/Instrs (txt)	Arithmetic, Logic and Comparative Base Instructions
ALU/Constants (txt)	Immediate Constants
ALU/ShftInstr (txt)	Barrel Shifter Subinstructions
ALU/AL1 (txt)	Unary Arithmetic and Logic Instructions
ALU/AL2 (txt)	Binary Arithmetic and Logic Instructions
ALU/CT (txt)	Comparative Instructions
ALU/MULMLA (txt)	Multiplication Instructions
Branch/BBL (txt)	Branch Instructions
Branch/SWI (txt)	SWI Instruction
FP/Constants (txt)	Floating Point Immediate Constants
FP/Precision (txt)	Floating Point Precision Codes
FP/Rounding (txt)	Floating Point Rounding Codes
FP/FPO1 (txt)	Floating Point Unary Operations
FP/FPO2 (txt)	Floating Point Binary Operations
FP/FPRT0 (txt)	Floating Point Flag Register Transfer Instructions
FP/FPRT1 (txt)	Floating Point Register Transfer Instructions
FP/FPST (txt)	Floating Point Status Transfer Instructions
FP/LDFSTF (txt)	Floating Point Load and Store Instructions
Data/LDMSTM (txt)	Multiple Register Load and Store Instructions
Data/LDRSTR (txt)	Single Register Load and Store Instructions

Where possible, a reference to a section of text which expands on information given will appear in parentheses after that information.

e.g.  0  MUL (MUL/MLA)
      1  MLA (MUL/MLA)

This file gives an overview of the ARM processor, its characteristics, its features and, most of all, its instruction set.

Introduction to the ARM

The ARM family of chips are arguably the most efficient microprocessors available at the present time (Jan 1994.) They are the first reduced instruction set microprocessors to be used in a microcomputer. Their most notable application to date is their use in innovative RISC microcomputer technology, such as the Acorn Archimedes series, Acorn UNIX workstations and the Apple Newton. This guide will deal with the ARM chip set currently used by Acorn in RISC OS computers, although much of the information given will have widespread applicability across all ARM incarnations.

The latest incarnations of the ARM belong to the ARM7 series. But because Acorn have not yet incorporated this new technology into the present series of RISC OS machines, we will limit this description of the ARM instruction set to the ARM2, ARM250 and ARM3 incarnations. For the sake of compatibility, we will only discuss instructions common to all these ARM processors.

N.B. The ARM1 is now considered to be obsolete. The only major difference
     between ARM1 and ARM2 is the lack of MUL and MLA instructions on the ARM1.

Aliases for the ARM

ARM chips all have special names which are used by Very Large Scale Integration Technology Incorporated (VLSI) to refer to them. VLSI currently manufacture all ARM processors.

Common name 	Special name	Usage

ARM1	    	VL86C00?     	Archimedes prototypes
ARM2	    	VL86C010    	Acorn A3xx, A4xx, A3000, R140
ARM250	    	VL86C0?? (?)  	Acorn A30x0, A4000
ARM3	    	VL86C020    	Acorn A540, A5000, A4, R2xx
ARM610	    	VL86C051 (?) 	Apple Newton
ARM7	    	VL86C06? (?)   	Next generation Acorn machines?

MEMC1	    	VL86C110    	Acorn A3xx
MEMC1a	    	VL86C110    	A4xx, A3000, A30x0, A4000, A5000, A4, R140, R2xx

VIDC10/VIDC1a	VL86C310    	All Acorn machines up until the A5000 alpha
VIDC20	    	VL86C3?? (?)   	Next generation Acorn machines?

IOC1	    	VL86C410    	All Acorn machines up until the A5000 alpha

Word and instruction size

The ARM2/3 is a 32-bit RISC processor with a 32-bit data bus and a 26-bit address bus. The memory architecture is based around a 32-bit word, although byte access is permitted through the use of the LDRB and STRB instructions. ARM instructions are encoded into 32 bits, which contain the base instruction opcode, the condition code, any barrel shifter subinstruction opcodes, all register specifications, any address offsets, all immediate constants and any other data, such as SWI numbers.

Features

The ARM processor incorporates many innovative features which add greatly to its performance and functionality. Features such as mutiple register load and store instructions, simultaneous execution, decoding and fetching of successive instructions, a compact and efficient instruction set, and a powerful barrel shifter ensure its efficiency. A versatile memory controller provides features which allow paged memory mapping to be employed, and a fully programmable video controller provides access to high quality audio and video facilities. Also, an impressive range of functions are provided by the input-output controller. Privileged processor modes with private register banks also improve efficiency and utility. Additionally, a floating point coprocessor is supported.

The compact and efficient RISC instruction set offers many advantages over the more traditional CISC instruction set. Due to the reduced number of instructions and the simpler encoding techniques used to generate them, they typically execute many times faster than their CISC equivalents. A CISC processor spends most of its time executing a small, simple subset of its instructions. Due to this fact, the speed advantages of the rarely used complex instructions are lost due to the speed disadvantages imposed by overly complicated processor design on the essential instructions. RISC processors also use much less power than their CISC competitiors, and are cheaper to produce.

The ARM has 27 32-bit registers, 16 of which are available at any one time. The program counter (and program status register, R15) and the link register (R14) are the only two registers which are bound by the processor hardware to specific purposes, and all other registers may be freely used. The high number of available registers means that the ARM is very well suited to complex tasks, and is often able to perform them without accessing memory as often as other processors. This a particularly important consideration when the memory is clocked slower than the CPU, as it is in most ARM-based machines.

Pipelining is the process which allows the ARM to simultaneously decode the next instruction whilst the current one is being executed and a third instruction is being fetched from memory. This process improves raw processor performance by a factor of three when in a body of code. It also improves the utilization of processor resources (reducing power consumption,) as without pipelining the circuitry which decodes an instruction would be disused while execution is taking place.

Multiple load and store instructions allow very fast memory block transfers and efficient register stacking during subroutine calls. In combination with a callee register saving protocol, these instructions offer significant speed improvements over more traditional programming techniques.

A barrel shifter is located on one of the inputs to the ALU of the ARM, and provides the option of shift and rotate operations for all arithmetic and logic instructions. This allows various optimizations to code which make frequent use of shift operations, which would not be possible using other processors.

In addition, the ARM3 chip has a 4 kilobyte fast memory cache, which reduces access times to frequently used memory locations. The caching techniques also dramatically improve loop timings, as the instructions in the loop are often copied into the cache, and so can be fetched much more quickly.

Addressing System

The addressing scheme is managed by the memory controller (MEMC,) a seperate chip which is essentially an integral part of the ARM CPU. MEMC also communicates with the input-output controller (IOC) and the video controller (VIDC,) which make up the remainder of the ARM chip set. In this discussion we are not concerned with the functions of either IOC or VIDC.

MEMC1a (as used with most ARM2/3 machines) can address up to 64 megabytes of memory (16 million words). However limitations currently imposed by RISC OS and the architecture of memory devices reduce the amount of physical memory that can actually be installed.

The following table of maximum physical memory sizes takes into account current limitations of standard Acorn machine hardware:

       Max.  Max.
       RAM   ROM

A3xx    8Mb   2Mb
A4      4Mb   2Mb
A4xx    8Mb   2Mb
A540   16Mb   2Mb
A3000   4Mb   2Mb
A30x0   4Mb   2Mb
A4000   4Mb   2Mb
A5000   8Mb   2Mb
R140    4Mb   2Mb
R2xx   16Mb   2Mb

These hardware limitations may be eliminated by future hardware upgrades (such as slave MEMC chips.)

The software limitations imposed by RISC OS also limit the amount of physical memory which is supported. At the present moment, an absolute 16Mb limit on physical RAM exists, due to RISC OS making intellegent use of the MEMC's logical to physical address translation mechanisms.

Physical memory is mapped into the present RISC OS memory map from address &2000000 onwards. The mappings from &3000000 onward include ROM, the input- output controllers, the video controller, the DMA address generators and the logical to physical address translator. These locations, with the exception of those mapped onto ROM, are only accessed by low level routines within RISC OS.

Below &2000000 on the memory map lies the logical memory. The physical memory is rarely addressed directly, and it is by addressing the logical memory that RISC OS operates. To set up the logical memory, RISC OS divides physical memory into pages. It can then map these pages randomly within the logical memory slot. The size of each of these pages is typically 8, 16 or 32K. Which of these page sizes is chosen currently depends on the total amount of RAM available, although this may change. To read the page size in use, use the SWI OS_ReadMemMapEntries (SWI &51.)

Dividing memory into pages which can be shuffled into any order is a powerful feature of MEMC as used under RISC OS. Shuffling of pages in the logical slot is achieved by manipulating 128 logical to physical memory mapping descriptors. These descriptors are held in content-addressable memory inside MEMC, and can be accessed quickly, maintaining short memory access times.

As an additional feature of the memory mapping scheme, three levels of memory protection are supported:

Protection level  Privileges

Supervisor        Privileged access to all memory
Operating System  Privileged access to logical memory (not used by RISC OS)
User              Access to unprotected pages of logical memory and read cycles
                  to addresses mapped onto the ROM

Exceptions are generated if an illegal memory access is attempted.

Processor Modes

There are four processor modes available when using ARM2/3:

Mode            Mnemonic  Normally entered             Private registers

User            USR       by default                   none (access user bank)
Supervisor      SVC       upon software interrupt      R13_svc and R14_svc
Interrupt       IRQ       upon interrupt request       R13_irq and R14_irq
Fast Interrupt  FIQ       upon fast interrupt request  R8_fiq to R14_fiq

The SVC, IRQ and FIQ modes are privileged, and provide more control over the system. These three modes also have their own private registers, which reduce time overheads (due to stacking registers) when interrupts are dealt with.

Performance

The performance of each Acorn RISC computer is given in the following table. Some of these figures have been estimated, and should not be relied on.

              ARM     ARM     RAM     ROM     ROM     Speed*  Speed*  Speed*
              CPU     Clock   Timing  (std.)  (opm.)  (avg.)  (pk.)   (sus.)

A3xx series   ARM2     8MHz   125ns   450ns   375ns     4.0     7.1     3.4
A3000         ARM2     8MHz   125ns   450ns   375ns     5.6     7.1     4.0
A30x0 series  ARM250  12MHz    83ns   333ns   333ns     7.6    12.5     6.0
A4            ARM3    25MHz    83ns   333ns   333ns    16.0    25.0    12.0
A4xx series   ARM2     8MHz   125ns   450ns   375ns     5.6     7.1     4.0
A540          ARM3    25MHz    83ns   333ns   333ns    16.0    25.0    12.0
A4000         ARM250  12MHz    83ns   333ns   333ns     7.6    12.5     6.0
A5000         ARM3    25MHz    83ns   333ns   333ns    16.0    25.0    12.0
A5000 alpha   ARM3    33MHz    83ns   333ns   333ns    20.0    32.0    15.4
R140          ARM2     8MHz   125ns   450ns   375ns     5.6     7.1     4.0
R2xx          ARM3    25MHz    83ns   333ns   333ns    14.0    25.0    12.0

* Speeds in millions of instructions per second (MIPS)

N.B. 1. Optimal ROM timings quoted for RISC OS 3.10 2Mb ROMs.
     2. Optimal timing for RISC OS 2.00 0.5Mb ROMs is 200ns.
     3. Future ROM chips may also have 166/200/250ns as optimal ROM timing.
     4. Future ARM chips may allow further memory access time optimization,
        at the user's risk.

The speed of instructions vary across the ARM instruction set, as indicated by the following table. However, compared with CISC processors, the ARM is extremely efficient.

Instruction type       Execution time

ALU (except multiply)  1 processor cycle
ALU (PC destination)   See branch
Single load/store      1 processor cycle plus 1 memory cycle
Branch                 1 memory cycle and up to 3 processor cycles
Multiple load/store    1 processor cycle, plus 1 memory cycle per register
Multiply               Up to 17 processor cycles

The ARM Instruction Set

The following table lists the assembler mnemonics of all instructions provided by the ARM2/3 processor.

Arithmetic and logic instructions

ADC ADd with Carry Rd=Rn+Rm+C
ADD ADD (without carry) Rd=Rn+Rm
SBC SuBtract with Carry Rd=Rn-Rm-(1-C)
SUB SUBtract (without carry) Rd=Rn-Rm
RSC Reverse Subtract with Carry Rd=Rm-Rn-(1-C)
RSB Reverse SuBtract (without carry) Rd=Rm-Rn
AND Bitwise AND Rd=Rn AND Rm
BIC Bitwise NAND Rd=Rn AND NOT Rm
ORR Bitwise OR Rd=Rn OR Rm
EOR Bitwise EOR (XOR) Rd=Rn EOR Rm

MOV MOVe Rd=Rm
MVN MOVe bitwise NOT Rd=NOT Rm

Comparison instructions

CMP CoMPare Rn+Rm
CMN CoMpare Negative Rn-Rm
TEQ Test EQuivalance Rn EOR Rm
TST TeST bits Rn AND Rm

Multiply instructions

MUL Multiply Rd=Rm*Rs
MLA Multiply and Accumulate Rd=Rm*Rs+Rn

Branch instructions

B Branch
BL Branch with Link

Register load and save

LDR LoaD Register
STR STore Register
LDM LoaD Multiple registers
STM STore Multiple registers

Software interrupt

SWI Perform SoftWare Interrupt

The following condition suffixes may be appended to any ARM instruction:

Flags
 AL  ALways           Always performed (the default)          TRUE
 NV  NeVer            Never use (reserved)                    undef.
 CS  Carry Set        Performed if Carry flag set             C
 CC  Carry Clear      Opposite to CS (Carry not set)          ~C
 EQ  EQual            Performed if Zero flag set       n1= n2 Z
 NE  Not Equal        Opposite to EQ (Zero flag unset) n1<>n2 ~Z
 VS  oVerflow Set     Performed if oVerflow flag set          V
 VC  oVerflow Clear   Opposite to VS (oVerflow unset)         ~V
 MI  MInus            Performed if Negative flag set          N
 PL  PLus             Opposite to MI (Negative unset)         ~N

Cardinal
 HS  Higher or Same   Same as CS (Carry set)           n1>=n2 C
 LO  LOwer            Same as CC (Carry clear)         n1< n2 ~C
 LS  Lower or Same    Performed when less or equal     n1<=n2 ~CvZ
 HI  Higher           Performed when greater than      n1> n2 C^~Z

2's-complement
 GE  Greater or Equal Performed when greater or equal  n1>=n2 (N^V)v(~N^~V)
 LT  Less Than        Performed when less              n1< n2 (N^~V)v(~N^V)
 LE  Less or Equal    Performed when less or equal     n1<=n2 (N^~V)v(~N^V)vZ
 GT  Greater Than     Performed when greater           n1> n2 ((N^V)v(~N^~V))^~Z

The shift and rotate mnemonics, available as an option on all AL instructions, are:

LSL Logical Shift Left
ASL Arithmetic Shift Left (same as LSL)
LSR Logical Shift Right
ASR Arithmetic Shift Right (sign bit 31 is rewritten after the shift)

ROR ROtate Right
RRX Rotate Right one bit with eXtend (uses carry flag as bit 32)

The following table gives the mnemonics for the various addressing modes available as suffixes to LDM and STM instructions:

DA Decrement After each store/load (post-indexed decremental form)
DB Decrement Before each store/load (pre-indexed decremental form)
IA Increment After each store/load (post-indexed incremental form)
IB Increment Before each store/load (pre-indexed incremental form)

EA Empty Ascending stack (i.e. use LDMDB and STMIA)
ED Empty Descending stack (i.e. use LDMIB and STMDA)
FA Full Ascending stack (i.e. use LDMDA and STMIB)
FD Full Descending stack (i.e. use LDMIA and STMDB), as used by RISC OS

The following suffixes may be applied to certain instructions in order to modify their normal operation:

Suffix  Applied to                           Meaning

  !     LDR/STR (address)                    Write back is used
        LDM/STM (address register)           Write back is used
  ^     LDM with R15 (register list)         Force update of PSR
        Other LDM/STM (register list)        Use user bank
  B     LDR/STR (mnemonic)                   Load/store byte value
  P     Comparative instructions (mnemonic)  Copy calculation result to PSR
  S     AL instructions with R15 dest.       Force update of PSR
        AL instructions (mnemonic)           Force update of flags
        MUL/MLA instructions (mnemonic)      Force update of flags
        Comparative instructions (mnemonic)  Force update of flags (implied)
  T     LDR/STR (mnemonic)                   Force address translation

The Floating Point Coprocessor

In addition to the standard ARM instruction set are a group of floating point coprocessor instructions. These instructions are designed to be executed by a coprocessor chip attached to the ARM CPU, such as that provided in the Acorn FPA10 upgrade. Software emulation of the coprocessor is provided by the Acorn FPEmulator module. The following table gives information on the availability and form of the FPA10 upgrade for each machine in the Acorn range.

       FPA10 upgrade
       
A3xx   card
A3000  card
A3010  N/A
A4     card
A4xx   card
A540   chip/card
A4000  N/A
A5000  chip
R140   card
R2xx   chip/card

In the floating point coprocessor there are eight floating point registers designated by F0-F7. There is also a floating point status register whose bits are as follows:

Bits   Usage

31-21  Unused
20     INX interrupt mask
19     UFL interrupt mask
18     OFL interrupt mask
17     DVZ interrupt mask
16     IVO interrupt mask
15-05  Unused
04     INX cumulative flag
03     UFL cumulative flag
02     OFL cumulative flag
01     DVZ cumulative flag
00     IVO cumulative flag

The bottom five flag bits indicate the following exceptions:

IVO InValid Operation
DVZ DiVision by Zero
OFL OverFLow
UFL UnderFLow
INX INeXact value obtained due to rounding

The interrupt mask bits, when set, cause exceptions to generate fatal errors.

The Floating Point Instruction Set

The following table lists the assembler mnemonics of the instructions provided by the floating point coprocessor (or suitable emulation.)

Data transfer instructions

LDF LoaD Floating point register
STF STore Floating point register

Register transfer instructions

FLT FLoat ARM register into an FP register
FIX FIX FP register into an ARM register
WFS Write Floating Status flags from an ARM register
RFS Read Floating Status flags to an ARM register
WFC Write Floating interrupt mask
RFC Read Floating interrupt mask

Comparison operations

CMF CoMpare Floating point numbers
CNF Compare (second argument Negated) Floating point numbers
CMFE CoMpare Floating point numbers and generate Error if unordered
CNFE Compare Negated Floating numbers and generate Error if unordered

Binary operations

ADF ADd Floating point registers Fd=Fn+Fm
MUF MUltiply Floating point registers Fd=Fn*Fm
SUF SUbtract Floating point registers Fd=Fn-Fm
RSF Reverse Subtract Floating registers Fd=Fm-Fn
DVF DiVide Floating point registers Fd=Fn/Fm
RDF Reverse Divide Floating registers Fd=Fm/Fn
POW POWer Fd=Fn^Fm
RPW Reverse POWer Fd=Fm^Fn
RMF ReMainder of Floating division Fd=remainder of Fn/Fm
FML Fast MuLtiply (Single precision) Fd=Fn*Fm
FDV Fast DiVide (Single precision) Fd=Fn/Fm
FRD Fast Reverse Divide (Single) Fd=Fm/Fn
POL POLar angle conversion Fd=ATN(Fn/Fm)

Unary operations

MVF MoVe Floating point register Fd=Fm
MNF Move Negated Floating register Fd=-Fm
ABS ABSolute value Fd=ABS(Fm)
RND RouND to integer Fd=INT(Fm)
SQT SQuare Root Fd=SQR(Fm)
LOG LOGarithm to the base 10 Fd=LOG(Fm)
LGN LoGarithm to the base e (Natural) Fd=LN(Fm)
EXP EXPonent of the base e (Natural) Fd=e^Fm
SIN SINe Fd=SIN(Fm)
COS COSine Fd=COS(Fm)
TAN TANgent Fd=TAN(Fm)
ASN Arc SiNe (Inverse sine) Fd=ASN(Fm)
ACS Arc CoSine (Inverse cosine) Fd=ACS(Fm)
ATN Arc TaNgent (Inverse tangent) Fd=ATN(Fm)

Each of the instructions listed above can be suffixed with any of the ARMs 16 condition codes. In addition, a precision suffix needs to be specified. The precision suffixes are:

                              Mantissa   Exponent

S  Single precision           23 bits     8 bits
D  Double precision           52 bits    11 bits
E  Double Extended precision  64 bits    15 bits
P  Packed BCD storage format  19 digits   4 digits

A rounding suffix may also be added. These are:

P  Round up
M  Round down
Z  Round to zero

If no rounding suffix is specified then the number will be rounded to the nearest available in the current precision.

Breakdown of an Encoded ARM Instruction

All encoded ARM instructions occupy one 32-bit word. The common elements which constitute all instructions are given below.

Bits   Size    Usage

31-28  Nybble  Condition code (Condition)
27-24  Nybble  Base instruction code (Base)
00-23  24      Depends on base instruction code

Learning about ARM Instruction Encoding

The file Base is a good starting point for those interested in exploring the encoding of ARM instructions. This file refers to other files with give more specific information on particular instruction groups. The encoding of condition codes is explained in the file Condition.txt.

Encoding an ARM Instruction

The first step to encoding an ARM instruction is to specify the condition code. The condition code is stored in the high nybble, and a list of condition codes is given in the file Condition.
Next, it is necessary to obtain the base instruction code, stored in the seventh nybble. These are listed in the file Base.
After selecting the instruction code, examine the file associated with the instruction (given in parentheses in the table of base instruction codes.) This file will explain the various elements of the bottom 3 nybbles of the instruction.

Decoding an ARM Instruction

The first step to decoding an ARM instruction is to discover the condition code. The condition code is stored in the high nybble, and a list of condition codes is given in the file Condition.
Next, it is necessary to find the base instruction code, stored in the seventh nybble. These are listed in the file Base.
After selecting the instruction code, examine the file associated with the instruction (given in parentheses in the table of base instruction codes.) This file will explain the various elements of the bottom 3 nybbles of the instruction.

This text may be freely distributed, provided that it is unmodified, it is not offered for sale at any price, published in a periodical or book or offered on a diffusion service without the author's written consent.

Kade Hansson retains copyright in all parts of this guide.

Maintained by Kade Hansson; e-mail: archer@kasoft.info; Updated on 23 February 2014; XHTML 1.0 Transitional.