|
ARM Instruction Set: OverviewIntroduction to this GuideThe purpose of this guide is to provide you with the necessary information to allow you to:
Because ARM assembler is often explained without any references to the encoding of ARM instructions, I felt it necessary to create this guide. In this overview, I also explain the features of the ARM processor in general terms, and their implications to the processor's performance and efficiency. Kade Hansson Note About the Acronym ARMIn this guide, ARM stands for Advanced/Acorn RISC Machine/Microprocessor. For example, ARM2 stands for Acorn RISC Microprocessor (revision 2,) because this processor was designed for Acorn. However, ARM610 stands for Advanced RISC Microprocessor (revision 6.10,) as it was designed for Advanced RISC Machines (ARM) Limited and is used in the Apple Newton computer. Acorn also use the ARM acronym to stand for Acorn RISC Machine. Format of this GuideThis guide is divided into separate text/HTML files, each of which contains information on a particular aspect of the encoding of the ARM instruction set.
Where possible, a reference to a section of text which expands on information given will appear in parentheses after that information. e.g. 0 MUL (MUL/MLA) 1 MLA (MUL/MLA) This file gives an overview of the ARM processor, its characteristics, its features and, most of all, its instruction set. Introduction to the ARMThe ARM family of chips are arguably the most efficient microprocessors available at the present time (Jan 1994.) They are the first reduced instruction set microprocessors to be used in a microcomputer. Their most notable application to date is their use in innovative RISC microcomputer technology, such as the Acorn Archimedes series, Acorn UNIX workstations and the Apple Newton. This guide will deal with the ARM chip set currently used by Acorn in RISC OS computers, although much of the information given will have widespread applicability across all ARM incarnations. The latest incarnations of the ARM belong to the ARM7 series. But because Acorn have not yet incorporated this new technology into the present series of RISC OS machines, we will limit this description of the ARM instruction set to the ARM2, ARM250 and ARM3 incarnations. For the sake of compatibility, we will only discuss instructions common to all these ARM processors. N.B. The ARM1 is now considered to be obsolete. The only major difference between ARM1 and ARM2 is the lack of MUL and MLA instructions on the ARM1. Aliases for the ARMARM chips all have special names which are used by Very Large Scale Integration Technology Incorporated (VLSI) to refer to them. VLSI currently manufacture all ARM processors. Common name Special name Usage ARM1 VL86C00? Archimedes prototypes ARM2 VL86C010 Acorn A3xx, A4xx, A3000, R140 ARM250 VL86C0?? (?) Acorn A30x0, A4000 ARM3 VL86C020 Acorn A540, A5000, A4, R2xx ARM610 VL86C051 (?) Apple Newton ARM7 VL86C06? (?) Next generation Acorn machines? MEMC1 VL86C110 Acorn A3xx MEMC1a VL86C110 A4xx, A3000, A30x0, A4000, A5000, A4, R140, R2xx VIDC10/VIDC1a VL86C310 All Acorn machines up until the A5000 alpha VIDC20 VL86C3?? (?) Next generation Acorn machines? IOC1 VL86C410 All Acorn machines up until the A5000 alpha Word and instruction sizeThe ARM2/3 is a 32-bit RISC processor with a 32-bit data bus and a 26-bit address bus. The memory architecture is based around a 32-bit word, although byte access is permitted through the use of the LDRB and STRB instructions. ARM instructions are encoded into 32 bits, which contain the base instruction opcode, the condition code, any barrel shifter subinstruction opcodes, all register specifications, any address offsets, all immediate constants and any other data, such as SWI numbers. FeaturesThe ARM processor incorporates many innovative features which add greatly to its performance and functionality. Features such as mutiple register load and store instructions, simultaneous execution, decoding and fetching of successive instructions, a compact and efficient instruction set, and a powerful barrel shifter ensure its efficiency. A versatile memory controller provides features which allow paged memory mapping to be employed, and a fully programmable video controller provides access to high quality audio and video facilities. Also, an impressive range of functions are provided by the input-output controller. Privileged processor modes with private register banks also improve efficiency and utility. Additionally, a floating point coprocessor is supported. The compact and efficient RISC instruction set offers many advantages over the more traditional CISC instruction set. Due to the reduced number of instructions and the simpler encoding techniques used to generate them, they typically execute many times faster than their CISC equivalents. A CISC processor spends most of its time executing a small, simple subset of its instructions. Due to this fact, the speed advantages of the rarely used complex instructions are lost due to the speed disadvantages imposed by overly complicated processor design on the essential instructions. RISC processors also use much less power than their CISC competitiors, and are cheaper to produce. The ARM has 27 32-bit registers, 16 of which are available at any one time. The program counter (and program status register, R15) and the link register (R14) are the only two registers which are bound by the processor hardware to specific purposes, and all other registers may be freely used. The high number of available registers means that the ARM is very well suited to complex tasks, and is often able to perform them without accessing memory as often as other processors. This a particularly important consideration when the memory is clocked slower than the CPU, as it is in most ARM-based machines. Pipelining is the process which allows the ARM to simultaneously decode the next instruction whilst the current one is being executed and a third instruction is being fetched from memory. This process improves raw processor performance by a factor of three when in a body of code. It also improves the utilization of processor resources (reducing power consumption,) as without pipelining the circuitry which decodes an instruction would be disused while execution is taking place. Multiple load and store instructions allow very fast memory block transfers and efficient register stacking during subroutine calls. In combination with a callee register saving protocol, these instructions offer significant speed improvements over more traditional programming techniques. A barrel shifter is located on one of the inputs to the ALU of the ARM, and provides the option of shift and rotate operations for all arithmetic and logic instructions. This allows various optimizations to code which make frequent use of shift operations, which would not be possible using other processors. In addition, the ARM3 chip has a 4 kilobyte fast memory cache, which reduces access times to frequently used memory locations. The caching techniques also dramatically improve loop timings, as the instructions in the loop are often copied into the cache, and so can be fetched much more quickly. Addressing SystemThe addressing scheme is managed by the memory controller (MEMC,) a seperate chip which is essentially an integral part of the ARM CPU. MEMC also communicates with the input-output controller (IOC) and the video controller (VIDC,) which make up the remainder of the ARM chip set. In this discussion we are not concerned with the functions of either IOC or VIDC. MEMC1a (as used with most ARM2/3 machines) can address up to 64 megabytes of memory (16 million words). However limitations currently imposed by RISC OS and the architecture of memory devices reduce the amount of physical memory that can actually be installed. The following table of maximum physical memory sizes takes into account current limitations of standard Acorn machine hardware: Max. Max. RAM ROM A3xx 8Mb 2Mb A4 4Mb 2Mb A4xx 8Mb 2Mb A540 16Mb 2Mb A3000 4Mb 2Mb A30x0 4Mb 2Mb A4000 4Mb 2Mb A5000 8Mb 2Mb R140 4Mb 2Mb R2xx 16Mb 2Mb These hardware limitations may be eliminated by future hardware upgrades (such as slave MEMC chips.) The software limitations imposed by RISC OS also limit the amount of physical memory which is supported. At the present moment, an absolute 16Mb limit on physical RAM exists, due to RISC OS making intellegent use of the MEMC's logical to physical address translation mechanisms. Physical memory is mapped into the present RISC OS memory map from address &2000000 onwards. The mappings from &3000000 onward include ROM, the input- output controllers, the video controller, the DMA address generators and the logical to physical address translator. These locations, with the exception of those mapped onto ROM, are only accessed by low level routines within RISC OS. Below &2000000 on the memory map lies the logical memory. The physical memory is rarely addressed directly, and it is by addressing the logical memory that RISC OS operates. To set up the logical memory, RISC OS divides physical memory into pages. It can then map these pages randomly within the logical memory slot. The size of each of these pages is typically 8, 16 or 32K. Which of these page sizes is chosen currently depends on the total amount of RAM available, although this may change. To read the page size in use, use the SWI OS_ReadMemMapEntries (SWI &51.) Dividing memory into pages which can be shuffled into any order is a powerful feature of MEMC as used under RISC OS. Shuffling of pages in the logical slot is achieved by manipulating 128 logical to physical memory mapping descriptors. These descriptors are held in content-addressable memory inside MEMC, and can be accessed quickly, maintaining short memory access times. As an additional feature of the memory mapping scheme, three levels of memory protection are supported: Protection level Privileges Supervisor Privileged access to all memory Operating System Privileged access to logical memory (not used by RISC OS) User Access to unprotected pages of logical memory and read cycles to addresses mapped onto the ROM Exceptions are generated if an illegal memory access is attempted. Processor ModesThere are four processor modes available when using ARM2/3: Mode Mnemonic Normally entered Private registers User USR by default none (access user bank) Supervisor SVC upon software interrupt R13_svc and R14_svc Interrupt IRQ upon interrupt request R13_irq and R14_irq Fast Interrupt FIQ upon fast interrupt request R8_fiq to R14_fiq The SVC, IRQ and FIQ modes are privileged, and provide more control over the system. These three modes also have their own private registers, which reduce time overheads (due to stacking registers) when interrupts are dealt with. PerformanceThe performance of each Acorn RISC computer is given in the following table. Some of these figures have been estimated, and should not be relied on. ARM ARM RAM ROM ROM Speed* Speed* Speed* CPU Clock Timing (std.) (opm.) (avg.) (pk.) (sus.) A3xx series ARM2 8MHz 125ns 450ns 375ns 4.0 7.1 3.4 A3000 ARM2 8MHz 125ns 450ns 375ns 5.6 7.1 4.0 A30x0 series ARM250 12MHz 83ns 333ns 333ns 7.6 12.5 6.0 A4 ARM3 25MHz 83ns 333ns 333ns 16.0 25.0 12.0 A4xx series ARM2 8MHz 125ns 450ns 375ns 5.6 7.1 4.0 A540 ARM3 25MHz 83ns 333ns 333ns 16.0 25.0 12.0 A4000 ARM250 12MHz 83ns 333ns 333ns 7.6 12.5 6.0 A5000 ARM3 25MHz 83ns 333ns 333ns 16.0 25.0 12.0 A5000 alpha ARM3 33MHz 83ns 333ns 333ns 20.0 32.0 15.4 R140 ARM2 8MHz 125ns 450ns 375ns 5.6 7.1 4.0 R2xx ARM3 25MHz 83ns 333ns 333ns 14.0 25.0 12.0 * Speeds in millions of instructions per second (MIPS) N.B. 1. Optimal ROM timings quoted for RISC OS 3.10 2Mb ROMs. 2. Optimal timing for RISC OS 2.00 0.5Mb ROMs is 200ns. 3. Future ROM chips may also have 166/200/250ns as optimal ROM timing. 4. Future ARM chips may allow further memory access time optimization, at the user's risk. The speed of instructions vary across the ARM instruction set, as indicated by the following table. However, compared with CISC processors, the ARM is extremely efficient. Instruction type Execution time ALU (except multiply) 1 processor cycle ALU (PC destination) See branch Single load/store 1 processor cycle plus 1 memory cycle Branch 1 memory cycle and up to 3 processor cycles Multiple load/store 1 processor cycle, plus 1 memory cycle per register Multiply Up to 17 processor cycles The ARM Instruction SetThe following table lists the assembler mnemonics of all instructions provided by the ARM2/3 processor. Arithmetic and logic instructionsADC ADd with CarryRd=Rn+Rm+C ADD ADD (without carry) Rd=Rn+Rm SBC SuBtract with Carry Rd=Rn-Rm-(1-C) SUB SUBtract (without carry) Rd=Rn-Rm RSC Reverse Subtract with Carry Rd=Rm-Rn-(1-C) RSB Reverse SuBtract (without carry) Rd=Rm-Rn AND Bitwise AND Rd=Rn AND Rm BIC Bitwise NAND Rd=Rn AND NOT Rm ORR Bitwise OR Rd=Rn OR Rm EOR Bitwise EOR (XOR) Rd=Rn EOR Rm MOV MOVe Rd=Rm MVN MOVe bitwise NOT Rd=NOT Rm Comparison instructionsCMP CoMPareRn+Rm CMN CoMpare Negative Rn-Rm TEQ Test EQuivalance Rn EOR Rm TST TeST bits Rn AND Rm Multiply instructionsMUL MultiplyRd=Rm*Rs MLA Multiply and Accumulate Rd=Rm*Rs+Rn Branch instructionsB BranchBL Branch with Link Register load and saveLDR LoaD RegisterSTR STore Register LDM LoaD Multiple registers STM STore Multiple registers Software interruptSWI Perform SoftWare InterruptThe following condition suffixes may be appended to any ARM instruction: Flags AL ALways Always performed (the default) TRUE NV NeVer Never use (reserved) undef. CS Carry Set Performed if Carry flag set C CC Carry Clear Opposite to CS (Carry not set) ~C EQ EQual Performed if Zero flag set n1= n2 Z NE Not Equal Opposite to EQ (Zero flag unset) n1<>n2 ~Z VS oVerflow Set Performed if oVerflow flag set V VC oVerflow Clear Opposite to VS (oVerflow unset) ~V MI MInus Performed if Negative flag set N PL PLus Opposite to MI (Negative unset) ~N Cardinal HS Higher or Same Same as CS (Carry set) n1>=n2 C LO LOwer Same as CC (Carry clear) n1< n2 ~C LS Lower or Same Performed when less or equal n1<=n2 ~CvZ HI Higher Performed when greater than n1> n2 C^~Z 2's-complement GE Greater or Equal Performed when greater or equal n1>=n2 (N^V)v(~N^~V) LT Less Than Performed when less n1< n2 (N^~V)v(~N^V) LE Less or Equal Performed when less or equal n1<=n2 (N^~V)v(~N^V)vZ GT Greater Than Performed when greater n1> n2 ((N^V)v(~N^~V))^~Z The shift and rotate mnemonics, available as an option on all AL instructions, are: LSL Logical Shift Left DA Decrement After each store/load (post-indexed decremental form) The following suffixes may be applied to certain instructions in order to modify their normal operation: Suffix Applied to Meaning ! LDR/STR (address) Write back is used LDM/STM (address register) Write back is used ^ LDM with R15 (register list) Force update of PSR Other LDM/STM (register list) Use user bank B LDR/STR (mnemonic) Load/store byte value P Comparative instructions (mnemonic) Copy calculation result to PSR S AL instructions with R15 dest. Force update of PSR AL instructions (mnemonic) Force update of flags MUL/MLA instructions (mnemonic) Force update of flags Comparative instructions (mnemonic) Force update of flags (implied) T LDR/STR (mnemonic) Force address translation The Floating Point CoprocessorIn addition to the standard ARM instruction set are a group of floating point coprocessor instructions. These instructions are designed to be executed by a coprocessor chip attached to the ARM CPU, such as that provided in the Acorn FPA10 upgrade. Software emulation of the coprocessor is provided by the Acorn FPEmulator module. The following table gives information on the availability and form of the FPA10 upgrade for each machine in the Acorn range. FPA10 upgrade A3xx card A3000 card A3010 N/A A4 card A4xx card A540 chip/card A4000 N/A A5000 chip R140 card R2xx chip/card In the floating point coprocessor there are eight floating point registers designated by F0-F7. There is also a floating point status register whose bits are as follows: Bits Usage 31-21 Unused 20 INX interrupt mask 19 UFL interrupt mask 18 OFL interrupt mask 17 DVZ interrupt mask 16 IVO interrupt mask 15-05 Unused 04 INX cumulative flag 03 UFL cumulative flag 02 OFL cumulative flag 01 DVZ cumulative flag 00 IVO cumulative flag The bottom five flag bits indicate the following exceptions: IVO InValid Operation The interrupt mask bits, when set, cause exceptions to generate fatal errors. The Floating Point Instruction SetThe following table lists the assembler mnemonics of the instructions provided by the floating point coprocessor (or suitable emulation.) Data transfer instructionsLDF LoaD Floating point registerSTF STore Floating point register Register transfer instructionsFLT FLoat ARM register into an FP registerFIX FIX FP register into an ARM register WFS Write Floating Status flags from an ARM register RFS Read Floating Status flags to an ARM register WFC Write Floating interrupt mask RFC Read Floating interrupt mask Comparison operationsCMF CoMpare Floating point numbersCNF Compare (second argument Negated) Floating point numbers CMFE CoMpare Floating point numbers and generate Error if unordered CNFE Compare Negated Floating numbers and generate Error if unordered Binary operationsADF ADd Floating point registersFd=Fn+Fm MUF MUltiply Floating point registers Fd=Fn*Fm SUF SUbtract Floating point registers Fd=Fn-Fm RSF Reverse Subtract Floating registers Fd=Fm-Fn DVF DiVide Floating point registers Fd=Fn/Fm RDF Reverse Divide Floating registers Fd=Fm/Fn POW POWer Fd=Fn^Fm RPW Reverse POWer Fd=Fm^Fn RMF ReMainder of Floating division Fd=remainder of Fn/Fm FML Fast MuLtiply (Single precision) Fd=Fn*Fm FDV Fast DiVide (Single precision) Fd=Fn/Fm FRD Fast Reverse Divide (Single) Fd=Fm/Fn POL POLar angle conversion Fd=ATN(Fn/Fm) Unary operationsMVF MoVe Floating point registerFd=Fm MNF Move Negated Floating register Fd=-Fm ABS ABSolute value Fd=ABS(Fm) RND RouND to integer Fd=INT(Fm) SQT SQuare Root Fd=SQR(Fm) LOG LOGarithm to the base 10 Fd=LOG(Fm) LGN LoGarithm to the base e (Natural) Fd=LN(Fm) EXP EXPonent of the base e (Natural) Fd=e^Fm SIN SINe Fd=SIN(Fm) COS COSine Fd=COS(Fm) TAN TANgent Fd=TAN(Fm) ASN Arc SiNe (Inverse sine) Fd=ASN(Fm) ACS Arc CoSine (Inverse cosine) Fd=ACS(Fm) ATN Arc TaNgent (Inverse tangent) Fd=ATN(Fm) Each of the instructions listed above can be suffixed with any of the ARMs 16 condition codes. In addition, a precision suffix needs to be specified. The precision suffixes are: Mantissa Exponent S Single precision 23 bits 8 bits D Double precision 52 bits 11 bits E Double Extended precision 64 bits 15 bits P Packed BCD storage format 19 digits 4 digits A rounding suffix may also be added. These are: P Round up M Round down Z Round to zero If no rounding suffix is specified then the number will be rounded to the nearest available in the current precision. Breakdown of an Encoded ARM InstructionAll encoded ARM instructions occupy one 32-bit word. The common elements which constitute all instructions are given below. Bits Size Usage 31-28 Nybble Condition code (Condition) 27-24 Nybble Base instruction code (Base) 00-23 24 Depends on base instruction code Learning about ARM Instruction EncodingThe file Base is a good starting point for those interested in exploring the encoding of ARM instructions. This file refers to other files with give more specific information on particular instruction groups. The encoding of condition codes is explained in the file Condition.txt.Encoding an ARM Instruction
Decoding an ARM Instruction
|
Text © 1994 Kade Hansson, HTML © 2014 Kade Hansson
This text may be freely distributed, provided that it is unmodified, it is not offered for sale at any price, published in a periodical or book or offered on a diffusion service without the author's written consent.
Kade Hansson retains copyright in all parts of this guide.
Maintained by Kade Hansson; e-mail: archer@kasoft.info; Updated on 23 February 2014; XHTML 1.0 Transitional.