Assembly-Guidebook

Assembly Language

An assembly (or assembler) language, often abbreviated asm, is a type of low-level programming language for a computer, or other programmable device, in which there is a very strong (but often not one-to-one) correspondence between the language and the architecture’s machine code instructions.
Each assembly language is specific to a particular computer architecture. In contrast, most high-level programming languages are generally portable across multiple architectures but require interpreting or compiling.
This Guidebook is for x86 architecture, or, 8086 microprocessor assembly language programming.

Is it easy to learn?

ABSOLUTELY.

Prerqeuisite Knowledge
Registers
The MOV Instruction
Variables, Arrays and constants
Some Important Instructions
Addressing Modes Of 8086
Sample Programs

Prerequisite knowledge

8086 is a 16-bit microprocessor. It has 20 address pins (16 of them multiplexed with the data pins) It can address a maximum of 2^20 = 1 million locations Memory is byte addressabe. Every byte has a separate addres.

Inside the CPU

Registers

Registers are memory devices used to store some data. The 8086 microprocessorr has a total of 14 registers that are accessible to the programmer.
8 of these registers are known as general purpose registers,i.e. they can be used by a proogrammer for data manipulation.
Each of the registers is 16-bit long,i.e. can carry a 16-bit binary number. The first 4 are reffered to as data registers, they are represented by AX, BX, CX, DX. The second 4 are referred tp as index/pointer registers. They are SP, BP, SI and DI registers.

General purpose registers

8086 CPU has 8 general purpose registers.General purpose registers are available to store any transient data required by the program.
For example, when a program is interrupted its state, ie: the value of the registers such as the program counter, instruction register or memory address register - may be saved into the general purpose registers, ready for recall when the program is ready to start again. In general the more registers a CPU has available, the faster it can work.
Each register has it’s own name:

AX – The accumulator register of 16-bits (divided into 2 8-bit AH/AL)
BX – The base address register of 16-bits (divided into 2 8-bit BH/BL)
CX – The counter register of 16-bits (divided into CH/CL)
DX – The data register of 16-bits (divided into DH/DL)

SI – The source index register
DI – The destination index register
BP – The base pointer
SP – The stack pointer

Segment Registers

Each segment register has its own special use.

CS – The Code Segment register points at the segment containing the current program.
DS – The Data Segment register generally points at segment where the variables are defined
ES – The Extra Segment register also refers to a segment in the memory which is another data segment. It is upto coder to define it.
SS – The Stack segment register points at the segment containing the stack.

Status (Flag) registers

Nine individual bits of the status register are used as control flags (3 of them) and status flags (6 of them) and remaining 7 are not used. A flag can only take on the values 0 and 1. We say that the flag is set, if it has the value 1.

The MOV Instruction

Copies second operand(source) to first operand(destination). MOV AX,5 Here, 5 is copied to AX.

MOV supprts these operands

MOVE REG, MEMORY
MOV MEMORY, REG
MOV REG, REG
MOV MEMORY, IMMEDIATE
MOV REG, IMMEDIATE

Memory: [AX], [BX], [BX+SI+5] etc. Immediate: 5, 10, 1001b, 3Fh etc

For Segment Registers

MOV A, B A: REG OR SEG REG OR MEMORY B: REG OR SEG REG OR MEMORY For example,
```
MOV [BX], DS
MOV AX, DS
MOV DS, AX
```

Assembler directives

DB (Data Byte)

BW (Data Word)

EQU

Variables Arrays and constants

Variables

Variable is a memory location, for a programmer it is much easier to have some value be kept in a variable names “Some_Variable” instead of keeping it at 2A12:2122B, especially when one have more number of variables in his program.
The compiler supports two types of compilers: BYTE and WORD.

Syntax for variable declaration

name DB value
name DW value

DB - Define Byte
DW - Define Word
name - Can be any letter or digit combination, though, it must start with a letter. value - It can be any numeric value in any supported numbering system (hex, bin or dec)

See a sample program
ORG 100h is a compiler directive (it tells compiler how to handle the source code)
This directive is very important when you work with variables. It tells compiler the correct address for all variables when it replaces the variable names with their offsets.
Directives are never converted into any real machine code.

Arrays

Arrays can be seen as chains of variables. A text string is an example of a byte array, each character is presented as an ASCII code value.
Exampes of array definition

arr1 DB 48h, 65h, 6Ch, 6Ch, 6Fh, 00h
arr2 DB 'Hello',0

arr2 is an exact copy of the arr1, when the compiler sees a string insiide quotes, it automatically converts it into a set of bytes.

Also, any element of an array can be accessed using square brackets.
For Example,

MOV AL, arr1[3]
; Alternatively
MOV SI, 3
MOV AL, arr1[SI]

If you need to declare a large array you can use DUP operator. The syntax for DUP: ``number DUP (values)
number - number of duplicate to make (any conost value)

value - expression that DUP will duplicate

arr3 DB 5 DUP(9)
; it is equivalent to
arr3 DB 9, 9, 9, 9, 9
arr4 DB 5 DUP(1,2)
; it is equivalent to
arr3 DB 1, 2, 1, 2, 1, 2, 1, 2, 1, 2

you can use DW instead of DB if it’s required to keep values larger then 255, or smaller then - 128. DW cannot be used to declare strings
Sample Program For Arrays

Byte or Word access – Data Type

In order to tell the compiler about the data type, these prefixes should be used:
BYTE PTR - for byte
WORD PTR - for word (2 bytes)

To get the address of a variable - LEA and OFFSET

Program #1: Using the LEA to get the address (LEA: Lead Effective Address)

ORG 100h
MOV AL, VAR1  ; check value of VAR1 by moving it to AL.
LEA BX, VAR1  ; get address of VAR1 in BX.
MOV BYTE PTR [BX], 44h ; modify the contents of VAR1.
MOV AL, VAR1  ; check he value of VAR1 by moving it to AL.
RET
VAR1 DB 22h
END

Program #2: Using the OFFSET to get the address

ORG 100h
MOV AL, VAR1  ; check value of VAR1 by moving it to AL.
MOV BX, OFFSET VAR1  ; get address of VAR1 in BX.
MOV BYTE PTR [BX], 44h ; modify the contents of VAR1.
MOV AL, VAR1  ; check he value of VAR1 by moving it to AL.
RET
VAR1 DB 22h
END

Both of them have the same functionality

NOTE: Only BX, SI, DI, BP can be used inside square brackets (as memory pointers) In assembly language there are not strict data types, so any variable can be presented as an array.

Constants

Constants are just like variables, but they exist only until your program is compiled (assembled). After definition of a constant its value cannot be changed. To define constants EQU directive is used:
Syntax: name EQU <any expression>

k EQU 5
MOV AX, k
; It is equivalent to
MOV AX, 5

NOTE – Variable can be viewed in any numbering system::

HEX
BIN
OCT
SIGNED
UNSIGNED
CHAR

Some Important Instructions

Before going further into assembly programming, there are some instructions that you must be aware of.

Addressing Modes Of 8086

Every instruction of a program has to operate on a data. The method of specifying the data to be operated by the instruction is called addressing.
The 8086 has 12 addressing modes and they can be classified into following five groups.

Group	Group 1	Group 2	Group 3	Group 4	Group 5
Addressing modes	Registers and immediate data	Memory Data	I/O Ports	Relative Addressing	Implied Addressing

Group 1

Register Addressing
Immediate Addressing
Group 2
Direct Addressing
Register Indirect addressing
Based Addressing
Indexed Addressing
Based Indexed Addressing
String Addressing
Group 3
Direct I/O port Addressing
Indirect I/O port Addressing
Group 4
Relative Addressing
Group 5
Implied Addressing

MORE DETAILS HERE