Int16h, 29th August 2006
Welcome to the first installation of CyberArmy's Assembly Language classes. We will start at the very beginning - with 16-bit 8086 assembly code - to help you understand the basics and workings of assembly and the CPU without distracting you with things such as 32-bit registers, protected mode, and so on. From there, we'll get to modern usage of Assembly language, and show you that it's far from dead, and there's no reason for it not to be used and taught everywhere.
Asm is a language close to my heart, ever since the Amiga days... when apart from Blitz BASIC, and AMOS - it was all many people had to work with. Its decline started a few years ago, which probably began with the popularity of Microsoft's rapid application development solutions like Visual Basic, and Borland's Delphi.
In addition to such easy-to-use IDEs, new languages and frameworks such as C# and .Net have been developed and are rapidly becoming a standard in software engineering
within Microsoft operating systems. Although such technologies may ease the jobs of programmers, they also result in larger amounts of poor quality applications being released into the wild.
In an attempt to make Asm more palatable for CompSci students, efforts have been made to modernise Asm and make it easier to use by creating things such as HLA (High Level Assembly).
I can't really comment on HLA as I've never used it - though, it seems to be Asm with a C-style approach. Another which probably isn't worth mentioning is Microsoft's ILASM for .Net. It's not something I'd personally use to develop .Net applications -
though, I'm not much of a .Net fan.
------------------------------------------------------------------------------------
The 8086 CPU was designed in 1978 by Intel, and has 14 accessible registers, each being 16 bits long. This CPU is the foundation on which your current x86 CPU is built on. The first 4 registers (and most commonly) are data registers, which are named AC,
BX, CX & DX. Then there are the pointer/index registers - SP, BP, IP, SI and DI.
The data registers, although being 16-bits long can be accessed as two 8-bit registers: AX (AH & AL), BX (BH & BL), CX (CH & CL), DX (DH & DL) - The 'H' and 'L' are the higher and lower parts of the register, respectively. One restriction on using a register in this way, is if you use the lower part of DX (DL), you may not use DX as a 16-bit register at the same time. For example, if the DX register contained 6944 h(ex) - DH (the higher, 8-bit register) would contain 69h, and DL - 44h.
Here is a list of the registers we just covered, their full names, and a description of their use:
AX - Accumulator - Calculations and I/O
BX - Base - Can be used as an index
CX - Counter - Used for loops
DX - Data - I/O and for MUL & DIV
IP - Instruction Pointer - 16-bit number which points to offset of next instruction
BP - Base Pointer - For passing data to/from the stack
SP - Stack Pointer - 16-bit number which points to the stack offset.
SI - Source Index - Used for String ops as a source
DI - Destination Index - Used for String ops as a destination
The next set of registers which I haven't mentioned yet, are segment registers. These point to the current active segment respectively:
CS - Code Segment - 16-Bit number which points to active code segment
DS - Data Segment - "
SS - Stack Segment - "
ES - Extra Segment - "
You've probably noticed that we only have 13 registers listed above. The final, unmentioned register is the Flag register - which requires a little more explanation than the others...
The FLAGS register has 9 status bits, these bits are called flags, as they are either "on" or "off", like binary & electronic switches. Here are the abbreviations, names, number and descriptions of each flag:
OF Overflow Flag 11 An overflow is present
DF Direction Flag 10 Check direction in string ops
IF Interrupt Flag 9 Interrupts are enabled
TF Trap Flag 8 CPU may work in step mode
SF Sign Flag 7 Number of calc is negative
ZF Zero Flag 6 Number of calc is zero
AF Aux Flag 4 A second Carry Flag
PF Parity Flag 2 Indicates even or off parity
CF Carry Flag 0 Contains far left bit after calc
Now we know about the registers, we should look at memory as handled by MSDOS since we are looking at 16-bit Asm first. When the CPU stores 16-bits of data, it's stored in reverse-order. For example, 6944h would be stored as 44h 12h in memory (as 2
seperate bytes, in reverse-order). Memory in MSDOS (under an 8086) is divided into 64KB segments, and each have a "reference" number, which is stored in the segment registers. When debugging memory, you will see that the addresses are in the following
format: SEGMENT:OFFSET (xxxx:xxxx). We will talk more about memory later when we start on 32-bit asm, as it's not extremely important right now.
Until recently, my favourite x86 Assembler was Borland's Turbo Assembler. I'm not sure if it's abandonware now or not, but it's a great package and quite easily available if you want to try it (Turbo Assembler 5.0). If I remember correctly, version 4 or 5 was the first version to include a 32-bit Assembler and linker so it's suitable for writing Win32 applications (Though, we may be using masm32 for the 32-bit part of this series - more on that later).
The following MSDOS examples use TASM syntax - If you are able to find tasm5.zip, please try them out, they should work fine on Windows XP but as I am an opposer of bloat & waste, I haven't tested on Windows Vista. If you can't get a copy of TASM, you can still follow the
rest of this article with ease.
The following, is a typical "Hello, World" example - written in Asm. After the code, I'll explain what each part of the code does:
(ex01.asm)
.model tiny
.stack
.data
harro db "Harro CyberArmy - I rove you rong time!", "$"
.code
main proc
mov ax,seg harro
mov ds,ax
mov ah,09
lea dx,harro
int 21h
mov ax,4c00h
int 21h
main endp
end main
You can build this app if you have TASM, by running "tasm ex01", then "tlink ex01". If you don't have TASM, you can still follow what's happening by reading the comments which follow.
.model tiny - This tells the assembler we don't need many resources for this app.
.stack - This is the beginning of the stack segment, which we don't use,
- but because we're creating an .EXE rather than .COM - we need this.
.data - Data segment starts here.
harro db - "Define Byte", in this case defines a byte-string identified as 'harro'.
.code - Code section begins.
main proc - Like in C/C++'s int main() or void main(),
- code should be placed in a main procedure.
mov ax, seg harro - "MOV" is an Asm instruction which moves data;
- here we are moving (or copying) the segment address of 'harro' into AX.
mov ds, ax - Now we're moving/loading our data into DS, unlike the data registers,
- we can't just MOV 'seg harro' into a segment register.
mov ah, 09 - Move the number 9 into the higher-part of AX (AH).
lea dx, harro - Load the Effective Address of 'harro' into DX. In other words,
- we're copying the offset within the datasegment of 'harro' into DX,
- so we now know that our byte-string is at DS:DX.
int 21h - This is interrupt 21h - at this point, AX(AH/AL) will be read and
- the value corresponds to a function within int 21h.
- In this case, AH contains 09 - which prints a string to stdout
mov ax, 4c00h - Move 4c00 h(ex) into AX.
int 21h - Again, Interrupt 21h. This time, ax (ah + al) contains 4C00h;
- 4C is for the procedure and is interpreted as "Terminate App",
- the lower (AL) value is used as an exit code,
- in this instance (00h) means "without error".
Using Microsoft's DEBUG.EXE, which thankfully is still included in Windows XP - we can see how this Assembled and Linked code is seen in memory:
Run "debug.exe ex01".
Once at the prompt, you can see a hex dump of the app by pressing "d [enter]"; Now, if you type: "u [enter]", you will be presented with a disassembly of the app in memory, similar to the following:
157E:0000 B87E15 MOV AX,157E
157E:0003 8ED8 MOV DS,AX
157E:0005 B409 MOV AH,09
157E:0007 BA1200 MOX DX,0012
157E:000A CD21 INT 21
157E:000C B8004C MOV AX,4C00
157E:000F CD21 INT 21
As you can see, the instructions "mov ax, seg harro" become "mov ax,157E" - which is the reversed segment where our message is.
Now, in machine/op code - "mox ax" is translated as "B8", then our number follows. The instruction "lea dx, harro" (which you should remember will Load the Effective Address of 'harro') has been interpreted as "mov dx,0012". This now tells us that out
string is located within segment 157E(7E15), offset 0012. If you return to our debug window, we can display what's there by typing: "d 157E:0012".
What you should now see is something like the following:
157E:0010 48 61 72 72 6F 20-43 79 62 65 72 41 72 6D Harro CyberArm
157E:0020 79 20 2D 20 49 20 72 6F-76 65 20 79 6F 75 20 72 y - I rove you r
157E:0030 6F 6E 67 20 74 69 6D 65-21 24 ong time!$
It should be noted that memory is handled differently under a 32-bit operating system than in a humble, 16-bit Disk Operating System, such as Microsoft's MS-DOS. More about that in a later part.
That's it for part 1, hope you weren't too bored.
To accompany part one of this beginner's guide, here are a couple of resources you may find useful:
HelpPC - An invaluable reference for DOS coders.
Borland Turbo Assembler 5.0 User's Guide
Programming from the Ground Up - Linux Asm book
Intel ia-32 Dev Manuals - vol1-3
680x0 Programming Manual (Motorola)
PowerPC Programmer's Manual
MOS 6510 Datasheet (Commodore, C64)
Recommended reading:
http://www.ctyme.com/rbrown.htm - Ralph Brown's Interrupt List
This article was originally published by CyberArmy.net in the CyberArmy Library.
|