|
Registered User
Join Date: Nov 2006
Location: Sydney, Australia
Posts: 207
OS: WinXP-Home
|
Introduction To x86 Assembly (Review)
Been a long time comming, but I decided to write an article. Still not finished (I foresee 6 or so chapters), just wanted to get some feedback to ensure I'm going in the right direction (and that people can understand):
Quote:
1. Preamble
Welcome,
As the title implies, this article has been written to give you a comprehensive introduction to the Assembly programming language. The audience for this text need not have any prior experience in computer programming, although it would be benificial for you to have an intermediate to advanced general ability working with computers, namely Windows.
From this article you will aquire the skill that will allow you to develop your own software, learn some of the lean & mean aspects of how computers work, and most importantly give you a solid foundation for learning higher level languages (such as C).
Please keep in mind for those of you who are technically inclined that I've intentionally not covered alot of the more advanced concepts of systems architecture (e.g, ALU, cache, gates, ect...), and kept things of quite an abstract nature.
2. Preperation
Luckily, there are very little requirements for programming in assembly, as follows:
* An x86 (or 86x64) Intel (or AMD) processor.
* Any Window's operating system.
* A plain text editor (Such as Notepad)
* A copy of Microsoft's freeware assembler (See steps below)
Step 1: Download MASM here: http://website.assemblercode.com/masm32/m32v9r.zip
Step 2: Run the installer, the steps required here are very self-explainatory (Just continue with everything).
Step 3: Create a folder for your work, anywhere you like. From now on this folder will be refered to as your "project folder".
Step 4: Create a copy of the window's shell (cmd.exe) which can be found in your \WINDOWS\System32 folder, and place it in your project folder.
Step 5: Last of all, add the path of MASM's "bin" folder to your system PATH variable. You can do this via Control Panel -> System -> Advanced -> Enviromental variables (remember to append a semi-colon to the end of the current PATH value, followed by the path of the bin directory.) For example, your PATH variable may now look like: %SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\masm32\bin
Note: You may have to restart for this change to take affect.
3. Mr. Von Neuman
With the exception of some embedded systems, computers follow one very basic model; the Von Neuman Architecture, which stipulates the generalized architecture of a computer as seen below: [TODO]
All the components of a computer communicate via the system bus, and why is this bus metaphor used? Because the system bus essentially runs in circles ("Bus route"), stopping at each component ("Bus stop"), picking up information ("Passenger pickup") and dropping that information off at another component ("Passenger dropoff"). This means that the frequency (measured in MHz typically) of the system bus is what determines how fast devices can be accessed, something you may want to keep this in mind when purchasing memory (Memory can only be accessed as fast as the system bus, e.g, if you have a 600MHz bus yet the memory module operates at 1600MHz, then there's going to be a bottle-neck between the two, reducing practical memory bandwidth.)
Memory consists of a linear array of 8-bit groupings called bytes which we use to store data, each byte is referenced using a unique numeric identifier; which is where the term 'byte-addressable memory' comes from in relation to the majority of modern processors. Although the most common implementation of 'data' is numeric values, it is truely up to the programmer to define what their 'data' is, and subsequently how to interpret it. For example, ASCII characters are encoded as bytes. A better example would be bit flags, where each bit is used to indicate a certain state, for example you may have a byte which indicates various options of a user's account like so:
[Bit 0] - Account enabled?
[Bit 1] - Account banned?
[Bit 2] - Account muted?
[Bit 3] - Account pending deletion?
[Bit 4-7] - Unused.
We manipulate these options by toggling the individual bits, to either 1 or 0 (on or off, whatever). Noting that in the above example we diden't use a 'whole' byte (we only used 4 bits which is called a 'nibble'), of which is perfectly legal. Also, when we are refering to bits within a byte we label them numerically from least-significant to most-significant starting at 0. Hence, in the above example the "Account Enabled" bit would be the least-significant digit if we were to interpret it as an integer.
So far I've only spoken about bytes, although you can use as much memory (within availability limits) as you like to store your data, the
number of bytes that can be manipulated (or accessed) by a single instruction (Which you'll learn about soon) depends primarily on the processor's architecture. For example, Intel's desktop processors can all manipulate a 4 byte chunk of memory using a single instruction, while their server processors can manipulate 8 bytes. This is where the terms "32 bit" and "64 bit", respectively, processors come from. Ofcourse, they can both manipulate values smaller than 32 (or 64 bits), but the size must strictly be on a multiple of bytes and be of an even number (So really, 1, 2, 4, or 8 bytes).
Many common naming conventions exist for naming these byte groupings of different sizes, the following are what we will use:
Byte - 1 byte (8 bits.)
Word - 2 bytes (16 bits.)
Dword - 4 bytes (32 bits.)
One important thing to note is that when you want to refer to a chunk of data, whether it be a word or dword that is larger than a byte you use the address of the least-sigificant byte (that which has the lowest address), for example if you using memory 10 - 13 as a dword then you would specify 10 as the address (how much data to access is dependent upon the instruction, but will always be presumed to be from lower to higher orders of memory addresses)
The processor is no doubt the most well-known component of a system. Often I've found people are daunted by computers due to their seemingly magical ways, but in reality, everything accomplished involves a series of very primitive steps which are called 'instructions'.
As you've most likely guessed, an instruction is what you use to order the processor to do some task, and like all tangible data resides in memory. Although the format of instructions varies from one to another, what is consistant is that all instructions have a unique identifier (which is a number as far as we're concerned) which tells the processor what we want it to do, the rest of the instructions contents largely depend on the instruction itself, for example an instruction that tells the processor to set the value (integer) of a chunk of memory would contain the address of the memory to be set, the actual value to set it to, along with a few other tidbits (such as an indicator of what size the chunk is; byte, word or dword). So how do we get the processor to execute our instructions? On-board the processor there are a handful of dedicated data containers called 'registers', one of these register's is named EIP, or Instruction Pointer which contains the memory address of the current instruction that is being executed, after it has finished executing that instruction it will increment EIP by the size (in bytes) of the previous instruction and execute the next, so on and so forth, forever.
Registers mentioned before are an important topic, essentially they are like main memory except they are much smaller, independent, and are physically contained on the processor chip (making for much faster access). There are 8 general purpose registers at your disposal, which are all 32-bits in size:
[ EAX [ AX [ AH, AL ] ] ]
[ EDX [ DX [ DH, DL ] ] ]
[ ECX [ CX [ CH, CL ] ] ]
[ EBX [ BX [ BH, BL ] ] ]
[ ESI [SI] ]
[ EDI [DI] ]
[ EBP [BP] ]
[ ESP [SP] ]
The above may seem a little confusing, but don't worry I'll explain. Since registers are technically not the same as main memory, they do not have a numerical address, therefor you refer to them by their actual name (like EAX). There are some instances ofcourse where you do not wish to use a whole 32-bit register (since they are quite the commodity), and want to use only a portion. To accomodate this the registers are broken down into multiple parts to reference a specific 'area' of the register, take EAX for example, it is broken down into 4 seperately addressable (although overlapping) registers. AX is the low-order word of EAX, AH is the high-order byte of AX and AL is the low-order byte of AX (There is no way to reference the high-order word of EAX). Don't forget, although they are referenced seperately EAX, AX, AL, AH are all overlapped and thus if you were to modify AX, then essentially EAX would be modified (although only the low 16 bits). The same rule applies with the other registers, except for ESI/EDI/EBP/ESP which you can only access either the whole 32-bits, or the low 16-bits.
|
__________________
8 Years C++
7 Years x86 assembly.
Network programming veteran.
ADA, Java, BASIC, Pascal, BCPL, FORTRAN, COBOL, HTML, PHP, CSS, JavaScript.
Last edited by MattBro; 02-13-2008 at 11:21 PM.
Reason: Formatting has screwed up.
|