How to use assembly language (tutorial up to hello world only)

How to set up assembly language

Local Environment Setup

Assembly language is dependent upon the instruction set and the architecture of the processor. In this tutorial, we focus on Intel-32 processors like Pentium. To follow this tutorial, you will need −

An IBM PC or any equivalent compatible computer
A copy of the Linux operating system
A copy of the NASM assembler program

There are many good assembler programs, such as −

Microsoft Assembler (MASM)
Borland Turbo Assembler (TASM)
The GNU assembler (GAS)

We will use the NASM assembler, as it is −

Free. You can download it from various web sources.
Well documented and you will get lots of information on the net.
Could be used on both Linux and Windows.

Installing NASM

If you select "Development Tools" while installing Linux, you may get NASM installed along with the Linux operating system and you do not need to download and install it separately. To check whether you already have NASM installed, take the following steps −

Open a Linux terminal.
Type where nasm and press ENTER.
If it is already installed, then a line like nasm: /usr/bin/nasm appears. Otherwise, you will see just nasm:, then you need to install NASM.

To install NASM, take the following steps −

Check The netwide assembler (NASM) website for the latest version.
Download the Linux source archive nasm-X.XX.ta.gz, where X.XX is the NASM version number in the archive.
Unpack the archive into a directory which creates a subdirectory nasm-X. XX.
cd to nasm-X.XX and type ./configure. This shell script will find the best C compiler to use and set up Makefiles accordingly.
Type make to build the NASM and ndisasm binaries.
Type make install to install nasm and ndisasm in /usr/local/bin and to install the man pages.

This should install NASM on your system. Alternatively, you can use an RPM distribution for the Fedora Linux. This version is simpler to install, just double-click the RPM file.

An assembly program can be divided into three sections −

The data section,
The bss section, and
The text section.

The data Section

The data section is where you can declare constants or initialized data that won't change during runtime. You can use this section to declare values such as file names, buffer size, and other constants. The syntax for declaring the data section is as follows: -

section.data

The bss Section

The bss section is used for declaring variables. The syntax for declaring bss section is −section.bss

The text section

The text section is used for keeping the actual code. This section must begin with the declaration global _start, which tells the kernel where the program execution begins.

The syntax for declaring text section is −

section.text
   global _start
_start:

Comments

Assembly language comment begins with a semicolon (;). It may contain any printable character including blank. It can appear on a line by itself, like −

; This program displays a message on screen

or, on the same line along with an instruction, like −

add eax, ebx     ; adds ebx to eax

Assembly Language Statements

Assembly language programs consist of three types of statements −

Executable instructions or instructions,
Assembler directives or pseudo-ops, and
Macros.

The executable instructions or simply instructions tell the processor what to do. Each instruction consists of an operation code (opcode). Each executable instruction generates one machine language instruction.

The assembler directives or pseudo-ops tell the assembler about the various aspects of the assembly process. These are non-executable and do not generate machine language instructions.

Macros are basically a text substitution mechanism.

Syntax of Assembly Language Statements

Assembly language statements are entered one statement per line. Each statement follows the following format −

[label]   mnemonic   [operands]   [;comment]

The fields in the square brackets are optional. A basic instruction has two parts, the first one is the name of the instruction (or the mnemonic), which is to be executed, and the second are the operands or the parameters of the command.

Following are some examples of typical assembly language statements −

INC COUNT        ; Increment the memory variable COUNT

MOV TOTAL, 48    ; Transfer the value 48 in the 
                 ; memory variable TOTAL
					  
ADD AH, BH       ; Add the content of the 
                 ; BH register into the AH register
					  
AND MASK1, 128   ; Perform AND operation on the 
                 ; variable MASK1 and 128
					  
ADD MARKS, 10    ; Add 10 to the variable MARKS
MOV AL, 10       ; Transfer the value 10 to the AL register

The Hello World Program in Assembly

The following assembly language code displays the string 'Hello World' on the screen −

Live Demo

section	.text
   global _start     ;must be declared for linker (ld)
	
_start:	            ;tells linker entry point
   mov	edx,len     ;message length
   mov	ecx,msg     ;message to write
   mov	ebx,1       ;file descriptor (stdout)
   mov	eax,4       ;system call number (sys_write)
   int	0x80        ;call kernel
	
   mov	eax,1       ;system call number (sys_exit)
   int	0x80        ;call kernel

section	.data
msg db 'Hello, world!', 0xa  ;string to be printed
len equ $ - msg     ;length of the string

When the above code is compiled and executed, it produces the following result −

Hello, world!

How did lower level programming evolved into high level programming

Were high compilers using lower-level language in the present day? In the past, writing compilers was done in the language being compiled, but nowadays, we usually use C or C++. Often, the initial development of a language is done in an existing programming language. Once the first compiler is relatively stable, it may be rewritten into the language that is being compiled . The process Assembly Language The Assembly Language, also known as assembler, was a language where programmers wrote mnemonics to represent machine code. This meant that they used more human-readable symbols to represent binary codes. The relationship between the instruction symbols and process was one-to-one from machine code to assembler. When the code was executed, the assembler converted it into machine code, which consisted of binary digits of 1s and 0s. To illustrate this, let’s use the example of computing the addition of two numbers represented by A = B + C, where the numbers (data) for B and C are sto...

The importance of lower level programming languages

Search This Blog