-
Notifications
You must be signed in to change notification settings - Fork 0
Assembly
The assembly language has the rule "In one line - one instruction."
An instruction or statement is the smallest autonomous part of a programming language; team or set of commands. A program is usually a sequence of instructions.
Source: Statement - Wikipedia
Empty lines, comments, as well as extra tabs or spaces are ignored.
There is a constant COMMENT_CHAR in op.h. It determines which character indicates the beginning of the comment.
In the provided file it is the hashtag — #.
That is, everything between the character '#' and the end of the line is considered a comment.
A comment can be located anywhere in the file.
Example #1:
# Codam
# is a programming school
Example #2:
ld %0, r2 # And it is located in the Marine Terrein
In the provided archive vm_champs.tar in directory champs/examples there is the file bee_gees.s with the champion code, which the original program asm translates into byte-code without errors.
There are two kinds of comments in the code of this champion:
- standard, which was discussed above;
- alternative, about which there is no information in subject.
Instead of octotorp (#) here is used ;.
An example of using a comment of this type:
sti r1, %:live, %1 ; Marine Terrein is placed in Amsterdam, Netherlands
This type of comment is not described in the subject, but it is supported by the original translator. Therefore we most likely do not have to support it.
But let's add it into the file op.h:
# define ALT_COMMENT_CHAR ;
It will be the only (excepting the norme) change we make to file
op.h.
Name of the champion must be defined in the file .s. There is a command in assembly, saved in the constant NAME_CMD_STRING. In the provided file op.h it is defined as .name.
Command .name must be followed by a string containing champion name:
.name "Batman"
The length of the name must not exceed the value defined in the constant PROG_NAME_LENGTH. In the provided file it equals to 128.
An empty string is a valid champion name:
.name ""
But lack of the string is an error:
.name
Also, in the file .s must be present champion's comment.
Command for comment is defined in the constant COMMENT_CMD_STRING in the file op.h as .comment.
The length of comment is restricted by the value of constant COMMENT_LENGTH and must not exceed 2048.
Command .comment is very similar to .name and behaves identically in case of empty string and abscence of the string.
Some of the provided .s example files contain the command .extend.
This command and all other commands excepting .name and .comment, are not described in the subject and are detected as an error by the original asm compiler.
Champion's executable code consists of instructions.
Assembly language has a rule one instruction per line. The new line character \n means both end of line and end of instruction.
So instead of ; as for C language, we will use \n.
This role means that even after the last instruction there must be a new line character. Otherwise the asm will display an error message.
Each instruction consists of several components:
Label consists of characters defined in the constant LABEL_CHARS. In op.h they are: abcdefghijklmnopqrstuvwxyz_0123456789.
Label can not containt characters other than those defined in the constant LABEL_CHARS.
Label must be followed by a character defined in constant LABEL_CHAR. In op.h it is :.
Label points to the instruction which immediately follows this label. Label points to one singe instruction and not to the block of instructions.
.name "Batman"
.comment "This city needs me"
loop:
sti r1, %:live, %1 # <-- operation 'sti' is pointed to by the label 'loop'
live:
live %0 # <-- operation 'live' is pointed to by the label 'live'
ld %0, r2 # <-- and this operation is not pointed to by any labels
zjmp %:loop
Labels make our life easier, by making coding process easier.
As we know, the code writtedd in assembly language will be transformed into byte-code. The virtual machine will work with the byte-code.
Let's assume we want to create a loop, in which operation live will be performed. We have operation zjmp, which can send us a few bytes forward or backward from current posistion.
To create a loop we have to give to operation zjmp a value. But what is the value? To find it out, we have to calculate how many bytes in the bytecode the operation code and its argument will use.
As we know, operation code is always 1 byte, and operation live has a single argument of size 4 bytes.
It turns out we need to go back 5 bytes back:
live %1
zjmp %-5
Not so difficult, but these calculations take time and it would be much easier to "switch to the operation live". This is what labels are made for.
We simply create a label loop and write it as zjmp operation argument:
loop: live %1
zjmp %:loop
Now it is the translators' job to calculate the 'distance' between operations. It will calculate how many bytes to jump back and will insert this value in the byte-code.
Valid labels:
marker:
live %0
marker:
live %0
marker: live %0
All labels from the examples above have the same value and meaning. You may choose any style you want.
Another valid example:
marker:
label:
live %0
This means label marker, and label label point to a single operation.
Valid example:
marker:
# End of file
In this case label points to the end of executable code.
Important to have a \n at the end of line. Otherwise translator will print an error message.
Assembly language consists of 16 operations. Each operation has 1 to 3 arguments.
Information about operation names and about arguments they receive is provided in the file op.c.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 1 | live |
T_DIR |
— | — |
| 2 | ld |
T_DIR / T_IND
|
T_REG |
— |
| 3 | st |
T_REG |
T_REG / T_IND
|
— |
| 4 | add |
T_REG |
T_REG |
T_REG |
| 5 | sub |
T_REG |
T_REG |
T_REG |
| 6 | and |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
| 7 | or |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
| 8 | xor |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
| 9 | zjmp |
T_DIR |
— | — |
| 10 | ldi |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
T_REG |
| 11 | sti |
T_REG |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
| 12 | fork |
T_DIR |
— | — |
| 13 | lld |
T_DIR / T_IND
|
T_REG |
— |
| 14 | lldi |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
T_REG |
| 15 | lfork |
T_DIR |
— | — |
| 16 | aff |
T_REG |
— | — |
Understanding of operations and how they work with the arguments of different types is the base of understanding of "Corewar".
There are three types of arguments:
Registry is a variable where we can store some data. The size of registries in octets is defined in the file op.h in the constant REG_SIZE and is equal to 4. A registry is a part of cursor (process), but this will be discussed later.
An octet in computer science is eight binary digits.
Source: Octet — Wikipedia
Amount of registries is defined in the constant REG_NUMBER with the value 16, so available regisries are r1, r2, r3 ... r16.
Registry values
During the startup of the virtual machine all the registries, excepting
r1, will be initialized with the value of 0.Registry
r1will contain the negated number of champion.This number is unique and is required for the operation
liveto report champion alive.Cursor placed at the beginning of the player
2, will have in it's registryr1value-2.If operation
liveis executed with the argument-2, virtual machine will know player 2 is alive:live %-2
Direct argument consists of special character defined in the constant DIRECT_CHAR (%) and a number or label, which represents direct value.
If there is a label in the argument, there must be the character defined in the constant LABEL_CHAR (:) in front of it:
sti r1, %:marker, %1
What are direct and indirect values?
Let's consider a simple example.
We have a number
5. Direct value means we just use5as is. That is,5is5.If
5is an indirect, it reprsents a relative address, pointing five bytes forward from the current position.
Direct and indirect labels
What is the difference between labels and numbers?
Actually there is no difference at all. The labels are transformed into their numrical equivalents by translator at the compiling stage.
This means labels are numbers, written in the form of words in
.sfile but replaced by numerical values by translator.Process of label replacement is describer in the chapter "Why do we need labels?»".
Argument of type indirect can be a number or a label, which will represent an indirect value.
If argument of type T_IND contains a number, there is no need in auxiliar symbols:
ld 5, r7
If indirect argument has a label as value, it must have LABEL_CHAR (:) in front of it:
ld :label, r7
To separate arguments a delimiter character is used. It is defined in SEPARATOR_CHAR constant as ,:
ld 21, r7
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 1 | live |
T_DIR |
— | — |
Description
Objectives of the operation:
-
Reports that cursor, which performed this
liveoperation, is alive. -
If argument of operation
liveis identical to the number of player saved in the cursor's registryr1, it reports that player is alive. If registryr1of the carriage executing operationliveequals-2, and the argument of opeartionliveequals to-2, virtual machine counts player2as alive.
What is a cursor?
Here is a brief explanation of the notion. In more detail it will be described in the chapter "Virtual machine".
A cursor is a process that executes the instruction on which it stands.
Let's assume we run our virtual machine with players loaded into it's memory. Every player will obtain a cursor (a process), which will be placed at the beginning of the players code.
3 champions. 3 sectors of memory with loaded executable code. 3 cursors (processes).
Each cursor contains some information:
PC(Program Counter) - position of the cursor (process) in memoryRegistries (
r1...r16) - variables that store cursor's data; their amount is defined by the constantREG_NUMBERFlag
carry- a specialbooleanvariable that affects operationzjmpand can take one of the two values1and0.Number of the cycle in which cursor reported
livelast time - this information is used to check whether cursor is still alive or not.In fact, cursor contains more information, but it will be discussed later.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 2 | ld |
T_DIR / T_IND
|
T_REG |
— |
This operation loads value into the registry of the cursor. But its behavior depends on the type of the first argument:
- How it works with the Argument #1 of type
T_DIR
If the first argument is T_DIR, then it will be loaded into the registry as is.
Actions performed:
-
Write the number from the first argument into the registry received as second argument.
-
If the written value is
0, then setcarryto1; if the written vaue is!0then setcarryto0.
- How it works with the Argument #1 of type
T_IND
If the first argument is of type T_IND, it represents an address.
If we receive argument of this type, we must truncate it with a modulo operation: <FIRST_ARGUMENT> % IDX_MOD.
What is
IDX_MOD?
IDX_MODis another constant from the fileop.h. It's value is defined as(MEM_SIZE / 8), whereMEM_SIZEis the size of memory in bytes. The virtual machine's memory size isMEM_SIZE. In this memory the champions will fight.What is the constant
IDX_MODfor? It is used to limit the maximum distance a cursor can jump in the memory. In the fileop.htheMEM_SIZEis initialized with the value(4 * 1024), soIDX_MODequals to512.This means a cursor can't move more than 512 bytes away from the current position.
After argument of type
T_INDhas been truncated, we use it as a relative address in memory - how many bytes forward or backward relative to the current position of the cursor is the position we need.
Actions performed:
-
Calculate address: current position of the cursor +
<FIRST_ARGUMENT> % IDX_MOD. -
Read four bytes starting from the obtained address.
-
Write the value from the step 2 into the registry passed as the second argument.
-
If the written value is
0, then setcarryto1; if the written vaue is!0then setcarryto0.
Why do we read exactly four bytes?
The registry size and direct value size are defined as 4 in the file
op.h:# define REG_SIZE 4 # define DIR_SIZE REG_SIZEWe go to the address we've calculated with the argument of type 'T_IND' to read a value. To read the value "as is". That is, to read a value of type
T_DIR. And we have to save the read value into the registry. To guarantee the success of the operation, the size of read number must be compatible to the size of the registry.Also, later we will discover that it is possible to write values from the registry to the address in memory. That is why the size of number read from the memory and size of the registry must be compatible in both directions.
So we read as many bytes as the registry can store.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 3 | st |
T_REG |
T_REG / T_IND
|
— |
Description
This operation writes the value from the registry passed to it as the first parameter, but destination depends on the type of the second argument:
- How it works with the Argument #2 of type
T_REG
st r7, r11
In this case value from the registry 7 is written to the registry 11
- How it works with the Argument #2 of type
T_IND
Type T_IND is related to the memory addresses, so st workflow is:
-
Truncate indirect value by modulo:
% IDX_MOD. -
Calculate the address: current address of the cursor +
<SECOND_ARGUMENT> % IDX_MOD. -
Write the value from the registry passed as first argument into the memory address calculated in the previous step.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 4 | add |
T_REG |
T_REG |
T_REG |
Description
Arguments of this operation are of the same type T_REG.
- How it works:
-
Summ up value of registry number passed to it as first argument with the value of the registry number passed as second argument.
-
Write the result of sum into the registry number passed to it as third parameter.
-
If the written value is
0, then setcarryto1; if the written vaue is!0then setcarryto0.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 5 | sub |
T_REG |
T_REG |
T_REG |
Description
Arguments of this operation are of the same type T_REG.
- How it works:
-
Substract from the value of registry number passed to it as first argument, the value of the registry number passed as second argument.
-
Write the result of sum into the registry number passed to it as third parameter.
-
If the written value is
0, then setcarryto1; if the written vaue is!0then setcarryto0.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 6 | and |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
Description
- How it works:
It performs a bitwise AND operation for the values of the first two arguments and writes the result into the register passed as the third argument
If the written value is 0, then set carry to 1; if the written vaue is !0 then set carry to 0.
First and second arguments can be of different types. Here is how to get values we need:
- Argument #1 or Argument #2 —
T_REG
In this case we take the value from the registry passed as argument.
- Argument #1 or Argument #2 —
T_DIR
In this case we take the value passed as argument.
- Argument #1 or Argument #2 —
T_IND
Calculate the address where to read the value from: current cursor position + <ARGUMENT> % IDX_MOD.
Read four bytes from the memory starting with the address calculated in the previous step. It will be the value we need.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 7 | or |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
Description
- How it works:
Identical to the operation AND described above.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 8 | xor |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
Description
- How it works:
Identical to the operation AND described above.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 9 | zjmp |
T_DIR |
— | — |
Description
This is that function affected by the value of the flag carry.
- How it works:
If carry is equal to 1, this operation performs a 'jump'. It moves cursor to the address: current position + <FIRST_ARGUMENT> % IDX_MOD.
This operation allows us to jump back and forth to different places in memory and not to execute everything by order.
If carry is equal to 0, 'jump' is not performed.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 10 | ldi |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
T_REG |
Description
This operation saves a value into the registry that was passed as third argument.
- How it works:
To get the value, it has to read four bytes from the memory. The address of the bytes to be read is calculated as follows:
current position + `(<VALUE_OF_FIRST_ARGUMENT> + <VALUE_OF_SECOND_ARGUMENT>) % IDX_MOD`.
First and second arguments can be of different types. Here is how to get values we need:
- Argument #1 or Argument #2 —
T_REG
Value from the regisry number passed as argument
- Argument #1 or Argument #2 —
T_DIR
Value passed as argument
- Argument #1 —
T_IND
Read four bytes from the address: current position + <FIRST_ARGUMENT> % IDX_MOD.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 11 | sti |
T_REG |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
Description
This operation writes the value from the registry that was passed as first argument.
- How it works:
Address in memory to write the value to: current position + (<VALUE_OF_SECOND_ARGUMENT> + <VALUE_OF_THIRD_ARGUMENT>) % IDX_MOD.
How to get values is described above.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 12 | fork |
T_DIR |
— | — |
Description
Operation fork creates duplicate of current cursor (process) and places it to the address <FIRST_ARGUMENT> % IDX_MOD.
** The duplicate cursor is identical to the initial one excepting the position**.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 13 | lld |
T_DIR / T_IND
|
T_REG |
— |
Description
It is a more powerful version of operation ld (see above).
The only difference between them is that in case of T_IND type for the first argument, we have to read four bytes from the address: current position + <FIRST_ARGUMENT>. No modulo truncation required.
the problems of original virtual machine
Original VM
corewar, unfortunately malfunctions, and reads two bytes instead of four. Perhaps a this bug is explained by the same lines as the problems in the provided files:... we might have mistaken a bottle of water for a bottle of vodka.
With the argument of type T_DIR it's identical to operation ld.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 14 | lldi |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
T_REG |
Description
It is a more powerful version of operation ldi (see above).
We have to read four bytes from the address: current position + (<VALUE_OF_FIRST_ARGUMENT> + <VALUE_OF_SECOND_ARGUMENT>). No modulo truncation required.
To get values of arguments of type
T_INDwe still need to make modulo:<VALUE_OF_FIRST_ARGUMENT>: read four bytes from current position +
<FIRST_ARGUMENT> % IDX_MOD
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 15 | lfork |
T_DIR |
— | — |
Description
It is a more powerful version of operation fork (see above).
No modulo truncation required.
| Operation code | Operation name | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 16 | aff |
T_REG |
— | — |
Description
This operation takes the value from the registry passed as argument, casts it to the type char and prints ot to the standard output.
(char)(value)
aff in original corewar
In the original vm corewar by default aff is switched off. To see it's output we have to use flag -a.
| Code | Name | Argument #1 | Argument #2 | Argument #3 | Changes carry
|
Description |
|---|---|---|---|---|---|---|
| 1 | live |
T_DIR |
— | — | No | alive |
| 2 | ld |
T_DIR / T_IND
|
T_REG |
— | Yes | load |
| 3 | st |
T_REG |
T_REG / T_IND
|
— | No | store |
| 4 | add |
T_REG |
T_REG |
T_REG |
Yes | addition |
| 5 | sub |
T_REG |
T_REG |
T_REG |
Yes | subtraction |
| 6 | and |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
Yes | bitwise AND (&) |
| 7 | or |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
Yes | bitwise OR (|) |
| 8 | xor |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
Yes | bitwise XOR (^) |
| 9 | zjmp |
T_DIR |
— | — | No | jump if non-zero |
| 10 | ldi |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
T_REG |
No | load index |
| 11 | sti |
T_REG |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
No | store index |
| 12 | fork |
T_DIR |
— | — | No | fork |
| 13 | lld |
T_DIR / T_IND
|
T_REG |
— | Yes | long load |
| 14 | lldi |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
T_REG |
Yes | long load index |
| 15 | lfork |
T_DIR |
— | — | No | long fork |
| 16 | aff |
T_REG |
— | — | No | aff |
One more detail about operations.
Another important parameter is cycles to wait.
This is amount of cycles a process waits before it executes the operation.
For example, if a curssor (process) gets on the operation fork, it must stay idle for 800 cycles before it actually can execute the operation.
And operation ld stops the process for only five cycles.
This parameter is used to create the game mechanics, in which the most effective and useful functions have the highest cost.
| Code | Name | Cycles to wait |
|---|---|---|
| 1 | live |
10 |
| 2 | ld |
5 |
| 3 | st |
5 |
| 4 | add |
10 |
| 5 | sub |
10 |
| 6 | and |
6 |
| 7 | or |
6 |
| 8 | xor |
6 |
| 9 | zjmp |
20 |
| 10 | ldi |
25 |
| 11 | sti |
25 |
| 12 | fork |
800 |
| 13 | lld |
10 |
| 14 | lldi |
50 |
| 15 | lfork |
1000 |
| 16 | aff |
2 |