-
Notifications
You must be signed in to change notification settings - Fork 0
Lexical_analysis
Before the actual translation of the assembly code into the byte-code, program will perform a so called lexical analysis of the source code.
In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an assigned and thus identified meaning).
Source: Lexical analysis — Wikipedia
the campion's code will be parsed into components, each of them being assigned to a certain type.
For example, following string:
loop: sti r1, %:live, %1
... will be decomposed into tokens:
| Nr | Content | Type |
|---|---|---|
| 1 | loop: |
LABEL |
| 2 | sti |
INSTRUCTION |
| 3 | r1 |
REGISTER |
| 4 | , |
SEPARATOR |
| 5 | %:live |
DIRECT_LABEL |
| 6 | , |
SEPARATOR |
| 7 | %1 |
DIRECT |
It is possible to study in detail how the provided translator program identifies each type of token using the error messages it displays.
Brute-forcing original asm with wrong inputs and reading it's error messages we can find out the following:
Places of commands for name and comment definitions can be switched
.comment "This city needs me"
.name "Batman"
STRING token contains everything between opening and closing double quotation marks ", even if there are new line characters \n and comment characters (# or ;) in between.
The following example is absolutely correct:
.name "one
#two
three"
If translator can distinguish two components without whitespaces in between them, spaces can be omitted (valid examples):
live%42
loop:sti r1, %:live, %1
.name"Batman"
.comment"This city needs me"
But obviously there are cases where whitespaces are mandatory.
Arguments can contain -, but a + will lead to an error message.
There also can be present some leading zeroes:
live %0000042
The provided asm recognises as a registry a string containing character r with one or two trailing digits.
The following examples are valid:
r2
r01
r99
Last example deserves some attention.
Lack of error messages on registry 99 means asm doesn't know about virtual machines' configuration and is not connected to the constant REG_NUMBER.
This is absolutely logic. Translator asm and the virtual machine are different programs which are not aware of each other's configuration (or even existence).
Invalid byte-code may become valid (and vice versa) if we change value of the constant REG_NUMBER.
If value of
REG_NUMBERis16, argumentr42is incorrect. If value of constant is49argument becomes valid.
This two arguments are translated without error messages by the original asm. But this contradicts to the subject of the project, which clearly states:
Registry: (
r1<–>rxwithx=REG_NUMBER)
So if we can explain behaviour of translator with the top boundary (it's not connected to REG_NUMBER), arguments r0 and r00 are always invalid.
The size of executable code is not restricted by the translator. It will translate both files with one instruction and with one million instructions.
The size is limited in the virtual machine. The max size is defined in the constant CHAMP_MAX_SIZE in the file op.h:
# define MEM_SIZE (4 * 1024)
// ...
# define CHAMP_MAX_SIZE (MEM_SIZE / 6)
So max size is 682 bytes.
The min size is not limited by the virtual machine (it will run a champion without executable code).
But it is quiet difficult to create such a file. If .s file contains name and comment of champion and nothing more, it is invalid. But if we add at least one label to the end of file, it will be successfully translated.
.comment "This city needs me"
.name "Batman"
loop:
# EOF