Bottom of Chapter     Contents     Previous Chapter     Next Chapter

2
: A Reduced Java Model

Because it is beyond the scope of this thesis to explore a completely reworked Java language system based on statement trees, it is necessary for us to reduce the Java language and its run-time environment to one much less burdened by complexity.

The Picojava model was the reduced language system developed and chosen to form the basis of both statement tree and code interpreters implemented to test this thesis.

There are two components of the Picojava model, just as there are two components of the Java system:

Many elements of the Java source language which were considered non-essential to this project were removed, and as a consequence many instructions corresponding to these language features in the Java VM were also removed.

Because the Java VM instruction set has many perculiarities in terms of unequal support for data types, and contains instructions which perform multiple language operations at once, other changes have been made to the Picojava instruction set to give it more symmetry and to make it more directly comparable with the statement-based approach.

2.1 Concrete Lexicon and Syntax

The Picojava source and assembly languages have no defined concrete lexicon or syntax. The source language has an abstract syntax given in Appendix A, and the assembly language has an abstract syntax given in Appendix B. The only fundamental lexical elements used both in these syntaxes are literals of the supported data types (§2.3) and identifiers (§2.2). A concrete representation of these elements is not formally defined, although a representation will be suggested so that Picojava programs may be more readily constructed and analysed. This will be the approach taken when describing all Picojava constructs.

In addition, the abstract syntax for the assembly language has an additional fundamental lexical element, an index. This index is a simple non-negative integer, and has two distinct purposes within the grammar:

When specifying Picojava source language programs in this thesis, we will use a concrete syntax based on that of that of the Java source language. It is the intention that programs written in Picojava form a subset of programs possible in Java. The one important difference which makes some non-trivial translation necessary is in the Picojava implementation of the return statement and corresponding instructions areturn, ireturn, and breturn. The assignment statement is extended to accomodate the semantics of return (§2.17.9) and these are carried through into new definitions for the return instructions (§2.15.3).

An example of the concrete syntax for the source language is:


class EmptyLoop
{
  EmptyLoop()
  {
  }
  

  void main()
  {

    int count;
    

    while (((count)<(1000))) {
      count=((count)+(1));
    } 
  } 
}

The Picojava assembly language programs used in this thesis will be specified in a syntax which approximates the output of the javap utility when disassembling class files. However, many elements are borrowed from the Java source language, such as the bracketing of classes and methods.

An example of this syntax, which has equivalent semantics to the previous example, is:


class EmptyLoop
{
  EmptyLoop()
  {
  }
  

  void main()
  {

    int count;
    

    0 goto 5

    1 iload (count)

    2 iconst (1)

    3 iadd

    4 istore (count)

    5 iconst (1000)

    6 iload (count)

    7 icmplt

    8 iftrue 1
  }
}

2.2 Identifiers

An identifier is a lexical element of the Picojava language used to identify a class, a method or a variable. When creating a Picojava program, one must also create the identifiers used in it. This is quite separate from the run-time concepts of a defining instance and a referring instance (which is explained in §3.1.1).

Once an identifier is created, it may be inserted into the Picojava program at many points to attain the required semantics. At run-time, the same identifier may refer to many different entities, but in any context, scoping rules will leave only one meaning relevant at each point it occurs with no ambiguity.

For the purposes of including the abstract identifiers in Picojava in a concrete listing, these identifiers are also often given a name in the same way a Java identifier.

2.2.1 Qualified Identifiers

Because Picojava programs allow variables of primitive types, and even further objects, to exist as fields within objects, there is a necessity for the concept of a qualified identifier to refer to components of objects. Also, the language requires some way of specifying the object that contains the method that to be called. Therefore, qualified identifiers are used in all contexts where a reference to a method, an existing variable of a primitive type or an existing object reference is required.

A qualified identifier consists of an ordered series of identifiers which, together, refer to a field of some arbitrarily nested object, which may be of a primitive type or an object reference itself. Each identifier is dereferenced until a single identifier remains, and the meaning of this is looked up at the point in the object tree thereby reached.

For the purposes of including the qualified identifiers in Picojava in a concrete listing, we use a syntax equivalent to that used to specify a Java qualified identifier. That is, the name given to each identifier in the qualified identifier is listed, each name separated from the preceeding one by a period character.

Qualified identifiers may refer to the special object this in their initial component. This refers to the object containing the currently executing method. In Picojava, it is not possible to refer to a field without identifying both the object containing the field and the field itself. Therefore this should always be used explicitly within a class to refer to its own instance variables.

2.3 Literals

A literal is a direct representation of a value of a primitive type (§2.4.1) or the null type (§2.4). Unlike identifiers (§2.2), you may create the same literal as many times as you like, because it is not the instance of the literal but the value which is important to semantics.

The null type has one value, the null reference, denoted in listings by the literal null. In the Picojava source language, which does not have a terminal representation of the null value or type, the statement Statement_Action_Evaluate_Constant_ Structured is used, which will return the null reference. Similarly there is an instruction, Instruction_Action_Evaluate_Constant_Structured, in the Picojava assembly language which will return the null reference.

The boolean type has two values, denoted in listings by the literals true and false. The int type has 4 294 967 296 values, denoted in listings by numeric literals in the range -2 147 483 648 to -1, and 0 to 2 147 483 647.

2.4 Types And Values

Picojava, like Java, is a strongly typed language. However, typing errors will typically be reflected by exceptions occuring in the interpreter at run-time, and perhaps not even then. It is not Picojava's aim to provide a safe programming environment, and ultimate responsibility lies with the programmer to create programs which obey the precepts of strong typing.

Picojava types are divided into two catagories: primitive types and structured types (called reference types in Java). The null type is a special structured type with a single value denoting the null structure. The null type cannot be referenced within Picojava programs, as is the case with this type in Java. The null type is compatible with all structured types[1].

Corresponding to the categories of Picojava types, there are two categories of Picojava values: primitive values and structured values. As for Java, these values can be stored in variables or formal parameters to a method, passed as actual parameters to a method, returned from methods, and be operated upon.

2.4.1 Primitive Types And Values

The primitive types Picojava defines are the boolean type and the int type and correspond to the respective primitive types of the same names defined within Java. Within the Picojava grammar, the collective name Algebraic is used to represent these primitive types. Also within this grammar, the int type is referred to as Algebraic_SignedInteger5 and the boolean type is Algebraic_Boolean.

As noted in §2.3, the boolean type has two values, true and false and the int type has 4 294 967 296 values, consisting of integers in the range -2 147 483 648 to -1, and 0 to 2 147 483 647.

For the sort of empirical benchmarks the Picojava interpreter will be running, there will be no need for larger or shorter integer types than the mid-range 32-bit int type. Integers will mainly be required as commodities for benchmarking the integer operations (where the actual values are irrelevant) and as counters for loops (where the values will be relatively small). Therefore, there will be no need to support further numeric types. The char type is unnecessary as the Picojava interpreter will not be running programs that will require any input or output.

2.4.2 Operators On Integral Values

Picojava reduces the wealth of operators available in Java to those essential for basic numeric comparison and arithmetic. As for Java, the integer operators do not indicate overflow or underflow, and the values simply wrap around in these situations. An attempt within the Picojava interpreter to divide by zero will result in an exception within the interpreter.

The Picojava source language supports the following operations on integer operands which return integer results:

Source language grammar production name Concrete representation
Statement_Action_Evaluate_BinaryOp_Sum (<augend>+<addend>)
Statement_Action_Evaluate_BinaryOp_Difference (<minuend>-<subtrahend>)
Statement_Action_Evaluate_BinaryOp_Product (<multiplicand>*<multiplier>)
Statement_Action_Evaluate_BinaryOp_Quotient (<dividend>/<divisor>)

↑ Table 2.1 Statements For Integer Operations

The Picojava assembly language has the following corresponding operations:

Assembly language grammar production name Concrete representation
Instruction_Action_Evaluate_BinaryOp_Sum iadd
Instruction_Action_Evaluate_BinaryOp_Difference isub
Instruction_Action_Evaluate_BinaryOp_Product imul
Instruction_Action_Evaluate_BinaryOp_Quotient idiv

↑ Table 2.2 Instructions For Integer Operations

There is no incentive to provide any further integer arithmetic operations over these, as all others provided by Java can either be emulated conveniently within this set (e.g. unary negation of value becomes ((0)-(value))), or are unlikely to be used by benchmarks (e.g. shift operations).

The Picojava source language supports the following operations on integer operands which return boolean results:

Source language grammar production name Concrete representation
Statement_Action_Evaluate_BinaryOp_Less (<minuend><<subtrahend>)
Statement_Action_Evaluate_BinaryOp_Equal (<minuend>=<subtrahend>)

↑ Table 2.3 Statements For Integer Comparison Operations

Corresponding to these, the Picojava assembly language has the following operations:

Assembly language grammar production name Concrete representation
Instruction_Action_Evaluate_BinaryOp_Sum icmplt
Instruction_Action_Evaluate_BinaryOp_Difference icmpeq

↑ Table 2.4 Instructions For Integer Comparison Operations

Again, it is not necessary to provide any further integer comparison operations over these, as all others provided by Java can either be emulated conveniently within this set (e.g.  ).

2.4.3 Operators On Boolean Values

The operators provided for the boolean type to those which are absolutely essential. Only Boolean expressions can be used in Picojava's flow control statements, as is the case for Java. If an attempt is made to use an expression which evaluates to something other than a boolean value, the Picojava interpreter will stop with an exception.

The Picojava source language supports the following operations on boolean operands (which return boolean results):

Source language grammar production name Concrete representation
Statement_Action_Evaluate_UnaryOp_Invert (!<operand>)
Statement_Action_Evaluate_BinaryOp_ConditionalOr (<operand>||<operand>)
Statement_Action_Evaluate_BinaryOp_ConditionalAnd (<operand>&&<operand>)

↑ Table 2.5 Statements For Boolean Operations

For details on how these statements are used to compose expressions, see §2.17.

The corresponding operations in the assembly language are as follows:

Assembly language grammar production name Concrete representation
Instruction_Action_Evaluate_UnaryOp_Invert bnot
Instruction_Action_Evaluate_BinaryOp_ConditionalOr bor
Instruction_Action_Evaluate_BinaryOp_ConditionalAnd band

↑ Table 2.6 Instructions For Boolean Operations

A precise definition of the action of these instructions is given in §2.16.6.

There is no need to provide any further Boolean operations, as these can be conveniently emulated as required, their occurence being quite rare relatively speaking. It should be noted that Instruction_Action_Evaluate_BinaryOp_ ConditionalOr and Instruction_Action_Evaluate_BinaryOp_ConditionalAnd are conditional only in the sense that the instruction will not consider the second operand if the first determines the result- both operands will still need to be evaluated. The behaviour of Statement_Action_Evaluate_BinaryOp_ConditionalOr and Statement_ Action_Evaluate_BinaryOp_ConditionalAnd (see §2.17.4) would therefore need to be emulated by a code sequence including a conditional branch, and not merely through a single corresponding instruction.

2.4.4 Structured Types, Objects And Structured Values

In Picojava there is one reference type (called a Structured type within Picojava), corresponding to Java's class type. There are no interface types or array types. These two forms of reference types increase the expressive power of the Java language (and in the case of arrays, also allow for an efficient machine level implementation) but Picojava has equivalent modelling power, in that it can do without these constructs and still allow the creation of semantically equivalent programs.

As in Java, an object (or Structured_Object, as it is called in the Picojava grammars) is a dynamically created class instance, created by a class instance creation expression. The reference values are pointers to these objects or the null reference, which refers to no object. Many references can refer to the same object, with semantics as would be expected for a Java-like language. There are no mechanisms which allow the explicit destruction of an object which has been previously created.

The main differences in Picojava are that reference types do not form a hierarchy (i.e. there is no inheritance[2]), and there are no predefined classes (hence there is no Object class). Inheritance can be removed because Picojava is not intended to be a programming language in which useful programs can be written, and benchmarking programs, unless testing the efficiency of inheritance itself, can do without it.

Under the source language, the object creation expression is defined by the production named Statement_Action_Create_Structured_Object. The assembly language instruction with corresponding semantics is Instruction_Action_Create_ Structured_Object. The concrete representations are new <class>(<parameter>, ...) and new_<parameter count> (<class>) respectively.

2.4.5 Operators On Objects

The operators available for objects are field access, method call and comparison for equality (congruence).

Field access is achieved through Statement_Action_Reference_<field type> (in the Picojava source language) or through Instruction_Action_Load_<field type> and Instruction_Action_Store_<field type> (in the Picojava assembly language).

Method call is achieved by Statement_Action_Evaluate_Method (in the Picojava source language) or by Instruction_Action_Evaluate_Method (in the Picojava assembly language).

Comparison for equality is an operator which takes two structured operands and returns a boolean. In the source language it is defined by the production Statement_ Action_Evaluate_BinaryOp_Congruence and in the assembly language the production is Instruction_Action_Evaluate_BinaryOp_Congruence. It is intended this comparison be used in conjunction with the null literal to determine if a structure is equivalent to the null structure (i.e. is a reference to no object). Any other uses may have implementation dependent behaviour.

2.5 Variables

A variable is a storage location which has a type (either a primitive type or a structured type, as discussed in §2.4) associated with it. A variable always contains a value which belongs to its type, primitive or structured.

There are four kinds of variables used by Picojava:

  1. An instance variable is a field declared within a class declaration. If a class  has a field  , then a new instance variable  is created and initialized to a default value (see §2.5.1) as part of each newly created object of class  . The instance variable is effecively destroyed when the object of which it is a field is no longer referenced.
  2. Method parameters name argument values passed to a method. For every parameter declared in a method declaration, a new parameter variable is created each time that method is invoked. The new variable is assigned with the corresponding actual parameter value from the method invocation. The variable is accessible until it ceases to exist or its identifier is reused by a subsequently invoked method, constructor or nested block. The method parameter ceases to exist when the execution of the method body is complete.
  3. Constructor parameters name argument values passed to a constructor. For every parameter declared in a constructor declaration. For every parameter declared in a constructor declaration, a new parameter variable is created each time a class instance creation expression referring to the parent class is invoked. The new variable is assigned with the corresponding actual parameter value from the creation expression. The variable is accessible until it ceases to exist or its identifier is reused by a subsequently invoked method, constructor or nested block. The constructor parameter ceases to exist when the execution of the constructor body is complete.
  4. Local variables are declared by local variable declaration statements. Whenever the flow of control enters a block, a new variable is created and initialized to a default value (see §2.5.1) for each local variable declared in a local variable declaration clause immediately inside that block. The variable is accessible until it ceases to exist or its identifier is reused by a subsequently invoked method, constructor or nested block. The local variable ceases to exist when the execution of the immediate parent block is complete.

2.5.1 Initial Values Of Variables

Every variable in a Picojava program has some value from the moment it is created. The initial value of each type of variable is defined as follows:

Note that constructor and method parameters immediately have this initial value replaced by the value of the actual parameter used in the invoking expression.

2.6 Conversions And Promotions

In Picojava, there is only one type of conversion, the identity conversion. An identity conversion is permitted for any type, and is a conversion from a type to that same type. This is actually consistent with Java in that there are only two primitive types, int and boolean (which are not convertable), and there is no inheritance (so no two classes are convertable to one another).

Conversion is performed in two contexts:

2.7 Names And Packages

2.7.1 Names

Picojava does not define the syntax of names, because names have no place in the abstract syntax of Picojava. Identifiers (§2.1) and qualified identifiers (§2.1.1) suffice to specify entities in the Picojava language.

Names are often given in concrete Picojava listings, and in these cases the names follow the ordinary Java conventions, as noted earlier.

2.7.2 Packages

There is no such thing as a package in Picojava. Packages are convenient for constructing modular programs, but Picojava is not intended to provide facilities for building anything but the most elementary of software systems.

Intuitively, the behaviour of Picojava is akin to that of Java where all classes belong to the unnamed package. However, the concept of this package can be successfully merged with the broader concept of a compilation unit (the root structure of the abstract Picojava tree, CompilationUnit). The key difference is that compilation units are completely disjoint and may not refer to each other.

2.7.3 Members

The compilation unit has members which are the classes (the only structured type declarations permissible in Picojava) of a program. Structured types also have internal structure, and elements of this structure are similarly referred to as members. The members of structured types are therefore either fields or methods.

2.7.4 The Members Of A Class Type

The members of a class type (called ClassBased_Class in the grammar) are simply the fields and methods belonging to the definition of that class. Classes may have a method and a field with the same identifier, as is the case in Java, but overloading of methods is not possible.

Constructors are considered to be special methods within a class which can only be executed once- when an instance of a class is instantiated. Constructors do not have identifiers, and so multiple constructors must be distinguished by a different means than ordinary methods. Typically this would be through the type signature of the parameters, but in a simple interpreter, an index to the correct constructor may be preferred.

2.8 Classes

A class declaration specifies a new structured type and provides details of its implementation. The relevant Picojava production in both the source and assembly language grammars is ClassBased_Class. Each class is disjoint, but a class may instantiate objects of another class.

The body of a class declares its members (fields and methods, including constructors). Depending on whether the Picojava source language or assembly language is being used, the way the methods are coded will be different.

One class, called the main class, must contain a method with no parameters called main. This class should be the first one declared in a Picojava program, unless the interpreter provides some external way of specifying the main class at run-time.

2.8.1 Class Identifiers

The identifier of a class is used within class instance creation expressions when creating an instance of that class, and within declarations to specify the type of a structured variable.

2.8.2 Class Modifiers

Picojava provides no class modifiers, for similar reasons to those for justifying the omission of packages (§2.7.2).

2.8.3 Superclasses And Subclasses

Picojava does not support inheritance, so the concepts of subclass and superclass are lost.

2.8.4 The Class Members

The production for ClassBased_Class in both grammars refers to components defined by the productions Declaration and Method. The definition of Method is different for each approach, with the Method production in the source language grammar referring to a Block consisting of Statements, wheras the assembly language grammar refers to a Block consisting of Instructions. Declarations immediately within a class simply declare the instance variables, or fields, of that class. A class must always contain at least one constructor.

2.9 Fields

Fields are members of the Picojava ClassBased_Class defined by the class of productions beginning with the prefix Declaration. Fields in Picojava are always instance variables, and thus the two main classes of Declarations are Declaration_Algebraic (consisting of Declaration_Algebraic_SignedInteger5 and Declaration_Algebraic_Boolean) and Declaration_Structured.

Each Declaration_Algebraic field has only one identifier that gives the name that will be used to refer to that variable within instances of the class (i.e. a defining instance of that identifer). The type is given by the suffix of the actual production used to declare the field.

Each Declaration_Structured field has two identifiers, one specifying the identifier which will be used to refer to that variable in future (i.e. a defining instance) and the identifier of the class which gives the type of that variable (i.e. a referring instance).

There is no difference between the specification of a field in the source language grammar and the assembly language grammar.

2.9.1 Field Modifiers

Picojava defines no field modifiers. Fields may be thought of as intuitively as having public access, and are never shared across instances (i.e. they are instance variables: not static and therefore not class variables).

2.10 Methods And Constructors

Methods declare executable code that can be called upon to return a value (or no value) given a fixed number of arguments. Method declarations are members of a ClassBased_Class production. As Picojava does not support inheritance, there is no facility for overriding a method's behaviour.

Methods have an identifier which is used to distinguish it from other methods within a class. However, special Methods called constructors are given no name, and these are called only when a new instance of a class is created. It is necessary to specify in some other way which constructor should be called, perhaps by a simple index. Constructors cannot return a value- the instance creation expression which causes them to be invoked returns the new instance of the class.

Method declarations consist of a series of formal parameters and a method body, called the Block (§2.14). Under the source language grammar the executable part of a Block consists of Statements (§2.15&#137;) while under the assembly language grammar, Instructions (§2.16&#135;) compose the Block.

2.10.1 Formal Parameters

The formal parameters for a Method (constructor or otherwise) are given by a series of members with the prefix Parameter. The formal parameters are the variables to which the actual parameters are assigned for the duration of the method or constructor call (see §2.5), and thus the two main classes of Parameters are Parameter_Algebraic (consisting of Parameter_Algebraic_SignedInteger5 and Parameter_Algebraic_ Boolean) and Parameter_Structured.

Each Parameter_Algebraic field has only one identifier that gives the name that will be used to refer to that parameter within the Block of the method (i.e. a defining instance). The type is given by the suffix of the actual production used to specify the formal parameter.

Each Parameter_Structured field has two identifiers, one specifying the identifier which will be used to refer to that parameter within the Block (i.e. a defining instance) and the identifier of the class which gives the type of that variable (i.e. a referring instance).

Once created, the formal parameter is immediately assigned with the corresponding actual parameter specified in the method call expression, or in the case of a constructor, the relevant actual parameter in the class instance creation expression.

There is no difference between the specification or semantic behaviour of a formal parameter in the source language grammar and the assembly language grammar.

2.10.2 Method Modifiers

Picojava defines no method modifiers. Methods may be thought of as intuitively as having public access, and are never called without reference to an instance (i.e. they are instance methods: not static and therefore not class methods).

Even the main method of the main class has a specially created instance in which to run. This instance has no name, but can be referred to using the special this provision in qualified identifers (§2.2.1).

2.11 Static Initializers

Picojava does not support static initializers in any form. All variables begin with the respective default values for their types.

2.12 Interfaces

As Picojava does not support inheritance, and interfaces provide for inheritance-like hierarchies of classes, there are no interfaces in Picojava.

2.13 Arrays

Arrays merely provide for a convenient and more efficient implementation of list-like structures at the machine level, and as such they are not necessary in a reduced language model like Picojava.

2.14 Blocks

A Block forms the executable component of a Method, or the body of a loop or conditional construct in the Picojava source language. In the Picojava source language, a Block is composed of Declarations and Statements (§2.15&#137;), while in the Picojava assembly language, a Block is composed of Declarations and Instructions (§2.16).

The Declarations specify the local variables to be created, and the ordered list of Statements or Instructions determine how execution should proceed.

2.14.1 Local Variable Declarations

The local variable declarations are common to the forms of Block used under both the source language and assembly language grammars. Local variables can be referred to throughout their immediately enclosing block, as well as within enclosed blocks (and even within method invocations[3]) which do not redefine them[4]. This domain of reference is referred to as the scope of the local variable.

Local variables are declared by the class of productions beginning with the prefix Declaration. The two main classes of Declarations are Declaration_Algebraic (consisting of Declaration_Algebraic_SignedInteger5 and Declaration_Algebraic_ Boolean) and Declaration_Structured.

Each Declaration_Algebraic field has only one identifier that gives the name that will be used to refer to that variable within the scope of the immediately enclosing block (i.e. a defining instance of that identifer). The type is given by the suffix of the actual production used to declare the field.

Each Declaration_Structured field has two identifiers, one specifying the identifier which will be used to refer to that variable in the scope of the immediately enclosing block (i.e. a defining instance) and the identifier of the class which gives the type of that variable (i.e. a referring instance).

2.15 Statements

The defining component of the Picojava source language grammar is the Statement production, which is used to build up the executable portion of Blocks in that grammar. There are two broad classes of statements: those used for flow control (conditional and iteration statements) and those which actually compute (expression statements).

Statements are listed in order within a Block, and are executed in the same order. But as a Statement may typically be composed of further Blocks or Statements, the invocation pattern of the atomic statements is that of a top to bottom depth-first tree traversal. There is no explicit branching[5] from statement to statement, although there is the possibility that the component Block of a flow control statement will be executed once, many times or not at all.

2.15.1 Conditional Statement

The conditional statement in Java is the if statement. The corresponding Picojava statement is Statement_Conditional, which consists of an expression statement (Statement_Action) which evaluates to a boolean value and a Block.

The Block is executed if and only if the expression (called the condition) evaluates to true. There is no action if the expression evaluates to false (i.e. Picojava does not support an else clause).

2.15.2 Iteration Statement

The only Java iteration statement defined by Picojava is the while statement, with its counterpart in Picojava being called Statement_Iteration. Statement_Iteration consists of a conditional statement (Statement_Conditional) which is executed repeatedly while its condition holds.

The for and do instructions available in Java are considered to be unnecessary luxuries in Picojava, and in all cases their behaviour can be emulated by a suitably constructed Statement_Iteration (and possibly also by prefixing it with replication of the Block comprising the Statement_Conditional).

2.15.3 Expression Statements

The Picojava expression statements are those which actually perform computations, and potentially return results. All these forms of Statements within the Picojava source language grammar are prefixed by Statement_Action. To see how these statements may be composed to form expressions, see §2.17.

2.15.4 Empty Statement

Picojava defines no empty statement. It serves no purpose.

2.15.5 Escape Statements

Picojava provides no statement to prematurely exit from a Block. Such constructs are rarely used to great advantage in Java programs anyway. Even Picojava's equivalent of the return statement (see §2.17.9) does not cause premature exit from a Block, essentially for reasons of simplicity and for consistency with the overall philosophy of having no escape statements.

2.15.6 Managed Statement

There is no equivalent to Java's try statement in Picojava to allow the management of exceptions because there is no exception mechanism in Picojava.

2.15.7 Unreachable Statements

Picojava does not concern itself with unreachable statements.

2.16 Instructions

The defining component of the Picojava assembly language grammar is the Instruction production, which is used to build up the executable portion of Blocks in that grammar. There are two broad classes of instructions: those used for flow control (unconditional and conditional branch instructions) and those which actually compute (i.e. all other instructions).

Instructions cannot be embedded within each other, and each has either no parameter, or a single simple parameter. All instructions within a Block lie immediately within that Block. This contrasts with the way Statements are composed, as they are nested to produce desired program flows and computational effects. Therefore, for the purposes of flow control, all instructions are numbered, and these numbers are used when transferring control within a block using branch instructions. Similarly, to perform computations, an operand stack is maintained during the execution of the block. The operand stack is capable of storing values of all Picojava types (§2.4).

Instructions are executed in linear sequential fashion, until a branch is taken. At that point the execution resumes (in linear sequential fashion) with the instruction indexed by the branch. Each non-branch instruction modifies the stack in some way with some goal in mind. A typical behaviour is for operands to be placed on the stack by data transfer instructions, which are then removed and manipulated by arithmetic or logic instructions.

2.16.1 Unconditional Branch Instruction

As in the Java VM, the unconditional branch instruction has the mnemonic goto in assembly listings. The production for this instruction is called Instruction_Branch. Its action is to cause execution to continue from the instruction in the same Block indexed by its single parameter.

2.16.2 Conditional Branch Instructions

There are two conditional branch instructions in Picojava, each of which removes the top value from the stack (which must be Algebraic_Boolean) and decides whether to branch based on the value. The first instruction is given by the production Instruction_Branch_ConditionalTrue (which is given the mnemonic iftrue in concrete listings) and the second is given by Instruction_Branch_ConditionalFalse (with mnemonic iffalse).

While there is no direct equivalent to these two instructions in the Java VM, they do provide for a behaviour which more closely matches the conditional and iteration statements of the source language interpreter. To allow the assembly language to achieve a comparison and branch in one instruction, as is possible in the Java VM, (using if_icmplt, for example) would have been to give an unfair advantage. Anyway, it would be possible in a full abstract tree representation of Java to provide an "if-int-negative" statement, if such a thing might aid efficiency in real-world applications.

2.15.3 Data Transfer Instructions

The class of data transfer instructions emcompasses all those which take the value of a variable and place it on the operand stack (Instruction_Action_Load) as well as all those which remove a value from the operand stack and insert it into a variable (Instruction_Action_Store). In the Picojava source grammar, the analogous operations are achieved by reference (Statement_Action_Reference) and assignment (Statement_Action_Assign) expression statements respectively, although both these operations need not use an explicit operand stack.

There is one load instruction for each data type, viz:

Assembly language grammar production name Concrete representation
Instruction_Action_Load_Algebraic_SignedInteger5 iload
Instruction_Action_Load_Algebraic_Boolean bload
Instruction_Action_Load_Structured_Object aload

↑ Table 2.7 Load Instructions

Similarly, there is one store instruction for each data type, viz:

Assembly language grammar production name Concrete representation
Instruction_Action_Store_Algebraic_SignedInteger5 istore
Instruction_Action_Store_Algebraic_Boolean bstore
Instruction_Action_Store_Structured_Object astore

↑ Table 2.8 Store Instructions

Each store instruction, which typically takes as a parameter a qualified identifier that specifies the variable to be altered, has a degenerate form which does not reference a variable. These instructions are the return instructions ireturn, breturn and areturn, and are similar to the Java VM instructions with the same mnemonics, as they specify the value to be returned from the currently executing Method. When control returns from the current Method, the value on the stack when the last return instruction was executed is used as the return value.

The behaviour of these return instructions differs from the Java VM versions in that they do not cause premature termination of the method. This is for the sake of simplicity, and is in line with the behaviour of the statement interpreter (see §2.17.9...).

2.16.4 Class Instance Creation Instruction

The Picojava class instance creation instruction is given by the production Instruction_Action_Create_Object. The action of this instruction is different from the Java VM's new instruction in that it will also invoke the constructor. This does away with the need for following the new instruction with dup and invokespecial. As there are no semantic equivalents to these operations in the Picojava source language (Statement_Action_Create_Object creates the object and calls the constructor at once), and there is no need for there to be, the class instance creation instruction is directly equivalent in its action to the class instance creation statement.

Because Instruction_Action_Create_Object effectively combines the behaviour of two Java VM instructions, there needs to be an extra parameter to Instruction_Action_Create_Object which specifies which constructor to call. This is in addition to the identifier of the prototype class. A class instance creation expression can implicitly determine which constructor is appropriate from the parameters given, but an instruction needs to be given this information explicitly.

This extra parameter is an index specifying which constructor to call, with the first unidentified method in a class being numbered 0, the second 1, and so on. In concrete form, the new instruction is:

new_<parameter count> <identifier>

The parameter count (as opposed to the constructor index) is given because it is more useful in a block of instruction-based code to know how many stack operands will be consumed than to be able to identify which constructor has been called. The format also is more in line with the notation of the Java VM assembly language, where no instruction has more than a single operand.

In detail, the action of Instruction_Action_Create_Object is to create a new class instance and call the constructor indicated. Calling the constructor entails removing the required number of actual parameters from the stack and assigning them to formal parameters for use within the constructor. The first operand (if required) is the one on the top of the stack, the second operand (if required) is the one immediately below the first, and so on.

2.16.5 Arithmetic And Logic Instructions

The Picojava arithmetic and logic instructions (Instruction_Action_Evaluate) remove their operands from the operand stack and act upon them in some way before placing a result on the operand stack in their place. Note that the operands for a Picojava instructions are in reverse order when compared with Java VM instructions (i.e. the first operand is on the top of the stack for all operations).

The productions for instructions beginning with Instruction_Action_Evaluate_ Constant take no operand from the stack, and simply place the constant value specified in their parameter on the operand stack. The type of the constant returned is given by the suffix used in the production (and the initial letter of the concrete representation, in like fashion to the Java assembly language).

Assembly language grammar production name Concrete rep.
Instruction_Action_Evaluate_Constant_Algebraic_SignedInteger5 iconst
Instruction_Action_Evaluate_Constant_Algebraic_Boolean bconst
Instruction_Action_Evaluate_Constant_Structured_Object aconst

↑ Table 2.9 Constant Generation Instructions

Instruction_Action_Evaluate_Constant_Structured_Object is degenerate, as the only valid constant value is the literal null, and therefore gives the only possible value of the parameter. For this reason, the abstract grammar considers Instruction_Action_Evaluate_Constant_Structured_Object to be a terminal production (i.e. having no parameter), even though concrete representations give the parameter (i.e. aconst (null)).

Productions beginning with Instruction_Action_Evaluate_UnaryOp take one operand from the stack, perform an operation on it, and place the result back on the stack. The only unary operation recognised by Picojava is Boolean inversion, specified by the production Instruction_Action_Evaluate_UnaryOp_Invert with concrete representation bnot. This operation takes a boolean value from the stack and replaces it with its opposite truth value (i.e. true becomes false, false becomes true).

Operations which take two operands from the stack, perform an operation, and place a result on the stack are specified by productions beginning with Instruction_ Action_Evaluate_BinaryOp.

Assembly language grammar production name Concrete representation
Instruction_Action_Evaluate_BinaryOp_Sum iadd
Instruction_Action_Evaluate_BinaryOp_Difference isub
Instruction_Action_Evaluate_BinaryOp_Product imul
Instruction_Action_Evaluate_BinaryOp_Quotient idiv
Instruction_Action_Evaluate_BinaryOp_ConditionalOr bor
Instruction_Action_Evaluate_BinaryOp_ConditionalAnd band
Instruction_Action_Evaluate_BinaryOp_Less icmplt
Instruction_Action_Evaluate_BinaryOp_Equal icmpeq
Instruction_Action_Evaluate_BinaryOp_Congruent acmpeq

↑ Table 2.10 Instructions For Binary Operations

Sum, Difference, Product and Quotient take two int parameters from the operand stack and replace them with one int result.

Instruction_Action suffix Stack before Stack after
Evaluate_BinaryOp_Sum
Evaluate_BinaryOp_Difference
Evaluate_BinaryOp_Product
Evaluate_BinaryOp_Quotient

↑ Table 2.11 Behaviour Of Integer Binary Operation Instructions

If a sum overflows, the result value will wrap around to a negative value. If a difference underflows, the result will wrap around to a positive value. The product given will be the low 32 bits thereof, and will be interpreted as negative if bit 31 of that result is set. Quotient may underflow to zero, and division by zero will cause an exception in the interpreter. This is as per the behaviour for the corresponding Java integer operations.

ConditionalOr and ConditionalAnd take two boolean parameters and replace them with a single boolean result.

Instruction_Action suffix Stack before Stack after
Evaluate_BinaryOp_ConditionalOr
Evaluate_BinaryOp_ConditionalAnd

↑ Table 2.12 Behaviour Of Boolean Binary Operation Instructions

Less and Equal take two int parameters and return a boolean result.

Instruction_Action suffix Condition Stack before Stack after
Evaluate_BinaryOp_Less
Evaluate_BinaryOp_Equal

↑ Table 2.13 Behaviour Of Integer Comparison Instructions

Congruent takes two Structured_Object parameters,  and  , and returns a boolean result,  .

2.16.6 Method Invocation Instruction

The Picojava method invocation instruction is specified by the production Instruction_Action_Evaluate_Method. The parameter is the qualified identifier which identifies the method to be invoked and the object upon which the method will operate. This differs from the behaviour of the Java VM's invokevirtual instruction, which requires that the object be provided as the first parameter on the operand stack.

As was the case for the class instance creation instruction, the parameter count is given in the concrete representation of the Picojava invokevirtual instruction to indicate how many stack operands will be consumed. Therefore the concrete form of the invokevirtual instruction is:

invokevirtual_<parameter count> <qualified identifier>

In detail, the action of Instruction_Action_Evaluate_Method is to call the identified method for the identified object, removing the required number of actual parameters from the stack and assigning them to formal parameters for use within the method. The first operand (if required) is the one on the top of the stack, the second operand (if required) is the one immediately below the first, and so on.

2.17 Expressions

Expressions are a construct specific to the Picojava source language. Expressions are made up of productions from the Statement_Action class of statements. The behaviour that all forms of Statement_Action have in common is that they have the capacity to return a result to a Statement further up the code tree.

Unlike the assembly language, there is no need to specify an explicit operand stack in order to define the behaviour of statements. Statements effectively call their component statements and receive results directly from them. These results, in turn, can be operated upon and passed to the parent statement.

When a Statement_Action node is composed of further Statement_Action nodes, as is the case in unary and binary operations and method and constructor calls, these further nodes are referred to subexpressions. Subexpressions are evaluated in the same way as all expressions, but this evaluation occurs in a particular order and only if this evaluation is required (notice the behaviour of the conditional Boolean binary operator statements), with their values being operated upon by parent superexpressions as required.

A Statement_Action which is a component of a Block has no parent Statement to return a value to. Therefore, the value is discarded. A Statement_Action which returns no value (e.g. a call to a method with no return parameter) cannot be used within a Statement which requires a value be returned.

2.17.1 Variables As Values

One form of Statement_Action which can have no further Statement_Action components is Statement_Action_Reference. Statement_Action_Reference refers to a variable by its qualified identifier. When the reference does not form part of an assignment statement (§2.17.9), the result returned to any parent Statements is the value of the variable referenced.

2.17.2 Type Of An Expression

The type of an expression or subexpression depends solely on the type returned by the root Statement_Action node of the expression tree under consideration:

2.17.3 Evaluation Order

A statement's operands are evaluated in the order in which they appear in the source language grammar. This corresponds to Java's evaluation order. For method and constructor calls, this means that actual parameter expressions are evaluated from left to right. For binary operations, the left operand is evaluated first, and then the right is evaluated only if required.

It is the case that the Boolean binary operation statements in Picojava will not evaluate the right-hand operand in certain situations:

Source language grammar production name Left-hand operand value
Statement_Action_Evaluate_BinaryOp_ConditionalAnd false
Statement_Action_Evaluate_BinaryOp_ConditionalOr true

↑ Table 2.14 Cases Where Boolean Binary Operation Statements Will Not Evaluate Their Right-Hand Operand

2.17.4 Constant Generation Statements

The following are the constant generation statements avaliable in Picojava:

Source language grammar production name Concrete example
Statement_Action_Evaluate_Constant_Algebraic_SignedInteger5 10
Statement_Action_Evaluate_Constant_Algebraic_Boolean true
Statement_Action_Evaluate_Constant_Structured_Object null

↑ Table 2.15 Constant Generation Statements

Each returns a constant value given by the component literal embedded within it and of the type given by the suffix to the statement. Note that Statement_Action_ Evaluate_Constant_Structured_Object has only one possible value, so it has no embedded data and is therefore a terminal production.

2.17.5 Class Instance Creation Statement

The Picojava class instance creation statement is Statement_Action_Create_Object. It is composed of an identifier which gives the prototype class for the object to be created, and zero or more parameters which will be passed to a matching constructor for that class. The behaviour of the statement is to evaluate the expressions which give values for the actual parameters, assign these values to the formal parameters and call the constructor. The result returned is the object thereby created.

2.17.6 Method Invocation Statement

The Picojava method invocation statement is Statement_Action_Evaluate_Method. It is composed of a qualified identifier which specifies both the object which is the subject of the method call and the method within that object which needs to be called. The behaviour of the statement is to evaluate the expressions which give values for the actual parameters, assign these values to the formal parameters and call the method. The result returned is the return parameter value returned by the method (if any).

2.17.7 Unary Operator Statements

Productions beginning with Statement_Action_Evaluate_UnaryOp are composed of a single operand. This is operated upon during execution, and the result of that operation is returned to the parent statement as required. As there is only one unary operation in Picojava, there is only one unary operator statement, being Statement_Action_Evaluate_UnaryOp_Invert with a concrete representation of the form (!<operand>). This operation takes a boolean value and replaces it with its opposite truth value (i.e. true becomes false, false becomes true).

2.17.8 Binary Operator Statements

Operator statements which are composed of two operands, and during execution perform an operation before returning the result of that operation to a parent statement (if required), are denoted by productions beginning with Statement_Action_ Evaluate_BinaryOp.

Source language grammar production name Concrete representation
Statement_Action_Evaluate_BinaryOp_Sum (<augend>+<addend>)
Statement_Action_Evaluate_BinaryOp_Difference (<minuend>-<subtrahend>)
Statement_Action_Evaluate_BinaryOp_Product (<multiplicand>*<multiplier>)
Statement_Action_Evaluate_BinaryOp_Quotient (<operand>/<operand>)
Statement_Action_Evaluate_BinaryOp_ConditionalOr (<operand>||<operand>)
Statement_Action_Evaluate_BinaryOp_ConditionalAnd (<operand>&&<operand>)
Statement_Action_Evaluate_BinaryOp_Less (<operand><<operand>)
Statement_Action_Evaluate_BinaryOp_Equal (<operand>=<operand>)
Statement_Action_Evaluate_BinaryOp_Congruent (<operand>=<operand>)

↑ Table 2.16 Statements For Binary Operations

Sum, Difference, Product and Quotient are composed of two subexpressions which evaluate to the int type and return an an int type result value (i.e. the statement forms an expression of type int, as defined in §2.17.2.)

Statement_Action suffix Evaluated components Result value
Evaluate_BinaryOp_Sum
Evaluate_BinaryOp_Difference
Evaluate_BinaryOp_Product
Evaluate_BinaryOp_Quotient

↑ Table 2.17 Behaviour Of Integer Binary Operation Statements

If a sum overflows, the result value will wrap around to a negative value. If a difference underflows, the result will wrap around to a positive value. The product given will be the low 32 bits thereof, and will be interpreted as negative if bit 31 of that result is set. Quotient may underflow to zero, and division by zero will cause an exception in the interpreter. This is as per the behaviour for the corresponding Java integer operations.

ConditionalOr and ConditionalAnd are composed of two boolean subexpressions and when evaluated return a single boolean result.

Statement_Action suffix Evaluated components Result value
Evaluate_BinaryOp_ConditionalOr
Evaluate_BinaryOp_ConditionalAnd

↑ Table 2.18 Behaviour Of Boolean Binary Operation Statements

These two instructions take advantage of the optimization possible by short-circuiting evaluation when the result can be deduced- which may be after only the left subexpression has been evaluated. Note that  and  , meaning that  need not be evaluated in these situations.

Less and Equal are composed of two int subexpressions and return a boolean result.

Statement_Action suffix Evaluated components Condition Result value
Evaluate_BinaryOp_Less
Evaluate_BinaryOp_Equal

↑ Table 2.19 Behaviour Of Integer Comparison Instructions

Congruent is composed of two subexpressions of type Structured_Object,  and  , and when evaluated returns a boolean result,  .

2.17.9 Assignment Statement

Assignment statements in the Picojava source language are composed of a reference to a variable (the type of which determines the type expected by the assignment) and a subexpression which provides a new value for that variable. They provide the functional equivalence of the load and store instructions of the Picojava assembly language. The concrete representation for the assignment statement, composed of the Statement_Action_Reference node indicating which variable is to be updated and the root Statement_Action node of the expression which is to be evaluated, is <reference>=<expression>.

The action of an assignment statement is analogous to a store instruction in that a value is assigned to an identified variable. The key difference is that the expression is evaluated as part of the assignment statement, wheras a store instruction need only consider which variable was referenced and take the new value from the stack.

There is also a degenerate form of the assignment statement, equivalent to Java's return statement, which does not include a reference to a variable. In this case, one imagines a nameless variable of the result type associated with the currently executing Method. When control returns from the current Method[6], the last assigned value for the return variable is passed back as the result of the method call statement which caused it to be invoked. The concrete representation of this statement looks like Result'=<expression>.

The result returned from an assignment expression statement is the value of the evaluated expression.

2.18 Definite Assignment

Picojava does not require definite assignment, as all variables are initialized to suitable default values (§2.5.1) upon their creation.

2.19 Exceptions

The Picojava source and assembly languages do not provide any support for the Java concept of an exception. Exceptions are, as their name would suggest, only necessary in exceptional circumstances, and primarily assist the diagnosis and recovery from error conditions. Therefore, in Picojava, which is not designed to support anything but the most basic of programs, exceptions are a luxury which can be removed.

2.20 Execution

The Picojava interpreter is the medium responsible for executing programs written using the Picojava model. The interpreter takes a Picojava program and a list of the identifiers used within and executes it. The Picojava interpreter might also be called the Picojava virtual machine.

Compared with the Java VM, the Picojava interpreter is very simple, owing to the fact that the Java language has been reduced to its most elementary components in the Picojava model.

2.20.1 Interpreter Start-Up

As the Picojava interpreter is effectively pre-loaded with the program it needs to execute, and is aware of all the identifiers used within it, there is relatively little that needs to be done before the program can start running.

When the interpreter starts-up, the main class of the CompilationUnit passed to the interpreter is instantiated, and the main method called with no parameters. The main class is typically the first class defined in the CompilationUnit, but an interpreter may provide an external way of specifying the main class.

2.20.2 Instantiation Of A Class

When a class is instantiated, a new object of that class is created, the instance variables are created and initialized to their default values (§2.5.1), and the constructor specified by the class instance creation expression is executed, parameters passed as required. In the case of the instantiation of main class at interpreter start-up (§2.15.1), the first constructor within the class is executed, which is expected to have no parameters.

Classes are not explictly uninstantiated, although the interpreter may (or may not) reclaim the memory used by an object which is no longer reachable.

2.20.3 Interpreter Exit

The interpreter exits when the main method of the main class terminates, or if an exception occurs within the interpreter. There is no special behaviour.

2.21 Threads

Picojava does not support threads. Threads in Java, while being a great programming tool, are not prudent to the types of benchmarking analysis to which Picojava will be subjected.




[1]  In Java this is referred to by saying that "the null type is convertable to all reference types", but there are no conversions possible in Picojava.

[2]  However, there is nothing stopping the programmer from using tools to create the abstract tree which provide for inheritance external to the interpreter. In doing so, however, she will no doubt take advantage of implementation defficiencies of the type checking mechanisms of the underlying interpreter.

[3]  This behaviour is not Java-like, and is specified to allow naïve symbol table implementations to be used instead of Java's stack frame based approach. The scope of local variables in Java is limited to the immediately enclosing method as a result of this approach. Fortunately for Java, this just happens to coincide with the preferred behaviour for promoting the writing of side-effect free programs.

[4]  Java does not allow the redefinition of a local variable within its scope (according to its scoping rules), and produces an error at compile-time. However, because the Picojava scoping is different (and allows a local variable to be referred to within a method invocation), redefinition is allowable.

[5]  Here, "branching" is used to mean that action which might be undertaken by a branch instruction (§2.16.1 and §2.16.2).

[6]  In Picojava this is only possible through the complete execution of the Block associated with that Method, as there are no escape statements. See §2.15.5.



   Previous Chapter     Next Chapter     Top of Chapter

Exit: Java- Trees Versus Bytes; Kasoft Typesetting; Archer


Java- Trees Versus Bytes is a BComp Honours thesis written by Kade Hansson.

Copyright 1998 Kade "Archer" Hansson; e-mail: archer@dialix.com.au

Last updated: Monday 12th February 2001