Iterators.pdf

(330 KB) Pobierz

Iterators

2.1

Chapter Overview

Chapter Two

This chapter discusses the low-level implementation of iterators. Earlier, this text brieﬂy discussed iter-

ators and the FOREACH loop in the chapter on intermediate procedures (see “Iterators and the FOREACH

Loop” on page 843). This chapter will review that information, discuss some uses for iterators, and then

present the low-level implementation of this interesting control structure.

2.2

Review of Iterators

An iterator is a cross between a control structure and a function. Although common high level languages

do not support iterators, they are present in some very high level languages

. Iterators provide a combination

state machine/function call mechanism that lets a function pick up where it last left off on each new call.

Iterators are also part of a loop control structure, with the iterator providing the value of the loop control

variable on each iteration.

To understand what an iterator is, consider the following for loop from Pascal:

for I := 1 to 10 do <some statement>;

When learning Pascal you were probably taught that this statement initializes

with one, compares

with 10, and executes the statement if

is less than or equal to 10. After executing the statement, the

FOR

statement increments

and compares it with 10 again, repeating the process over and over again until

greater than 10.

While this description is semantically correct, and indeed, it’s the way that most Pascal compilers

implement the FOR loop, this is not the only point of view that describes how the for loop operates. Sup-

pose, instead, that you were to treat the TO reserved word as an operator. An operator that expects two

parameters (one and ten in this case) and returns the range of values on each successive execution. That is,

on the ﬁrst call the TO operator would return one, on the second call it would return two, etc. After the tenth

call, the TO operator would fail which would terminate the loop. This is exactly the description of an itera-

tor.

In general, an iterator controls a loop. Different languages use different names for iterator controlled

loops, this text will just use the name FOREACH as follows:

foreach

iterator

() do

statements

;

endfor;

An iterator returns two values: a boolean success or failure value and a function result. As long as the

iterator returns success, the FOREACH statement executes the statements comprising the loop body. If the

iterator returns failure, the FOREACH loop terminates and executes the next sequential statement following

the FOREACH loop’s body.

Iterators are considerably more complex than normal functions. A typical function call involves two

basic operations: a call and a return. Iterator invocations involve four basic operations:

Initial iterator call

Yielding a value

Resumption of an iterator

1. Ada and PL/I support very limited forms of iterators, though they do not support the type of iterators found in CLU, SETL,

Icon, and other languages.

Beta Draft - Do not distribute

Page 1305

Chapter Two

Termination of an iterator.

Volume Five

To understand how an iterator operates, consider the following short example:

iterator range( start:int32; stop:int32 );

begin range;

forever

mov( start, eax );

breakif( eax > stop );

yield();

inc( start );

endfor;

end range;

In HLA, iterator calls may only appear in the FOREACH statement. With the exception of the "yield();"

statement above, anyone familiar with HLA should be able to ﬁgure out the basic logic of this iterator.

An iterator in HLA may return to its caller using one of two separate mechanisms, it can return to the

caller by exiting through the "end Range;" statement or it may yield a value by executing the "yield();" state-

ment. An iterator succeeds if it executes the "yield();" statement, it fails if it simply returns to the caller.

Therefore, the FOREACH statement will only execute its corresponding statement if you exit an iterator

with a "yield();". The FOREACH statement terminates if you simply return from the iterator. In the example

above, the iterator returns the values

start..stop

via a "yield();" statement and then the iterator terminates.

The loop

foreach Range(1,10) do

stdout.put( (type uns32 eax ), nl );

endfor;

is comparable to the code:

for( mov( 1, eax ); eax <= 10; inc( eax )) do

stdout.put( (type uns32 eax ), nl );

endfor;

When an HLA program ﬁrst executes the FOREACH statement, it makes an initial call to the iterator.

The iterator runs until it executes a "yield();" or it returns. If it executes the "yield();" statement, it returns the

value in EAX as the iterator result and it succeeds. If it simply returns, the iterator returns failure and no iter-

ator result. In the current example, the initial call to the iterator returns success and the value one.

Assuming a successful return (as in the current example), the FOREACH statement returns the current

result in EAX and executes the FOREACH loop body. After executing the loop body, the FOREACH state-

ment calls the iterator again. However, this time the FOREACH statement resumes the iterator rather than

making an initial call. An iterator resumption continues with the ﬁrst statement following the last "yield();" it

executes. In the

range

example, a resumption would continue execution at the "inc( start );" statement. On

the ﬁrst resumption, the

range

iterator would add one to

start,

producing the value two. Two is less than ten

(stop’s value) so the FOREACH loop would repeat and the iterator would yield the value two. This process

would repeat over and over again until the iterator yields ten. Upon resuming after yielding ten, the iterator

would increment start to eleven and then return, rather than yield, since this new value is not less than or

equal to ten. When the

Range

iterator returns (fails), the FOREACH loop terminates.

Page 1306

Version: 9/12/02

Iterators

2.2.1 Implementing Iterators Using In-Line Expansion

The implementation of an iterator is rather complex. To begin with, consider a ﬁrst attempt at an assem-

bly implementation of the FOREACH statement above:

push( 1 );

// Manually pass 1 and 10 as parameters.

push( 10 );

call Range_initial;

jc Failure;

ForLp: stdout.put( (type uns32 eax), nl );

call Range_Resume;

jnc ForLp;

Failure:

Although this looks like a straight-forward implementation project, there are several issues to consider.

First, the call to

Range_Resume

above looks simple enough, but there is no ﬁxed address that corresponds to

the resume address. While it is certainly true that this

Range

example has only one resume address, in gen-

eral you can have as many "yield();" statements as you like in an iterator. For example, the following iterator

returns the values 1, 2, 3, and 4:

iterator OneToFour;

begin OneToFour;

mov(

eax

);

yield();

end OneToFour;

The initial call would execute the "mov( 1, eax );" and "yield();" statements. The ﬁrst resumption would exe-

cute the "mov( 2, eax );" and "yield();" statements, the second resumption would execute "mov( 3, eax );"

and "yield();", etc. Obviously there is no single resume address the calling code can count on.

There are a couple of additional details left to consider. First, an iterator is free to call procedures and

functions. If such a procedure or function executes the "yield();" statement then resumption by the

FOREACH statement continues execution within the procedure or function that executed the "yield();"

Second, the semantics of an iterator require all local variables and parameters to maintain their values until

the iterator terminates. That is, yielding does not deallocate local variables and parameters. Likewise, any

return addresses left on the stack (e.g., the call to a procedure or function that executes the "yield();" state-

ment) must not be lost when a piece of code yields and the corresponding FOREACH statement resumes the

iterator. In general, this means you cannot use the standard call and return sequence to yield from or resume

to an iterator because you have to preserve the contents of the stack.

While there are several ways to implement iterators in assembly language, perhaps the most practical

method is to have the iterator call the loop controlled by the iterator and have the loop return back to the iter-

ator function. Of course, this is counter-intuitive. Normally, one thinks of the iterator as the function that the

loop calls on each iteration, not the other way around. However, given the structure of the stack during the

execution of an iterator, the counter-intuitive approach turns out to be easier to implement.

Some high level languages support iterators in exactly this fashion. For example, Metaware’s Profes-

sional Pascal Compiler for the PC supports iterators

. Were you to create a Professional Pascal code

sequence as follows:

iterator OneToFour:integer;

begin

yield 1;

2. This requires the use of nested procedures, a subject we will discuss in a later chapter.

3. Obviously, this is a non-standard extension to the Pascal programming language provided in Professional Pascal.

Beta Draft - Do not distribute

Page 1307

Chapter Two

yield 2;

yield 3;

yield 4;

end;

Volume Five

and call it in the main program as follows:

for i in OneToFour do writeln(i);

Professional Pascal would completely rearrange your code. Instead of turning the iterator into an assembly

language function and calling this function from within the FOR loop body, this code would turn the FOR

loop body into a function, expand the iterator in-line (much like a macro) and call the FOR loop body func-

tion on each yield. That is, Professional Pascal would probably produce assembly language that looks some-

thing like the following:

// The following procedure corresponds to the for loop body

// with a single parameter (I) corresponding to the loop

// control variable:

procedure ForLoopCode( i:int32 ); nodisplay;

begin ForLoopCode;

mov( i, eax );

stdout.put( i, nl );

end ForLoopCode;

The follow code would be emitted in-line upon encountering the

for loop in the main program, it corresponds to an in-line

expansion of the iterator as though it were a macro,

substituting a call for the yield instructions:

ForLoopCode(

);

This method for implementing iterators is convenient and produces relatively efﬁcient (fast) code. It

does, however, suffer from a couple drawbacks. First, since you must expand the iterator in-line wherever

you call it, much like a macro, your program could grow large if the iterator is not short and you use it often.

Second, this method of implementing the iterator completely hides the underlying logic of the code and

makes your assembly language programs difﬁcult to read and understand.

2.2.2 Implementing Iterators with Resume Frames

In-line expansion is not the only way to implement iterators. There is another method that preserves the

structure of your program at the expense of a slightly more complex implementation. Several high level lan-

guages, including Icon and CLU, use this implementation.

To start with, you will need another stack frame: the

resume frame.

A resume frame contains two

entries: a yield return address (that is, the address of the next instruction after the yield statement) and a

dynamic link,

that is a pointer to the iterator’s activation record. Typically the dynamic link is just the value

in the EBP register at the time you execute the yield statement. This version implements the four parts of an

iterator as follows:

A CALL instruction for the initial iterator call,

A CALL instruction for the YIELD statement,

A RET instruction for the resume operation, and

Page 1308

Version: 9/12/02

Iterators

A RET instruction to terminate the iterator.

To begin with, an iterator will require two return addresses rather than the single return address you

would normally expect. The ﬁrst return address appearing on the stack is the termination return address. The

second return address is where the subroutine transfers control on a

yield

operation. The calling code must

push these two return addresses upon initial invocation of the iterator. The stack, upon initial entry into the

iterator, should look something like Figure 2.1.

Previous Stack Contents

Parameters for Iterator

Termination Return Address

Yield Return Address

Figure 2.1

Iterator Activation Record

As an example, consider the

Range

iterator presented earlier. This iterator requires two parameters, a

starting value and an ending value:

foreach range(1,10) do

stdout.put( i, nl );

endfor;

The code to make the initial call to the

range

iterator, producing a stack like the one above, could be the

following:

push( 1 );

push( 10 );

push( &ForDone);

call range;

ForDone:

Push

Call

start parameter value.

stop parameter value.

termination address.

the iterator.

fordone

is the ﬁrst statement immediately following the FOREACH loop, that is, the instruction to execute

when the iterator returns failure. The FOREACH loop body must begin with the ﬁrst instruction following

the call to

range.

At the end of the FOREACH loop, rather than jumping back to the start of the loop, or call-

ing the iterator again, this code should just execute a RET instruction. The reason will become clear in a

moment. So the implementation of the above FOREACH statement could be the following:

push( 1 );

push( 10 );

push( &ForDone);

call range;

push( ebp );

mov( [esp+8], ebp );

stdout.put( i, nl );

Push start parameter value.

Push stop parameter value.

Push termination address.

Call the iterator.

Preserve iterator’s ebp value.

Get original EBP value passed to us by range.

Display i’s value.

Beta Draft - Do not distribute

Page 1309

Plik z chomika:

jezuss

Iterators.pdf

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: