Iterators.pdf
(
330 KB
)
Pobierz
Iterators
Iterators
2.1
Chapter Overview
Chapter Two
This chapter discusses the low-level implementation of iterators. Earlier, this text briefly discussed iter-
ators and the FOREACH loop in the chapter on intermediate procedures (see “Iterators and the FOREACH
Loop” on page 843). This chapter will review that information, discuss some uses for iterators, and then
present the low-level implementation of this interesting control structure.
2.2
Review of Iterators
An iterator is a cross between a control structure and a function. Although common high level languages
do not support iterators, they are present in some very high level languages
1
. Iterators provide a combination
state machine/function call mechanism that lets a function pick up where it last left off on each new call.
Iterators are also part of a loop control structure, with the iterator providing the value of the loop control
variable on each iteration.
To understand what an iterator is, consider the following for loop from Pascal:
for I := 1 to 10 do <some statement>;
When learning Pascal you were probably taught that this statement initializes
i
with one, compares
i
with 10, and executes the statement if
i
is less than or equal to 10. After executing the statement, the
FOR
statement increments
i
and compares it with 10 again, repeating the process over and over again until
i
is
greater than 10.
While this description is semantically correct, and indeed, it’s the way that most Pascal compilers
implement the FOR loop, this is not the only point of view that describes how the for loop operates. Sup-
pose, instead, that you were to treat the TO reserved word as an operator. An operator that expects two
parameters (one and ten in this case) and returns the range of values on each successive execution. That is,
on the first call the TO operator would return one, on the second call it would return two, etc. After the tenth
call, the TO operator would fail which would terminate the loop. This is exactly the description of an itera-
tor.
In general, an iterator controls a loop. Different languages use different names for iterator controlled
loops, this text will just use the name FOREACH as follows:
foreach
iterator
() do
statements
;
endfor;
An iterator returns two values: a boolean success or failure value and a function result. As long as the
iterator returns success, the FOREACH statement executes the statements comprising the loop body. If the
iterator returns failure, the FOREACH loop terminates and executes the next sequential statement following
the FOREACH loop’s body.
Iterators are considerably more complex than normal functions. A typical function call involves two
basic operations: a call and a return. Iterator invocations involve four basic operations:
1)
2)
3)
Initial iterator call
Yielding a value
Resumption of an iterator
1. Ada and PL/I support very limited forms of iterators, though they do not support the type of iterators found in CLU, SETL,
Icon, and other languages.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1305
Chapter Two
4)
Termination of an iterator.
Volume Five
To understand how an iterator operates, consider the following short example:
iterator range( start:int32; stop:int32 );
begin range;
forever
mov( start, eax );
breakif( eax > stop );
yield();
inc( start );
endfor;
end range;
In HLA, iterator calls may only appear in the FOREACH statement. With the exception of the "yield();"
statement above, anyone familiar with HLA should be able to figure out the basic logic of this iterator.
An iterator in HLA may return to its caller using one of two separate mechanisms, it can return to the
caller by exiting through the "end Range;" statement or it may yield a value by executing the "yield();" state-
ment. An iterator succeeds if it executes the "yield();" statement, it fails if it simply returns to the caller.
Therefore, the FOREACH statement will only execute its corresponding statement if you exit an iterator
with a "yield();". The FOREACH statement terminates if you simply return from the iterator. In the example
above, the iterator returns the values
start..stop
via a "yield();" statement and then the iterator terminates.
The loop
foreach Range(1,10) do
stdout.put( (type uns32 eax ), nl );
endfor;
is comparable to the code:
for( mov( 1, eax ); eax <= 10; inc( eax )) do
stdout.put( (type uns32 eax ), nl );
endfor;
When an HLA program first executes the FOREACH statement, it makes an initial call to the iterator.
The iterator runs until it executes a "yield();" or it returns. If it executes the "yield();" statement, it returns the
value in EAX as the iterator result and it succeeds. If it simply returns, the iterator returns failure and no iter-
ator result. In the current example, the initial call to the iterator returns success and the value one.
Assuming a successful return (as in the current example), the FOREACH statement returns the current
result in EAX and executes the FOREACH loop body. After executing the loop body, the FOREACH state-
ment calls the iterator again. However, this time the FOREACH statement resumes the iterator rather than
making an initial call. An iterator resumption continues with the first statement following the last "yield();" it
executes. In the
range
example, a resumption would continue execution at the "inc( start );" statement. On
the first resumption, the
range
iterator would add one to
start,
producing the value two. Two is less than ten
(stop’s value) so the FOREACH loop would repeat and the iterator would yield the value two. This process
would repeat over and over again until the iterator yields ten. Upon resuming after yielding ten, the iterator
would increment start to eleven and then return, rather than yield, since this new value is not less than or
equal to ten. When the
Range
iterator returns (fails), the FOREACH loop terminates.
Page 1306
© 2001, By Randall Hyde
Version: 9/12/02
Iterators
2.2.1 Implementing Iterators Using In-Line Expansion
The implementation of an iterator is rather complex. To begin with, consider a first attempt at an assem-
bly implementation of the FOREACH statement above:
push( 1 );
// Manually pass 1 and 10 as parameters.
push( 10 );
call Range_initial;
jc Failure;
ForLp: stdout.put( (type uns32 eax), nl );
call Range_Resume;
jnc ForLp;
Failure:
Although this looks like a straight-forward implementation project, there are several issues to consider.
First, the call to
Range_Resume
above looks simple enough, but there is no fixed address that corresponds to
the resume address. While it is certainly true that this
Range
example has only one resume address, in gen-
eral you can have as many "yield();" statements as you like in an iterator. For example, the following iterator
returns the values 1, 2, 3, and 4:
iterator OneToFour;
begin OneToFour;
mov(
mov(
mov(
mov(
1,
2,
3,
4,
eax
eax
eax
eax
);
);
);
);
yield();
yield();
yield();
yield();
end OneToFour;
The initial call would execute the "mov( 1, eax );" and "yield();" statements. The first resumption would exe-
cute the "mov( 2, eax );" and "yield();" statements, the second resumption would execute "mov( 3, eax );"
and "yield();", etc. Obviously there is no single resume address the calling code can count on.
There are a couple of additional details left to consider. First, an iterator is free to call procedures and
functions. If such a procedure or function executes the "yield();" statement then resumption by the
FOREACH statement continues execution within the procedure or function that executed the "yield();"
2
.
Second, the semantics of an iterator require all local variables and parameters to maintain their values until
the iterator terminates. That is, yielding does not deallocate local variables and parameters. Likewise, any
return addresses left on the stack (e.g., the call to a procedure or function that executes the "yield();" state-
ment) must not be lost when a piece of code yields and the corresponding FOREACH statement resumes the
iterator. In general, this means you cannot use the standard call and return sequence to yield from or resume
to an iterator because you have to preserve the contents of the stack.
While there are several ways to implement iterators in assembly language, perhaps the most practical
method is to have the iterator call the loop controlled by the iterator and have the loop return back to the iter-
ator function. Of course, this is counter-intuitive. Normally, one thinks of the iterator as the function that the
loop calls on each iteration, not the other way around. However, given the structure of the stack during the
execution of an iterator, the counter-intuitive approach turns out to be easier to implement.
Some high level languages support iterators in exactly this fashion. For example, Metaware’s Profes-
sional Pascal Compiler for the PC supports iterators
3
. Were you to create a Professional Pascal code
sequence as follows:
iterator OneToFour:integer;
begin
yield 1;
2. This requires the use of nested procedures, a subject we will discuss in a later chapter.
3. Obviously, this is a non-standard extension to the Pascal programming language provided in Professional Pascal.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1307
Chapter Two
yield 2;
yield 3;
yield 4;
end;
Volume Five
and call it in the main program as follows:
for i in OneToFour do writeln(i);
Professional Pascal would completely rearrange your code. Instead of turning the iterator into an assembly
language function and calling this function from within the FOR loop body, this code would turn the FOR
loop body into a function, expand the iterator in-line (much like a macro) and call the FOR loop body func-
tion on each yield. That is, Professional Pascal would probably produce assembly language that looks some-
thing like the following:
// The following procedure corresponds to the for loop body
// with a single parameter (I) corresponding to the loop
// control variable:
procedure ForLoopCode( i:int32 ); nodisplay;
begin ForLoopCode;
mov( i, eax );
stdout.put( i, nl );
end ForLoopCode;
//
//
//
//
The follow code would be emitted in-line upon encountering the
for loop in the main program, it corresponds to an in-line
expansion of the iterator as though it were a macro,
substituting a call for the yield instructions:
ForLoopCode(
ForLoopCode(
ForLoopCode(
ForLoopCode(
1
2
3
4
);
);
);
);
This method for implementing iterators is convenient and produces relatively efficient (fast) code. It
does, however, suffer from a couple drawbacks. First, since you must expand the iterator in-line wherever
you call it, much like a macro, your program could grow large if the iterator is not short and you use it often.
Second, this method of implementing the iterator completely hides the underlying logic of the code and
makes your assembly language programs difficult to read and understand.
2.2.2 Implementing Iterators with Resume Frames
In-line expansion is not the only way to implement iterators. There is another method that preserves the
structure of your program at the expense of a slightly more complex implementation. Several high level lan-
guages, including Icon and CLU, use this implementation.
To start with, you will need another stack frame: the
resume frame.
A resume frame contains two
entries: a yield return address (that is, the address of the next instruction after the yield statement) and a
dynamic link,
that is a pointer to the iterator’s activation record. Typically the dynamic link is just the value
in the EBP register at the time you execute the yield statement. This version implements the four parts of an
iterator as follows:
1)
2)
3)
A CALL instruction for the initial iterator call,
A CALL instruction for the YIELD statement,
A RET instruction for the resume operation, and
Page 1308
© 2001, By Randall Hyde
Version: 9/12/02
Iterators
4)
A RET instruction to terminate the iterator.
To begin with, an iterator will require two return addresses rather than the single return address you
would normally expect. The first return address appearing on the stack is the termination return address. The
second return address is where the subroutine transfers control on a
yield
operation. The calling code must
push these two return addresses upon initial invocation of the iterator. The stack, upon initial entry into the
iterator, should look something like Figure 2.1.
Previous Stack Contents
Parameters for Iterator
Termination Return Address
Yield Return Address
SP
Figure 2.1
Iterator Activation Record
As an example, consider the
Range
iterator presented earlier. This iterator requires two parameters, a
starting value and an ending value:
foreach range(1,10) do
stdout.put( i, nl );
endfor;
The code to make the initial call to the
range
iterator, producing a stack like the one above, could be the
following:
push( 1 );
push( 10 );
push( &ForDone);
call range;
.
.
.
ForDone:
//
//
//
//
Push
Push
Push
Call
start parameter value.
stop parameter value.
termination address.
the iterator.
fordone
is the first statement immediately following the FOREACH loop, that is, the instruction to execute
when the iterator returns failure. The FOREACH loop body must begin with the first instruction following
the call to
range.
At the end of the FOREACH loop, rather than jumping back to the start of the loop, or call-
ing the iterator again, this code should just execute a RET instruction. The reason will become clear in a
moment. So the implementation of the above FOREACH statement could be the following:
push( 1 );
push( 10 );
push( &ForDone);
call range;
push( ebp );
mov( [esp+8], ebp );
stdout.put( i, nl );
//
//
//
//
//
//
//
Push start parameter value.
Push stop parameter value.
Push termination address.
Call the iterator.
Preserve iterator’s ebp value.
Get original EBP value passed to us by range.
Display i’s value.
Beta Draft - Do not distribute
© 2001, By Randall Hyde
Page 1309
Plik z chomika:
jezuss
Inne pliki z tego folderu:
IntroductionToProcedures.pdf
(400 KB)
LexicalNesting.pdf
(297 KB)
Volume1.pdf
(29 KB)
DataRepresentation.pdf
(470 KB)
ClassesAndObjects.pdf
(472 KB)
Inne foldery tego chomika:
Linux - Administrator
Linux - net
Linux - Podrecznik Administratora Sieci
Linux podrecznik administratora - (PL) [PDF]
Pawel Krawczyk - Ruting IP w Linuxie 2.2
Zgłoś jeśli
naruszono regulamin