程序代做CS代考 compiler AI algorithm Compiler Design Week 4 – cscodehelp代写

Compiler Design Week 4

Detailed content
Weekly program
 Week  Week  Week
 Week
1 – Introduction to Compiler Design 2 – Lexical Analysis
3 – CD Programming Language
4 – Syntax Analysis
2
 Week
 Week
 Week
 Week
 Week
 Week
 Week 11 – Code Optimisation  Week 12 – Revision
5 – Top Down Parsing
6 – Symbol Table and Error Recovery 7 – Bottom-Up Parsing
8 – Stack Machine (SM) Architecture 9 – Semantic Analysis
10 – Code Generation
 Week 13 – Extra revision (if needed)
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au

Week 04 Lecture Outline
Syntax Analysis
 Language Classes and Chomsky Hierarchy  Parse Tree & Ambiguity
 Top Down VS Bottom Up Parsing
 Lookahead & LL(k) Grammar
 Eliminating of Ambiguity
 Left Factoring
 Left Recursion
 Operator Precedence and Associativity
3
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au

Basic Structure of a Compiler
Source Code
Tokens
Syntax Object Tree Code
Lexical Analyser
Syntax Analyser
Code Gen.
Lexical Error
Syntax Error
Symbol Table
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
4

Grammars (Recap)
• Context-free grammar is a 4-tuple G=(N,T,P,S) where
– T is a finite set of tokens (terminal symbols)
– N is a finite set of nonterminals
– P is a finite set of productions of the form A
where A  N and   (NT)*
– S is a designated start symbol S  N
• Vocabulary of a CFG V = N  T
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
5

Notational Conventions Used
• Terminals
a,b,c,…  
specific terminals: 0, 1, id, if, +
• Nonterminals
A,B,C,…  N
specific nonterminals: expr, term, stmt
• Grammar symbols X,Y,Z  (NT)
• Strings of terminals u,v,w,x,y,z  T*
• Strings of grammar symbols ,,  (NT)*
• A-productions
A  1, A  2, …., A  k
A  1 | 2 | …. | k COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
10/8/2021
6

Language Classification
• A grammar G is said to be
– Regular if each production is of the form
Aa orAaBorA
– Context free if each production is of the form A
where A  N and   (NT)*
– Context sensitive if each production is of the form A
where A  N, ,,  (NT)*, || > 0
– Unrestricted
• 
where ,  (NT)*
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
7

Chomsky Hierarchy
L(regular)  L(context free)  L(context sensitive)  L(unrestricted)
Where L(T) = { L(G) | G is of type T } That is, the set of all languages generated by grammars G of type T
Examples:
Every finite language is regular
L1 = { anbn | n  1 } is context free
L2 = { anbncn | n  1 } is context sensitive L3 = {ai| i is a power of 2} is unrestricted
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
8

Derivations (Recap)

The one-step derivation is defined by S …   A     
where A   is a production in the grammar Example: ( L ) ( L )  ( ( L ) L ) ( L )
A sequence of one step derivations of the form: w0 w1 …wn
is a derivation of wn from w0 (w0 * wn) Example: L  ( L ) L  ( ) L  ( )
L * ()
If S * , where S is the start symbol of a grammar G then is 
referred to as sentential form.


10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
Grammar: L(L)L L
9

Leftmost & Rightmost Derivations
• In the leftmost derivation ( lm ) the leftmost nonterminal is always chosen.
• In the rightmost derivation ( rm ) the rightmost nonterminal is always chosen.
Leftmost derivation
L
lm ( L) L
lm ( ( L ) L) L lm ( ( ) L) L lm ( () ) L
lm ( ( ) ) ( L) L lm ( () ) ( ) L lm ( () ) ( )
10/8/2021
Rightmost derivation
L
rm ( L) L
rm ( L) ( L) L rm ( L) ( L ) rm ( L) ( ) rm ( ( L) L ) ( ) rm ( ( L) ) ( ) rm ( ( )) ( )
Grammar: L(L)L L
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
10

Parse Tree
• The parse tree for some string w in a language that is defined by the grammar G as follows:
– The root is the start symbol of G
– The leaves are terminals or .
– The interior nodes are non-terminals of G
– For every non-terminal A in the tree with children B1 … Bk, there is some production A  B1 … Bk
• When visited from left to right, the leaves form the input string
Grammar: L(L)L L 
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
11

Parse Tree
• Filters out the order in which productions are applied in derivation
• E => -E => -(E) => -(E + E) => -(id + E) => -(id + id)
• E => -E => -(E) => -(E + E) => -(E + id) => -(id + id)
• Relationship between parse tree and derivation is one-to-many
• Relationship between parse tree and either leftmost or rightmost derivation is one-to-one
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
12

Ambiguity
• Two (or more) parse trees or leftmost derivations (rightmost derivation) for some string in the language
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
E E+ E EE*E E  (E) | id
13

Parsing
• Compilers must verify the syntactic validity of their inputs
– Given a grammar G and input string s, the compiler must determine
if s  L(G)?
• If a string s is in the given language L(G), a derivation (a parse tree)
must exist.
• Parsing is a process of finding a derivation for a string (program) s,
given a grammar G.
– Side Effect: Proves that the string s is a sentence in L(G).
• NOTE: CFGs are NOT capable of describing all syntax we use in programming languages
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
14

Task of Parsers
• Check whether the structure of the program,
– The juxtaposition of the tokens from the scanner is valid
• Recover a parse tree for the input string (program)
– In fact, we will create a more useful structure called “abstract syntax tree”
• Report errors if those tokens do not properly encode a structure
• Store tokens in symbol table for use in the next phase
• Main Goal: Unambiguous expansion of the start symbol to input string
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
15

Parsing Algorithms
• Universal (any CFG)
– Cocke-Younger-Kasami (O(n3|G|))
– Earley’s algorithm (O(n3))
• Top-down (CFG with restrictions)
– Recursive descent (predictive parsing)
– LL (Left-to-right, Leftmost derivation) methods
• Bottom-up (CFG with restrictions)
– Operator precedence parsing
– LR (Left-to-right, Rightmost derivation) methods
• SLR, canonical LR, LALR
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
16

Top Down Parsing
begin with the start symbol repeat
find new sentential form by expanding a non-terminal using one of its productions.
The leftmost non-terminal is the one to expand. until sentential form = input string
• The derivation obtained is left-most derivation.
• Expands the parse tree by applying productions in a depth-first manner
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
17

Top Down Parsing
Grammar:
E T E’
E’ +T E’ | T  FT’
T’  *F T’ | F  ( E ) I id
Input String: id+id*id
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
18

Bottom Up Parsing
begin with input string repeat
find new sentential form by replacing a phrase with a phrase name that directly produces it.
until sentential form = start symbol
• The derivation obtained is right-most derivation (in reverse order).
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
19

Bottom Up Parsing
Grammar:
E E + T E T
T  T*F TF
F  ( E ) I id
Input String: id*id
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
20

Top Down Parsing
• • •
Simplicity is most attractive feature Top-down parsers are easy to write
Restriction: Grammar needs to satisfy certain conditions
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
21

Lookahead
• For certain class of grammars the top down construction of parse tree can be done by a single left-to-right scan of the input string
– with some lookahead (peeking in the input string)
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
22

Lookahead
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
23
stmt  expr ;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other Optexpr  
| expr

LL(k) Grammars
• read left to right
• expand the leftmost non-terminal, produce a leftmost derivation
• look-ahead only k input symbol to choose a production
• LL(1): A grammar such that it is possible to choose the correct production with which to expand a given nonterminal, looking only at the next input symbol, is called LL(1).
– Predictive parsers, that is, recursive-descent parsers need no backtracking
– Rich enough to cover most programming constructs
– No left recursive or ambiguous grammar can be LL(1)
– Transformations can help us
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
24

Ambiguity
• Note that ambiguity is a property of the grammar, not the language. – There are non-ambiguous grammars for if-then-else.
• Some languages are inherently ambiguous,
– all grammars that describe the language are ambiguous.
Theory from COMP2270:
• Both of the following problems are undecidable:
 Given a context-free grammar G, is G ambiguous?
 Given a context-free language L, is L inherently ambiguous?
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
25

Eliminating Ambiguity
Dangling Else Problem (attachment of optional postfixes):
stmt  if expr then stmt
| if expr then stmt else stmt | other
• Input: if E1 then if E2 then S1 else S2
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
26

Eliminating Ambiguity
Re-write the grammar:

stmt  if expr then stmt
| if expr then stmt else stmt | other
IDEA: A statement appearing between a then and an else must be “matched” ; that is, the interior statement must not end with an unmatched or open then.
stmt  matched_stmt | open_stmt
matched_stmt  if expr then matched_stmt else matched_stmt | other
open_stmt  if expr then stmt
| if expr then matched_stmt else open_stmt
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
27

Eliminating Ambiguity
stmt  matched_stmt | open_stmt
matched_stmt  if expr then matched_stmt else matched_stmt | other
open_stmt  if expr then stmt
| if expr then matched_stmt else open_stmt
stmt
open_stmt
If expr then stmt
E1
matched_stmt
If expr then matched_stmt else matched_stmt E2 S1 S2
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
28

Eliminating Ambiguity
• Algol-60 and Simula-67 approach:
– Made then token sequence then-if illegal.
– As soon as you have to say then-begin-if, you have to put the matching end token somewhere, and this resolves the doubt.
• Check if you have dangling else problem in CD21
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
29

Grammar transformation: Left factoring
• Useful for producing a grammar suitable for top-down parsing
• When to use?
– the choice between two alternative productions (of a nonterminal) is not clear, we may be able to rewrite the productions
stmt  if expr then stmt
| if expr then stmt else stmt
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
30

Grammar transformation: Left factoring
• How?
– Defer the decision until enough of the input has been seen that we can make the right choice.
– Factor out the common part of the two options into a shared rule that both will use.
– Add a new rule that picks up where the tokens diverge
– Many non-LL(1) aspects of a grammar can be overcome this way
stmt  if expr then stmt
| if expr then stmt else stmt
stmt  if expr then stmt opt_else opt_else  else stmt
|
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
10/8/2021
31

Left Factoring
• When a nonterminal has two or more productions whose right- hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing
• Replace productions
A1 |2 |…|n |
with
AAR |
AR  1 | 2 | … | n
•  represents all alternative that does not start with 
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
32

Left factoring: CD21 Examples:
• Within CD21 we have various places where left factoring is required to make a section of the grammar LL(1).
• List of statements:
::= ; | ;
• If statement:
::= if ( ) end
| if ( ) else end
We see the first if, know its some sort of if statement, so the decision as to which one is delayed by left factoring.
• Variable specification:
x OR x[4] OR x[4].y
If we see x (an id), then we delay the decision among ε, [4] and [4].y.
• Procedure call: CD21 uses x(..) as a statement, we delay the decision on whether a statement is a procedure call or an assignment statement till later (by the use of left factoring)
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
33

Hidden Left Factors
• Common factors could be hidden and need to be first exposed via substitution.
A  da | acB
B  abB | daA | Af
• Substitute A in second production A  da | acB
B  abB | daA | daf | acBf
• Now apply left factoring A –> da | acB
B –> aM | daN M –> bB | cBf N –> A | f
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
34

Recursive Productions
• Productions of the form
AA |
|
are left recursive
• Productions of the form
AA |
|
are right recursive
• When one of the productions in a grammar is left recursive then a predictive parser may loop forever
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
35

Grammar Transformation: Eliminating Left Recursion
• Rewrite every left-recursive production
AA |
|
|A
into a right-recursive production:
A   AR |  AR
AR   AR |  AR
• Example:
|T
|
E  E +T
E TE E  + T E | -TE
|
|E-T
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
36

Left Recursion: CD21 Examples:
• Within CD21 we have various places where removal of left recursion is required to make a section of the grammar LL(1).
• Boolean statements:
::= |
• Arithmetic expressions:
::= + | |
::= * | / | % | ::= ^ |
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
37

Hidden Left Recursion
• A grammar is left recursive if it has a nonterminal A such that there is a derivation A + A for some string .
• Left recursion could be hidden and need to be first exposed via substitution.
S  Tu | wx
T  Sq | vvS
• Substituting S into T, the left-recursion comes to light:
S  Tu | wx
T  Tuq | wxq | vvS
• Eliminating left-recursion, we get:
S  Tu | wx
T  wxqT’ | vvST’ T’  uqT’ | ε
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
38

General Left Recursion Elimination
Arrange the nonterminals in some order A1, A2, …, An for i = 1, …, n do
for j = 1, …, i-1 do replace each
Ai  Aj  with
Ai  1  | 2  | … | k 
where
Aj  1 | 2 | … | k
enddo
eliminate the immediate left recursion in Ai enddo
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
39

Example Left Rec. Elimination
ABC|a BCA|Ab CAB|CC|a
Choose arrangement: A, B, C
i = 1:
i = 2, j = 1:
i = 3, j = 1: i = 3, j = 2:
CR  A BR C B CR | C CR |  COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
nothing to do
B  C A | A b
 BCA|BCb|ab (imm) BCABR |abBR
BR CbBR | C  A B | C C | a
 CBCB|aB|CC|a
C  B C B | a B | C C | a
 C  C A BR C B | a b BR C B | a B | C C | a (imm) C  a b BR C B CR | a B CR | a CR
10/8/2021
40

Grammar Transformation
• Operator Precedence/associativity
• Grammar for single-digit arithmetic expressions:
E → E+E | E-E | E*E | E/E | (E) | id id → 0 | 1 | 2 | … | 9
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
41

Operator Precedence

Parse tree for 2 * 3 + 5
E
E+E E * E id
id id 23
5
(2*3) +5
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
2 * (3 + 5 )
E *
id
2
E E+E
id id 35
E
10/8/2021
42
E → E+E | E-E | E*E | E/E | (E) | id id → 0 | 1 | 2 | … | 9

Operator Precedence
• Will leftmost – derivation solve the problem?
• E.g.2+3*5
E
E*E E + E id
id id 23
5
• Needs to take care of precedence/associativity of the operators
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
43

Associativity/Precedence of Operators
• The operator + associates to the left, because an operand with plus signs on both sides of it belongs to the operator to its left.
String a+b+c has the same meaning as (a+b)+c
• The operator = (assignment) associates to the right, because an operand with = signs on both sides of it belongs to the operator to its right.
String a=b=c has the same meaning as a=(b=c)
• We say that * has higher precedence than + if * takes its operands
before + does
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
44

Dealing with Operator Precedence
Choice I
• Leave the grammar as is
• Code into the parser knowledge of precedence and associativity to break the tie and force the parser to build the correct tree.
• Advantage:
– Grammar remains simple and less complicated.
– This allows for unambiguous parsing of ambiguous grammars
• Disadvantage:
– The syntactic structure of the language is no longer given by the grammar alone.
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
45

Dealing with Operator Precedence Choice II
• To change the grammar to only allow the one tree that correctly reflects our intention and eliminate the others
• Layering: For the expression grammar, separate expressions into multiplicative and additive subgroups and force them to be expanded in the desired order.
• Note that just left-factoring won’t take care of operator precedence!
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
46

Dealing with Operator Precedence Layering:
1. 2. 3.
Create a table of operators a row for each precedence level Create non-terminals for each level
Start with the lowest precedence level
E → E+E | E-E | E*E | E/E | (E) | id Rewrite to:
E → E+E | E-E | T T → T*T | T/T | F F → id | (E)
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
47

Layering
Example: id – id / id
E → E+E | E-E | T T → T*T | T/T | F F → id | (E)
E
E –
T F id
E
T
T/T
Can we construct parse tree for id-id/id that shows the wrong precedence?
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
F
F
id id
48

Associativity
• The grammar captures operator precedence, but it is still ambiguous!
– Fails to express that the operators (+,-,*,/) are left associative – 5-3-2 is equivalent to: ((5-3)-2) and not to: (5-(3- 2))
E
E E-E E-
E
T F id
T F id
E
E-E
TT FF
id id
E
E-E
TT FF
id id
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
49

Associativity of Operators
• Left-associative operators have left-recursive productions left  left + term | term
• Right-associative operators have right-recursive productions right  term = right | term
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
50

Fix for associativity
• •

E → E+E | E-E | T T→ T*T|T/T|F F → id | (E)
The grammar given above is both left and right recursive in nonterminals E and T
To correctly expresses operator associativity:
– For left associativity, use left recursion
– For right associativity, use right recursion
Here’s the correct grammar: E → E+T | E-T | T
T→ T*F|T/F|F F → id | (E)
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
51

Non-Context Free language Constructs Identifiers: Declare before use
• Identifiers are declared before their use in a program.
• L = {wcw | w is in (a|b)*}
– First w represents the declaration of a variable
– Second w refers to its use.
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
52

Non-Context Free language Constructs
Checking the number of formal parameters declared in a function is the same as the number of actual parameters when it is called
• L={anbmcndm | n, m >= 1}.
– anbm could represent formal parameter lists of two procedures with n and m arguments respectively
– cndm represent the actual parameter lists in the calls to these two functions.
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
53

References
• Compilers: Principles, Techniques, and Tools (2nd Edition)
• By A.V. Aho, Lam, R. Sethi, . Ullman
• Chapter 2, 4
10/8/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
54

Leave a Reply

Your email address will not be published. Required fields are marked *