Finite Automata Archives -

Context Free Grammars

ComputeNow — Thu, 23 Aug 2018 18:21:25 +0000

Context free grammars (CFGs) are used to describe context-free languages. A context-free grammar is a set of recursive rules used to generate patterns of strings. A context-free grammar can describe all regular languages and more, but they cannot describe all possible languages.

Context-free grammars are studied in fields of theoretical computer science, compiler design, and linguistics. CFG’s are used to describe programming languages and parser programs in compilers can be generated automatically from context-free grammars.

Two parse trees that describe CFGs that generate the string “x + y * z”. Source: Context-free grammar wikipedia page.

Context Free Grammars:

Context-free grammars can generate context-free languages. They do this by taking a set of variables which are defined recursively, in terms of one another, by a set of production rules. Context-free grammars are named as such because any of the production rules in the grammar can be applied regardless of context—it does not depend on any other symbols that may or may not be around a given symbol that is having a rule applied to it.

Context-free grammars have the following components:

A set of terminal symbols which are the characters that appear in the language/strings generated by the grammar. Terminal symbols never appear on the left-hand side of the production rule and are always on the right-hand side.

A set of nonterminal symbols (or variables) which are placeholders for patterns of terminal symbols that can be generated by the nonterminal symbols. These are the symbols that will always appear on the left-hand side of the production rules, though they can be included on the right-hand side. The strings that a CFG produces will contain only symbols from the set of nonterminal symbols.

A set of production rules which are the rules for replacing nonterminal symbols. Production rules have the following form: variable  string of variables and terminals.

A start symbol which is a special nonterminal symbol that appears in the initial string generated by the grammar.

For comparison, a context-sensitive grammar can have production rules where both the left-hand and right-hand sides may be surrounded by a context of terminal and nonterminal symbols.

To create a string from a context-free grammar, follow these steps:

- Begin the string with a start symbol.

- Apply one of the production rules to the start symbol on the left-hand side by replacing the start symbol with the right-hand side of the production.

- Repeat the process of selecting nonterminal symbols in the string, and replacing them with the right-hand side of some corresponding production, until all nonterminals have been replaced by terminal symbols. Note, it could be that not all production rules are used.

Formal Definition

A context-free grammar can be described by a four-element tuple (V, Σ, R, S) , where

V is a finite set of variables (which are non-terminal)
Σ is a finite set (disjoint from V) of terminal symbols
R is a set of production rules where each production rule maps a variable to a string s ∈ (V ∪ Σ) *
S (which is in V ) which is a start symbol.

Example:
Come up with a grammar that will generate the context-free (and also regular) language that contains all strings with matched parentheses.

There are many grammars that can do this task. This solution is one way to do it, but should give you a good idea of if your (possibly different) solution works too.

Starting symbol -> S
Non-terminal variables = {(,)}
Production rules:

- S -> ( )

- S -> SS

- S -> (S).

A way to condense production rules is as follows:

We can take

S->()
S->SS
S->(S)

and translate them into a single line: S -> ( ) | SS | (S) | ε where ε is an empty string.

Context-free grammars can be modeled as parse trees. The nodes of the tree represent the symbols and the edges represent the use of production rules. The leaves of the tree are the end result (terminal symbols) that make up the string the grammar is generating with that particular sequence of symbols and production rules.

The parse trees below represent two ways to generate the string “a + a – a” with the grammar

Example of an ambiguous grammar—one that can have multiple ways of generating the same string

Because this grammar can be implemented with multiple parse trees to get the same resulting string, this is said to be ambiguous.

Relationship with other Computation Models

A context-free grammar can be generated by pushdown automata just as regular languages can be generated by finite state machines. Since all regular languages can be generated by CFGs, all regular languages can too be generated by pushdown automata.

Any language that can be generated using regular expressions can be generated by a context-free grammar.

The way to do this is to take the regular language, determine its finite state machine and write production rules that follow the transition functions.

The post Context Free Grammars appeared first on .

Definition of Pushdown Automata

ComputeNow — Fri, 17 Aug 2018 19:52:10 +0000

Definition of Pushdown Automata

Pushdown Automata is a finite automata with extra memory called stack which helps Pushdown automata to recognize Context Free Languages.

A Pushdown Automata (PDA) can be defined as :

Q is the set of states
∑ is the set of input symbols
Γ is the set of pushdown symbols (which can be pushed and popped from stack)
q0 is the initial state
Z is the initial pushdown symbol (which is initially present in stack)
F is the set of final states
δ is a transition function which maps Q x { ∑ ∪ ɛ } x Γ into Q x Γ *. In a given state, PDA will read input symbol and stack symbol (top of the stack) and move to a new state and change the symbol of stack.

Pushdown automata are nondeterministic finite state machines augmented with additional memory in the form of a stack, which is why the term “pushdown” is used, as elements are pushed down onto the stack.

Pushdown automata are computational models — theoretical computer-like machines — that can do more than a finite state machine, but less than a Turing machine.

Pushdown automata accept context-free languages, which include the set of regular languages. The language that describes strings that have matching parentheses is a context-free language.

Say that a programmer has written some code, and in order for the code to be valid, any parentheses must be matched.

One way to do this would be to feed the code (as strings) into a pushdown automaton programmed with transition functions that implement the context-free grammar for the language of balanced parentheses.

If the code is valid, and all parentheses are matched, the pushdown automata will “accept” the code. If there are unbalanced parentheses, the pushdown automaton will be able to return to the programmer that the code is not valid. This is one of the more theoretical ideas behind computer parsers and compilers.

Pushdown automata can be useful when thinking about parser design and any area where context-free grammars are used, such as in computer language design.

Since pushdown automata are equal in power to context-free languages, there are two ways of proving that a language is context-free: provide the context-free grammar or provide a pushdown automaton for the language.

δ represents transition functions (the program of the pushdown automaton), is the stack symbol, is the tape symbol, and represents the state

A stack can be thought of as a stack of plates, one is stacked on top of the other, and plates can be taken off of the top of the stack. To get to the bottom of the stack of plates, all others must be removed first. Stacks are a last-in-first-out, or LIFO data structure. In pushdown automata, state transitions include adding a symbol to the string generated, like in FSMs, but state transitions can also include instructions about pushing and popping elements to and from the stack.

One can walk through the pushdown automata diagram to see what kinds of strings can be produced by the transition functions describing the language the pushdown automata generates, or you can feed it an input string and verify that there exists a set of transitions that end in an accepting state that creates the input string.

At each transition, a pushdown automaton can push a symbol to the stack, pop a symbol from the stack, do both, or do no operations to the stack. This transition symbol is ɛ. ɛ also represents the empty string and can be used as a symbol. If the instructions say that ɛ is the symbol read, this means that the stack/input is empty. If the instructions say to replace the symbol on top of the stack with an ɛ this means to delete the symbol on top of the stack (this is popping).

The pushdown automaton starts with an empty stack and accepts if it ends in an accepting state at the end. The contents of the stack at the end do not matter unless the problem specifies that the stack must be empty at the end. If no transition from the current state can be made, reject. For example, if the transition from state A to state B requires popping an x from the stack, if there is no x on the top of the stack to pop, reject.

Pushdown automata can be modeled as a state machine diagram with added instructions about the stack. Where in the finite state machine, the arrows between states were labeled with a symbol that represented the input symbol from the string, a pushdown automaton includes information in the form of input symbol followed by the symbol that is currently at the top of the stack, followed by the symbol to replace the top of the stack with. These instructions are sometimes separated by commas, slashes, or arrows.

The exception to the “replace with this symbol” command is during the first step after we write the $ symbol, we do not overwrite (i.e. pop/delete) the $ symbol. We need to keep this so that as we reach the end of the string, we know when we’ve reached the bottom of our stack. Instead of overwritting this symbol, simply place the next stack symbol on top of the $.

For this example, assume that s5 and s6 are the accepting states. This pushdown automaton only shows instructions for the stack, usually, the pushdown automata diagrams will also contain information about which symbols are needed to move from one state to another, but let’s use this example to get a feel for how the stack works. Assume the stack starts off empty, with the symbol $ , which indicates the bottom of the stack: so the stack is initially set to [$].

What does the stack look like after following these transitions: s1 to s2 to s3 ?

The push down automaton pushes “a”, pushes “b”, and then pushes another “b” so the stack at this point is [$,a,b,b].

Starting with the empty stack, what does the stack look like after the transitions s1 to s2 to s3 to s3 to s4 to s4 ?

The push down automaton pushes “a”, pushes “b”, pushes “b”, pushes “b”, pops “b”, and pops “b”, so the stack looks like [$, a, b].

Try It Yourself:

Given the following pushdown automata and input string 00001111, what does the PDA’s stack look like once it has gotten through and read the second 1 of the string (i.e. the substring 000011 has been read).

It is easy give your answer in comments.

What’s the point of a stack

A stack allows pushdown automata a limited amount of memory. A pushdown automaton can read from, push (add) to, or pop (remove) the stack. The transition functions that describe the pushdown automaton (usually represented by labels on the arrows between the state circles) tell the automaton what to do.

Pushdown automata accept context-free languages. This means that a context-free language can be represented by a pushdown automaton or a context-free grammar.

For example, the language containing all strings of 0’s followed by an equal number of 1’s is a context-free language, and it was proved on the regular languages page that this language is not a regular language, and so it is possible to represent this language using a pushdown automaton.

Here is a push down automaton that accepts strings in the language L = {0,1 | 0^n1^n for n >= 0 }.

Note: in the transition from A to B, do not overwrite the $ symbol with an empty string (i.e. don’t remove the $) just write the new symbol on top of that.

The post Definition of Pushdown Automata appeared first on .

Equivalence of Automata

ComputeNow — Wed, 15 Aug 2018 13:30:31 +0000

Equivalence of Automata

Two automata A and B are said to be equivalent if both accept exactly the same set of input strings. Formally, if two automata A and B are equivalent then

if there is a path from the start state of A to a final state of A labeled a1a2..ak, there there is a path from the start state of B to a final state of B labeled a1a2..ak.
if there is a path from the start state of B to a final state of B labeled b1b2..bj, there there is a path from the start state of A to a final state of A labeled b1b2..bj.

[the_ad id=”112″]

Equivalence of Deterministic and Nondeterministic Automata

To show that there is a corresponding DFA for every NDFA, we will show how to remove nondeterminism from an NDFA, and thereby produce a DFA that accepts the same strings as the NDFA.

The basic technique is referred to as subset construction, because each state in the DFA corresponds to some subset of states of the NDFA.

The idea is this: as we trace the set of possible paths thru an NDFA, we must record all possible states that we could be in as a result of the input seen so far. We create a DFA which encodes the set of states of the NDFA that we could be in within a single state of the DFA.

Subset Construction for NDFA

To create a DFA that accepts the same strings as this NDFA, we create a state to represent all the combinations of states that the NDFA can enter.

From the previous example (of an NDFA to recognize input strings containing the word “main”) of a 5 state NDFA, we can create a corresponding DFA (with up to 2^5 states) whose states correspond to all possible combinations of states in the NDFA:

  {},
  {s0}, {s1}, {s2}, {s3}, {s4},
  {s0, s1}, {s0, s2}, {s0, s3}, {s0, s4},
  {s1, s2}, {s1, s3}, {s1, s4},
  {s2, s3}, {s2, s4},
  {s3, s4},
  {s0, s1, s2}, {s0, s1, s3}, {s0, s1, s4},
  {s0, s2, s3}, {s0, s2, s4},
  {s0, s3, s4},
  {s1, s2, s3}, {s1, s2, s4},
  {s1, s3, s4},
  {s2, s3, s4},
  {s0, s1, s2, s3}, {s0, s1, s2, s4},
  {s0, s1, s3, s4}, {s0, s2, s3, s4},
  {s1, s2, s3, s4},
  {s0, s1, s2, s3, s4}

Note that many of these states won’t be needed in our DFA because there is no way to enter that combination of states in the NDFA. However, in some cases, we might need all of these states in the DFA to capture all possible combinations of states in the NDFA.

Subset Construction for NDFA (cont)

A DFA accepting the same strings as our example NDFA has the following transitions:

  {s0} -m-> {s0,s1}
  {s0} -not m-> {s0}
  
  {s0,s1} -m-> {s0,s1}
  {s0,s1} -a-> {s0,s2}
  {s0,s1} -not m,a-> {s0}
  
  {s0,s2} -m-> {s0,s1}
  {s0,s2} -i-> {s0,s3}
  {s0,s2} -not m,i-> {s0}
  
  {s0,s3} -m-> {s0,s1}
  {s0,s3} -n-> {s0,s4}
  {s0,s3} -not m,n-> {s0}

The start state is {s0} and the final state is {s0,s4}, the only one containing a final state of the NDFA.

Limitations of Finite Automata

The defining characteristic of FA is that they have only a finite number of states. Hence, a finite automata can only “count” (that is, maintain a counter, where different states correspond to different values of the counter) a finite number of input scenarios.

There is no finite automaton that recognizes these strings:

The set of binary strings consisting of an equal number of 1’s and 0’s
The set of strings over ‘(‘ and ‘)’ that have “balanced” parentheses

The ‘pumping lemma’ can be used to prove that no such FA exists for these examples.

The post Equivalence of Automata appeared first on .

Deterministic Finite Automata (DFA)

ComputeNow — Mon, 13 Aug 2018 16:47:35 +0000

Definition of Deterministic Finite Automata

Deterministic Finite Automata (DFA) consists of 5 tuples {Q, ∑, q, F, δ}. 
Q : set of all states.
∑ : set of input symbols. ( Symbols which machine takes as input )
q : Initial state. ( Starting state of a machine )
F : set of final state.
δ : Transition Function, defined as δ : Q X ∑ --> Q.

In a DFA, for a particular input character, machine goes to one state only. A transition function is define on every state for every input symbol. Also in DFA null (or ε) move is not allowe, i.e., DFA can not change state without any input character.

For example, below DFA with ∑ = {0, 1} accepts all strings ending with 0.

One important thing to note is, there can be many possible DFAs for a pattern. A DFA with minimum number of states is generally preferred.

Some Important Points:

Every DFA is NFA but not vice versa.
Both NFA and DFA have same power and each NFA can be translated into a DFA.
There can be multiple final states in both DFA and NFA.
NFA is more of a theoretical concept.
DFA is used in Lexical Analysis in Compiler.

Limitations of Finite Automata

There is no finite automaton that recognizes these strings:

The set of binary strings consisting of an equal number of 1’s and 0’s
The set of strings over ‘(‘ and ‘)’ that have “balanced” parentheses

The ‘pumping lemma’ can be used to prove that no such FA exists for these examples.

Read: What is Non Deterministic Finite Automata?

The post Deterministic Finite Automata (DFA) appeared first on .

Definition of Finite Automata

ComputeNow — Wed, 08 Aug 2018 17:32:44 +0000

Definition of Finite Automata:

A finite automata (FA) is a simple idealized machine used to recognize patterns within input taken from some character set (or alphabet) C. The job of an FA is to accept or reject an input depending on whether the pattern defined by the FA occurs in the input.

A finite automaton consists of:

a finite set S of N states
a special start state
a set of final (or accepting) states
a set of transitions T from one state to another, labeled with chars in C

As noted above, we can represent a FA graphically, with nodes for states, and arcs for transitions.

We execute our FA on an input sequence as follows:

Begin in the start state
If the next input char matches the label then a transition from the current state to a new state, go to that new state
Continue making transitions on each input char
- If no move is possible then stop
- If in accepting state, then accept

Finite Automata(FA) is the simplest machine to recognize patterns.

A Finite Automata consists of the following :

Q : Finite set of states.
∑ : set of Input Symbols.
q : Initial state.
F : set of Final States.
δ : Transition Function.

Formal specification of machine is
{ Q, ∑, q, F, δ }.

Example of Finite Automata In Term of Programming Language

Suppose you want to write a program to recognize the word “main” in an input program. Logically, your program will look something like this:

cin >> char
while (char != “m”) cin >> char
if (cin >> char != “a”) go to step 1
if (cin >> char != “i”) go to step 1
if (cin >> char != “n”) go to step 1
done

We can explain each step in this program as follows:

Initialization
Looking for “m”
Recognized “m”, looking for “a”
Recognized “ma”, looking for “i”
Recognized “mai”, looking for “n”
Recognized “main”

Each step in the program corresponds to a different place in the recognition process. We can capture this behavior in a graph

each node in the graph represents a step in the process
arcs in the graph represent movement from one step to another
labels on the arcs correspond to the input required to make a transition

It is fairly straightforward to translate an FA into a program. Consider a 4-state FA to recognize “main” in a program.

Let FA = {S,C,T,s0,F}
S = {s0, s1, s2, s3, s4}
C = {a,b,..z,A,B,..Z,0,1,..9,+,-,*,/}
F = {s4}

Transitioning of States (T) as follow:

T = { (s0,m,s1), (s0,C-m,s0), (s1,a,s2), (s1,m,s1), (s1,C-a-m,s0), (s2,i,s3), (s2,m,s1), (s2,C-i-m,s0), (s3,n,s4), (s3,m,s1), (s3,C-n-m,s0), (s4,C,s4) }

We can easily create a program from this description of the FA. We will use statement labels to represent states and goto’s to represent the meaning of an arc. (In general, goto’s are discouraged, but this is one case where their use is not only reasonable, it is quite common.) The variable “accept” is true if the FA accepts, and is false otherwise.

s: accept = false; cin >> char;
if char = “m” goto m;
if char = EOF goto end;
goto s;
m: accept = false; cin >> char;
if char = “m” goto m;
if char = “a” goto a;
if char = EOF goto end;
goto s;
a: accept = false; cin >> char;
if char = “m” goto m;
if char = “i” goto i;
if char = EOF goto end;
goto s;
i: accept = false; cin >> char;
if char = “m” goto m;
if char = “n” goto n;
if char = EOF goto end;
goto s;
n: accept = true; while (cin >> char);
end: cout << accept;

Finite Automata or Finite Automation is the first level of machine that works to match the string and how it will be acceptible. We can imagine the lexical analysis on Finite Automata to match the token.

The post Definition of Finite Automata appeared first on .