Regular Expression Archives -

Every regular expression describes regular language

ComputeNow — Sat, 27 Jul 2019 08:51:39 +0000

Every regular expression describes regular language, let R be an arbitrary regular expression over the alphabet Σ. We will prove that the language described by R is a regular language. The proof is by induction on the structure of R.

The first base case of induction: Assume that R = ε. The R describes the language of {ε}. In order to prove that this language is regular, it suffices, by the theorem which says,

Theorem 1: Let A be a language. Then A is regular if and only if there exists a nondeterministic finite automaton that accepts A.

thus, let construct the NFA M = (Q, Σ, δ, q, F) that accepts this language. This NFA is obtained by defining Q={q}, q is the start state, F = {q}, and δ(q,a) = ε, for all a ∈ Σ_ε. The figure below gives the state diagram of M:

The second base case:Assume that R= ε. The R describes the language of {ε}. In order to prove that this language is regular, we know , by theorem 1, which state that if language is regular then it should be accepted by NFA.

So, let construct the NFA M = (Q, Σ, δ, q, F) that accepts this language. This NFA is obtained by defining Q={q}, q is the start state, F = θ, means final state not exist, and δ(q,a) = θ, for all a ∈ Σ_ε. The figure below gives the state diagram of M:

The third base case: Let a ∈ Σ and assume that R = a. The R describes the language of {a}. In order to prove that this language is regular, we know , by theorem 1, which state that if language is regular then it should be accepted by NFA.

So, let construct the NFA M = (Q, Σ, δ, q₁, F) that accepts this language. This NFA is obtained by defining Q={q₁, q₂}, q₁ is the start state, F = {q₂}, and

δ(q₁,a) ={q₂},

δ(q₁,b) = θ for all b ∈ Σ_ε\ {a}

δ(q₁,b) = θ for all b ∈ Σ_ε

The figure below gives the state diagram of M:

The first case of the induction step: Assume that R = R1 ∪ R2, where R1 and R2 are regular expressions. Let L1 and L2 be the languages described by R1 and R2, respectively, and assume that L1 and L2 are regular. Then R describes the language L1 ∪ L2, which, by,

Theorem 2: The set of regular languages is closed under the union operation, i.e., if A1 and A2 are regular languages over the same alphabet Σ, then A1 ∪ A2 is also a regular language.

The second case of the induction step: Assume that R = R1 ∪ R2, where R1 and R2 are regular expressions. Let L1 and L2 be the languages described by R1 and R2, respectively, and assume that L1 and L2 are regular. Then R
describes the language L1 ∪ L2, which, by Theorem 3, is regular.

Theorem 3: The set of regular languages is closed under the concatenation operation, i.e., if A1 and A2 are regular languages over the same alphabet Σ , then A1A2 is also a regular language.

The third case of the induction step: Assume that R = (R1)*, where R1 is a regular expression. Let L1 be the language described by R1 and assume that L1 is regular. Then R describes the language (L1)*, which, by Theorem 4, is regular.

Theorem 4: The set of regular languages is closed under the star (Kleene) operation, i.e., if A is a regular language, then A* is also a regular language.

This concludes the proof of the claim that every regular expression describes a regular language.

Read: Regular Language in Automata Thoery

The post Every regular expression describes regular language appeared first on .

Regular Language in Automata Thoery

ComputeNow — Thu, 20 Sep 2018 17:22:27 +0000

Regular Languages or Formal Language : A language is regular if it can be expressed in terms of regular expression.

Closure Properties of Regular Languages

Union : If L1 and If L2 are two regular languages, their union L1 ∪ L2 will also be regular. For example, L1 = {aⁿ | n ≥ 0} and L2 = {bⁿ | n ≥ 0}
L3 = L1 ∪ L2 = {aⁿ ∪ bⁿ | n ≥ 0} is also regular.
Intersection : If L1 and If L2 are two regular languages, their intersection L1 ∩ L2 will also be regular. For example,
L1= {a^m bⁿ | n ≥ 0 and m ≥ 0} and L2= {a^m bⁿ ∪ bⁿ a^m | n ≥ 0 and m ≥ 0}
L3 = L1 ∩ L2 = {a^m bⁿ | n ≥ 0 and m ≥ 0} is also regular.
Concatenation : If L1 and If L2 are two regular languages, their concatenation L1.L2 will also be regular. For example,
L1 = {aⁿ | n ≥ 0} and L2 = {bⁿ | n ≥ 0}
L3 = L1.L2 = {a^m . bⁿ | m ≥ 0 and n ≥ 0} is also regular.
Kleene Closure : If L1 is a regular language, its Kleene closure L1* will also be regular. For example,
L1 = (a ∪ b)
L1* = (a ∪ b)*
Complement : If L(G) is regular language, its complement L’(G) will also be regular. Complement of a language can be found by subtracting strings which are in L(G) from all possible strings. For example,
L(G) = {aⁿ | n > 3}
L’(G) = {aⁿ | n <= 3}

Note : Two regular expressions are equivalent if languages generated by them are same. For example, (a+b*)* and (a+b)* generate same language. Every string which is generated by (a+b*)* is also generated by (a+b)* and vice versa.

How to solve problems on regular expression and regular languages?

Question 1 : Which one of the following languages over the alphabet {0,1} is described by the regular expression?
(0+1)*0(0+1)*0(0+1)*
(A) The set of all strings containing the substring 00.
(B) The set of all strings containing at most two 0’s.
(C) The set of all strings containing at least two 0’s.
(D) The set of all strings that begin and end with either 0 or 1.

Solution : Option A says that it must have substring 00. But 10101 is also a part of language but it does not contain 00 as substring. So it is not correct option.
Option B says that it can have maximum two 0’s but 00000 is also a part of language. So it is not correct option.
Option C says that it must contain atleast two 0. In regular expression, two 0 are present. So this is correct option.
Option D says that it contains all strings that begin and end with either 0 or 1. But it can generate strings which start with 0 and end with 1 or vice versa as well. So it is not correct.

Question 2 : Which of the following languages is generated by given grammar?
S -> aS | bS | ∊
(A) {aⁿ b^m | n,m ≥ 0}
(B) {w ∈ {a,b}* | w has equal number of a’s and b’s}
(C) {aⁿ | n ≥ 0} ∪ {bⁿ | n ≥ 0} ∪ {aⁿ bⁿ | n ≥ 0}
(D) {a,b}*

Solution : Option (A) says that it will have 0 or more a followed by 0 or more b. But S -> bS => baS => ba is also a part of language. So (A) is not correct.
Option (B) says that it will have equal no. of a’s and b’s. But But S -> bS => b is also a part of language. So (B) is not correct.
Option (C) says either it will have 0 or more a’s or 0 or more b’s or a’s followed by b’s. But as shown in option (A), ba is also part of language. So (C) is not correct.
Option (D) says it can have any number of a’s and any numbers of b’s in any order. So (D) is correct.

Question 3 : The regular expression 0*(10*)* denotes the same set as
(A) (1*0)*1*
(B) 0 + (0 + 10)*
(C) (0 + 1)* 10(0 + 1)*
(D) none of these

Solution : Two regular expressions are equivalent if languages generated by them are same.
Option (A) can generate 101 but 0*(10*)* cannot. So they are not equivalent.
Option (B) can generate 0100 but 0*(10*)* cannot. So they are not equivalent.
Option (C) will have 10 as substring but 0*(10*)* may or may not. So they are not equivalent.

The post Regular Language in Automata Thoery appeared first on .

Regular Expressions – (Regex) – Regular Expression

ComputeNow — Sun, 02 Sep 2018 04:35:46 +0000

Regular Expressions was initially a term borrowed from automata theory in theoretical computer science. Broadly, it refers to patterns to which a sub-string needs to be matched.

The comic should have already given you an idea of what regular expressions could be useful for. It should not be surprising that many programming languages, text processing tools, data validation tools and search engines make extensive use of them.

The key idea is that a regular expression is a pattern which matches a set of target strings.

\w+@\w+\.(com|org|net|in) is a regex that matches a most email addresses that end with a .com, .net, .org or a .in.

Regular Expressions Concepts

There are many forms of regex syntax that vary with the language. Here, we will be examining Perl regex since most other regexps are usually a variation on this.

Before we dive into the syntax, these are the kinds of things that the patterns consist of:

Literals: They are the simplest things to match. When they are there, we just match them. It could be like an a or a 1.
Meta characters: They do not mean what they look like. They usually refer to something else. For example, \d could refer to any digit.
Vertical Bar: The | is a symbol of boolean OR. It gives an option to match any of the things it delimits.
Quantifiers: They specify how many of the concerned pattern needs to be matched.
Grouping and Capturing: Parentheses could be used to group parts of the regex or capturing parts for later use.

Regular Expression Syntax

Let’s look at what the meta characters do in a little more detail.

Meta character	Description
`^`	Start of a string
`$`	End of a string
`\t`	Tab
`\n`	Newline
`\r`	Carriage Return
`\s`	Any whitespace character
`\S`	Any non-whitespace character
`\d`	Any Digit
`\D`	Any non-digit
`\w`	Any word-character
`\W`	Any non-word character
`\b`	Any word boundary
`\B`	Any non-word-boundary
`.`	Any single character, usually barring a newline

By the way, if you want to match a metacharacter literally, you need to use \ to escape it. For example, \. would just match the . character.

Now, let us look into more flexibility stuff.

Expression	Meaning
`[abc]`	Matches any of `a`,`b`, or `c`
`[^abc]`	Matches anything other than `a`, `b`, or `c`
`[a-d]`	Matches any of the characters in the range `a-d`
`a*`	Matches `a` zero or more times
`a?`	Matches `a` zero or one time
`a+`	Matches `a` one or more times
`a\|b`	Matches either `a` or `b`
`a{3}`	Matches exactly 3 of `a`
`a{3,}`	Matches 3 or more of `a`
`a{3,5}`	Matches 3, 4 or 5 of `a` (inclusive range)
`( )`	Captures everything inside the bracket

Example:

We are now ready to explain why \w+@\w+\.(com|org|net|in) does what it claims.

Firstly, what should an email look like? That's right, it should have a structure like user@domain.extension.

The user and domain consists of any letter, number or underscore but at least one of them. So, we use \w+.

We restrict the extension to org, com, net or in by using the |.