ToC: Lesson 2 Accepting Languages.

Journal Nyarly's Journal: ToC: Lesson 2 Accepting Languages.

Journal by Nyarly on Thursday February 21, 2002 @06:06PM

Okay, so, you've read lesson 1, or you somehow understand how strings and languages are defined in this context. And there was something of a cliffhanger last lesson about accepting strings and languages.

Strictly speaking, a language is really the set of strings accepted by some means. But you could also define a means to accept a specific language of strings. Since usually the means are defined by the language it accepts, it seems to make more sense to me to define them in that order, but it really comes down to a toss-up.

"By some means" isn't very descriptive, and its the filling in of that blank that I'll spend most of this entry. I'm going to start with the idea of regular expressions, for two reasons. First, it's much easier for me to explain and illustrate a regular expression in a text based format than a state machine (the other common option), and I'm sure there's some who will be pleasantly surprised to see where regex (as they're usually called) come from.

Quick aside (for the regex literate, everyone else, skip on): this may not be exactly what you expect of regex (or is the plural regexes? regexi?) Specifically, these regex are anchored, and they aren't intended for replacement, so don't worry about /1 - /9 and whatnot.

Okay, a regex can accept a language of exactly one string by listing that string. So if the regex is "cow", then it accepts the language {cow} exactly. Simple, no?

Too simple. What if I want to accept {cow, chickens}? The first operation of regex is called alternation. Put two strings together with a vertical bar between them, like so: "cow|chickens" and the resulting regex accepts either element. In this case {cow,chickens}. Quick note: {cow,chickens,cowchickens} isn't accepted. Alternation is exclusive.

It seems like this would get really tedious really quickly. So far, are we really any better off than listing the accepted strings? Next few regex operands have to do with repetition.

Follow a single symbol with a '?', and it becomes optional; it can occur zero or once.
example: a?b accepts {ab, b}.

Follow a single symbol with a '*', and it can occur any number of times, including none.
example: a*b accepts {b, ab, aab, aaab, ...} (the language is infinite.)

Follow a single symbol with something like {n,m} means "this symbol can occur at least n times, and at most m times." {n} means "this symbol can occur exactly n times". {0,inf} (where by "inf" I mean the infinity symbol) is equivalent to *, and {0,1} is equivalent to ?.
example:a{3,4}b accepts {aaab, aaaab}.

The operand . stands for any symbol. We don't care what it is, just accept it.
example:.* accepts the language that consists of every possible string such that no string is a prefix of any other.

Last operand of importance: ( and ) are used for grouping. Surround anything with ( ) and it can be treated as a single symbol for purposes of the above operands.
example: (a|b)*(cow|chicken)?end. This accepts any string of a and b, possibly a "cow" or "chicken", so long as it ends with the three symbols e n d.

That being said, regex usually gets used to define languages, in the form "the language accepted by (foo|bar){2,17}a*b."

Quick quiz: how many strings are in the language accepted by a*?

What we really want to accept languages with are State Machines.

This discussion has been archived. No new comments can be posted.