World's most popular travel blog for travel bloggers.

What is the Relationship Between Programming Languages, Regular Expressions and Formal Languages

, , No Comments
Problem Detail: 

I've looked around the net for an answer to this question and it seems as if everybody implicitly knows the answer except me. Presumably this is because the only people who care are those who have had tertiary education on the subject. I, on the other hand, have been thrown in the deep end for a high school assignment.

My question is, how exactly are programming languages related to formal languages? Everywhere I read, something along the lines of "formal languages are used for defining the grammar of programming languages" is said.

Now from what I've been able to gather, a formal language is a series of production rules that apply to a specific set of symbols (the language's alphabet). These production rules define a set of transformations, such as:

b -> a

aaa->c

This can be applied such that:

abab->aaaa aaaa-> ca

Just as a side note, if we define that our formal language's alphabet as {a,b,c}, then a and b are non terminals and c is terminal as it can not be transformed (please correct me if I'm wrong about that).

So given all that, how on earth does this apply to programming languages? Often it is also stated that regex is used to parse a language in it's text form to ensure the grammar is correct. This makes sense. Then it is stated that regex are defined by formal languages. Regex return true or false (in my experience at least) depending on if the finite state automata that represents the regex reaches the goal point. As far as I can see, that has nothing to do with transformations*.

For the compiling of the program itself, I suppose a formal language would be able to transform code into consecutively lower level code, eventually reaching assembly via a complex set of rules, which the hardware could then understand.

So that's things from my confused point of view. There's probably a lot of things fundamentally wrong with what I have said, and that is why I'm asking for help.


*Unless you consider something like (a|b)*b*c->true to be a production rule, in which case the rule requires a finite state automata (ie: regex). This makes no sense as we just said that

Asked By : Zwander

Answered By : Yuval Filmus

Whoever told you that regular expressions are used to parse code was spreading disinformation. Classically (I don't know to what extent this is true in modern compilers), the parsing of code – conversion of code from text to a syntax tree – is composed of two stages:

  1. Lexical analysis: Processes the raw text into chunks such as keywords, numerical constants, strings, identifiers and so on. This is classically implemented using some sort of finite state machine, similar in spirit to a deterministic finite automaton (DFA).

  2. Parser: Run after lexical analysis, and converts the raw text into an annotated syntax tree. The grammar of programming languages is (to first approximation) context-free (actually, one needs an even stricter subset), and this allows certain efficient algorithms to parse the lexified code into a syntax tree. This is similar to the problem of recognizing whether a given string belongs to some context-free grammar, the difference being that we also want the proof in the form of a syntax tree.

Grammars for programming languages are written as context-free grammars, and this representation is used by parser generators to construct fast parsers for them. A simple example would have some non-terminal STATEMENT and then rules of the form STATEMENT$\to$IF-STATEMENT, where IF-STATEMENT$\to$if CONDITION then BLOCK endif, or the like (where BLOCK$\to$STATEMENT|BLOCK;STATEMENT, for example). Usually these grammars are specified in Backus-Naur form (BNF).

The actual specifications of programming languages are not context-free. For example, a variable cannot appear if it hadn't been declared in many languages, and languages with strict typing might not allow you to assign an integer to a string variable. The parser's job is only to convert the raw code into a form which is easier to process.

I should mention that there are other approaches such as recursive descent parsing which doesn't actually generate a parse tree, but processes your code as it parses it. Although it doesn't bother to generate the tree, in all other respects it operates at the same level as described above.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/30639

0 comments:

Post a Comment

Let us know your responses and feedback