World's most popular travel blog for travel bloggers.

[Solved]: Lexing and parsing a language with juxtaposition as an operator

, , No Comments
Problem Detail: 

Normal human math notation treats juxtaposition as implied multiplication, e.g., $2x$ means $2$ multiplied by $x$. This does not seem to be a common feature of computer languages, although it was, for example, supposed to be included in the language Fortress (now a dead project).

Lexing and parsing such a language seems to me like it would be difficult within the usual lexer-parser framework. In a computer language, unlike human math notation, we want to have variable names that are more than one letter. So if we need to lex the string xyz, we have a problem, because this could, e.g., mean x*yz if x and yz are variables, but it could also mean xy*z if xy and z are variables. The lexer normally wouldn't have this type of information available, so it would have no way of resolving the ambiguity.

Are there ways of reducing this problem to a more standard form that can be handled by a traditional lexer-parser pair, or does it require qualitatively different techniques?

Asked By : Ben Crowell

Answered By : rici

There is no issue parsing a language which uses juxtaposition as an operator provided that it is possible to identify lexical tokens. For example, awk uses juxtaposition for string concatenation, and the following expressions are completely valid:

last_name", "first_name last_name ", " first_name last_name comma space first_name 

However, two juxtaposed variables must be separated by whitespace to satisfy the lexical constraint. Whitespace is also used to resolve the ambiguity:

last_name(", ") 

which might be a function call or a concatenation. Since awk does not require function definitions to precede use of the function, it cannot disambiguate based on the type of last_name; instead, it specifies that a function call cannot have whitespace between the function's name and the (, so that the above is a function call, while:

last_name (", ") 

is concatenation.

Fortress's syntax has (or had) a number of other issues which are harder to solve with a simple yacc/lex parser; if I recall correctly, the proof-of-concept implementation used a packrat parser. For example, the syntax allows the use of vertical bars either as operators or as brackets, so |a| might be the absolute value of a. Somewhat complicated whitespace rules were used to disambiguate, and there are other situations in which whitespace is significant, affecting operator precedence. In my opinion, the apparent convenience of having a language which resembles mathematics is out-weighed by the possibility of an unintended parse changing the semantics of the program without any indication as a warning or error. (This sort of issue plagues some more conventional languages also.)

Another class of languages in which juxtaposition is common are unit-aware languages (or calculators), such as GNU Units or Frink. As far as I know neither of these requires any special parsing technology, although there is a general issue with the solidus (/) which can require careful juggling of operator precedence. It's clear what:

2 cm/sec 

means. But what about the following:

2/3 cm/sec 2 cm/3 sec 2 g/cm sec^2 

(It's relevant that SI insists on parentheses and prohibits multiple solidus symbols, so that the last value must be written as 2 g/(cm sec2); neither 2 g/cm sec2 nor 2 g/cm/sec2 are correct.)

Again, this is not so much a parsing issue as a user interface design problem.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/28408

 Ask a Question

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback