Given a language $L$, define the length set of $L$ as the set of lengths of words in $L$: $$\mathrm{LS}(L) = \{|u| \mid u \in L \}$$
Which sets of integers can be the length set of a regular language?
Asked By : Gilles
Answered By : Gilles
First, an observation which is not crucial but convenient: the set $\mathscr{S}$ of sets of integers that are $LS(L)$ for some regular language $L$ on a non-empty alphabet $\mathscr{A}$ does not depend on the choice of alphabet. To see that, consider a finite automaton that recognizes $L$; the lengths of the words that are in $L$ are the lengths of the paths on the automaton seen as an unlabeled graph from the start state to any accept state. In particular, you can relabel every arrow to $a$ and get a regular language with the same length set over the alphabet $\{a\}$. Conversely, if $L$ is a regular language over a one-element alphabet, it can be trivially injected into a larger alphabet, and the result is still a regular language.
Therefore we are looking for the possible length sets for words over a singleton alphabet. On a singleton alphabet, the language is the length set written out in unary: $\mathrm{LS}(L) = \{n\in\mathbb{N} \mid a^n \in L\}$.
Let $L$ be a regular language, and consider a deterministic finite automaton (DFA) that recognizes $L$. The set of lengths of words of $L$ is the set of lengths of paths in the DFA seen as a directed graph that start on the start state and end in one of the accept states. A DFA on a one-element alphabet is pretty tame (NFAs would be wilder): it's either a finite list or a circular list. If the list is finite, number the states from $0$ to $h$ following the list order; if it's circular, number the states from $0$ to $h$ following the head of the list, and $h$ to $h+r$ along the loop.
Let $F$ be the set of indices of accept states up to $h$, and $G$ be the set of indices of accept states from $h$ to $h+r$. Then
$$\mathrm{LS}(L) = F \cup \{ k \, r + x \mid x \in G, k\in\mathbb{N} \}$$
Conversely, let $h$ and $r$ be two integers and $F$ and $G$ be two finite sets of integers such that $\forall x \in F, x \le h$ and $\forall x \in G, h \le x \le h+r$. Then the set $L_{F,G,r} = \{ a^{k\,r+x} \mid x\in G, k\in\mathbb{N} \}$ is a regular language: it is the language recognized by the DFA described above. A regular expression that describes this language is $a^F \mid a^{G} (a^r)^*$.
To summarize in English, the length sets of regular languages are the sets of integers that are periodic¹ above a certain value.
¹ To hang on to a well-established notion, periodic means the characteristic function of the set (which is a function $\mathbb{N}\to\{\mathtt{false},\mathtt{true}\}$ which we lift to a function $\mathbb{Z}\to\{\mathtt{false},\mathtt{true}\}$) is periodic. Periodic above a certain value means that the function restricted to $[h,+\infty[$ can be prolonged into a periodic function.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/164
0 comments:
Post a Comment
Let us know your responses and feedback