World's most popular travel blog for travel bloggers.

# [Solved]: How similar are two DFAs? -not just binary equivalence-

, ,
Problem Detail:

Are there any measures to compute similarity (or distance) between two DFAs?

If yes, which are the main references?

I need a measure of similarity, not only a (binary) equivalence test.

"Similarity" is an intuitive concept. There are different ways to formalize it; i'm looking for ready-to-use formalizations with regard to DFAs (not NFAs, weighted automata, or general graphs!). Furthermore, good solutions may exploit recognized languages.

I found some studies about this problem, which help to clarify the question. They involve two approaches: one from model-testing field aiming at compare languages, a second defined by authors aiming at compare structure of DFAs.

• Bogdanov, Kirill, and Neil Walkinshaw. "Computing the structural difference between state-based models." Reverse Engineering, 2009. WCRE'09. 16th Working Conference on. IEEE, 2009.

• Walkinshaw, Neil, Kirill Bogdanov, and Ken Johnson. "Evaluation and comparison of inferred regular grammars." Grammatical Inference: Algorithms and Applications. Springer Berlin Heidelberg, 2008. 252-265.

• Walkinshaw, Neil, and Kirill Bogdanov. "Automated comparison of state-based software models in terms of their language and structure." ACM Transactions on Software Engineering and Methodology (TOSEM) 22.2 (2013): 13.

They compute exactly what I was looking for, a metrics among DFAs, adopting the techniques developed in the model-testing field.

Does there exist any other approaches, ready-to-use, to compute similarity between DFAs? (Both in terms of language both of structure).

This problem seems to ask for co-development of a reasonable distance function and an algorithm to compute it for a given pair of languages.

The paper "The Cost of Traveling between Languages" by Michael Benedikt, Gabriele Puppis, and Cristian Riveros defines one such notion, where we are searching for the largest value $d$ such that there exists a word $w$ in the language of the first automaton such that the edit distance of $w$ to any word in the second language is at least $|w| \cdot d$. The paper gives an algorithm to solve this problem, which involves distance automata.

The paper also has an interesting related work section, with a reference to the paper "Edit-distance of weighted automata: general definitions and algorithms" by Mohri, which seems to solve the same problem without the multiplication with the word length.

Both of these definitions are reasonable from a theoretical point of view - you now only have to evaluate whether they make sense for your particular application.

Both of the paper have author-archived versions available without pay-wall on the web which you should easily find with your favorite search engine.