World's most popular travel blog for travel bloggers.

[Solved]: Correct term for “string consisting of words”

, , No Comments
Problem Detail: 

In a paper I am writing I want to make distinction between (1) string consisting of any characters and (2) string consisting of a chain of words from known language, with possible delimiters. My intuitive idea is to simply use string for meaning (1) and text for meaning (2). It sounds a bit naive, but this terminology could work, given that I define it properly in my paper.

Yet I have an uneasy feeling that meaning (2) has fancier name in fields of computer science or computational linguistics. So, what are the precise terms to make distinction between the two types of strings?

UPDATE

Suppose we have an alphabet Σ = {a, b, c, ~}, where ~ is a delimiter symbol, and language L = {aaa, bbb, abc}.

Now, the following strings satisfy definition (1), but not (2):

  • cba
  • a
  • aaaa
  • a~b~~

And the following strings would satisfy both definitions (because they are made of the words of language L).

  • aaabbbabc
  • abc
  • aaa~bbb~aaa~~~aaa
  • ~
  • (an empty string)

In some applications my strings could be actual text in a human language like English, Lithuanian or Esperanto. But this is not required. It could also be a DNA chain, a binary file, or anything else. Also keep in mind, that in practical applications the strings would most likely be long (like a journal article, or entire corpus for that matter), so calling it a "sentence" would be a bit of understatement. Meaning of the text is entirely irrelevant here.

So, regarding definition (1) all is clear - I just call it a string on alphabet Σ. Now the core question is this: what do I call the strings from the second example to make them distinct from the first example. My initial idea is to call it a "text". One of the answers proposed "word string", which I like even better. Maybe you have seen other terms being used for such purpose in the literature?

It might seem that I'm in extreme hair splitting mode here. Yet that term will be all over my PhD thesis, very likely including the title. Therefore I really want to get my terminology straight.

Asked By : Vilius Normantas

Answered By : babou

If definition (1) is intended for any sequence of characters, I would simply call it string as you suggest, but I would call it word or lexeme if it is intended to be words of a language.

Regarding definition (2), it depends again on what you are expecting to consider. If it is any sequence of words, usually meaningless, with a variety of separators, the name text would do fine, and I would not worry too much about computational linguistic since the only meaningless piece of text that matters in CL is "Colorless green ideas sleep furiously".

If it is actually intended to be a sentence of a language, then you might call it sentence. I feel that text would rather be used for larger pieces of discourse. You should be careful though that is is not confused with sentence meaning a string of words. Speech processing people may speak of utterance, but it may be inappropriate for your use. They use also sentence, which they structure into word lattice when the separators are not clearly identified, which amounts to a word sequence or word string if they are clearly identified.

This disctinction may also depend on whether your separators are one or many, and whether they have a role.

In other words, it is hard to give you a precise answer without more details on what you are doing. I first tried, and then realized that it led me to make unwarranted assumptions about what you are doing.

The one thing that is really important is that you are very clear about your definitions. And if you can motivate your terminology choices, that may help the reader. I am still wondering why the borogoves had to be so mimsy.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/27984

3.2K people like this

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback