I understand the theory behind Bayesian networks, and am wondering what it takes to build one in practice. Let's say for this example, that I have a Bayesian (directed) network of 100 discrete random variables; each variable can take one of up to 10 values.
Do I store all the nodes in a DAG, and for each node store its Conditional Probability Table (CPT)? Are there other data structures I should make use of to ensure efficient computation of values when some CPTs change (apart from those used by a DAG)?
Asked By : ash
Answered By : lot
The "best" data structure probably depends on what particular problem you're trying to solve with it. Here's one approach that I've seen (and have used myself), which simply stores all the information and leaves it up to the algorithm what to do with it.
First you index the nodes by unique integers, 0 through n-1. Then you simply store, for each node, the list of its parents as an array of integers --- in C++, for instance, you could have a
std::vector<std::vector<int> >
: first vector over nodes, second vector lists the respective parents). That captures the entire DAG structure.Furthermore, since each node has exactly one conditional probability table associated with it, you could index those with the same integer IDs. For each probability table you need to store its scope, i.e. the set of random variables its defined over. Secondly you'd probably have one large list of floating point numbers that contains the actual conditional probabilities (and you'll want to make sure you get the indexing right). To give a C++ example again, something like this could do:
struct CondProbTable { std::vector<int> scope; // list of random variables the CPT is defined over std::vector<double> table; // appropriately sized and indexed table of // conditional probabilities };
With that, you could use a
std::vector<CondProbTable>
to store all your CPTs.
Again, this basically only stores the Bayes net, it does not assume anything about what you want to do with it. Including the CPT scope in CondProbTable is somewhat redundant, since it could be inferred from the list of parent nodes described under point 1.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/580
0 comments:
Post a Comment
Let us know your responses and feedback