*99606*
Sergiy Illichevskyy
Postgraduate student of
the Taras Shevchenko National University of Kyiv, Ukraine
The Modeling of the Insurance Company by Bayesian Networks
Bayesian
networks (BNs), also known as belief
networks (or Bayes nets for short), belong to the family of
probabilistic graphical models (GMs).
These graphical structures are used to represent knowledge about an uncertain
domain. In particular, each node in the graph represents a random variable,
while the edges between the nodes represent probabilistic dependencies among
the corresponding random variables.
These
conditional dependencies in the graph are often estimated by using known
statistical and computational methods. Hence, BNs combine principles from graph
theory, probability theory, computer
science, and statistics. GMs with undirected
edges are generally called Markov
random fields or Markov
networks. These networks provide a simple definition of independence
between any two distinct nodes based on the concept of a Markov blanket. Markov networks are
popular in fields such as statistical physics and computer vision. BNs
correspond to another GM structure known as a directed acyclic graph (DAG) that is popular in the statistics,
the machine learning, and the artificial intelligence societies. BNs are both
mathematically rigorous and intuitively understandable. They enable an
effective representation and computation of the joint probability distribution
(JPD) over a set of random variables.
The
structure of a DAG is defined by two sets: the set of nodes (vertices) and the
set of directed edges. The nodes represent random variables and are drawn as
circles labeled by the variable names. The edges represent direct dependence
among the variables and are drawn by arrows between nodes. In particular, an
edge from node Xi to node Xj represents a statistical
dependence between the corresponding variables. Thus, the arrow indicates that
a value taken by variable Xj depends
on the value taken by variable Xi ,
or roughly speaking that variable Xi “influences”
Xj . Node Xi is then referred to as a parent of Xj and, similarly, Xj is
referred to as the child of Xi. An extension of these
genealogical terms is often used to define the sets of “descendants” – the set
of nodes that can be reached on a direct path from the node, or “ancestor”
nodes – the set of nodes from which the node can be reached on a direct path.
The structure of the acyclic graph guarantees that there is no node that can be
its own ancestor or its own descendent. Such a condition is of vital importance
to the factorization of the joint probability of a collection of nodes as seen
below.
Note
that although the arrows represent direct causal connection between the
variables, the reasoning process can
operate on BNs by propagating information in any direction. A BN reflects a
simple conditional independence statement. Namely that each variable is
independent of its non-descendents in the graph given the state of its parents.
This property is used to reduce, sometimes significantly, the number of
parameters that are required to characterize the JPD of the variables. This
reduction provides an efficient way to compute the posterior probabilities
given the evidence. In addition to the DAG structure, which is often considered
as the “qualitative” part of the model, one needs to specify the “quantitative”
parameters of the model. The parameters are described in a manner which is
consistent with a Markovian property, where the conditional probability
distribution (CPD) at each node depends only on its parents.
For
discrete random variables, this conditional probability is often represented by
a table, listing the local probability that a child node takes on each of the
feasible values – for each combination of values of its parents. The joint distribution
of a collection of variables can be determined uniquely by these local
conditional probability tables (CPTs). Following the above discussion, a more
formal definition of a BN can be given. A Bayesian network B is an annotated acyclic graph that
represents a JPD over a set of random variables. The graph encodes independence
assumptions, by which each variable Xi
is independent of its nondescendents given its parents in G. The second component denotes the
set of parameters of the network. For simplicity of representation we omit the
subscript B henceforth. If Xi has no parents, its local probability distribution is said to
be unconditional, otherwise it
is conditional. If the variable
represented by a node is observed,
then the node is said to be an evidence node, otherwise the node is said to be
hidden or latent. Consider the following example that illustrates some of the
characteristics of BNs. The example shown in Figure 1 has a similar structure
to the classical “earthquake” example in Pearl. It considers a person who might
suffer from a back injury, an event represented by the variable Back (denoted by B). Such an injury
can cause a backache, an event represented by the variable Ache (denoted by A).
The
back injury might result from a wrong sport activity, represented by the
variable Sport (denoted by S) or from new uncomfortable chairs installed at the
person’s office, represented by the variable Chair (denoted by C). In the
latter case, it is reasonable to assume that a co-worker will suffer and report
a similar backache syndrome, an event represented by the variable Worker
(denoted by W). All variables are binary; thus, they are either true (denoted
by “T”) or false (denoted by “F”).
Such
an approach results in a maximum a posteriori estimate and is also known as the
equivalent sample size (ESS) method. In general, the other learning cases are
computationally intractable. In the second case with known structure and
partial observability, one can use the EM (expectation maximization) algorithm to
find a locally optimal maximum-likelihood estimate of the parameters [4]. MCMC
is an alternative approach that has been used to estimate the parameters.
References
1.
Aksoy ,S. (2006). Parametric Models:
Bayesian Belief Networks, Lecture Notes, Department of Computer Engineering
Bilkent University, available at http:// www.cs.bilkent.edu.tr/∼saksoy/courses/cs551/slides/cs551
parametric4.pdf.
2.
Boutilier, C., Friedman, N.,
Goldszmidt, M. & Koller, D. (1996). Context specific independence in
Bayesian networks, in Proceedings of the 12th Conference on Uncertainty in
Artificial Intelligence, Portland, August1–4 1996, pp.115–123.
3.
Friedman, N. & Goldszmidt, M.
(1996).Learning Bayesian networks with local structure, in Proceedings of the
12th Conference on Uncertainty in Artificial Intelligence, Portland, August1–4
1996. Artificial