Sergiy Illichevskyy
Postgraduate student of
the Taras Shevchenko National University of Kyiv, Ukraine
Bayesian Network as a Tool for Modeling of Insurance Companies
Bayesian networks (BNs) became extremely popular
models in the last decade. They have been used for applications in various
areas, such as machine learning, text mining, natural language processing,
speech recognition, signal processing, bioinformatics, error-control codes,
medical diagnosis, weather forecasting, and cellular networks. The name BNs
might be misleading. Although the use of Bayesian statistics in conjunction
with BN provides an efficient approach for avoiding data over-fitting, the use
of BN models does not necessarily imply a commitment to Bayesian statistics. In
fact, practitioners often follow frequentists’ methods to estimate the
parameters of the BN.
On the other hand, in a general form of the graph, the
nodes can represent not only random variables but also hypotheses, beliefs, and
latent variables. Such a structure is intuitively appealing and convenient for
the representation of both causal and probabilistic semantics. This structure
is ideal for combining prior knowledge, which often comes in causal form, and
observed data. BN can be used, even in the case of missing data, to learn the
causal relationships and gain an understanding of the various problem domains
and to predict future events.
To
compensate for zero occurrences of some sequences in the training dataset, one
can use appropriate (mixtures of) conjugate prior distributions, prior for the
multinomial case as in the above backache example or the prior for the Gaussian
case. Such an approach results in a maximum a posteriori estimate and is also known as the equivalent sample size (ESS) method. In general, the other learning
cases are computationally intractable. In the second case with known structure
and partial observability, one can use the EM (expectation maximization) algorithm to find a locally optimal
maximum-likelihood estimate of the parameters. MCMC is an alternative approach
that has been used to estimate the parameters of the BN model. In the third
case, the goal is to learn a DAG that best explains the data. This is an
NP-hard problem, since the number of DAGs on N variables is superexponential in N. One approach is to proceed with the simplest assumption that
the variables are conditionally independent given a class, which is represented
by a single common parent node to all the variable nodes.
This
structure corresponds to the naive BN,
which surprisingly is found to provide reasonably good results in some
practical problems. To compute the Bayesian score in the fourth case with
partial observability and unknown graph structure, one has to marginalize out
the hidden nodes as well as the parameters. Since this is usually intractable,
it is common to use an asymptotic approximation to the posterior called Bayesian information criterion (BIC)
also known as the minimum description length
(MDL) approach. In this case one considers the trade-off effects between
the likelihood term nd a penalty term
associated with the model complexity. An alternative approach is to conduct
local search steps inside of the M step of the EM algorithm, known as structural EM, that presumably
converges to a local maximum of the BIC score.
It
is well known that classic machine learning methods like Hidden Markov models
(HMMs), neural networks, and
Kalman filters can be considered as special cases of BNs. Specific types of BN
models were developed to address stochastic processes, known as dynamic BN, and counterfactual
information, known as functional BN.
The
authors introduce the variable-order
Bayesian network (VOBN) model as an extension of the position weight matrix (PWM) model,
the fixed-order Markov model (MM)
including HMMs, the variable order
Markov (VOM) model, and the BN model. The PWM model is presumably the
simplest and the most common context-independent model for DNA sequence
classification. The basic assumption of the PWM model is that the random
variables (e.g., nucleotides at different positions of the sequence) are
statistically independent. Since this model has no memory it can be regarded as
a fixed-order MM of order 0. In contrast, higher fixed-order models, such as
MMs, HMMs, and interpolated MMs, rely on the statistical dependencies within
the data to indicate repeating motifs in the sequence. VOM models stand in
between the above two types of models with respect to the number of model
parameters. In fact, VOM models do not ignore statistical dependencies between variables
in the sequence, yet, they take into account only those dependencies that are
statistically significant. In contrast to fixed-order MMs, where the order is
the same for all positions and for all contexts, in VOM models the order may
vary for each position, based on its contexts. Unlike the VOM models, which are
homogeneous and which allow statistical dependences only between adjacent
variables in the sequence, VOBN models are inhomogeneous and allow statistical
dependences between nonadjacent positions in a manner similar to BN models.
Yet,
as opposed to BN models, where the order of the model at a given node depends
only on the size of the set of its parents, in VOBN models the order also
depends on the context, i.e. on the specific observed realization in each set
of parents. As a result, the number of parameters that need to be estimated in
VOBN models is potentially smaller than in BN models, yielding a smaller chance
for overfitting of the VOBN model to the training dataset. Context-specific BNs
are closely related to, yet constructed differently from, the VOBN models. To
summarize, the VOBN model can be regarded as an extension of PWM, fixed-order
Markov. If statistical dependencies exist only between adjacent positions in
the sequence and the memory length is identical for all contexts, the VOBN
model degenerates to an inhomogeneous fixed-order MM.
References
1.
Boutilier, C., Friedman, N., Goldszmidt,
M. & Koller, D. (1996). Context specific independence in Bayesian networks,
in Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence,
Portland, August1–4 1996, pp.115–123.
2.
Friedman,N.,Geiger,D.&Goldszmidt,M.(1997).
Bayesian network classifiers, Machine Learning 29,131–163.
3.
Spirtes,P.,Glymour,C.&Schienes,R.(1993).
Causation Prediction and Search, SpringerVerlag, NewYork.