O.G.Berestneva, D.V. Devjatyh, V.V. Parubets
Tomsk
Polytechnic University, Russia
USING NVIDIA CUDA TECHNOLOGY FOR
NEURAL NETWORKS OPTIMIZATION
E-mail: ogb6@yandex.ru
Constantly
developing technologies are inevitably seeking for spheres of live to be
applied to. Requirements
of modern medicine lead to such kind of collaboration with any known science.
As for mathematics and computer science, methods of data analysis that these
branches of science possess are suitable for solving diagnostics issues. Generally
speaking all the problems solved by
man, from the point of view of computer technologies technologies can be
classified into two groups:
1. Tasks that provide specific set of conditions that let getting a clear,
precise answer by using particular algorithm.
2. Tasks that don’t let taking into considerations all circumstances
impacting on answer. These features lead to using approximate input of the most
influential data. Due to general approach the answer is fuzzy.
In order to solve tasks
from the first group it is possible to use classical methods of data analysis.
No matter how complex the algorithm is, determined amount in input parameters
still let achieving precise answer.
The thing is that
medicine faces problems that are representatives from the second group, for
example identification of the nature and cause of disease or assisting doctors
in the interpretation of medical images. These factors anticipated appearing of
such technology as Computer Aided Diagnosis (CAD). It uses various algorithms in
particular Artificial Neural Networks (ANN) that are commonly used for
performing classifying or clustering processes. Known limitations of neural network algorithms include high
computational cost of implementing such methods [3]. Traditional methods of
solving this problem include organizing parallel and distributed computing
using hardware.
The peculiarity of the
equipment that supports the CUDA (Compute Unified Device Architecture) is
unified architecture that provides
higher order of magnitude of the memory bandwidth [4].
Starting eighth series,
NVIDIA graphic accelerators got implementation with parallel computing
architecture CUDA, that provides specialized programming interface for
non-graphics calculations[ 5].
CUDA supported GPU can
be considered as a set of multi-core processors. Basic computational units of
GPUs are multiprocessors that consist of eight cores, a few thousand 32-bit
registers, 16 Kbytes of total memory, texture and constant caches [6].
We select the core stream processing, running on the video card,
transforming the computing scheme in such a way that we will gain ability to
operate with the desired size during the data processing and also provide the
processor elements utilization that could hide a delay while accessing the global
memory card. This
will also speed up processing by a particular video duplication, as a vector
computing on graphics card.
Thus, the learning of
network is following:
1. signal is fed to the
hidden layer
2. the output signal
strength
3. elements of the
matrix at new step of the network settings
Representing the
solution of the problem of diagnosis, as a solution to the problem of
classification, created two artificial neural networks.
The first neural network
was created using NeuroPro and the package was a multilayer perceptron with the
following parameters:
The second neural
network had the same characteristics, but was implemented with the optimized
learning algorithms, and network performance in a threaded form using CUDA.
References
1. Галушкин А.И. О методике
решения задач в нейросетевом логическом базисе
// Нейрокомпьютер.- 2006.- №2.- С.49-70.
2. Мызников А.В., Россиев
Д.А., Лохман В.Ф. Нейросетевая экспертная система для оптимизации лечения
облитерирующего тромбангиита и прогнозирования его непосредственных исходов //
Ангиология и сосудистая хирургия.- 1995.- №2.- С.100.
3. Горбань А.Н., Россиев
Д.А. Нейронные сети на персональном компьютере.- Новосибирск: Наука.- 1996.-
276с.
4. Harris M. Mapping
Computational Concepts to GPUs. // GPU Gems 2.-Addison Wesley.- 2006.-
С.493-508.
5. J. D. Hall, N. A. Carr,
and J. C. Hart. Cache and Bandwidth Aware Matrix Multiplication on the GPU.
Technical Report. -UIUCDCS-R-2003-2328, URL:
http://graphics.cs.uiuc.edu/~jch/papers/UIUCDCS-R-2003-2328.pdf
6. Horn D. Stream Reduction
Operations for GPGPU Applications. // GPU Gems 2. -Addison Wesley.- 2006.- С.
573-589
7. Хайкин С, Нейронные сети
полный курс, 2-е изд.,
испр.: пер. с
англ. – М.:ООО «И.Д. Вильямс», 2006. 1104 с.
8. Лазарев И.А., Русаков
В.И., Ткачева Т.Н. Иммунопатологические изменения у больных язвенной болезнью
желудка и двенадцатиперстной кишки и их имму-нокоррекция // 4 Всесоюзный Съезд
гастроэнтерологов.: Тез. докл. - Москва, Ле-нинград, 1990.- Т.1.- С.702-703.
9. А.Н.Горбань,
В.Л.Дунин-Барковский, А.Н.Кардин «Нейроинформатика» Отв. Ред. Новиков Е.А.,
РАН, Сиб. Отд., Институт выч. Моделирования – Новосибирск: Наука, 1998.
10. NVIDIA CUDA Programming Guide Version 2.3.1 // NVIDIA – World Leader in
Visual Computing Technologies. 2011. URL:
http://www.nvidia.com/object/cuda_develop.html