Ñîâðåìåííûå èíôîðìàöèîííûå òåõíîëîãèè/Âû÷èñëèòåëüíàÿ òåõíèêà è ïðîãðàììèðîâàíèå

Êðàâ÷óê Â.Ï.,Ìÿñ³ùåâ Î.À.

Õìåëüíèöüêèé íàö³îíàëüíèé óí³âåðñèòåò

Äîñë³äæåííÿ åôåêòèâíîñò³ êîìï³ëÿòîð³â òà á³áë³îòåê Intel ó ïàðàëåëüíèõ îá÷èñëåííÿõ â ãàëóç³ ë³í³éíî¿ àëãåáðè.

 

         Ç ðîçâèòêîì íàóêè ç'ÿâëÿºòüñÿ âñå á³ëüøå çàäà÷, ùî ïîòðåáóþòü ³íòåíñèâíèõ îá÷èñëåíü äëÿ ¿õ âèð³øåííÿ. Îäí³ºþ ç íàéá³ëüø øèðîêî çóñòð³÷àþ÷èõñÿ çàäà÷ ë³í³éíî¿ àëãåáðè º çàäà÷à âèð³øåííÿ ñèñòåìè ë³í³éíèõ ð³âíÿíü, ÿêèìè ìîæíà îïèñàòè äåÿê³ ïðîöåñè. Öÿ çàäà÷à ìຠê³ëüêà ï³äõîä³â äî âèð³øåííÿ, àëå ò³ ç ï³äõîä³â, ÿê³ ìîæóòü áóòè àâòîìàòèçîâàí³, º äîñèòü ñõîæèìè, à àëãîðèòìè ¿õ â³äîì³. Çâè÷àéíî, îäíèì ç îñíîâíèõ ïèòàíü, ùî ïîñòຠïðè äîñë³äæåíí³ ð³çíèõ øëÿõ³â âèð³øåííÿ çàäà÷³ º ïèòàííÿ åôåêòèâíîñò³, òîáòî øâèäêîñò³ îá÷èñëåíü.

         Øâèäê³ñòü âèð³øåííÿ çàäà÷ çàëåæèòü â³ä íàáîðó ôàêòîð³â, ÿê³ ìîæíà ðîçä³ëèòè íà äâ³ ãðóïè: åôåêòèâí³ñòü òåõí³÷íèõ çàñîá³â òà åôåêòèâí³ñòü ïðîãðàìíîãî çàáåçïå÷åííÿ. Ö³ äâà ôàêòîðè ò³ñíî ïîâ'ÿçàí³ îäíå ç îäíèì, àäæå íàëåæíå âèêîðèñòàííÿ òåõí³÷íèõ çàñîá³â çàáåçïå÷óºòüñÿ ñàìå ïðîãðàìíîþ ÷àñòèíîþ.  òîé æå ÷àñ, â êîìïåòåíö³¿ ðîçðîáíèêà º ëèøå ïðîãðàìí³ ³íñòðóìåíòè.

         Îñíîâí³ àñïåêòè ïðîãðàìíîãî çàáåçïå÷åííÿ, ùî âïëèâàþòü íà øâèäê³ñòü îá÷èñëåíü:

1.Êîìï³ëÿòîð, ùî âèêîðèñòîâóºòüñÿ.

2.Âèêîðèñòàííÿ ñïåö³àëüíèõ á³áë³îòåê, ÿê³ñòü âèõ³äíîãî êîäó ïðîãðàìè, âèêîðèñòàííÿ ñïåö³àëüíèõ á³áë³îòåê.

3.Çàñòîñóâàííÿ ïàðàëåë³çìó.

         Ñàìå äîñë³äæåííþ ìîæëèâîñò³ ïðèøâèäøåííÿ îá÷èñëåíü ñèñòåì ë³í³éíèõ ð³âíÿíü çà äîïîìîãîþ íàéîïòèìàëüí³øîãî âèêîðèñòàííÿ ïðîãðàìíèõ çàñîá³â ³ ïðèñâÿ÷åíà äàíà ðîáîòà.

 

Âèá³ð êîìï³ëÿòîðà.

         ßê â³äîìî, êîìï³ëÿòîð ïðèçíà÷åíèé, ãðóáî êàæó÷è, äëÿ ïåðåòâîðåííÿ ïðîãðàìè, íàïèñàíî¿ íà äåÿê³é ìîâ³ ïðîãðàìóâàííÿ âèñîêîãî ð³âíÿ ó ¿¿ åêâ³âàëåíò â ìàøèííîìó êîä³, ÿêèé ìîæå áóòè âèêîíàíèé áåçïîñåðåäíüî öåíòðàëüíèì ïðîöåñîðîì.

         ³äïîâ³äíî, ïåðåòâîðèòè âèõ³äíèé êîä ó âèêîíóâàíèé ìîæíà ð³çíèìè øëÿõàìè  ³ ðåçóëüòàò ç òî÷êè çîðó øâèäêî䳿 òàêîæ áóäå ð³çíèì. Âèðîáíèêè êîìï³ëÿòîð³â íàìàãàþòüñÿ âò³ëþâàòè íîâ³ ðîçðîáêè, ùî äîçâîëÿþòü îïòèì³çóâàòè âèêîíàííÿ, çàñòîñîâóþòü òåõíîëî㳿, ùî çäàòí³ ïðèøâèäøèòè âèêîíàííÿ ïðîãðàìè çà ðàõóíîê âèêîðèñòàííÿ ñïåö³àëüíèõ ³íñòðóêö³¿ ïðîöåñîð³â, áàãàòîïîòî÷íîñò³, òîùî.

         Îïåðàö³éí³ ñèñòåìè ñ³ìåéñòâà Linux äîñèòü øèðîêî âèêîðèñòîâóþòüñÿ äëÿ âèêîíàííÿ îá÷èñëåíü ð³çíèìè îðãàí³çàö³ÿìè, çàâäÿêè â³äêðèòîñò³ êîäó, áåçêîøòîâíîñò³ òà íàÿâíèõ ìîæëèâîñòåé äëÿ ñòâîðåííÿ âèñîêîïðîäóêòèâíèõ îá÷èñëþâàëüíèõ êëàñòåð³â.

          ÿêîñò³ ìîâè ïðîãðàìóâàííÿ, ùî âèêîðèñòîâóºòüñÿ äëÿ íàïèñàííÿ ñïåöèô³÷íèõ ïðîãðàì, ùî âèêîíóþòü îá÷èñëåííÿ â ãàëóç³ ë³í³éíî¿ àëãåáðè, íàáóëè øèðîêîãî âæèòêó Fortran òà C++. Íå äèâëÿ÷èñü íà òå, ùî ïåðøèé êîìï³ëÿòîð Fortran áóâ ðîçðîáëåíèé ó 1957 ðîö³, öÿ ìîâà ïðîäîâæóº ñëóæèòè ³íæåíåðíèì òà ìàòåìàòè÷íèì çàäà÷àì á³ëüø í³æ ï’ÿòäåñÿò ðîê³â ïîòîìó. C++ º íàáàãàòî íîâ³øîþ òà á³ëüø øèðîêî âèêîðèñòîâóâàíîþ ìîâîþ, ùî ïðîäîâæóº àêòèâíî ðîçâèâàòèñü.

         Ñåðåä êîìï³ëÿòîð³â Fortran äëÿ Linux º äâà îñíîâíèõ – GNU Fortran (gfortran) òà Intel Fortran. Îñíîâíèìè êîìï³ëÿòîðàìè C++ òàêîæ º êîìï³ëÿòîð GNU ùî âõîäèòü äî íàáîðó GCC òà Intel C++ Compiler.

         Êîìï³ëÿòîðè GNU º áåçêîøòîâíèìè òà ðîçïîâñþäæóºòüñÿ ï³ä ë³öåí糺þ GNU General Public License.

         Êîìï³ëÿòîðè Intel º êîìåðö³éíèìè ïðîäóêòàìè, ïðîòå º áåçêîøòîâíà âåðñ³ÿ êîìï³ëÿòîð³â ï³ä Linux äëÿ íåêîìåðö³éíîãî âèêîðèñòàííÿ. Êîìï³ëÿòîðè Intel º ö³êàâèìè ç ò³º¿ òî÷êè çîðó, ùî Intel îäíî÷àñíî º âèðîáíèêîì ïðîöåñîð³â òà íàìàãàºòüñÿ ï³äâèùèòè øâèäêîä³þ ñêîìï³ëüîâàíèõ ïðîãðàì íà ïðîöåñîðàõ ñâîãî âèðîáíèöòâà çà ðàõóíîê âèêîðèñòàííÿ ñïåö³àëüíèõ ³íñòðóêö³é.

 

Âèêîðèñòàííÿ ñïåö³àëüíèõ á³áë³îòåê.

         Íà äàíèé ìîìåíò â ïðîãðàìóâàíí³ â ö³ëîìó ïðîäîâæóº ðîçâèâàòèñü òåíäåíö³ÿ äî âèêîðèñòàííÿ á³áë³îòåê òà ôðåéìâîðê³â, ÿê³ ì³ñòÿòü íàáîðè êîäó äëÿ âèð³øåííÿ íàéá³ëüø ðîçïîâñþäæåíèõ çàäà÷. Öåé ï³äõ³ä â ö³ëîìó º âèïðàâäàíèì, òàê ÿê äîçâîëÿº çàîùàäèòè ÷àñ, âèòðà÷åíèé íà ðîçðîáêó ïðîãðàìè ³ ïðè öüîìó ÷àñòî äຠïðèð³ñò ó øâèäêî䳿 çà ðàõóíîê òîãî, ùî ôóíêö³¿ â á³áë³îòåêàõ ïîñò³éíî âäîñêîíàëþþòüñÿ, à íàä íèìè ïðàöþþòü êîìàíäè äîñâ³ä÷åíèõ ðîçðîáíèê³â. Òîìó, êîëè çàäà÷ó ìîæíà âèð³øèòè çà äîïîìîãîþ ôóíêö³é ç ãîòîâèõ á³áë³îòåê, òî öå ÷àñòî º á³ëüø äîö³ëüíèì àí³æ âëàñíîðó÷íå ñòâîðåííÿ ïðîãðàìè «ç íóëÿ».

         Äëÿ êîìï³ëÿòîð³â GNU, îñíîâíîþ á³áë³îòåêîþ, ùî ðåàë³çóº îïåðàö³¿ ë³í³éíî¿ àëãåáðè º á³áë³îòåêà LAPACK ³ ¿¿ âåðñ³ÿ äëÿ ñèñòåì ç ðîçïîä³ëåíîþ ïàì'ÿòòþ ScaLAPACK. Òàêîæ ³ñíóº á³áë³îòåêà PARDISO, ùî ïðèçíà÷åíà êîíêðåòíî äëÿ âèð³øåííÿ ñèñòåì ë³í³éíèõ ð³âíÿíü ³ äຠíàéêðàù³ ðåçóëüòàòè ïðè âèð³øåíí³ ñèñòåì, â ÿêèõ ìàòðèöÿ êîåô³ö³ºíò³â ì³ñòèòü äîñòàòíüî íóë³â. Êîä á³áë³îòåêè PARDISO º çàêðèòèì, ïðîòå º ìîæëèâ³ñòü áåçêîøòîâíî îòðèìàòè ë³öåíç³þ äëÿ àêàäåì³÷íîãî êîðèñòóâàííÿ.

         Äëÿ êîìï³ëÿòîð³â Intel ³ñíóº âëàñíà ðîçðîáêà — ³íòåãðîâàíà á³áë³îòåêà  MKL. Ïðè öüîìó, á³áë³îòåêà äîñòóïíà äëÿ íåêîìåðö³éíîãî âèêîðèñòàííÿ íà òàêèõ æå óìîâàõ, ùî ³ êîìï³ëÿòîð. Äî ñêëàäó MKL âõîäÿòü BLAS, LAPACK, ScaLAPACK, PARDISO òà äîäàòêîâ³ çàñîáè äëÿ âèêîíàííÿ ìàòåìàòè÷íèõ îïåðàö³é, ïðè ÷îìó âèùåçãàäàí³ á³áë³îòåêè âæå ìîäèô³êîâàí³ äëÿ íàäàííÿ îïòèìàëüíî¿ øâèäêî䳿, áàãàòî ç ôóíêö³é ó íèõ âæå âèêîðèñòîâóþòü ïàðàëåë³çì òà îïòèì³çîâàí³ äëÿ âèêîðèñòàííÿ ç ïðîöåñîðàìè Intel. MKL ìຠâáóäîâàíó ï³äòðèìêó OpenMP ³ âäàëî ³íòåãðóºòüñÿ ç òàêèìè IDE ÿê Eclipse, Microsoft Visual Studio, XCode.

 

Çàñòîñóâàííÿ ïàðàëåë³çìó.

         Áåçïåðå÷íî, íàéêðàù³ ïîêàçíèêè øâèäêî䳿 ìîæóòü áóòè äîñÿãíóò³ ëèøå ïðè ïîâíîìó âèêîðèñòàíí³ íàÿâíèõ â ñèñòåì³ ðåñóðñ³â, à ñàìå ÿäåð ïðîöåñîðà êîëè éäå ìîâà ïðî îäíó îá÷èñëþâàëüíó ìàøèíó ç áàãàòîÿäåðíèì ïðîöåñîðîì.

         Á³áë³îòåêà MKL àâòîìàòè÷íî íàìàãàºòüñÿ çàñòîñîâóâàòè ïàðàëåë³çì ³ ñàìîñò³éíî çàâàíòàæóâàòè íàÿâí³ â ñèñòåì³ ïðîöåñîðè òà ¿õ ÿäðà. Ó ðàç³ æ â³äìîâè â³ä âèêîðèñòàííÿ ãîòîâèõ á³áë³îòåê ïàðàëåë³çì ìîæíà äîñòèãíóòè çà äîïîìîãîþ MPI òà OpenMP.

 

Ðîçâ'ÿçóâàííÿ ñèñòåìè ë³í³éíèõ ð³âíÿíü ç âèêîðèñòàííÿì êîìï³ëÿòîðà Intel C++ òà á³áë³îòåêè PARDISO.

         Ðîçãëÿíåìî âàð³àíò ðîçâ'ÿçàííÿ ñèñòåìè ë³í³éíèõ àëãåáðà¿÷íèõ ð³âíÿíü çà äîïîìîãîþ á³áë³îòåêè PARDISO òà êîìï³ëÿòîðà Intel C++.

         Ó á³áë³îòåö³ PARDISO ³ñíóº ôóíêö³ÿ pardiso ùî ³ ïðîâîäèòü âèð³øåííÿ ñèñòåìè ð³âíÿíü, à òàêîæ êðîêè, ùî äîïîìàãàþòü îïòèì³çóâàòè ïðîöåñ âèð³øåííÿ øëÿõîì ïîïåðåäíüîãî âïîðÿäêóâàííÿ ìàòðèöü.

         Çàãàëüíèé ñèíòàêñèñ ôóíêö³¿ ìຠíàñòóïíå ïðåäñòàâëåííÿ:

         pardiso (pt, maxfct, mnum, mtype, phase, n, a, ia, ja, perm, nrhs, iparm, msglvl, b, x, error).

         Âàæëèâîþ äåòàëëþ º òå, ùî çíà÷åííÿ iparm[12] çà çàìîâ÷óâàííÿì ð³âíå 1 äëÿ íåñèìåòðè÷íèõ ìàòðèöü, àëå ÿê ñòàëî â³äîìî â ðåçóëüòàò³ äîñë³äæåíü, öåé ïàðàìåòð íàäçâè÷àéíî íåãàòèâíî âïëèâàâ íà øâèäê³ñòü îá÷èñëåíü çà ðàõóíîê ³íòåíñèâíîãî âèêîðèñòàííÿ ïàì'ÿò³. Çì³íà öüîãî ïàðàìåòðó íà 0 äîïîìãëà çá³ëüøèòè øâèäê³ñòü îá÷èñëåíü ïðèáëèçíî â 10 ðàç.

         Ðîçãëÿíåìî ïðèêëàä ðîçâ'ÿçàííÿ ñèñòåìè ë³í³éíèõ ð³âíÿíü:

         x1 – x2 – 3x4 = 1

         -2x1 + 5x2 = 1

         4x3 + 6x4 + 4x5 = 1

         -4x1 + 2x3 + 7x4 = 1

         8x2 – 5x5 = 1

        

         Ïðè òàêèõ óìîâàõ íåîáõ³äíî çàäàòè íàñòóïí³ âõ³äí³ çì³íí³:

         int n = 5;

         int ia[ 6] = { 1, 4, 6, 9, 12, 14 };

         int ja[13] = { 1, 2, 4, 1, 2, 3, 4, 5, 1, 3, 4, 2, 5 };

         double a[18] = { 1.0, -1.0, -3.0,         -2.0, 5.0, 4.0, 6.0, 4.0, -4.0, 2.0, 7.0,8.0, -5.0};

         int mtype = 11;

         nrhs = 1;

         b[5] = {1, 1, 1, 1, 1};

         iparm[0] = 1; /* No solver default */

         iparm[1] = 2; /* Fill-in reordering from METIS */

         /* Numbers of processors, value of OMP_NUM_THREADS */

         iparm[2] = 2;

         iparm[3] = 0; /* No iterative-direct algorithm */

         iparm[4] = 0; /* No user fill-in reducing permutation */

         iparm[5] = 0; /* Write solution into x */

         iparm[6] = 0; /* Not in use */

         iparm[7] = 2; /* Max numbers of iterative refinement steps */

         iparm[8] = 0; /* Not in use */

         iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */

         iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */

         iparm[11] = 0; /* Not in use */

         iparm[12] = 0; /* Maximum weighted matching algorithm is switched-on (default for non-symmetric) */

         iparm[13] = 0; /* Output: Number of perturbed pivots */

         iparm[14] = 0; /* Not in use */

         iparm[15] = 0; /* Not in use */

         iparm[16] = 0; /* Not in use */

         iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */

         iparm[18] = -1; /* Output: Mflops for LU factorization */

         iparm[19] = 0; /* Output: Numbers of CG Iterations */

         maxfct = 1; /* Maximum number of numerical factorizations. */

         mnum = 1; /* Which factorization to use. */

         msglvl = 1; /* Print statistical information in file */

         error = 0; /* Initialize error flag */

 

         Âèêîíóºìî ïåðåñòàíîâêó, ôàêòîðèçàö³þ, òà îá÷èñëåííÿ:

         phase = 13;

         PARDISO (pt, &maxfct, &mnum, &mtype, &phase,

                   &n, a, ia, ja, &idum, &nrhs,

                   iparm, &msglvl, b, x, &error);

 

          ê³íö³ î÷èùóºìî ïàì'ÿòü:

         phase = -1; /* Release internal memory. */

         PARDISO (pt, &maxfct, &mnum, &mtype, &phase,

                   &n, &ddum, ia, ja, &idum, &nrhs,

                   iparm, &msglvl, &ddum, &ddum, &error);

 

         Â ðåçóëüòàò³ âèêîíàííÿ îòðèìàíî:

         x [0] = -0.522321

         x [1] = -0.008929

         x [2] =  1.220982

         x [3] = -0.504464

         x [4] = -0.214286

 

Ðîçâ'ÿçóâàííÿ ñèñòåìè ë³í³éíèõ ð³âíÿíü ç âèêîðèñòàííÿì êîìï³ëÿòîðà Intel C++ òà á³áë³îòåêè LAPACK.

         Äî ñêëàäó á³áë³îòåêè LAPACK âõîäèòü íàá³ð ôóíêö³é

         ?gesv( int* n, int* nrhs, double* a, int* lda, int* ipiv, double* b, int* ldb, int* info );

          çàëåæíîñò³ â³ä òèïó äàíèõ, íàä ÿêèìè ïðîâîäÿòüñÿ îá÷èñëåííÿ, ï³äñòàâëÿºòüñÿ ïåðøà ë³òåðà â íàçâ³ ôóíêö³¿. Ó íàøîìó âèïàäêó äëÿ ÷èñåë ïîäâ³éíî¿ òî÷íîñò³ ôóíêö³ÿ íàçèâàºòüñÿ dgesv.

         Âàðòî çàóâàæèòè, ùî áåç ñïåö³àë³çîâàíèõ êëþ÷³â êîìï³ëÿòîðà, ïðîãðàìà íå ìîãëà ïðîâîäèòè îá÷èñëåííÿ ñèñòåì ð³âíÿíü ç á³ëüøå í³æ 1000 íåâ³äîìèìè òà ïîâåðòàëà ïîìèëêó segmentation fault. Äëÿ òîãî, ùîá çàïîá³ãòè ö³é ïîìèëö³ íåîáõ³äíî âèêîíàòè êîìàíäó ulimit -s unlimited ïåðåä êîìï³ëÿö³ºþ òà êîìï³ëþâàòè ç êëþ÷åì -std=c99.

         Ïðîãðàìà, ùî âèêîíóº îá÷èñëåííÿ çà äîïîìîãîþ á³áë³îòåêè LAPACK  íà áàç³ çãåíåðîâàíî¿ ñèñòåìè â 1000 ð³âíÿíü ïðèâåäåíà íèæ÷å:

 

#include <stdlib.h>                                                                                                                  

#include <stdio.h>                                                                                                                                                                                                                                         

#include <math.h>                                                                                                                                                                                                                                        

/* DGESV prototype */                                                                                                                 

extern void dgesv( int* n, int* nrhs, double* a, int* lda, int* ipiv,                                                                

                double* b, int* ldb, int* info );                                                                                                                                                                                                                                                                                

/* Parameters */                                                                                                                     

#define N 1000                                                                                                                       

#define NRHS 1                                                                                                                       

#define LDA N                                                                                                                         

#define LDB N                                                                                                                                                                                              

/* Main program */                                                                                                                   

int main() {                                                                                                                          

        /* Locals */                                                                                                                 

        int n = N, nrhs = NRHS, lda = LDA, ldb = LDB, info;                                                                           

        /* Local arrays */                                                                                                           

        int i,j,z;                                                                                                                    

        int *ipiv[N];                                                                                                                                                                                      

        double a[N][N], b[LDB*NRHS], o;                                                                                                                                                                                                                      

        z=0;                                                                                                                                                                                                                                      

        for(i=1; i<(n+1); i++) {                                                                                                     

                for(j=1; j<(n+1); j++) {                                                                                              

                        o = (i+j)/0.89;                                                                                              

                        if(i==j-1) o = 10;                                                                                            

                        if(i==j+1) o = 2;                                                                                                                                                          

                        a[i-1][j-1]=o;                                                                                               

                        z++;                                                                                                         

                }                                                                                                                    

        }                            

       for(i=1; i<(n+1); i++) b[i-1] = (i*21 + 101) / 1.3;                                                                                                                                                                                                              

        dgesv( &n, &nrhs, a, &lda, ipiv, b, &ldb, &info );                                                                                                                                                                                                                                            

        /* Check for the exact singularity */                                                                                         

        if( info > 0 ) {                                                                                                             

                printf( "The diagonal element of the triangular factor of A,\n" );                                                   

                printf( "U(%i,%i) is zero, so that A is singular;\n", info, info );                                                  

                printf( "the solution could not be computed.\n" );                                                                   

                exit( 1 );                                                                                                           

        }                                                                                                                                                                                                 

        exit( 0 );                                                                                                                    

}

 

Ðîçâ'ÿçóâàííÿ ñèñòåìè ë³í³éíèõ ð³âíÿíü ç âèêîðèñòàííÿì êîìï³ëÿòîðà Intel C++ òà ïðîãðàìè, íàïèñàíî¿ ç âèêîðèñòàííÿì OpenMP.

         Êîëè íåìຠìîæëèâîñò³ âèêîðèñòîâóâàòè á³áë³îòåêè ñòîðîíí³õ ðîçðîáíèê³â, ìîæíà âëàñíîðó÷ ñòâîðèòè ïðîãðàìó äëÿ âèð³øåííÿ çàäà÷³. Ðîçïàðàëåëþâàííÿ áóëî âèêîíàíå çà äîïîìîãîþ çàñîá³â ðîçïàðàëåëþâàííÿ öèêëó for â OpenMP. Òåêñò ïðîãðàìè ùî çíàõîäèòü êîðåí³ ñèñòåìè 1000 ð³âíÿíü  íàâåäåíî íèæ÷å:

 

#include <stdlib.h>                                                                                                                  

#include <stdio.h>                                                                                                                    

#include <omp.h>                                                                                                                     

#define ABS(x) (x < 0 ? -(x) : (x))                                                                                                   

#define EPS 0.00001                                                                                                                  

#define TRUE  1                                                                                                                       

#define FALSE 0                                                                                                                      

#define N 1000                                                                                                                                                                                                                             

int GSolve(double **a,int n,double *x)                                                                                                

{                                                                                                                                    

   int i,j,k,maxrow;                                                                                                                  

   double tmp;                                                                                                                                                                                                                                                

   for (i=0;i<n;i++) {                                                                                                               

                                                                                                                                      

      /* Find the row with the largest first value */                                                                                

      maxrow = i;                                                                                                                     

      for (j=i+1;j<n;j++) {                                                                                                          

         if (ABS(a[i][j]) > ABS(a[i][maxrow]))                                                                                        

            maxrow = j;                                                                                                              

      }                                                                                                                                                                                                                                                      

      /* Swap the maxrow and ith row */                                                                                               

      for (k=i;k<n+1;k++) {                                                                                                          

         tmp = a[k][i];                                                                                                               

         a[k][i] = a[k][maxrow];                                                                                                     

         a[k][maxrow] = tmp;                                                                                                         

      }                                                                                                                                                                                                                                                                                                                                                                              

      if (ABS(a[i][i]) < EPS)                                                                                                               return(FALSE);                                                                                                                                                                                                                                                

      /* Eliminate the ith element of the jth row */

#pragma omp parallel for                                                                                 

      for (j=i+1;j<n;j++) {                                                                                                                                                                                                                         

         for (k=n;k>=i;k--) {                                                                                                        

            a[k][j] -= a[k][i] * a[i][j] / a[i][i];                                                                                  

         }                      

      }                                                                                                                               

   }                                                                                                                                                                                                                   

   /* Do the back substitution */                                                                                                    

   for (j=n-1;j>=0;j--) {                                                                                                            

      tmp = 0;                                                                                                                       

      for (k=j+1;k<n;k++)                                                                                                             

         tmp += a[k][j] * x[k];                                                                                                      

      x[j] = (a[n][j] - tmp) / a[j][j];                                                                                               

   }                                                                                                                                                                                                                                                    

   return(TRUE);                                                                                                                     

}                                                                                                                                     

                                                                                                                                     

int main(int argc,char **argv)                                                                                                        

{                           

   int i,j,z,n = N;                                                                                                                  

   double x[N], o;                                                                                                                    

   double **a;                                                                                                                       

                                                                                                                                      

   a = (double **)malloc((n+1)*sizeof(double *));                                                                                    

   for (i=0;i<n+1;i++)                                                                                                               

      a[i] = (double *)malloc(n*sizeof(double));                                                                                                                                                                                                                           

    z = 0;                                                                                                                            

                                                                                                                                     

        for(i=1; i<(n+1); i++) {                                                                                                      

                for(j=1; j<(n+1); j++) {                                                                                             

                        o = (i+j)/0.89;                                                                                               

                        if(i==j-1) o = 10;                                                                                           

                        if(i==j+1) o = 2;                                                                                                                                                 

                        a[i-1][j-1]=o;                                                                                               

                        z++;                                                                                                         

                }                                                                                                                    

        }                                                                                                                            

                                                                                                                                      

        for(i=0; i<n; i++) {x[i] = 0; a[n][i] = ((i+1)*21 + 101) / 1.3;}                                                             

   GSolve(a,n,x);                                                                                                                                                                                                                                 

}                        

        

Ïîð³âíÿííÿ øâèäêî䳿.      

         Äëÿ îö³íêè øâèäêî䳿 îá÷èñëåíü ïðîãðàìè, çãàäàí³ âèùå áóëè çàïóùåí³ äëÿ ðîçâ'ÿçàííÿ ñèñòåì ð³âíÿíü ç ê³ëüê³ñòþ íåâ³äîìèõ â³ä 1000 äî 7000.  êîæíîìó âèïàäêó ð³âíÿííÿ áóëè îäíàêîâ³, òàê ÿê ãåíåðóâàëèñü çà îäíèì ³ òèì æå àëãîðèòìîì.

         ×àñ âèêîíàííÿ çàì³ðÿâñÿ çà äîïîìîãîþ âáóäîâàíèõ ôóíêö³é á³áë³îòåêè PARDISO òà ïðîãðàìè time äëÿ Ubuntu 9.10.

         Äëÿ îá÷èñëåíü âèêîðèñòîâóâàëàñü îá÷èñëþâàëüíà ìàøèíà íà áàç³ Intel(R) Core(TM)2 Duo CPU T5550  @ 1.83GHz, 3 GB RAM, Linux Ubuntu 9.10. Ðåçóëüòàòè íàâåäåí³ ó òàáëèö³.

 

PARDISO:

ʳëüê³ñòü ð³âíÿíü

×àñ âèêîíàííÿ íà îäíîìó ÿäð³, ñåê.

×àñ âèêîíàííÿ íà äâîõ ÿäðàõ, ñåê.

Ïðèñêîðåííÿ

1000

0.45

0.37

1.22

2000

2.25

1.7

1.32

3000

6.13

4.42

1.39

4000

12.84

8.83

1.45

5000

23.06

15.3

1.51

6000

37.6

24.89

1.51

7000

56.75

36.7

1.55

 

LAPACK:

ʳëüê³ñòü ð³âíÿíü

×àñ âèêîíàííÿ íà îäíîìó ÿäð³, ñåê.

×àñ âèêîíàííÿ íà äâîõ ÿäðàõ, ñåê.

Ïðèñêîðåííÿ

1000

0.24

0.45

0.53

2000

1.34

0.95

1.41

3000

3.54

2.57

1.38

4000

8.04

5.86

1.37

5000

14.99

10.63

1.41

6000

25.29

17.27

1.46

7000

44.89

27.79

1.62

 

Âëàñíà ïðîãðàìà ç OpenMP:

ʳëüê³ñòü ð³âíÿíü

×àñ âèêîíàííÿ íà îäíîìó ÿäð³, ñåê.

×àñ âèêîíàííÿ íà äâîõ ÿäðàõ, ñåê.

Ïðèñêîðåííÿ

1000

7.66

4.95

1.55

2000

61.71

35.12

1.76

3000

209.55

114.19

1.84

4000

509.1

279.93

1.82

5000

1021.32

608.93

1.68

 

Âèñíîâêè.

1.Âèêîðèñòàííÿ ñïåö³àë³çîâàíèõ á³áë³îòåê º äîö³ëüíèì òà äຠïðèð³ñò øâèäêî䳿 äî 60 ðàç â ïîð³âíÿíí³ ç ïðîñòîþ ïðîãðàìîþ, ùî âèêîðèñòîâóº OpenMP.

2.Á³áë³îòåêà PARDISO äåùî ïîñòóïàºòüñÿ á³áë³îòåö³ LAPACK ó øâèäêî䳿, àëå öå ìîæíà ïîÿñíèòè îïòèì³çàö³ºþ ö³º¿ á³áë³îòåêè ï³ä ñèñòåìè ð³âíÿíü, äå º çíà÷íà ê³ëüê³ñòü íóë³â ó ìàòðèö³ êîåô³ö³ºíò³â.

3.Ñïåö³àë³çîâàí³ á³áë³îòåêè âñå ùå äàþòü ïðèñêîðåííÿ ïðè âèêîðèñòàíí³ ê³ëüêîõ ÿäåð. Öå ïðèñêîðåííÿ º ìåíøèì í³æ ïðèñêîðåííÿ â ïðîñò³é ïðîãðàì³ ç âèêîðèñòàííÿì OpenMP, ùî ïîÿñíþºòüñÿ çíà÷íîþ îïòèì³çàö³ºþ àëãîðèòì³â.

4.Ïðèñêîðåííÿ ó ïðîñò³é ïðîãðàì³ ç âèêîðèñòàííÿì OpenMP º çíà÷íèì, íàéêðàùèé ïîêàçíèê äîñÿãàºòüñÿ ïðè îá÷èñëåíí³ ñèñòåì ç 3000-4000 ð³âíÿíü.

 

˳òåðàòóðà.

1.Intel® Math Kernel Library 10.2 Reference Manual. http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/win/mkl/refman/index.htm — 2010

2.Intel® Math Kernel Library 10.2 User's Guide http://software.intel.com/sites/products/documentation/hpc/mkl/lin/index.htm — 2010

3.Intel® Math Kernel Library 9.0 for Linux Technical User Notes http://www.intel.com/software/products/mkl/docs/mklusel.htm — 2006

4.Ôîðóìè Intel MKL http://software.intel.com/en-us/forums/intel-math-kernel-library/

5.PARDISO Solver Project http://www.pardiso-project.org/index.php?p=manual

6.LAPACK — Linear Algebra Package http://www.netlib.org/lapack/