Parallel Computing in India
Parallel Computing In India Introduction to Parallel Computing Traditionally, software has been written for serial computation: •To be run on a single computer having a single Central Processing Unit (CPU); •A problem is broken into a discrete series of instructions. •Instructions are executed one after another. •Only one instruction may execute at any moment in time. In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: •To be run using multiple CPUs A problem is broken into discrete parts that can be solved concurrently •Each part is further broken down to a series of instructions •Instructions from each part execute simultaneously on different CPUs • The computer resources can include: •A single computer with multiple processors; •An arbitrary number of computers connected by a network; •A combination of both. Introduction to Parallel Computing In India Although the performance of single processors has been steadily increasing over the years, the only way to build the next generation teraflop architecture supercomputers seems to be through parallel processing technology.
Even with today’s workstation-class high performance processors exceeding 100 megaflops, thousands of processors are required to build a teraflop architecture machine. Further, the fastest special purpose vector processors have a few Gigaflop peak performance, and thus they too need to be utilized in parallel to achieve Teraflop levels of performance. In 1987, India decided to launch a national initiative in supercomputing in the form of a time-bound mission to design, develop and deliver a upercomputer in the gigaflops range. The major motivation came from delays (political) in getting a CRAY XMP for weather forecasting. A decision was made to support the development of indigenous parallel processing technology. The Center for Development of Advanced Computing (C-DAC) was set up in August 1988 with 3- year budget of Rs. 375 million (approximately US$ 12 million). C-DAC’s First Mission was directed to deliver 1000 MFlops parallel supercomputer (1GF) by 1991.
Simultaneously, several other complementary projects were initiated to develop high-performance parallel computers at the National Aerospace Laboratory of the Council of Scientific and Industrial Research (CSIR), the Center for Development of Telematics (C-DOT), Advanced Numerical Research & Analysis Group (ANURAG) of Defense Research and Development Organization (DRDO) and Bhabha Atomic Research Center (BARC). India’s first generation parallel computers were delivered starting from 1991. PARALLEL PROCESSING
We all know that the silicon based chips are reaching a physical limit in processing speed, as they are constrained by the speed of electricity, light and certain thermodynamic laws. A viable solution to overcome this limitation is to connect multiple processors working in coordination with each other to solve grand challenge problems. Hence, high performance computing requires the use of Massively Parallel Processing (MPP) systems containing thousands of power full CPUs. Processing of multiple tasks simultaneously on multiple processors is called Parallel Processing.
The parallel program consists of multiple active processes simultaneously solving a given problem. A given task is divided into multiple sub tasks using divide-and-conquer technique and each one of them are processed on different CPUs. Programming on multiprocessor system using divide-and-conquer technique is called Parallel Processing. The development of parallel processing is being influenced by many factors. The prominent among them include the following: ? Computational requirements are ever increasing, both in the area of scientific and business computing.
The technical computing problems, which require high-speed computational power, are related to life sciences, aerospace, geographical information systems, mechanical design and analysis, etc. ? Sequential architectures reaching physical limitation, as they are constrained by the speed of light and thermodynamics laws. Speed with which sequential CPUs can operate is reaching saturation point ( no more vertical growth ), and hence an alternative way to get high computational speed is to connect multiple CPUs ( opportunity for horizontal growth ). Hardware improvements in pipelining, super scalar, etc, are non scalable and requires sophisticated compiler technology. Developing such compiler technology is difficult task. ? Vector processing works well for certain kind of problems. It is suitable for only scientific problems ( involving lots of matrix operations). It is not useful to other areas such as database. ? The technology of parallel processing is mature and can be exploited commercially, there is already significant research and development work on development tools and environment is achieved. Significant development in networking technology is paving a way for heterogeneous computing. PROJECTS IN INDIA India launched a major initiative in parallel computing in 1988. There are five or six independent projects to construct parallel processing systems. This was motivated by the need for advanced computing, a vision of developing its own technology, and difficulties (political and economic) obtaining commercial products.
The creation of the Center for Development of Advanced Computing (C-DAC) and concurrently other efforts at National Aerospace Laboratory (NAL), Bangalore, Advanced Numerical Research & Analysis Group (ANURAG), Hyderabad, Bhabha Atomic Research Center (BARC), Bombay, Center for Development of Telematics (C-DOT), Bangalore, marked the beginning of high performance computing in India. Today, India has designed its own high performance computers in the form of ? PARAM by C-DAC ? ANUPAM by BARC ? PACE by ANURAG ? FLOSOLVER by NAL ? CHIPPS by C-DOT ? MTPPS by BARC
PARAM SERIES First Mission C-DAC formally launched its first mission in August 1988 to deliver a 1GF parallel machine. This effort started almost from scratch, but came out with its first 64 node prototype in two years. C-DAC’s computers have been name PARAM, meaning in Sanskrit “Supreme”. It also made a nice acronym for a PARAllel Machine. The programming environment is called PARAS (the mythical stone which can turn iron into gold by mere touch) which gave a golden touch to the underlying machine and made the job of programmer or user considerably easier.
The first PARAMs were based on INMOS Transputers 800/805 as computing nodes, and the first PARAM models were called PARAM 8000 series systems. Although the theoretical peak-performance of 256 node PARAM machine was 1 gigaflops (single node T805 claiming 4. 25 MFlops), its sustained performance in actual application turned out to be between 100 to 200 MFlops. Early in 1992, it was acknowledged that the basic compute node of PARAM 8000 was underpowered, and C-DAC decided to integrate i860 into the PARAM architecture.
The objective was to preserve the same application programming environment and provide straightforward hardware upgradability by just replacing the compute node boards of PARAM 8000. This resulted in the next architecture with i860 as a main processor with 4 transputers acting as communication processors, each with 4 built-in links. The PARAS programming environment was extended to PARAM 8600 to give an identical user view as in PARAM 8000; this new system was created during 1992 and 1993. The sustained performance of 16 node PARAM 8600 was claimed to be in the range of 100-200 MFlops, depending on the application.
Second Mission PARAM 9000 Both C-DAC and the Indian Government considered that the First Mission was accomplished and embarked on the Second Mission, to deliver teraflops range parallel system, capable of addressing grand challenge problems. This machine, called PARAM 9000 was announced in 1994. System Architecture: The multistage interconnect network of PARAM 9000 uses a packet switching wormhole router as the basic switching element. Each switch is capable of establishing 32 simultaneous non-blocking connections to provide a sustainable bandwidth of 320 MBytes/sec.
The PARAM 9000 architecture emphasizes flexibility. It is hoped that as new technologies in processors, memory and communication links advance and become available, these can be upgraded in the field. The first system is PARAM 9000SS, which is based on SuperSparc processors. The complete node is realized using the SuperSparc II processor with 1 MB of external cache, 16 to 128 MB of memory, one to four communication links and related I/O devices. The current operating speed of the processor is 75 Mhz. When new MBus modules with higher frequencies become available, they can be field-upgraded. Applications on PARAM:
Application kernels are said to have been developed on PARAM in the areas of computation fluid dynamics, finite element analysis, oil reservoir modeling, seismic data processing, image processing, remote sensing, medical imaging, signal processing, radio astronomy, molecular modeling, biotechnology, quantum molecular dynamics, quantum chemical calculations, semiconductor physics, composites and special materials, power systems analysis and energy management, and discrete optimization. BARC’S ANUPAM ANUPAM Pentium Supercomputer placed at BARC Bhabha Atomic Research Center, founded by DrHomiBhabha, is
India’s major national center for nuclear science and at the forefront of India’s Atomic Energy Program. Through 1991 and 1992, BARC computer facility members started interacting with C-DAC to have a high-performance computing facility. It was estimated that a machine of 200 MFlops of sustained computing power would be needed to solve their current problems. Because of the importance of their program, BARC decided to build their own parallel computer. Initially, Computer Division, BARC, developed ANUPAM 860 series of supercomputers which used processor boards based on Intel i860 microprocessors as compute nodes.
Since 1997, ANUPAM Alpha and ANUPAM Pentium series of supercomputers are being developed based on industry standard high-speed network switches and either Alpha Server/Workstations or Pentium Servers/ PCs as compute nodes. Very high-speed inter-node communication is provided by one of the high-speed switches like ATM, fast Ethernet and Gigabit Ethernet. First ANUPAM-860/4 was developed in December 1991. It made use of Intel i860 microprocessor @ 40 MHz, based mini computers as master nodes and 4 Intel i860 based processor boards with on-board memory as compute nodes all in one chassis.
Each compute node had a peak speed of 80 Mega Flops. The overall sustained speed of the system for user jobs was 30 Mega Flops. The system was later on upgraded to 8 nodes in August 1992 giving a sustained computational speed of 52 Mega Flops. This involved redesigning of the processor boards so that 8 slave compute nodes could be accommodated in the same single 860 mini computermultibus-II chassis. The system was further upgraded to 16- node ANUPAM-860 in November 1992, giving a sustained speed of 110 Mega Flops. This involved coupling of two 860 mini computer chassis.
Subsequently, ANUPAM 860/32, a 32-node system was developed in February 1994 by interconnecting 4 mini computers Multibus chassis. The system was further upgraded to 64 nodes in November 1995 by adding 32 more slave compute nodes which were designed using the latest Intel 860 microprocessor @ 50 MHz and providing up to 256 MB on board memory. The 64-node ANUPAM-860, gave a sustained computational speed of 400 Mega Flops, which was equivalent to the speed of CRAY Y/MP Vector Supercomputer. First of ANUPAM-Alpha series of supercomputers was developed in July 1997 giving a sustained speed of 1000 Mega Flops on 6 compute nodes.
This system made use of six Alpha servers, based on Alpha 21164 microprocessor @ 400 MHz as node processors and Asynchronous Transmission Mode (ATM) switch operating at a peak speed of 155 Mbps and sustained speed of 134 Mbps as interconnecting network. The design of the system differed significantly from the earlier ANUPAM-860 design. This system used complete servers/ workstations with memory, disk, other I/O and operating systems as compute nodes instead of processor boards with only memory as compute nodes in the earlier ANUPAM-860 series of systems.
The system was further upgraded to ANUPAM-Alpha/10 in March 1998 by adding 4 compute nodes, thus giving a sustained speed of 1. 5 Giga Flops on 10 nodes. ANUPAM-Alpha series of supercomputer can be easily upgraded to 128 nodes, thus giving a sustained speed of about 50 Giga Flops using currently available Alpha 21264 microprocessors @ 700 MHz. The computing speed available on personal computers based on the latest Pentium Microprocessors have increased to a level almost matching the speed of workstations based on RISC microprocessors and they also support large RAM memories required for large compute-bound jobs.
Being commodity items, these personal computers are readily available at much lower prices from multiplevendors. The development work on ANUPAM-Pentium series of superomputers based on Pentium PCs was initiated in January 1998, the main focus of development being minimization of cost. The first ANUPAM- Pentium II/4 using 4 Pentium II PCs operating @ 266 MHz as compute nodes and a fast 100 Mbps Ethernet switch for interconnection was ready in July 1998. This gave a sustained speed of 248 MFlops. Subsequently ANUPAM-Pentium II was expanded in March 1999 o 16 nodes using Pentium II personal computers @ 330 MHz giving a sustained speed of 1. 3 Giga Flops. In April 2000, the system was further upgraded to Pentium III/16 using 16 Pentium III personal computers @ 550 MHz as compute nodes and a Gigabit Ethernet switch for interconnection, giving a sustained speed of 3. 5 Giga Flops. ANUPAM Supercomputer developed by BARC is being continuously upgraded, the latest being an 84-node system based on Pentium-III, which has demonstrated a sustained speed of 15 giga flops.
It is expected that a sustained speed of 50 giga flops will be reached by the end of the IX Plan. ANUPAM-Pentium series of supercomputers can be easily upgraded to 128 nodes for meeting any desired speed requirement up to 25 Giga Flops. A new super computer that solves computation problems faster has been developed by the Bhabha Atomic Research Centre (BARC). As a result problems in a range of fields including scientific research and simulation of nuclear explosions are now amenable to faster solutions.
The computer division of BARC has developed the ANUPAM-PIV 64-node supercomputer with a sustained speed of 43 GIGA FLOPS (floating point instructions per second). Its works three times faster than that of last year’s version and 1000 times faster than BARC’s first 4- node version of 1991. The ANUPAM-PIV is 30 to 40 times faster than the parallel computer developed indigenously by other institutes in the country and more than 10 times faster than the fastest supercomputers imported from abroad for various computing applications.
ANUPAM-PIV is designed using Pentium IV personal computers operating at 1. 7 GHz with 256 MB memory each. APPLICATIONS All the three series of supercomputers – ANUPAM-860, ANUPAM-Alpha and ANUPAM-Pentium – have been extensively used for solving some of the very large computational problems for BARC. ANUPAM systems have also been used by many other R organizations in the country. Applications at BARC: BARC has used ANUPAM series of super-computers for the past ten years for solving large computational problems in various frontier fields of science and engineering.
Some of the major applications implemented on ANUPAM supercomputers are as follows: – ? Molecular Dynamics Simulation :This simulation is carried out by setting upa box consisting of a number of particles. Then, assuming certain startingvalues of positions and velocities of atoms, the equations of motion are solved iteratively with a small time step, with the new iteration utilizing the results of the previous cycle. The parallelization is done on calculating the net forces on the atom at each time step and the values of atomic coordinates are passed between processors for each iteration. Neutron Transport Calculations :This problem involves solving of neutron transport problems, involving complicated geometry and large flux gradients using Monte Carlo method. A large amount of computations is needed for reducing uncertainties associated with this method ? Gamma Ray Simulation by Monte Carlo Method:This simulates thedevelopment of the electromagnetic cascade, initiated by Cosmic Gammaray, in the earth’s atmosphere. This simulation enables one to device andtune the performance of the detector for the detection of Cosmic GammaRays from extra-terrestrial source. Crystal Structure Analysis :In this problem, computations are required for the processing and analysis of huge experimental data, optimising thousands of structural parameters and visualisation of 3 dimensional structures of biological macromolecule like Proteins. It has been parallelized using data domain partitioning technique. Data partitions are totally independent leading to very little inter process message passing. ? Laser-Atom Interaction Computation :Computation of intense field Laser- Atom interaction is a very complex problem.
In recent years, there is an intense activity in the direct solutions of Schrodinger equation (SE) involving time dependent (TD) interactions. This necessitates high performance computing solutions. This program has been successfully on ANUPAM parallel system. ? Three Dimensional Electromagnetic Plasma Simulation :This simulation demands high performance computers and are used for studying plasmas of various types such as those occurring in high power microwave cavities, and in earth’s magnetic sphere. Parallelization has considerably reduced the time taken to analyze and display various time frames of electromagnetic plasma.
Applications in Outside Organizations ? Weather Forecasting at National Centre for Medium Range Weather Forecasting (NCMRWF), Delhi : Dual CRAY X/MP Supercomputer, procured in 1988, was being used for Medium Range Weather Forecasting at NCMRWF, Delhi. The search for the replacement of this supercomputer was started by Department of Science and Technology in 1994. So far, out of all the indigenously developed supercomputers, only ANUPAM-Alpha with (1+4) one master and four slave node configuration, has been able to meet both the conditions of matching accuracy and execution time on CRAY.
An ANUPAM-Alpha system was fully commissioned at NCMRWF, Delhi, in December 1999, thus providing a solution to a long-standing problem of finding an alternative to obsolete Dual CRAY X/MP supercomputer. ? Computational Fluid Dynamics, Aeronautical Development Agency (ADA), Solving Computational Fluid Dynamics problem for studying airflow through air-intake ducts of an aircraft is one of the very large computational problem demanding huge amount of computations. This problem became one of the major challenges to the indigenously developed supercomputers. his challenging problem was solved using the ANUPAM-860 system for a dedicated period of 2-3 months. Recently ADA had another problem of Computational Fluid Dynamics involving 4 million grid points. This problem could not even be loaded on any of the supercomputers available in the country. It was implemented on a 16-node ANUPAM-Pentium II developed last year using 16 Pentium-II PCs with 760 MB memory each. ANURAG’S PACE The Advanced Numerical Research and Analysis group (ANURAG) is located in Hyderabad.
It is a recently created Laboratory of the Defense Research Development Organization (DRDO) focused on R&D in parallel computing, VLSIs, and applications of High Performance Computing in CFD, medical imaging, and other areas. ANURAG has developed PACE, a loosely-coupled, messagepassing parallel processing system. PACE is an acronym for Processor for Aerodynamic Computations and Evaluation. ANURAG’s PACE program began in August 1988. The initial prototypes of PACE were based on the Motorola MC 68020 processor.
The first prototype had 4 nodes (16. 67 MHz). Later, an 8-node prototype based on MC 68030 processor (25 MHz) was developed. This 8-node Cluster forms the backbone of the PACE architecture. The 128 node prototype is based on the MC 68030 processor (33 MHz). The latest offering of PACE is called PACE+ and is based on the HyperSPARC node running at 66 MHz. The memory per node is expandable up to 256 MB. The PACE 128 system based on the Motorola 68030 processor and MC 68882 co-processor delivered over 30 MFlops for large problems.
The speed per processor node was 0. 33 MFlops. Later, this was enhanced to 0. 75 MFlops per node. With the SPARC II processors, the speed is 4. 5 MFlops per node. The latest SPARC processors should offer higher performance. The 128 node configuration is supposed to provide a Linpack (1000×1000) rating of 375 MFlops (single precision). NAL’S FLOSOLVER The National Aerospace Laboratories (NAL) located at Bangalore is a major national laboratory of the Council for Scientific & Industrial Research of the Govt. of India.
In 1986, NAL started a project to design, develop and fabricate suitable parallel processing systems to solve fluid dynamics and aerodynamics problems. The project was motivated by the need for a powerful computer in the laboratory and was influenced by similar developments internationally. NAL’s parallel computer is called Flosolver, and was the first Indian parallel computer to become operational (1986). Since then, a series of updated versions have been built, which include Flosolver Mk1 and Mk1A, which were four processor systems based on 16-bit Intel 8086/8087 rocessors, Flosolver Mk1B, an eight processor system in this series, Flosolver Mk2, based on 32-bit Intel 80386/80387 processors and the latest version, Flosolver Mk3, based on the RISC processor i860 from Intel. The application of NAL’s Flosolver is dominantly focused on the weather forecasting code T80 under a project from the Department of Science and Technology of the Govt. of India. Flosolver has also been used by the scientists of NAL for solving their computational fluid dynamics problems. The current system is used to compute aerodynamic loads on aircraft wing-body combinations and in light combat aircraft design.
BARC’S MTPPS MTPPS (Multi Transputer Parallel Processing System) is a 16 node T800 transputer based system designed and built by the Electronics Division of theBhabha Atomic Research Center (this is a clear example of the friendly competition in the field of parallel processing in India!! ANUPAM is the other product of BARC). MTPPS is intended to be extensively used in the design of large detector nuclear acquisition systems in nuclear power plants. However concludes that with its modest performance of about 6 MFLOPs, the applications that could be run on MTPPS will probably be limited.
CONCLUSION India has made significant strides in developing highperformance parallel computers. The technological developments in parallel computing in India have been considerable. They show that the general evolution of society in India is at such a stage that if appropriate financial resources are available, computer systems can be designed and built around available microprocessor chips along with systems software, and many application packages can be ported onto these machines using message passing architecture.
The fact that nearly half a dozen such efforts in varied organizations and cities have borne fruit suggests that there is no shortage of leadership, technical and organizational skills. Furthermore these successes have not only enhanced self-confidence of concerned groups and organizations but also help relieve some bottlenecks in scientific research and technological development. India is now capable, with enough funding and effort, to develop its own teraflops supercomputers, perhaps in the next few years.