As part of the Dutch National Grid Initiative (NGI_NL), SURF offers people affiliated with Dutch universities and institutes access to Grid infrastructure. With the help of Grid large scale computational problems can be solved and large amounts of data can be stored. Grid computing is a form of distributed computing which can be very powerful when applied in the right way. The Grid consists of a large number of various clusters which are distributed all over the Netherlands and even abroad. The clusters are interconnected by fast network connections, provided by SURFnet. A Grid user is able to use all this computing power simultaneously, without having to log in at all the different sites. Grid middleware takes care of the distribution of all the 'jobs' a user submits. By a 'job' we mean the execution of a program that will run somewhere on a machine on the Grid.
As a user you connect to the Grid by using a so-called UI (User Interface). Once you have received the right credentials (i.e. an account on a user interface and a grid certificate) you are set to go.
First, you divide your problem into smaller units, called jobs. These jobs are the unit of computation and can be submitted to the Grid. The way to do this is to describe each job in terms of a Job Description Language (JDL). This is not a programming language but consists of attribute value pairs which describe your job. Here you list which program should be executed and what data it should operate on. Your program and data can be send with the job when necessary.
Each job in the form of a JDL file is then submitted to the Workload Management System. This system schedules your jobs and knows which compute clusters in the Grid are ready to accept your job. These clusters each consist of several machines. The Compute Element (CE) is the server which communicates with the WMS and accepts jobs. It then distributes the jobs to other machines in the cluster, called Worker Nodes (WNs). These WNs are the machines which do the actual work. When finished with a job they will report back to the CE, which in turn will inform the user about the status of the job. In addition, Clusters have a storage server, called the Storage Element (SE). These servers can be used to store files on a permanent basis. Data on the SE's can be replicated at other sites and jobs can be told only to land on Worker Nodes which are close to data the jobs will operate upon.
On these pages you can find more information about this whole process.