Data processing

Data processing (DP) at SURF consist of two platforms, Spider and the Grid, both allow users to run highly parallel jobs on distributed resources. If your project is too big in scale for personal computers, or even cloud solutions then DP at SURF may be a good solution for you.

See more about Data Processing at SURF here.

The Spider cluster

Spider is a versatile DP platform aimed at processing large structured data sets. Spider is an in house compute cluster built on top of SURF’s in-house elastic Cloud. This allows for scalable processing of many terabytes or even petabytes of data, utilizing many hundreds of cores simultaneously, in exceedingly short timespans. Superb network throughput ensures connectivity to external data storage systems. As a local compute cluster running only at SURF, Spider is geared towards interoperability with other platforms, allowing for a high degree of integration and customization into the user’s domain. This is enhanced by specific features that support collaboration, data (re)distribution, custom security, or even private Spider instances.

Spider is used for large scale multi-year data intensive projects, for users to actively process their data, such are large static data sets or continuously growing data sets. Examples include genomics data, astronomic telescope data, physics detector data and satellite earth observations.

If you have experienced restrictions with the Grid in the past, or you do not have a local compute cluster available then Spider may be the best solution for you, reach out to our Servicedesk to confirm.

The full Spider documentation can be found here.

  • No labels