Current

  • Due to the delay in accounting SRC wallets were not charged for multiple weeks. A recalculation of credits was completed on September 26th. Therefore some users may see a steep decrease in their credits on that date. As of today, the current balances are accurate and reflect the correct charges.
  • Displayed storage usage may be inaccurate. Please check back for further notice.

Future 

n.a.

Past

2024

  • 2024-11-27 16:15 until 2024-11-28 9:45
    • Due to network issues during a network maintenance, no HPC Cloud workspaces can be started, paused, resumed or deleted. Communication with the workspaces can fail.
  • 2024-11-13
    • Root cause analysis of recent network outages.

Summary
SURF research services hosted at Amsterdam Data Center, experienced multiple interruptions in the last months. Namely, SRC service had limited access to the 'HPC Cloud' provider.

The reliability and availability of our services is paramount. After thorough investigation we have found the root cause of the issues and will implement a fix in the remainder of this year.

Root Cause Analysis
In recent years, SURF has experienced an exponential growth of hardware hosted at Amsterdam Data Center. This growth has led to scalability issues on our network which caused recent outage events. A solution was agreed to adapt our datacenter network to withstand with current and future loads. 

What to expect
In the coming weeks, SURF plans to implement network adjustments to avoid future outages. This will require maintenance work. We expect the services, namely SRC,  to continue operating as usual during the maintenance. Nevertheless, there is always the possibility that an outage will occur. You will always be kept informed through the usual channels, and any disruptive maintenance will be announced in advance as usual.

Should you have any questions, you can contact us via the Service Desk.

  • 2024-11-06 9:00 - 13:30
    • Due to issues in the underlying cloud infra-structure +/- 50 % of started workspaces were failing for SURF HPC Cloud. We have disabled starting new workspaces for this cloud for the moment so we can fix the issue.
  • 2024-10-10 17:00 - 17:30 
    • Emergency maintenance: SURF Cloud workspace actions (start, pause, resume, delete) are disrupted 
  • 2024-10-07 15:30 until 2024-10-09 9:15
    • Due to network issues no workspaces can be started, paused, resumed or deleted on SURF HPC Cloud.
  • 2024-09-13 13:30 - 16:00
    • Network instability on SURF Cloud
    • Impact: some running workspaces weren't reachable
  • 2024-09-10 08:30 - 10:15
    • A technical problem on a SRC component impacted the regular users workflow
    • Impact: starting workspaces not possible; connection with Research Drive broken.
  • 2024-08-29 11:00 - 12:30 
  • 2024-08-27 15:36 until 2024-08-28 13:37

    • Network outage at SURF's data centre in Amsterdam

    • Unavailability of The 'HPC Cloud' provider until the network incident was resolved

    • Impact: machines could not be started/stopped/paused in the 'HPC Cloud'; access to running workspaces was affected
  • 2024-08-23 09:00 - 16:00
    • The accounting / budgeting service component will have its annual maintenance. Impact: wallet creation will be delayed. Requests will be handled after the maintenance.
  • 2024-07-09 21:54
  • 2024-06-25
    • A preliminary evaluation of the network outage which caused the 'HPC Cloud' provider to be down, is publicly available. Executive summary:

On Wednesday evening, June 12, the SURF EVPN experienced an outage, caused by an internal broadcast storm. No external cause – or malicious intent – was detected. The network was fully recovered the next day. Eight services were impacted; two services were available the same night, with the last service fully recovered by Monday morning. The root cause is still unknown. An in-depth evaluation is planned, as we are awaiting more information from one of our vendors.

  • 2024-06-15 14:43 until 2024-06-17 9:30
    • HPC Cloud API is down due to cloud recovery due to earlier network issues
    • No new workspaces can be created on HPC Cloud
    • No Pause / Resume can be done on HPC Cloud workspaces
    • Existing machines can be accessed
  • 2024-06-14 10:15 until 12:00
    • HPC Cloud API is down due to recovery from earlier network outage.
    • No new workspaces can be created, no workspace states can be changed.
    • Existing machines can be accessed.
  • 2024-06-12 19:15 until  2024-06-14 9:45

    • SURF Research Cloud service has limited access to the 'HPC Cloud' provider

    • The 'HPC Cloud' is down due to network problems.

    • Impact: machines cannot be started/stopped/paused in the 'HPC Cloud'
  • 2023-12-15 10:00 am / 18:00 pm: Update network components, Impact: Possible short interruptions of portal functionality. I case of a glitch, please retry after 1-2 minutes. Workspaces will not be affected.
  • 2023-06-20 7:49 am / 11:14 pm: Apply Security updates to a batch of GPU & CPU Fat nodes. Impact: less availability of mentioned resources and no running workspaces on the hardware under maintenance.
  • 2023-06-13 08:37 am / 12:40 pm: Apply security updates to a batch of GPU & CPU Fat nodes. Impact: less availability of mentioned resources and no running workspaces on the hardware under maintenance. 
  • 2023-05-31 8:00 am / 03:51 pm: Update network components in our SURF HPC Cloud system. The portal is unavailable, workspaces cannot be created or paused/resumed in the Cloud provider 'HPC Cloud'. Running workspaces remain available.
  • 2023-04-12 9:00 am / 5:00 pm: Maintenance of the accounting service that manages the wallets. New wallets or changes will be processed after this maintenance window. Existing workspaces and wallets will not be affected.
  • 2023-04-04 7:00 am / 9:00 am: Network change 418 with expected network downtime of 2 minutes during this maintenance window. This might affect network traffic to and from VM's on SURF HPC Cloud.
  • 2023-03-28 22:00 CET / 01:00 CET: Internal database outage. Users could not perform any operations on either workspaces or catalog items. Any changes made to either workspaces or catalog items between 19:00 CET and 22:00 CET are lost and cannot be recovered.
  • 2023-03-21 / 2023-03-22: For a short period of time SURF ResearchCloud reported wrong usage amounts. This resulted in falsely depleted wallets and some users were unable to start/resume workspaces. This has been corrected and resolved.
  • 2023-03-20 12:15 / 14:55: Storage issue caused workspace creation to be unavailable
  • 2023-03-14 12:40 am / 12:50 am: Internal database upgrade, workspace cannot be created or paused/resumed.
  • 2023-03-02 7:00 am / 9:00 am: Network change 418 with expected network downtime of 2 minutes during this maintenance window. This might affect network traffic to and from VM's on SURF HPC Cloud. (rescheduled to: 2023-04-04)
  • 2023-02-16 5:00 am / 7:00 am: SRAM service dependency will be updated; Impact: SRC portal is not accessible; no impact for running workspaces
  • 2023-02-07 11:25 am / 11:50 am: Due to problems with our authentication service, it is currently not possible to log in to the Research Cloud portal. Running workspaces are unaffected.
  • 2023-02-06 2:00 pm / 8:26 pm: Ubuntu workspaces would fail sporadically due to failing to reach the package repository endpoint. (status.canonical.com)
  • 2023-01-25 09:00 am / 11:30 am: Update network components in our SURF HPC Cloud system. The portal will be unavailable, workspace cannot be created or paused/resumed in the Cloud provider 'HPC Cloud'.
  • 2022-12-20 08:00 / 17:00: Maintenance of all GPU & CPU Fat nodes; Impact: no running workspaces on the hardware under maintenance 
  • 2022-09-28 9:00 am / 19:00 pm: Updating the infrastructure supporting the SRC Portal.
  • 2022-09-15 5:00 am / 7:00 am: SRAM service dependency will be updated; Impact: SRC portal is not accessible; no impact for running workspaces
  • 2022-07-24 8:13 pm - 8:15 pm: Intermittent authentication service issues.  20:13-20:1520:13-20:15 20:13-20:15
  • 2022-06-03 / 2022-06-22: New workspaces can not be attached to existing reserved IPs and not be added to existing private networks.

  • 2022-06-02 04:00 pm / 2022-06-02 18:00: No new workspaces can be created.

  • 2022-05-25 1:00 pm / 6:30 pm:  new workspaces likely to fail due to network capacity. Running workspaces could be logged in to and worked with as usual.
  • 2022-05-10  11:30 am / 4:45 pm : portal.live.surfresearchcloud.nl blocked  
  • 2022-05-09 08:00 am / 8:00 pm
    • Plan: update network components in our SURF HPC Cloud system
    • Impact: workspaces cannot be created or paused/resumed in the Cloud provider 'HPC Cloud'
  • 2022-01-28 12:14 am: Communication to all service users about the PwnKit vulnerability and how to patch vulnerable workspaces
  • 2021-10-13 12:30 pm / 2021-10-15 10:56 am: Maintenance was extended due to unforeseen stability issues while deploying service components to new hardware
    • Impact
      • creation of new workspaces in the Cloud provider 'HPC Cloud' is not available
  • 2021-10-13 7:00 am / 12:30 pm: Hardware replacement
    • Impact
      • creation of new workspaces in the Cloud provider 'HPC Cloud' is not available
    • No impact
      • running workspaces will operate as usual
      • SRC Portal is available
  • 2021-09-29: SRC access might be less available
    • From 5:00 to 07:00: SRAM service dependency will be updated; Impact: SRC portal is not accessible; no impact for running workspaces
  • 2021-06-08: SRC access might be less available
    • From 5:00 to 07:00: SRAM service dependency will be updated; Impact: access to SRC is less available; no impact for running workspaces
  • 2021-05-26: Portal and connection to VMs unstable
    • From ca. 10:00 to 12:00: Limited portal and VM usage due to an internal failure in Research Cloud.
  • 2021-04-28: Creation of new workspaces fails
    • From ca. 9:45 to 10:45: Due to a Research Cloud internal failure, users could not start new workspaces.
  • 2021-03-15: Gitlab.com down
    • Between 13:00h and 15:00h, gitlab.com was unavailable, which rendered SRC unable to create workspaces.