Success Story

OSCAR and DT Flood in the DISCOVER-US project

Transatlantic Computational Testbeds Enabling Seamless Workflow Executions of Flood Simulations

By integrating the interTwin DTE core module Infrastructure Manager (IM) with Chameleon, a transatlantic testbed composed of OSCAR clusters in Europe and the USA was deployed. Using Common Workflow Language (CWL), we achieved seamless execution of scientific workflows for flood assessment.

Image

The Challenge

In this project, the objective was to address the computational and infrastructure challenges of combining resources from two OSCAR clusters located in different distributed Cloud infrastructures and regions—EGI Federated Cloud in Europe and Chameleon in the USA. By uniting these clusters, the project aimed to achieve three key goals:

  1. Optimise Resource Utilisation: Allocate tasks to the cluster with available capacity to maximise efficiency and streamline operations.
  2. Ensure Data Locality Compliance: Assign tasks to the cluster closest to the required data, reducing latency and improving processing speed.
  3. Increase System Resilience: Create a robust infrastructure capable of executing workloads between clusters during maintenance or unforeseen disruptions, ensuring uninterrupted operations.

These goals reflect the importance of efficiently harnessing distributed computational resources while overcoming geographic and operational barriers.

Image

Solution

The project utilised Infrastructure Manager (IM) to deploy OSCAR clusters in Europe (EGI federated cloud) and the USA (Chameleon), enabling seamless setup across both sites. OSCAR, an open-source Kubernetes-based serverless platform, was used for event-driven computation to efficiently run the services required for the scientific workflow.

The Common Workflow Language (CWL) was applied to develop and define the scientific workflows, with the FloodAdapt Digital Twin (DT-Flood) as the use case. CWL integrated a Python script that leveraged the oscar-python library, enabling seamless connection and interaction with the OSCAR clusters across regions. This streamlined approach ensured clarity and efficiency in executing distributed workflows.

Infrastructure Manager (IM)
+

An open-source Infrastructure as Code (IaC) tool that deploys complex and customized virtual infrastructures on multiple back-ends.

DT-Flood
+

DT-Flood, is the FloodAdapt Digital Twin used to demonstrate the ability to execute scientific workflows seamlessly across both OSCAR clusters located in Europe and the USA. The DT-Flood use case focuses on advanced flood hazard and impact modelling to support decision-making in flood-prone areas.

OSCAR
+

a serverless platform for event-driven computation, allows users to deploy container-based services seamlessly, leveraging the scalability and flexibility of cloud-native environments.

EGI Federated Cloud
+

The EGI federated e-infrastructure comprises national and intergovernmental computing and data centres from the EGI Federation. These federated centres make EGI one of the largest distributed computing infrastructures for research.

CWL, Common Workflow Language
+

Common Workflow Language (CWL) is an open standard for describing how to run command line tools and connect them to create workflows.

Here CWL is employed to define the steps of the scientific workflows.

Python
+

A Python script, using the oscar-python library, allows users to decide where to offload computation at the step level of the CWL-defined workflow

Looking Forward

We plan to further enhance the integration of the Infrastructure Manager with Chameleon to facilitate the deployment of customized virtualized infrastructure for Chameleon users using the easy-to-used IM Dashboard. This will unlock access to the wide catalog of curated recipes for the deployment of popular applications (e.g. Kubernetes, SLURM-based Clusters, MLFlow, etc.). The advanced interoperability that provides the IM as an orchestrator of Cloud-based infrastructures will facilitate the deployment of additional transatlantic computational testbeds required to aggregate disparate computing Cloud-based resources from large-scale distributed infrastructures.