Core DTE Modules

Big Data Analytics TOSCA templates

Description

A set of TOSCA templates to deploy Big Data Analytics tools

TOSCA templates enable the description, in a cloud-agnostic way, of the virtual infrastructures needed in the available Big Data Analytics tools.

Release Notes

In this release, the following templates have been created:

  • KubeFlow: Template to deploy the Kubeflow machine learning (ML) workflows platform on top of Kubernetes.
  • Airflow: Template to deploy the Apache workflows system on top of Kubernetes.
  • CernVMFS: Install CernVMFS on a VM and mount a list of CernVM-FS repositories specified by the user.
  • Kafka: Deploy Kafka distributed event streaming platform on top of a Kubernetes cluster.
  • MLFlow: Deploy the MLFlow platform to manage the ML lifecycle in a single VM, with possibility to store the artefacts in an external S3 (or MinIO) storage system.

WP5 and WP6 members have tested the templates before the release and plans for the testing with DT use cases have been done

Future Plans

Some of the templates are in an early stage (Kafka and MLFlow) and need to be correctly tested by users with experience using these tools to validate the functionality of the deployed infrastructure. Other templates are more mature but may need some additions to improve them. Finally, further templates need to be created (e.g. for the openEO compatible back-ends).

Target Audience
+

Any scientist that requires a Big Data Analytics tool.

License
+

Apache 2.0

Created by
+