The Trunk Data Platform (TDP) is a 100% open source big data distribution, based on Apache Hadoop and compatible with HDP 3.1. Initiated in 2021 by EDF, the DGFiP and Adaltas, the project is governed by the TOSIT – an association under the 1901 law with the objective of promoting open source to major companies and institutions.
Version 1.1, which release is expected duing the 4th quarter of 2023, adds features necessary for managing a production cluster (see #308). Support and training offers are already available from some consulting firms like Adaltas with Alliage.
TDP is aimed at anyone wishing to:
- Create their data platform (Data Lake, Data Hub, Data Warehouse, Data Science Platform, etc.).
- Migrate their current solution to a 100% open source (and free) solution.
- Develop on big data services (HDFS, Hive, Spark, etc.).
- Explore Hadoop technologies.
Architecture
TDP can be broken down into 2 main parts:
- A stack, based on Apache Hadoop and compatible with HDP 3.1.
- A cluster manager, based on Ansible, that allows deploying and managing a TDP cluster via a library, a REST API, or a graphical interface (see
tdp-lib,tdp-serverandtdp-ui).
