dbt
dbt Labs is redefining how data teams work with SQL. Instead of waiting on complex ETL processes, dbt lets data analysts and data engineers build production-ready transformations directly in the warehouse, using code, version control, and CI/CD. This community-driven approach puts power back in the hands of practitioners while maintaining governance and scalability for enterprise use.
With a rapidly growing open-source community and an enterprise-grade cloud platform, dbt is at the heart of the modern data stack. It’s the go-to solution for teams who want faster analytics, higher quality data, and the confidence that comes from transparent, testable transformations.
Learn more
DataHub
DataHub is a versatile open-source metadata platform crafted to enhance data discovery, observability, and governance within various data environments. It empowers organizations to easily find reliable data, providing customized experiences for users while avoiding disruptions through precise lineage tracking at both the cross-platform and column levels. By offering a holistic view of business, operational, and technical contexts, DataHub instills trust in your data repository. The platform features automated data quality assessments along with AI-driven anomaly detection, alerting teams to emerging issues and consolidating incident management. With comprehensive lineage information, documentation, and ownership details, DataHub streamlines the resolution of problems. Furthermore, it automates governance processes by classifying evolving assets, significantly reducing manual effort with GenAI documentation, AI-based classification, and intelligent propagation mechanisms. Additionally, DataHub's flexible architecture accommodates more than 70 native integrations, making it a robust choice for organizations seeking to optimize their data ecosystems. This makes it an invaluable tool for any organization looking to enhance their data management capabilities.
Learn more
MANTA
Manta is a unified data lineage platform that serves as the central hub of all enterprise data flows.
Manta can construct lineage from report definitions, custom SQL code, and ETL workflows. Lineage is analyzed based on actual code, and both direct and indirect flows can be visualized on the map. Data paths between files, report fields, database tables, and individual columns are displayed to users in an intuitive user interface, enabling teams to understand data flows in context.
Learn more
Apache Atlas
Atlas serves as a versatile and scalable suite of essential governance services, empowering organizations to efficiently comply with regulations within the Hadoop ecosystem while facilitating integration across the enterprise's data landscape. Apache Atlas offers comprehensive metadata management and governance tools that assist businesses in creating a detailed catalog of their data assets, effectively classifying and managing these assets, and fostering collaboration among data scientists, analysts, and governance teams. It comes equipped with pre-defined types for a variety of both Hadoop and non-Hadoop metadata, alongside the capability to establish new metadata types tailored to specific needs. These types can incorporate primitive attributes, complex attributes, and object references, and they can also inherit characteristics from other types. Entities, which are instances of these types, encapsulate the specifics of metadata objects and their interconnections. Additionally, REST APIs enable seamless interaction with types and instances, promoting easier integration and enhancing overall functionality. This robust framework not only streamlines governance processes but also supports a culture of data-driven collaboration across the organization.
Learn more