Apache Airflow Integrations in 2024

Stonebranch

See Software

Learn More

Stonebranch’s Universal Automation Center (UAC) is a Hybrid IT automation platform, offering real-time management of tasks and processes within hybrid IT settings, encompassing both on-premises and cloud environments. As a versatile software platform, UAC streamlines and coordinates your IT and business operations, while ensuring the secure administration of file transfers and centralizing IT job scheduling and automation solutions. Powered by event-driven automation technology, UAC empowers you to achieve instantaneous automation throughout your entire hybrid IT landscape. Enjoy real-time hybrid IT automation for diverse environments, including cloud, mainframe, distributed, and hybrid setups. Experience the convenience of Managed File Transfers (MFT) automation, effortlessly managing and orchestrating file transfers between mainframes and systems, seamlessly connecting with AWS or Azure cloud services.

Netdata

Netdata, Inc.

Free

18 Ratings

See Software

Monitor your servers, containers, and applications, in high-resolution and in real-time. Netdata collects metrics per second and presents them in beautiful low-latency dashboards. It is designed to run on all of your physical and virtual servers, cloud deployments, Kubernetes clusters, and edge/IoT devices, to monitor your systems, containers, and applications. It scales nicely from just a single server to thousands of servers, even in complex multi/mixed/hybrid cloud environments, and given enough disk space it can keep your metrics for years. KEY FEATURES: Collects metrics from 800+ integrations Real-Time, Low-Latency, High-Resolution Unsupervised Anomaly Detection Powerful Visualization Out of box Alerts systemd Journal Logs Explorer Low Maintenance Open and Extensible Troubleshoot slowdowns and anomalies in your infrastructure with thousands of per-second metrics, meaningful visualisations, and insightful health alarms with zero configuration. Netdata is different. Real-Time data collection and visualization. Infinite scalability baked into its design. Flexible and extremely modular. Immediately available for troubleshooting, requiring zero prior knowledge and preparation.

Microsoft Purview

Microsoft

$0.342

See Software

Microsoft Purview is a unified data governance service that helps you manage and govern your on-premises, multicloud, and software-as-a-service (SaaS) data. You can easily create a comprehensive, up-to date map of your data landscape using automated data discovery, sensitive classification, and end to end data lineage. Data consumers can find trustworthy, valuable data. Automated data discovery, lineage identification and data classification across on and off-premises, multicloud, as well as SaaS sources. For more effective governance, a unified map of all your data assets and their relationships. Semantic search allows data discovery using technical or business terms. Get insight into the movement and location of sensitive data in your hybrid data landscape. Purview Data Map will help you establish the foundation for data usage and governance. Automate and manage metadata from mixed sources. Use built-in and customized classifiers to classify data and Microsoft Information Protection sensitive labels to protect it.

IRI FieldShield

IRI, The CoSort Company

Varies by component/scope

See Software

IRI FieldShield® is a powerful and affordable data discovery and de-identification package for masking PII, PHI, PAN and other sensitive data in structured and semi-structured sources. Front-ended in a free Eclipse-based design environment, FieldShield jobs classify, profile, scan, and de-identify data at rest (static masking). Use the FieldShield SDK or proxy-based application to secure data in motion (dynamic data masking). The usual method for masking RDB and other flat files (CSV, Excel, LDIF, COBOL, etc.) is to classify it centrally, search for it globally, and automatically mask it in a consistent way using encryption, pseudonymization, redaction or other functions to preserve realism and referential integrity in production or test environments. Use FieldShield to make test data, nullify breaches, or comply with GDPR. HIPAA. PCI, PDPA, PCI-DSS and other laws. Audit through machine- and human-readable search reports, job logs and re-ID risks scores. Optionally mask data when you map it; FieldShield functions can also run in IRI Voracity ETL and federation, migration, replication, subsetting, and analytic jobs. To mask DB clones run FieldShield in Windocks, Actifio or Commvault. Call it from CI/CD pipelines and apps.

Ray

Anyscale

Free

See Software

You can develop on your laptop, then scale the same Python code elastically across hundreds or GPUs on any cloud. Ray converts existing Python concepts into the distributed setting, so any serial application can be easily parallelized with little code changes. With a strong ecosystem distributed libraries, scale compute-heavy machine learning workloads such as model serving, deep learning, and hyperparameter tuning. Scale existing workloads (e.g. Pytorch on Ray is easy to scale by using integrations. Ray Tune and Ray Serve native Ray libraries make it easier to scale the most complex machine learning workloads like hyperparameter tuning, deep learning models training, reinforcement learning, and training deep learning models. In just 10 lines of code, you can get started with distributed hyperparameter tune. Creating distributed apps is hard. Ray is an expert in distributed execution.

Dagster Cloud

Dagster Labs

$0

See Software

Dagster is the cloud-native open-source orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. It is the platform of choice data teams responsible for the development, production, and observation of data assets. With Dagster, you can focus on running tasks, or you can identify the key assets you need to create using a declarative approach. Embrace CI/CD best practices from the get-go: build reusable components, spot data quality issues, and flag bugs early.

DQOps

$499 per month

See Software

DQOps is a data quality monitoring platform for data teams that helps detect and address quality issues before they impact your business. Track data quality KPIs on data quality dashboards and reach a 100% data quality score. DQOps helps monitor data warehouses and data lakes on the most popular data platforms. DQOps offers a built-in list of predefined data quality checks verifying key data quality dimensions. The extensibility of the platform allows you to modify existing checks or add custom, business-specific checks as needed. The DQOps platform easily integrates with DevOps environments and allows data quality definitions to be stored in a source repository along with the data pipeline code.

Decube

See Software

Decube is a comprehensive data management platform designed to help organizations manage their data observability, data catalog, and data governance needs. Our platform is designed to provide accurate, reliable, and timely data, enabling organizations to make better-informed decisions. Our data observability tools provide end-to-end visibility into data, making it easier for organizations to track data origin and flow across different systems and departments. With our real-time monitoring capabilities, organizations can detect data incidents quickly and reduce their impact on business operations. The data catalog component of our platform provides a centralized repository for all data assets, making it easier for organizations to manage and govern data usage and access. With our data classification tools, organizations can identify and manage sensitive data more effectively, ensuring compliance with data privacy regulations and policies. The data governance component of our platform provides robust access controls, enabling organizations to manage data access and usage effectively. Our tools also allow organizations to generate audit reports, track user activity, and demonstrate compliance with regulatory requirements.

intermix.io

Intermix.io

$295 per month

See Software

Data warehouse metadata and tools that connect to it can be captured. You can track the workloads that matter to you and retroactively analyze user engagement, cost, performance, and performance of data products. You have complete visibility into your data platform. This includes who is accessing your data and how it's being used. We'll be sharing the secrets of how data teams create and deliver data products in these interviews. We also discuss tech stacks, best practices, and other lessons learned. Intermix.io provides end-to-end visibility through an easy-to use SaaS dashboard. You can collaborate with your entire team to create custom reports and get all the information you need to understand your data platform, cloud data warehouse, and the tools that connect it. Intermix.io, a SaaS product, collects metadata from your cloud data warehouse without any coding. We don't need access to any data you have copied into your data warehouse.

CrateDB

See Software

The enterprise database for time series, documents, and vectors. Store any type data and combine the simplicity and scalability NoSQL with SQL. CrateDB is a distributed database that runs queries in milliseconds regardless of the complexity, volume, and velocity.

IRI Voracity

IRI, The CoSort Company

See Software

IRI Voracity is an end-to-end software platform for fast, affordable, and ergonomic data lifecycle management. Voracity speeds, consolidates, and often combines the key activities of data discovery, integration, migration, governance, and analytics in a single pane of glass, built on Eclipse™. Through its revolutionary convergence of capability and its wide range of job design and runtime options, Voracity bends the multi-tool cost, difficulty, and risk curves away from megavendor ETL packages, disjointed Apache projects, and specialized software. Voracity uniquely delivers the ability to perform data: * profiling and classification * searching and risk-scoring * integration and federation * migration and replication * cleansing and enrichment * validation and unification * masking and encryption * reporting and wrangling * subsetting and testing Voracity runs on-premise, or in the cloud, on physical or virtual machines, and its runtimes can also be containerized or called from real-time applications or batch jobs.

Prophecy

$299 per month

See Software

Prophecy allows you to connect with many more people, including data analysts and visual ETL developers. To create your pipelines, all you have to do is click and type a few SQL expressions. You will be creating high-quality, readable code for Spark or Airflow by using the Low-Code Designer. This code is then committed to your Git. Prophecy provides a gem builder that allows you to quickly create and roll out your own Frameworks. Data Quality, Encryption and new Sources are just a few examples. Prophecy offers best practices and infrastructure as managed service - making your life and operations easier! Prophecy makes it easy to create workflows that are high-performance and scale out using the cloud.

BentoML

Free

See Software

Your ML model can be served in minutes in any cloud. Unified model packaging format that allows online and offline delivery on any platform. Our micro-batching technology allows for 100x more throughput than a regular flask-based server model server. High-quality prediction services that can speak the DevOps language, and seamlessly integrate with common infrastructure tools. Unified format for deployment. High-performance model serving. Best practices in DevOps are incorporated. The service uses the TensorFlow framework and the BERT model to predict the sentiment of movie reviews. DevOps-free BentoML workflow. This includes deployment automation, prediction service registry, and endpoint monitoring. All this is done automatically for your team. This is a solid foundation for serious ML workloads in production. Keep your team's models, deployments and changes visible. You can also control access via SSO and RBAC, client authentication and auditing logs.

Ascend

$0.98 per DFC

See Software

Ascend provides data teams with a unified platform that allows them to ingest and transform their data and create and manage their analytics engineering and data engineering workloads. Ascend is supported by DataAware intelligence. Ascend works in the background to ensure data integrity and optimize data workloads, which can reduce maintenance time by up to 90%. Ascend's multilingual flex-code interface allows you to use SQL, Java, Scala, and Python interchangeably. Quickly view data lineage and data profiles, job logs, system health, system health, and other important workload metrics at a glance. Ascend provides native connections to a growing number of data sources using our Flex-Code data connectors.

ZenML

Free

See Software

Simplify your MLOps pipelines. ZenML allows you to manage, deploy and scale any infrastructure. ZenML is open-source and free. Two simple commands will show you the magic. ZenML can be set up in minutes and you can use all your existing tools. ZenML interfaces ensure your tools work seamlessly together. Scale up your MLOps stack gradually by changing components when your training or deployment needs change. Keep up to date with the latest developments in the MLOps industry and integrate them easily. Define simple, clear ML workflows and save time by avoiding boilerplate code or infrastructure tooling. Write portable ML codes and switch from experiments to production in seconds. ZenML's plug and play integrations allow you to manage all your favorite MLOps software in one place. Prevent vendor lock-in by writing extensible, tooling-agnostic, and infrastructure-agnostic code.

Kedro

Free

See Software

Kedro provides the foundation for clean, data-driven code. It applies concepts from software engineering to machine-learning projects. Kedro projects provide scaffolding for complex machine-learning and data pipelines. Spend less time on "plumbing", and instead focus on solving new problems. Kedro standardizes the way data science code is written and ensures that teams can collaborate easily to solve problems. You can make a seamless transition between development and production by using exploratory code. This code can be converted into reproducible, maintainable and modular experiments. A series of lightweight connectors are used to save and upload data across a variety of file formats and file systems.

Secoda

$50 per user per month

See Software

Secoda AI can help you generate documentation and queries from your metadata. This will save your team hundreds of hours of tedious work. Secoda AI will also generate documentation and queries based on your metadata. This will save your team hundreds of tedious hours and redundant data requests. Search across all columns, dashboards and metrics, as well as tables, dashboards and tables. AI-powered searches allow you to ask any question and receive a contextual response quickly. Answer questions. Our API allows you to integrate data discovery into your workflow, without disrupting the flow. Perform bulk updates, tag PII, manage tech debt and more. Eliminate manual errors and have complete trust in your knowledge base.

Chalk

Free

See Software

Data engineering workflows that are powerful, but without the headaches of infrastructure. Simple, reusable Python is used to define complex streaming, scheduling and data backfill pipelines. Fetch all your data in real time, no matter how complicated. Deep learning and LLMs can be used to make decisions along with structured business data. Don't pay vendors for data that you won't use. Instead, query data right before online predictions. Experiment with Jupyter and then deploy into production. Create new data workflows and prevent train-serve skew in milliseconds. Instantly monitor your data workflows and track usage and data quality. You can see everything you have computed, and the data will replay any information. Integrate with your existing tools and deploy it to your own infrastructure. Custom hold times and withdrawal limits can be set.

Coursebox

$13 per month

See Software

This AI Course creator will help you create a course structure within seconds, and then quickly build your course content. All you have to do is edit and add missing pieces. You can publish your course as public or private, sell it or export it to your LMS. Research shows that mobile apps are preferred by your community because they spend more time on them, are more engaged and have a higher level of engagement. Create an interactive course with videos, quizzes, and more. Motivate your learners with exciting online learning. Coursebox LMS allows you to quickly create courses within a learning management system with native mobile apps. You can choose custom hosting and a domain. Coursebox is a great solution for organizations and individuals who want to organize and manage their users. You can easily segment users into different learning experiences.

Yandex Data Proc

Yandex

$0.19 per hour

See Software

Yandex Data Proc creates and configures Spark clusters, Hadoop clusters, and other components based on the size, node capacity and services you select. Zeppelin Notebooks and other web applications can be used to collaborate via a UI Proxy. You have full control over your cluster, with root permissions on each VM. Install your own libraries and applications on clusters running without having to restart. Yandex Data Proc automatically increases or decreases computing resources for compute subclusters according to CPU usage indicators. Data Proc enables you to create managed clusters of Hive, which can reduce failures and losses due to metadata not being available. Save time when building ETL pipelines, pipelines for developing and training models, and describing other iterative processes. Apache Airflow already includes the Data Proc operator.

Mode

Mode Analytics

See Software

Learn how users interact with your product and identify opportunities to help you make product decisions. Mode allows one Stitch analyst to perform the work of a full-time data team with speed, flexibility, collaboration. Create dashboards for annual revenue and then use chart visualizations quickly to identify anomalies. Share analysis with teams to create polished reports that are investor-ready. Connect your entire tech stack with Mode to identify upstream issues and improve performance. With webhooks and APIs, you can speed up team workflows. Learn how users interact with your product and identify areas for improvement. Use marketing and product data to identify weak points in your funnel, improve landing page performance, and prevent churn from happening.

Apache Druid

Druid

See Software

Apache Druid, an open-source distributed data store, is Apache Druid. Druid's core design blends ideas from data warehouses and timeseries databases to create a high-performance real-time analytics database that can be used for a wide range of purposes. Druid combines key characteristics from each of these systems into its ingestion, storage format, querying, and core architecture. Druid compresses and stores each column separately, so it only needs to read the ones that are needed for a specific query. This allows for fast scans, ranking, groupBys, and groupBys. Druid creates indexes that are inverted for string values to allow for fast search and filter. Connectors out-of-the box for Apache Kafka and HDFS, AWS S3, stream processors, and many more. Druid intelligently divides data based upon time. Time-based queries are much faster than traditional databases. Druid automatically balances servers as you add or remove servers. Fault-tolerant architecture allows for server failures to be avoided.

AT&T Alien Labs Open Threat Exchange

AT&T Cybersecurity

See Software

The largest open threat intelligence network in the world that facilitates collaborative defense using actionable, community-powered threats data. The security industry's threat sharing is still ad-hoc and informal. It is fraught with frustrations, blind spots, and pitfalls. Our vision is that companies and government agencies can quickly gather and share information about cyberattacks and threats, as well as current breaches, as accurate, timely, and complete information as quickly as possible. This will allow us to avoid major breaches and minimize the damage caused by an attack. This vision is realized by the Alien Labs Open Threat Exchange (OTX) - which provides an open, transparent threat intelligence community. OTX allows open access to a global network of security professionals and threat researchers. There are now more than 100,000 participants from 140 countries who contribute over 19,000,000 threat indicators each day. It provides community-generated threat information, facilitates collaborative research, and automates the updating of your security infrastructure.

Beats

Elastic

$16 per month

See Software

Beats is an open platform that allows single-purpose data shippers to use. They can send data from thousands or hundreds of machines and systems to Logstash and Elasticsearch. Beats are open-source data shippers that you can install on your servers to send operational information to Elasticsearch. Elastic offers Beats to capture data and event logs. Beats can send data directly via Elasticsearch or Logstash. There you can further process the data and enhance it before visualizing it in Kibana. You can quickly get up and running with infrastructure metrics monitoring or centralized log analytics. You can try the Metrics and Logs apps in Kibana. For more information, see Analyze metrics or Monitor logs. Filebeat allows you to easily forward and centralize logs from any source, including security devices, cloud containers, hosts, and OT.

Datakin

$2 per month

See Software

You can instantly see the order in your complex data world and know exactly where to find answers. Datakin automatically tracks data lineage and displays your entire data ecosystem as a rich visual graph. It clearly shows the upstream and downstream relationships of each dataset. The Duration tab summarizes the job's performance and its upstream dependencies in a Gantt-style graph. This makes it easy to identify bottlenecks. The Compare tab allows you to see how your jobs and data have changed over time. Sometimes jobs that run well can produce poor output. The Quality tab shows you the most important data quality metrics and how they change over time. This makes anomalies easily visible. Datakin allows you to quickly identify the root cause of problems and prevent them from happening again.

Meltano

See Software

Meltano offers the most flexibility in deployment options. You control your data stack from beginning to end. Since years, a growing number of connectors has been in production. You can run workflows in isolated environments and execute end-to-end testing. You can also version control everything. Open source gives you the power and flexibility to create your ideal data stack. You can easily define your entire project in code and work confidently with your team. The Meltano CLI allows you to quickly create your project and make it easy to replicate data. Meltano was designed to be the most efficient way to run dbt and manage your transformations. Your entire data stack can be defined in your project. This makes it easy to deploy it to production.

Google Cloud Composer

Google

$0.074 per vCPU hour

See Software

Cloud Composer's managed nature with Apache Airflow compatibility allow you to focus on authoring and scheduling your workflows, rather than provisioning resources. Google Cloud products include BigQuery, Dataflow and Dataproc. They also offer integration with Cloud Storage, Cloud Storage, Pub/Sub and AI Platform. This allows users to fully orchestrate their pipeline. You can schedule, author, and monitor all aspects of your workflows using one orchestration tool. This is true regardless of whether your pipeline lives on-premises or in multiple clouds. You can make it easier to move to the cloud, or maintain a hybrid environment with workflows that cross over between the public cloud and on-premises. To create a unified environment, you can create workflows that connect data processing and services across cloud platforms.

Amazon MWAA

Amazon

$0.49 per hour

See Software

Amazon Managed Workflows (MWAA), a managed orchestration service that allows Apache Airflow to create and manage data pipelines in the cloud at scale, is called Amazon Managed Workflows. Apache Airflow is an open source tool that allows you to programmatically create, schedule, and monitor a series of processes and tasks, also known as "workflows". Managed Workflows lets you use Airflow and Python to create workflows and not have to manage the infrastructure for scalability availability and security. Managed Workflows automatically scales the workflow execution to meet your requirements. It is also integrated with AWS security services, which allows you to have fast and secure access.

rudol

$0

See Software

You can unify your data catalog, reduce communication overhead, and enable quality control for any employee of your company without having to deploy or install anything. Rudol is a data platform that helps companies understand all data sources, regardless of where they are from. It reduces communication in reporting processes and urgencies and allows data quality diagnosis and issue prevention for all company members. Each organization can add data sources from rudol's growing list of providers and BI tools that have a standardized structure. This includes MySQL, PostgreSQL. Redshift. Snowflake. Kafka. S3*. BigQuery*. MongoDB*. Tableau*. PowerBI*. Looker* (*in development). No matter where the data comes from, anyone can easily understand where it is stored, read its documentation, and contact data owners via our integrations.

Telmai

See Software

A low-code no-code approach to data quality. SaaS offers flexibility, affordability, ease-of-integration, and efficient support. High standards for encryption, identity management and role-based access control. Data governance and compliance standards. Advanced ML models for detecting row-value data anomalies. The models will adapt to the business and data requirements of users. You can add any number of data sources, records, or attributes. For unpredictable volume spikes, well-equipped. Support streaming and batch processing. Data is continuously monitored to provide real-time notification, with no impact on pipeline performance. Easy boarding, integration, investigation. Telmai is a platform that allows Data Teams to detect and investigate anomalies in real-time. No-code on-boarding. Connect to your data source, and select alerting channels. Telmai will automatically learn data and alert you if there are unexpected drifts.

Determined AI

See Software

Distributed training is possible without changing the model code. Determined takes care of provisioning, networking, data load, and fault tolerance. Our open-source deep-learning platform allows you to train your models in minutes and hours, not days or weeks. You can avoid tedious tasks such as manual hyperparameter tweaking, re-running failed jobs, or worrying about hardware resources. Our distributed training implementation is more efficient than the industry standard. It requires no code changes and is fully integrated into our state-ofthe-art platform. With its built-in experiment tracker and visualization, Determined records metrics and makes your ML project reproducible. It also allows your team to work together more easily. Instead of worrying about infrastructure and errors, your researchers can focus on their domain and build upon the progress made by their team.

Foundational

See Software

Identify code issues and optimize code in real-time. Prevent data incidents before deployment. Manage code changes that impact data from the operational database all the way to the dashboard. Data lineage is automated, allowing for analysis of every dependency, from the operational database to the reporting layer. Foundational automates the enforcement of data contracts by analyzing each repository, from upstream to downstream, directly from the source code. Use Foundational to identify and prevent code and data issues. Create controls and guardrails. Foundational can be configured in minutes without requiring any code changes.

Databand

IBM

See Software

Monitor your data health, and monitor your pipeline performance. Get unified visibility for all pipelines that use cloud-native tools such as Apache Spark, Snowflake and BigQuery. A platform for Data Engineers that provides observability. Data engineering is becoming more complex as business stakeholders demand it. Databand can help you catch-up. More pipelines, more complexity. Data engineers are working with more complex infrastructure and pushing for faster release speeds. It is more difficult to understand why a process failed, why it is running late, and how changes impact the quality of data outputs. Data consumers are frustrated by inconsistent results, model performance, delays in data delivery, and other issues. A lack of transparency and trust in data delivery can lead to confusion about the exact source of the data. Pipeline logs, data quality metrics, and errors are all captured and stored in separate, isolated systems.

Soda

See Software

Soda helps you manage your data operations by identifying issues and alerting the right people. No data, or people, are ever left behind with automated and self-serve monitoring capabilities. You can quickly get ahead of data issues by providing full observability across all your data workloads. Data teams can discover data issues that automation won't. Self-service capabilities provide the wide coverage data monitoring requires. Alert the right people at just the right time to help business teams diagnose, prioritize, fix, and resolve data problems. Your data will never leave your private cloud with Soda. Soda monitors your data at source and stores only metadata in your cloud.

MaxPatrol

Positive Technologies

See Software

MaxPatrol is designed to manage vulnerabilities and compliance in corporate information systems. MaxPatrol's core features include penetration testing, system checks, compliance monitoring, and system checks. These mechanisms provide an objective view of IT security infrastructure and granular insight at department, host and application levels. This information is essential to quickly identify vulnerabilities and prevent attacks. MaxPatrol makes it easy to keep a current inventory of IT assets. You can view information about your network resources (network addresses and OS), identify hardware and software that are in use, and track the status of updates. It can also monitor changes to your IT infrastructure. MaxPatrol does not blink when new hosts and accounts are created, or when hardware and software are upgraded. Information about the security of infrastructure is quietly collected and processed.

lakeFS

Treeverse

See Software

lakeFS allows you to manage your data lake in the same way as your code. Parallel pipelines can be used for experimentation as well as CI/CD of your data. This simplifies the lives of data scientists, engineers, and analysts who work in data transformation. lakeFS is an open-source platform that provides resilience and manageability for object-storage-based data lakes. lakeFS allows you to build repeatable, atomic, and versioned data lakes operations. This includes complex ETL jobs as well as data science and analysis. lakeFS is compatible with AWS S3, Azure Blob Storage, and Google Cloud Storage (GCS). It is API compatible to S3 and seamlessly integrates with all modern data frameworks like Spark, Hive AWS Athena, Presto, AWS Athena, Presto, and others. lakeFS is a Git-like branching/committing model that can scale to exabytes by using S3, GCS, and Azure Blob storage.

Datafold

See Software

You can prevent data outages by identifying data quality issues and fixing them before they reach production. In less than a day, you can increase your test coverage for data pipelines from 0 to 100%. Automatic regression testing across billions upon billions of rows allows you to determine the impact of every code change. Automate change management, improve data literacy and compliance, and reduce incident response times. Don't be taken by surprise by data incidents. Automated anomaly detection allows you to be the first to know about them. Datafold's ML model, which can be easily adjusted by Datafold, adapts to seasonality or trend patterns in your data to create dynamic thresholds. You can save hours trying to understand data. The Data Catalog makes it easy to search for relevant data, fields, or explore distributions with an intuitive UI. Interactive full-text search, data profiling and consolidation of metadata all in one place.

Great Expectations

See Software

Great Expectations is a standard for data quality that is shared and openly accessible. It assists data teams in eliminating pipeline debt through data testing, documentation and profiling. We recommend that you deploy within a virtual environment. You may want to read the Supporting section if you are not familiar with pip and virtual environments, notebooks or git. Many companies have high expectations and are doing amazing things these days. Take a look at some case studies of companies we have worked with to see how they use great expectations in their data stack. Great expectations cloud is a fully managed SaaS service. We are looking for private alpha members to join our great expectations cloud, a fully managed SaaS service. Alpha members have first access to new features, and can contribute to the roadmap.

Metaphor

Metaphor Data

See Software

Automatically index warehouses, lakes and dashboards. Metaphor allows you to show your most trusted data to your users when combined with lineage, utilization, and other social popularity indicators. Open 360-degree views of your data are available to all employees. This allows for data conversations and data sharing. Meet your customers at their location - share artifacts and documentation natively via Slack. Tag your conversations in Slack and associate them with data. Collaboration across silos is possible through the organic discovery and use of key terms and patterns. You can easily discover data from the entire stack and write technical details. This wiki is easy to use by non-technical users. Slack allows you to support your users and the catalog can be used as a Data Enablement Tool to quickly onboard users.

Sifflet

See Software

Automate the automatic coverage of thousands of tables using ML-based anomaly detection. 50+ custom metrics are also available. Monitoring of metadata and data. Comprehensive mapping of all dependencies between assets from ingestion to reporting. Collaboration between data consumers and data engineers is enhanced and productivity is increased. Sifflet integrates seamlessly with your data sources and preferred tools. It can run on AWS and Google Cloud Platform as well as Microsoft Azure. Keep an eye on your data's health and notify the team if quality criteria are not being met. In a matter of seconds, you can set up the basic coverage of all your tables. You can set the frequency, criticality, and even custom notifications. Use ML-based rules for any anomaly in your data. There is no need to create a new configuration. Each rule is unique because it learns from historical data as well as user feedback. A library of 50+ templates can be used to complement the automated rules.

Acryl Data

See Software

No more data catalog ghost cities. Acryl Cloud accelerates time-to-value for data producers through Shift Left practices and an intuitive user interface for data consumers. Continuously detect data-quality incidents in real time, automate anomaly detecting to prevent breakdowns, and drive quick resolution when they occur. Acryl Cloud supports both pull-based and push-based metadata ingestion to ensure information is reliable, current, and definitive. Data should be operational. Automated Metadata Tests can be used to uncover new insights and areas for improvement. They go beyond simple visibility. Reduce confusion and speed up resolution with clear asset ownership and automatic detection. Streamlined alerts and time-based traceability are also available.

Pantomath

See Software

Data-driven organizations are constantly striving to become more data-driven. They build dashboards, analytics and data pipelines throughout the modern data stack. Unfortunately, data reliability issues are a major problem for most organizations, leading to poor decisions and a lack of trust in the data as an organisation, which directly impacts their bottom line. Resolving complex issues is a time-consuming and manual process that involves multiple teams, all of whom rely on tribal knowledge. They manually reverse-engineer complex data pipelines across various platforms to identify the root-cause and to understand the impact. Pantomath, a data pipeline traceability and observability platform, automates data operations. It continuously monitors datasets across the enterprise data ecosystem, providing context to complex data pipes by creating automated cross platform technical pipeline lineage.

Apache Airflow Integrations

The Apache Software Foundation

What Integrates with Apache Airflow?

Stonebranch

Netdata

Microsoft Purview

IRI FieldShield

Ray

Dagster Cloud

DQOps

Decube

intermix.io

CrateDB

IRI Voracity

Prophecy

BentoML

Ascend

ZenML

Kedro

Secoda

Chalk

Coursebox

Yandex Data Proc

Mode

Apache Druid

AT&T Alien Labs Open Threat Exchange

Beats

Datakin

Meltano

Google Cloud Composer

Amazon MWAA

rudol

Telmai

Determined AI

Foundational

Databand

Soda

MaxPatrol

lakeFS

Datafold

Great Expectations

Metaphor

Sifflet

Acryl Data

Pantomath

Relevant Categories

Category Integrations