Best Data Governance Software for Apache Spark

Find and compare the best Data Governance software for Apache Spark in 2025

Use the comparison tool below to compare the top Data Governance software for Apache Spark on the market. You can filter results by user reviews, pricing, features, platform, region, support options, integrations, and more.

  • 1
    Protegrity Reviews
    Our platform allows businesses to use data, including its application in advanced analysis, machine learning and AI, to do great things without worrying that customers, employees or intellectual property are at risk. The Protegrity Data Protection Platform does more than just protect data. It also classifies and discovers data, while protecting it. It is impossible to protect data you don't already know about. Our platform first categorizes data, allowing users the ability to classify the type of data that is most commonly in the public domain. Once those classifications are established, the platform uses machine learning algorithms to find that type of data. The platform uses classification and discovery to find the data that must be protected. The platform protects data behind many operational systems that are essential to business operations. It also provides privacy options such as tokenizing, encryption, and privacy methods.
  • 2
    PHEMI Health DataLab Reviews
    Unlike most data management systems, PHEMI Health DataLab is built with Privacy-by-Design principles, not as an add-on. This means privacy and data governance are built-in from the ground up, providing you with distinct advantages: Lets analysts work with data without breaching privacy guidelines Includes a comprehensive, extensible library of de-identification algorithms to hide, mask, truncate, group, and anonymize data. Creates dataset-specific or system-wide pseudonyms enabling linking and sharing of data without risking data leakage. Collects audit logs concerning not only what changes were made to the PHEMI system, but also data access patterns. Automatically generates human and machine-readable de- identification reports to meet your enterprise governance risk and compliance guidelines. Rather than a policy per data access point, PHEMI gives you the advantage of one central policy for all access patterns, whether Spark, ODBC, REST, export, and more
  • 3
    DataHub Reviews
    DataHub is a free and open-source metadata platform that streamlines data discovery, observability and governance across diverse data ecologies. It allows organizations to discover trustworthy data with experiences tailored to each user and eliminates breaking updates with detailed cross-platform, column-level lineage. DataHub gives you a complete view of your data, including its business, operational and technical context. The platform provides automated data quality checks, AI-driven anomaly identification and alerts teams when problems arise. It also centralizes incident tracking. DataHub's detailed ownership, documentation, and lineage information allows for quick issue resolution. It automates governance by classifying assets in real-time, reducing manual work with GenAI documentation, AI classification, and smart propagation. DataHub’s extensible architecture supports more than 70 native integrations.
  • 4
    Astro Reviews
    Astronomer is the driving force behind Apache Airflow, the de facto standard for expressing data flows as code. Airflow is downloaded more than 4 million times each month and is used by hundreds of thousands of teams around the world. For data teams looking to increase the availability of trusted data, Astronomer provides Astro, the modern data orchestration platform, powered by Airflow. Astro enables data engineers, data scientists, and data analysts to build, run, and observe pipelines-as-code. Founded in 2018, Astronomer is a global remote-first company with hubs in Cincinnati, New York, San Francisco, and San Jose. Customers in more than 35 countries trust Astronomer as their partner for data orchestration.
  • 5
    Databricks Data Intelligence Platform Reviews
    The Databricks Data Intelligence Platform enables your entire organization to utilize data and AI. It is built on a lakehouse that provides an open, unified platform for all data and governance. It's powered by a Data Intelligence Engine, which understands the uniqueness in your data. Data and AI companies will win in every industry. Databricks can help you achieve your data and AI goals faster and easier. Databricks combines the benefits of a lakehouse with generative AI to power a Data Intelligence Engine which understands the unique semantics in your data. The Databricks Platform can then optimize performance and manage infrastructure according to the unique needs of your business. The Data Intelligence Engine speaks your organization's native language, making it easy to search for and discover new data. It is just like asking a colleague a question.
  • 6
    Alteryx Reviews
    Alteryx AI Platform will help you enter a new age of analytics. Empower your organization through automated data preparation, AI powered analytics, and accessible machine learning - all with embedded governance. Welcome to a future of data-driven decision making for every user, team and step. Empower your team with an intuitive, easy-to-use user experience that allows everyone to create analytical solutions that improve productivity and efficiency. Create an analytics culture using an end-toend cloud analytics platform. Data can be transformed into insights through self-service data preparation, machine learning and AI generated insights. Security standards and certifications are the best way to reduce risk and ensure that your data is protected. Open API standards allow you to connect with your data and applications.
  • 7
    Privacera Reviews
    Multi-cloud data security with a single pane of glass Industry's first SaaS access governance solution. Cloud is fragmented and data is scattered across different systems. Sensitive data is difficult to access and control due to limited visibility. Complex data onboarding hinders data scientist productivity. Data governance across services can be manual and fragmented. It can be time-consuming to securely move data to the cloud. Maximize visibility and assess the risk of sensitive data distributed across multiple cloud service providers. One system that enables you to manage multiple cloud services' data policies in a single place. Support RTBF, GDPR and other compliance requests across multiple cloud service providers. Securely move data to the cloud and enable Apache Ranger compliance policies. It is easier and quicker to transform sensitive data across multiple cloud databases and analytical platforms using one integrated system.
  • 8
    Okera Reviews
    Complexity is the enemy of security. Simplify and scale fine-grained data access control. Dynamically authorize and audit every query to comply with data security and privacy regulations. Okera integrates seamlessly into your infrastructure – in the cloud, on premise, and with cloud-native and legacy tools. With Okera, data users can use data responsibly, while protecting them from inappropriately accessing data that is confidential, personally identifiable, or regulated. Okera’s robust audit capabilities and data usage intelligence deliver the real-time and historical information that data security, compliance, and data delivery teams need to respond quickly to incidents, optimize processes, and analyze the performance of enterprise data initiatives.
  • 9
    Foundational Reviews
    Identify code issues and optimize code in real-time. Prevent data incidents before deployment. Manage code changes that impact data from the operational database all the way to the dashboard. Data lineage is automated, allowing for analysis of every dependency, from the operational database to the reporting layer. Foundational automates the enforcement of data contracts by analyzing each repository, from upstream to downstream, directly from the source code. Use Foundational to identify and prevent code and data issues. Create controls and guardrails. Foundational can be configured in minutes without requiring any code changes.
  • 10
    Kylo Reviews
    Kylo is an enterprise-ready open-source data lake management platform platform for self-service data ingestion and data preparation. It integrates metadata management, governance, security, and best practices based on Think Big's 150+ big-data implementation projects. Self-service data ingest that includes data validation, data cleansing, and automatic profiling. Visual sql and an interactive transformation through a simple user interface allow you to manage data. Search and explore data and metadata. View lineage and profile statistics. Monitor the health of feeds, services, and data lakes. Track SLAs and troubleshoot performance. To enable user self-service, create batch or streaming pipeline templates in Apache NiFi. While organizations can spend a lot of engineering effort to move data into Hadoop, they often struggle with data governance and data quality. Kylo simplifies data ingest and shifts it to data owners via a simple, guided UI.
  • 11
    Unity Catalog Reviews
    Databricks Unity Catalog, the only open and unified governance solution in the industry for data and AI that is built into the Databricks Data Intelligence Platform, is the only one of its kind. Unity Catalog allows organizations to seamlessly govern structured and unstructured information in any format. This includes machine learning models, notebooks and dashboards as well as files and files stored on any cloud or platform. Data scientists, analysts, engineers, and other professionals can securely discover, access and collaborate on trusted AI assets across platforms. This will boost productivity and unlock full potential of the Lakehouse environment. This open and unified approach to governance accelerates data and AI projects and simplifies regulatory compliance. Discover and classify structured and unstructured information in any format including machine learning models and notebooks.
  • Previous
  • You're on page 1
  • Next