Average Ratings 0 Ratings

Total
ease
features
design
support

No User Reviews. Be the first to provide a review:

Write a Review

Average Ratings 0 Ratings

Total
ease
features
design
support

No User Reviews. Be the first to provide a review:

Write a Review

Description

Deequ is an innovative library that extends Apache Spark to create "unit tests for data," aiming to assess the quality of extensive datasets. We welcome any feedback and contributions from users. The library requires Java 8 for operation. It is important to note that Deequ version 2.x is compatible exclusively with Spark 3.1, and the two are interdependent. For those using earlier versions of Spark, the Deequ 1.x version should be utilized, which is maintained in the legacy-spark-3.0 branch. Additionally, we offer legacy releases that work with Apache Spark versions ranging from 2.2.x to 3.0.x. The Spark releases 2.2.x and 2.3.x are built on Scala 2.11, while the 2.4.x, 3.0.x, and 3.1.x releases require Scala 2.12. The primary goal of Deequ is to perform "unit-testing" on data to identify potential issues early on, ensuring that errors are caught before the data reaches consuming systems or machine learning models. In the sections that follow, we will provide a simple example to demonstrate the fundamental functionalities of our library, highlighting its ease of use and effectiveness in maintaining data integrity.

Description

The data refinery tool, which can be accessed through IBM Watson® Studio and Watson™ Knowledge Catalog, significantly reduces the time spent on data preparation by swiftly converting extensive volumes of raw data into high-quality, usable information suitable for analytics. Users can interactively discover, clean, and transform their data using more than 100 pre-built operations without needing any coding expertise. Gain insights into the quality and distribution of your data with a variety of integrated charts, graphs, and statistical tools. The tool automatically identifies data types and business classifications, ensuring accuracy and relevance. It also allows easy access to and exploration of data from diverse sources, whether on-premises or cloud-based. Data governance policies set by professionals are automatically enforced within the tool, providing an added layer of compliance. Users can schedule data flow executions for consistent results and easily monitor those results while receiving timely notifications. Furthermore, the solution enables seamless scaling through Apache Spark, allowing transformation recipes to be applied to complete datasets without the burden of managing Apache Spark clusters. This feature enhances efficiency and effectiveness in data processing, making it a valuable asset for organizations looking to optimize their data analytics capabilities.

API Access

Has API

API Access

Has API

Screenshots View All

Screenshots View All

Integrations

Apache Spark
IBM Cloud
IBM Cloud Pak for Watson AIOps
IBM Watson
IBM Watson Discovery
IBM Watson Language Translator
IBM Watson Recruitment
IBM watsonx Assistant

Integrations

Apache Spark
IBM Cloud
IBM Cloud Pak for Watson AIOps
IBM Watson
IBM Watson Discovery
IBM Watson Language Translator
IBM Watson Recruitment
IBM watsonx Assistant

Pricing Details

No price information available.
Free Trial
Free Version

Pricing Details

No price information available.
Free Trial
Free Version

Deployment

Web-Based
On-Premises
iPhone App
iPad App
Android App
Windows
Mac
Linux
Chromebook

Deployment

Web-Based
On-Premises
iPhone App
iPad App
Android App
Windows
Mac
Linux
Chromebook

Customer Support

Business Hours
Live Rep (24/7)
Online Support

Customer Support

Business Hours
Live Rep (24/7)
Online Support

Types of Training

Training Docs
Webinars
Live Training (Online)
In Person

Types of Training

Training Docs
Webinars
Live Training (Online)
In Person

Vendor Details

Company Name

Deequ

Website

github.com/awslabs/deequ

Vendor Details

Company Name

IBM

Founded

1911

Country

United States

Website

www.ibm.com/products/data-refinery

Product Features

Product Features

Data Preparation

Collaboration Tools
Data Access
Data Blending
Data Cleansing
Data Governance
Data Mashup
Data Modeling
Data Transformation
Machine Learning
Visual User Interface

Alternatives

Alternatives

Spark Streaming Reviews

Spark Streaming

Apache Software Foundation
Kylo Reviews

Kylo

Teradata
MLlib Reviews

MLlib

Apache Software Foundation
MLlib Reviews

MLlib

Apache Software Foundation
Apache Spark Reviews

Apache Spark

Apache Software Foundation
Amazon EMR Reviews

Amazon EMR

Amazon
Apache Mahout Reviews

Apache Mahout

Apache Software Foundation
Apache Mahout Reviews

Apache Mahout

Apache Software Foundation