Average Ratings 0 Ratings

Total
ease
features
design
support

No User Reviews. Be the first to provide a review:

Write a Review

Average Ratings 0 Ratings

Total
ease
features
design
support

No User Reviews. Be the first to provide a review:

Write a Review

Description

Deequ is an innovative library that extends Apache Spark to create "unit tests for data," aiming to assess the quality of extensive datasets. We welcome any feedback and contributions from users. The library requires Java 8 for operation. It is important to note that Deequ version 2.x is compatible exclusively with Spark 3.1, and the two are interdependent. For those using earlier versions of Spark, the Deequ 1.x version should be utilized, which is maintained in the legacy-spark-3.0 branch. Additionally, we offer legacy releases that work with Apache Spark versions ranging from 2.2.x to 3.0.x. The Spark releases 2.2.x and 2.3.x are built on Scala 2.11, while the 2.4.x, 3.0.x, and 3.1.x releases require Scala 2.12. The primary goal of Deequ is to perform "unit-testing" on data to identify potential issues early on, ensuring that errors are caught before the data reaches consuming systems or machine learning models. In the sections that follow, we will provide a simple example to demonstrate the fundamental functionalities of our library, highlighting its ease of use and effectiveness in maintaining data integrity.

Description

IBM Analytics Engine offers a unique architecture for Hadoop clusters by separating the compute and storage components. Rather than relying on a fixed cluster with nodes that serve both purposes, this engine enables users to utilize an object storage layer, such as IBM Cloud Object Storage, and to dynamically create computing clusters as needed. This decoupling enhances the flexibility, scalability, and ease of maintenance of big data analytics platforms. Built on a stack that complies with ODPi and equipped with cutting-edge data science tools, it integrates seamlessly with the larger Apache Hadoop and Apache Spark ecosystems. Users can define clusters tailored to their specific application needs, selecting the suitable software package, version, and cluster size. They have the option to utilize the clusters for as long as necessary and terminate them immediately after job completion. Additionally, users can configure these clusters with third-party analytics libraries and packages, and leverage IBM Cloud services, including machine learning, to deploy their workloads effectively. This approach allows for a more responsive and efficient handling of data processing tasks.

API Access

Has API

API Access

Has API

Screenshots View All

Screenshots View All

Integrations

Apache Spark
Acquia CDP
Galileo
Hadoop
IBM Cloud Object Storage
MINT
RadiantOne
Switch Automation
ZARUS

Integrations

Apache Spark
Acquia CDP
Galileo
Hadoop
IBM Cloud Object Storage
MINT
RadiantOne
Switch Automation
ZARUS

Pricing Details

No price information available.
Free Trial
Free Version

Pricing Details

$0.014 per hour
Free Trial
Free Version

Deployment

Web-Based
On-Premises
iPhone App
iPad App
Android App
Windows
Mac
Linux
Chromebook

Deployment

Web-Based
On-Premises
iPhone App
iPad App
Android App
Windows
Mac
Linux
Chromebook

Customer Support

Business Hours
Live Rep (24/7)
Online Support

Customer Support

Business Hours
Live Rep (24/7)
Online Support

Types of Training

Training Docs
Webinars
Live Training (Online)
In Person

Types of Training

Training Docs
Webinars
Live Training (Online)
In Person

Vendor Details

Company Name

Deequ

Website

github.com/awslabs/deequ

Vendor Details

Company Name

IBM

Founded

1911

Country

United States

Website

www.ibm.com/cloud/analytics-engine

Product Features

Product Features

Data Discovery

Contextual Search
Data Classification
Data Matching
False Positives Reduction
Self Service Data Preparation
Sensitive Data Identification
Visual Analytics

Data Visualization

Analytics
Content Management
Dashboard Creation
Filtered Views
OLAP
Relational Display
Simulation Models
Visual Discovery

Alternatives

Alternatives

Hadoop Reviews

Hadoop

Apache Software Foundation
Spark Streaming Reviews

Spark Streaming

Apache Software Foundation
E-MapReduce Reviews

E-MapReduce

Alibaba
MLlib Reviews

MLlib

Apache Software Foundation
Apache Sentry Reviews

Apache Sentry

Apache Software Foundation
Apache Spark Reviews

Apache Spark

Apache Software Foundation
Apache Mahout Reviews

Apache Mahout

Apache Software Foundation