Union Cloud Integrations in 2024

Google Cloud Platform

Google

Free ($300 in free credits)

See Software

Learn More

Google Cloud is an online service that lets you create everything from simple websites to complex apps for businesses of any size. Customers who are new to the system will receive $300 in credits for testing, deploying, and running workloads. Customers can use up to 25+ products free of charge. Use Google's core data analytics and machine learning. All enterprises can use it. It is secure and fully featured. Use big data to build better products and find answers faster. You can grow from prototypes to production and even to planet-scale without worrying about reliability, capacity or performance. Virtual machines with proven performance/price advantages, to a fully-managed app development platform. High performance, scalable, resilient object storage and databases. Google's private fibre network offers the latest software-defined networking solutions. Fully managed data warehousing and data exploration, Hadoop/Spark and messaging.

Google Cloud BigQuery

Google

$0.04 per slot hour

1,556 Ratings

See Software

Learn More

ANSI SQL allows you to analyze petabytes worth of data at lightning-fast speeds with no operational overhead. Analytics at scale with 26%-34% less three-year TCO than cloud-based data warehouse alternatives. You can unleash your insights with a trusted platform that is more secure and scales with you. Multi-cloud analytics solutions that allow you to gain insights from all types of data. You can query streaming data in real-time and get the most current information about all your business processes. Machine learning is built-in and allows you to predict business outcomes quickly without having to move data. With just a few clicks, you can securely access and share the analytical insights within your organization. Easy creation of stunning dashboards and reports using popular business intelligence tools right out of the box. BigQuery's strong security, governance, and reliability controls ensure high availability and a 99.9% uptime SLA. Encrypt your data by default and with customer-managed encryption keys

Amazon Web Services (AWS)

Amazon

14 Ratings

See Software

Learn More

AWS offers a wide range of services, including database storage, compute power, content delivery, and other functionality. This allows you to build complex applications with greater flexibility, scalability, and reliability. Amazon Web Services (AWS), the world's largest and most widely used cloud platform, offers over 175 fully featured services from more than 150 data centers worldwide. AWS is used by millions of customers, including the fastest-growing startups, large enterprises, and top government agencies, to reduce costs, be more agile, and innovate faster. AWS offers more services and features than any other cloud provider, including infrastructure technologies such as storage and databases, and emerging technologies such as machine learning, artificial intelligence, data lakes, analytics, and the Internet of Things. It is now easier, cheaper, and faster to move your existing apps to the cloud.

Kubernetes

Free

1 Rating

See Software

Kubernetes (K8s), an open-source software that automates deployment, scaling and management of containerized apps, is available as an open-source project. It organizes containers that make up an app into logical units, which makes it easy to manage and discover. Kubernetes is based on 15 years of Google's experience in running production workloads. It also incorporates best-of-breed practices and ideas from the community. Kubernetes is built on the same principles that allow Google to run billions upon billions of containers per week. It can scale without increasing your operations team. Kubernetes flexibility allows you to deliver applications consistently and efficiently, no matter how complex they are, whether you're testing locally or working in a global enterprise. Kubernetes is an open-source project that allows you to use hybrid, on-premises, and public cloud infrastructures. This allows you to move workloads where they are most important.

Apache Hive

Apache Software Foundation

1 Rating

See Software

Apache Hive™, a data warehouse software, facilitates the reading, writing and management of large datasets that are stored in distributed storage using SQL. Structure can be projected onto existing data. Hive provides a command line tool and a JDBC driver to allow users to connect to it. Apache Hive is an Apache Software Foundation open-source project. It was previously a subproject to Apache® Hadoop®, but it has now become a top-level project. We encourage you to read about the project and share your knowledge. To execute traditional SQL queries, you must use the MapReduce Java API. Hive provides the SQL abstraction needed to integrate SQL-like query (HiveQL), into the underlying Java. This is in addition to the Java API that implements queries.

Snowflake

Snowflake Inc.

$40.00 per month

5 Ratings

See Software

Your cloud data platform. Access to any data you need with unlimited scalability. All your data is available to you, with the near-infinite performance and concurrency required by your organization. You can seamlessly share and consume shared data across your organization to collaborate and solve your most difficult business problems. You can increase productivity and reduce time to value by collaborating with data professionals to quickly deliver integrated data solutions from any location in your organization. Our technology partners and system integrators can help you deploy Snowflake to your success, no matter if you are moving data into Snowflake.

Amazon Athena

Amazon

2 Ratings

See Software

Amazon Athena allows you to easily analyze data in Amazon S3 with standard SQL. Athena is serverless so there is no infrastructure to maintain and you only pay for the queries you run. Athena is simple to use. Simply point to your data in Amazon S3 and define the schema. Then, you can query standard SQL. Most results are delivered in a matter of seconds. Athena makes it easy to prepare your data for analysis without the need for complicated ETL jobs. Anyone with SQL skills can quickly analyze large-scale data sets. Athena integrates with AWS Glue Data Catalog out-of-the box. This allows you to create a unified metadata repositorie across multiple services, crawl data sources and discover schemas. You can also populate your Catalog by adding new and modified partition and table definitions. Schema versioning is possible.

AWS Batch

Amazon

1 Rating

See Software

AWS Batch allows scientists, engineers, and developers to run hundreds of thousands upon thousands of batch computing jobs on AWS. AWS Batch dynamically provision the best type and quantity of compute resources (e.g. CPU or memory optimized instances) according to the volume and specific resource needs of batch jobs submitted. AWS Batch eliminates the need to install or manage batch computing software, server clusters, or other hardware. This allows you to concentrate on analysing results and solving problems. AWS Batch schedules, executes and plans your batch computing workloads across all AWS compute services and features such as AWS Fargate and Amazon EC2 to help you analyze results and solve problems. AWS Batch is free. AWS Batch only charges for AWS resources (e.g. You only pay for the AWS resources (e.g. Fargate jobs or EC2 instances) that you use to store and manage your batch jobs.

Ray

Anyscale

Free

See Software

You can develop on your laptop, then scale the same Python code elastically across hundreds or GPUs on any cloud. Ray converts existing Python concepts into the distributed setting, so any serial application can be easily parallelized with little code changes. With a strong ecosystem distributed libraries, scale compute-heavy machine learning workloads such as model serving, deep learning, and hyperparameter tuning. Scale existing workloads (e.g. Pytorch on Ray is easy to scale by using integrations. Ray Tune and Ray Serve native Ray libraries make it easier to scale the most complex machine learning workloads like hyperparameter tuning, deep learning models training, reinforcement learning, and training deep learning models. In just 10 lines of code, you can get started with distributed hyperparameter tune. Creating distributed apps is hard. Ray is an expert in distributed execution.

Amazon SageMaker

Amazon

See Software

Amazon SageMaker, a fully managed service, provides data scientists and developers with the ability to quickly build, train, deploy, and deploy machine-learning (ML) models. SageMaker takes the hard work out of each step in the machine learning process, making it easier to create high-quality models. Traditional ML development can be complex, costly, and iterative. This is made worse by the lack of integrated tools to support the entire machine learning workflow. It is tedious and error-prone to combine tools and workflows. SageMaker solves the problem by combining all components needed for machine learning into a single toolset. This allows models to be produced faster and with less effort. Amazon SageMaker Studio is a web-based visual interface that allows you to perform all ML development tasks. SageMaker Studio allows you to have complete control over each step and gives you visibility.

Flyte

Union.ai

Free

See Software

The workflow automation platform that automates complex, mission-critical data processing and ML processes at large scale. Flyte makes it simple to create machine learning and data processing workflows that are concurrent, scalable, and manageable. Flyte is used for production at Lyft and Spotify, as well as Freenome. Flyte is used at Lyft for production model training and data processing. It has become the de facto platform for pricing, locations, ETA and mapping, as well as autonomous teams. Flyte manages more than 10,000 workflows at Lyft. This includes over 1,000,000 executions per month, 20,000,000 tasks, and 40,000,000 containers. Flyte has been battle-tested by Lyft and Spotify, as well as Freenome. It is completely open-source and has an Apache 2.0 license under Linux Foundation. There is also a cross-industry oversight committee. YAML is a useful tool for configuring machine learning and data workflows. However, it can be complicated and potentially error-prone.

Hugging Face

$9 per month

See Software

AutoTrain is a new way to automatically evaluate, deploy and train state-of-the art Machine Learning models. AutoTrain, seamlessly integrated into the Hugging Face ecosystem, is an automated way to develop and deploy state of-the-art Machine Learning model. Your account is protected from all data, including your training data. All data transfers are encrypted. Today's options include text classification, text scoring and entity recognition. Files in CSV, TSV, or JSON can be hosted anywhere. After training is completed, we delete all training data. Hugging Face also has an AI-generated content detection tool.

dbt

dbt Labs

$50 per user per month

See Software

Data teams can collaborate as software engineering teams by using version control, quality assurance, documentation, and modularity. Analytics errors should be treated as serious as production product bugs. Analytic workflows are often manual. We believe that workflows should be designed to be executed with one command. Data teams use dbt for codifying business logic and making it available to the entire organization. This is useful for reporting, ML modeling and operational workflows. Built-in CI/CD ensures data model changes are made in the correct order through development, staging, production, and production environments. dbt Cloud offers guaranteed uptime and custom SLAs.

SQLAlchemy

See Software

SQLAlchemy, the Python SQL toolkit and the object-relational mapping program that gives developers the full power of SQL, is SQLAlchemy. SQL databases behave less as object collections when performance and size start to matter. Object collections behave less like rows and tables the more abstraction starts mattering. SQLAlchemy is designed to accommodate both these principles. SQLAlchemy views the database as a relational algebra engine and not just a collection table. Rows can be selected not only from tables, but also joins or select statements. Any of these units can be combined into a larger structure. This idea is the basis of SQLAlchemy’s expression language. SQLAlchemy's object-relational mappingper (ORM) is the most well-known component. This optional component provides the data mapper pattern.

Apache Spark

Apache Software Foundation

See Software

Apache Spark™, a unified analytics engine that can handle large-scale data processing, is available. Apache Spark delivers high performance for streaming and batch data. It uses a state of the art DAG scheduler, query optimizer, as well as a physical execution engine. Spark has over 80 high-level operators, making it easy to create parallel apps. You can also use it interactively via the Scala, Python and R SQL shells. Spark powers a number of libraries, including SQL and DataFrames and MLlib for machine-learning, GraphX and Spark Streaming. These libraries can be combined seamlessly in one application. Spark can run on Hadoop, Apache Mesos and Kubernetes. It can also be used standalone or in the cloud. It can access a variety of data sources. Spark can be run in standalone cluster mode on EC2, Hadoop YARN and Mesos. Access data in HDFS and Alluxio.

MLflow

See Software

MLflow is an open-source platform that manages the ML lifecycle. It includes experimentation, reproducibility and deployment. There is also a central model registry. MLflow currently has four components. Record and query experiments: data, code, config, results. Data science code can be packaged in a format that can be reproduced on any platform. Machine learning models can be deployed in a variety of environments. A central repository can store, annotate and discover models, as well as manage them. The MLflow Tracking component provides an API and UI to log parameters, code versions and metrics. It can also be used to visualize the results later. MLflow Tracking allows you to log and query experiments using Python REST, R API, Java API APIs, and REST. An MLflow Project is a way to package data science code in a reusable, reproducible manner. It is based primarily upon conventions. The Projects component also includes an API and command line tools to run projects.

Dask

See Software

Dask is free and open-source. It was developed in collaboration with other community projects such as NumPy and pandas. Dask uses existing Python data structures and APIs to make it easy for users to switch between NumPy/pandas and scikit-learn-powered versions. Dask's schedulers can scale to thousands of node clusters, and its algorithms have been tested at some of the most powerful supercomputers around the world. You don't necessarily need a large cluster to get started. Dask ships schedulers that can be used on personal computers. Many people use Dask to scale computations on their laptops, using multiple cores and their disk for extra storage. Dask exposes lower level APIs that allow you to build custom systems for your own applications. This allows open-source leaders to parallelize their own packages, and business leaders to scale custom business logic.

DuckDB

See Software

Processing and storage of tabular datasets, e.g. CSV or Parquet files. Large result set transfer to client. Large client/server installations are required for central enterprise data warehousing. Multiple concurrent processes can be used to write to a single database. DuckDB is a relational database management software (RDBMS). It is a system to manage data stored in relational databases. A relation is basically a mathematical term for a particular table. Each table is a named collection. Each row in a table has the same number of named columns. Each column is of a particular data type. Schemas are used to store tables, and a collection can be accessed to access the entire database.

Azure Databricks

Microsoft

See Software

Azure Databricks allows you to unlock insights from all your data, build artificial intelligence (AI), solutions, and autoscale your Apache Spark™. You can also collaborate on shared projects with other people in an interactive workspace. Azure Databricks supports Python and Scala, R and Java, as well data science frameworks such as TensorFlow, PyTorch and scikit-learn. Azure Databricks offers the latest version of Apache Spark and allows seamless integration with open-source libraries. You can quickly spin up clusters and build in an Apache Spark environment that is fully managed and available worldwide. Clusters can be set up, configured, fine-tuned, and monitored to ensure performance and reliability. To reduce total cost of ownership (TCO), take advantage of autoscaling or auto-termination.

Kubeflow

See Software

Kubeflow is a project that makes machine learning (ML), workflows on Kubernetes portable, scalable, and easy to deploy. Our goal is not create new services, but to make it easy to deploy the best-of-breed open source systems for ML to different infrastructures. Kubeflow can be run anywhere Kubernetes is running. Kubeflow offers a custom TensorFlow job operator that can be used to train your ML model. Kubeflow's job manager can handle distributed TensorFlow training jobs. You can configure the training controller to use GPUs or CPUs, and to adapt to different cluster sizes. Kubeflow provides services to create and manage interactive Jupyter Notebooks. You can adjust your notebook deployment and compute resources to meet your data science requirements. You can experiment with your workflows locally and then move them to the cloud when you are ready.

Union Cloud Integrations

Union.ai

What Integrates with Union Cloud?

Google Cloud Platform

Google Cloud BigQuery

Amazon Web Services (AWS)

Kubernetes

Apache Hive

Snowflake

Amazon Athena

AWS Batch

Ray

Amazon SageMaker

Flyte

Hugging Face

dbt

SQLAlchemy

Apache Spark

MLflow

Dask

DuckDB

Azure Databricks

Kubeflow

Relevant Categories

Category Integrations