What Integrates with Yandex Data Proc?
Find out what Yandex Data Proc integrations exist in 2025. Learn what software and services currently integrate with Yandex Data Proc, and sort them by reviews, cost, features, and more. Below is a list of products that Yandex Data Proc currently integrates with:
-
1
TensorFlow
TensorFlow
Free 2 RatingsOpen source platform for machine learning. TensorFlow is a machine learning platform that is open-source and available to all. It offers a flexible, comprehensive ecosystem of tools, libraries, and community resources that allows researchers to push the boundaries of machine learning. Developers can easily create and deploy ML-powered applications using its tools. Easy ML model training and development using high-level APIs such as Keras. This allows for quick model iteration and debugging. No matter what language you choose, you can easily train and deploy models in cloud, browser, on-prem, or on-device. It is a simple and flexible architecture that allows you to quickly take new ideas from concept to code to state-of the-art models and publication. TensorFlow makes it easy to build, deploy, and test. -
2
Definitive functions are the heart of extensible programming. Python supports keyword arguments, mandatory and optional arguments, as well as arbitrary argument lists. It doesn't matter if you are a beginner or an expert programmer, Python is easy to learn. Python is easy to learn, whether you are a beginner or an expert in other languages. These pages can be a helpful starting point to learn Python programming. The community hosts meetups and conferences to share code and much more. The documentation for Python will be helpful and the mailing lists will keep in touch. The Python Package Index (PyPI), hosts thousands of third-party Python modules. Both Python's standard library and the community-contributed modules allow for endless possibilities.
-
3
The NumPy vectorization and indexing concepts are fast and flexible. They are the current de-facto standard in array computing. NumPy provides comprehensive mathematical functions, random numbers generators, linear algebra routines and Fourier transforms. NumPy is compatible with a wide variety of hardware and computing platforms. It also works well with sparse array libraries, distributed, GPU, or GPU. NumPy's core is C code that has been optimized. Enjoy Python's flexibility with the speed and efficiency of compiled code. NumPy's high-level syntax makes it easy for programmers of all backgrounds and experience levels. NumPy brings the computational power and simplicity of languages such as C and Fortran into Python, making it a language that is much easier to learn and to use. This power is often accompanied by simplicity: NumPy solutions are often simple and elegant.
-
4
scikit-image
scikit-image
Free 1 RatingScikit-image is a collection algorithm for image processing. It is free to download and without restriction. We are proud of our high-quality code that has been peer-reviewed and is written by a large community of volunteers. Scikit-image is a Python library that provides a variety of image processing routines. This library is being developed by its community. Contributions are most welcome! Scikit-image is a reference library for scientific image analysis using Python. This is achieved by making it easy to use and easy to install. We take care when adding new dependencies. Sometimes we remove existing ones or make them optional. Our API has detailed docstrings that clarify the expected inputs and outputs for all functions. Conceptually identical arguments share the same name and position within a function signature. The library has close to 100% test coverage and all code is reviewed by at minimum two core developers before it is included. -
5
Apache Hive
Apache Software Foundation
1 RatingApache Hive™, a data warehouse software, facilitates the reading, writing and management of large datasets that are stored in distributed storage using SQL. Structure can be projected onto existing data. Hive provides a command line tool and a JDBC driver to allow users to connect to it. Apache Hive is an Apache Software Foundation open-source project. It was previously a subproject to Apache® Hadoop®, but it has now become a top-level project. We encourage you to read about the project and share your knowledge. To execute traditional SQL queries, you must use the MapReduce Java API. Hive provides the SQL abstraction needed to integrate SQL-like query (HiveQL), into the underlying Java. This is in addition to the Java API that implements queries. -
6
Pandas is an open-source data analysis and manipulation tool that is fast, flexible, flexible, and easy to use. It was built on top the Python programming language. Tools for reading and writing data between memory data structures and various formats: CSV, text files, Microsoft Excel, SQL databases and the fast HDF5 format. Intelligent data alignment and integrated handling missing data: Use a powerful group engine to perform split-apply/combine operations on data sets. Time series-functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging. You can even create domain-specific offsets and join time sequences without losing data.
-
7
Yandex DataSphere
Yandex.Cloud
$0.095437 per GBSelect the configurations and resources required for specific code segments within your project. It only takes seconds to save and apply changes in a training scenario. Select the right configuration of computing resources to launch training models in a matter of seconds. All will be created automatically, without the need to manage infrastructure. Select a serverless or dedicated operating mode. All in one interface, manage project data, save to datasets and connect to databases, object storage or other repositories. Create a ML model with colleagues from around the world, share the project and set budgets across your organization. Launch your ML within minutes, without developers' help. Try out experiments with different models being published simultaneously. -
8
Matplotlib
Matplotlib
FreeMatplotlib is a Python library that allows you to create interactive, animated, or static visualizations. Matplotlib makes difficult things simple and easy. Many third-party packages extend and build upon Matplotlib functionality. These include several higher-level plotting interfaces such as seaborn, HoloViews and ggplot. -
9
Yandex Cloud
Yandex
A fully-fledged platform that provides scalable infrastructure, storage and machine learning tools for building and enhancing digital services and apps. Yandex is one of the world's largest technology companies that creates intelligent products and services. Yandex's applications are hosted in three geographically dispersed data centers. Yandex data centers are fully in-house, with proprietary licensing software and hardware, and independent power. -
10
Apache HBase
The Apache Software Foundation
Apache HBase™, is used when you need random, real-time read/write access for your Big Data. This project aims to host very large tables, billions of rows and X million columns, on top of clusters of commodity hardware. -
11
Hadoop
Apache Software Foundation
Apache Hadoop is a software library that allows distributed processing of large data sets across multiple computers. It uses simple programming models. It can scale from one server to thousands of machines and offer local computations and storage. Instead of relying on hardware to provide high-availability, it is designed to detect and manage failures at the application layer. This allows for highly-available services on top of a cluster computers that may be susceptible to failures. -
12
Apache Spark
Apache Software Foundation
Apache Spark™, a unified analytics engine that can handle large-scale data processing, is available. Apache Spark delivers high performance for streaming and batch data. It uses a state of the art DAG scheduler, query optimizer, as well as a physical execution engine. Spark has over 80 high-level operators, making it easy to create parallel apps. You can also use it interactively via the Scala, Python and R SQL shells. Spark powers a number of libraries, including SQL and DataFrames and MLlib for machine-learning, GraphX and Spark Streaming. These libraries can be combined seamlessly in one application. Spark can run on Hadoop, Apache Mesos and Kubernetes. It can also be used standalone or in the cloud. It can access a variety of data sources. Spark can be run in standalone cluster mode on EC2, Hadoop YARN and Mesos. Access data in HDFS and Alluxio. -
13
Apache Zeppelin
Apache
Web-based notebook that allows data-driven, interactive data analysis and collaborative documents with SQL and Scala. The IPython interpreter offers a similar user experience to Jupyter Notebook. This release features Note level dynamic form, note comparison comparator, and the ability to run paragraph sequentially instead of simultaneous execution in previous releases. Interpreter lifecycle manager automatically terminates interpreter process upon idle timeout. So resources are released when not in use. -
14
Apache Flume
Apache Software Foundation
Flume is a reliable, distributed service that efficiently collects, aggregates, and moves large amounts of log data. Flume's architecture is based on streaming data flows and is simple and flexible. It is robust and fault-tolerant, with many failovers and recovery options. It is based on a simple extensible data structure that allows for online analytical applications. Flume 1.8.0 has been released by the Apache Flume team. Flume is a distributed, reliable and available service that efficiently collects, aggregates, and moves large amounts of streaming event information. -
15
Apache Airflow
The Apache Software Foundation
Airflow is a community-created platform that allows programmatically to schedule, author, and monitor workflows. Airflow is modular in architecture and uses a message queue for managing a large number of workers. Airflow can scale to infinity. Airflow pipelines can be defined in Python to allow for dynamic pipeline generation. This allows you to write code that dynamically creates pipelines. You can easily define your own operators, and extend libraries to suit your environment. Airflow pipelines can be both explicit and lean. The Jinja templating engine is used to create parametrization in the core of Airflow pipelines. No more XML or command-line black-magic! You can use standard Python features to create your workflows. This includes date time formats for scheduling, loops to dynamically generate task tasks, and loops for scheduling. This allows you to be flexible when creating your workflows.
- Previous
- You're on page 1
- Next