Business Software for Apache Spark

  • 1
    Vertex AI Reviews

    Vertex AI

    Google

    Free ($300 in free credits)
    673 Ratings
    See Software
    Learn More
    Fully managed ML tools allow you to build, deploy and scale machine-learning (ML) models quickly, for any use case. Vertex AI Workbench is natively integrated with BigQuery Dataproc and Spark. You can use BigQuery to create and execute machine-learning models in BigQuery by using standard SQL queries and spreadsheets or you can export datasets directly from BigQuery into Vertex AI Workbench to run your models there. Vertex Data Labeling can be used to create highly accurate labels for data collection. Vertex AI Agent Builder empowers developers to design and deploy advanced generative AI applications for enterprise use. It supports both no-code and code-driven development, enabling users to create AI agents through natural language prompts or by integrating with frameworks like LangChain and LlamaIndex.
  • 2
    Scalytics Connect Reviews
    Scalytics Connect combines data mesh and in-situ data processing with polystore technology, resulting in increased data scalability, increased data processing speed, and multiplying data analytics capabilities without losing privacy or security. You take advantage of all your data without wasting time with data copy or movement, enable innovation with enhanced data analytics, generative AI and federated learning (FL) developments. Scalytics Connect enables any organization to directly apply data analytics, train machine learning (ML) or generative AI (LLM) models on their installed data architecture.
  • 3
    Kubernetes Reviews
    Kubernetes (K8s) is a powerful open-source platform designed to automate the deployment, scaling, and management of applications that are containerized. By organizing containers into manageable groups, it simplifies the processes of application management and discovery. Drawing from over 15 years of experience in handling production workloads at Google, Kubernetes also incorporates the best practices and innovative ideas from the wider community. Built on the same foundational principles that enable Google to efficiently manage billions of containers weekly, it allows for scaling without necessitating an increase in operational personnel. Whether you are developing locally or operating a large-scale enterprise, Kubernetes adapts to your needs, providing reliable and seamless application delivery regardless of complexity. Moreover, being open-source, Kubernetes offers the flexibility to leverage on-premises, hybrid, or public cloud environments, facilitating easy migration of workloads to the most suitable infrastructure. This adaptability not only enhances operational efficiency but also empowers organizations to respond swiftly to changing demands in their environments.
  • 4
    Sematext Cloud Reviews
    Top Pick
    Sematext Cloud provides all-in-one observability solutions for modern software-based businesses. It provides key insights into both front-end and back-end performance. Sematext includes infrastructure, synthetic monitoring, transaction tracking, log management, and real user & synthetic monitoring. Sematext provides full-stack visibility for businesses by quickly and easily exposing key performance issues through a single Cloud solution or On-Premise.
  • 5
    Jupyter Notebook Reviews
    The Jupyter Notebook is a web-based open-source tool that enables users to create and distribute documents featuring live code, visualizations, equations, and written explanations. Its applications are diverse and encompass tasks such as data cleaning and transformation, statistical modeling, numerical simulations, data visualization, machine learning, among others, showcasing its versatility in various fields. Additionally, it serves as an excellent platform for collaboration and sharing insights within the data science community.
  • 6
    Amazon EC2 Reviews
    Amazon Elastic Compute Cloud (Amazon EC2) is a cloud service that offers flexible and secure computing capabilities. Its primary aim is to simplify large-scale cloud computing for developers. With an easy-to-use web service interface, Amazon EC2 allows users to quickly obtain and configure computing resources with ease. Users gain full control over their computing power while utilizing Amazon’s established computing framework. The service offers an extensive range of compute options, networking capabilities (up to 400 Gbps), and tailored storage solutions that enhance price and performance specifically for machine learning initiatives. Developers can create, test, and deploy macOS workloads on demand. Furthermore, users can scale their capacity dynamically as requirements change, all while benefiting from AWS's pay-as-you-go pricing model. This infrastructure enables rapid access to the necessary resources for high-performance computing (HPC) applications, resulting in enhanced speed and cost efficiency. In essence, Amazon EC2 ensures a secure, dependable, and high-performance computing environment that caters to the diverse demands of modern businesses. Overall, it stands out as a versatile solution for various computing needs across different industries.
  • 7
    Apache Cassandra Reviews

    Apache Cassandra

    Apache Software Foundation

    1 Rating
    When seeking a database that ensures both scalability and high availability without sacrificing performance, Apache Cassandra stands out as an ideal option. Its linear scalability paired with proven fault tolerance on standard hardware or cloud services positions it as an excellent choice for handling mission-critical data effectively. Additionally, Cassandra's superior capability to replicate data across several datacenters not only enhances user experience by reducing latency but also offers reassurance in the event of regional failures. This combination of features makes it a robust solution for organizations that prioritize data resilience and efficiency.
  • 8
    SingleStore Reviews

    SingleStore

    SingleStore

    $0.69 per hour
    1 Rating
    SingleStore, previously known as MemSQL, is a highly scalable and distributed SQL database that can operate in any environment. It is designed to provide exceptional performance for both transactional and analytical tasks while utilizing well-known relational models. This database supports continuous data ingestion, enabling operational analytics critical for frontline business activities. With the capacity to handle millions of events each second, SingleStore ensures ACID transactions and allows for the simultaneous analysis of vast amounts of data across various formats, including relational SQL, JSON, geospatial, and full-text search. It excels in data ingestion performance at scale and incorporates built-in batch loading alongside real-time data pipelines. Leveraging ANSI SQL, SingleStore offers rapid query responses for both current and historical data, facilitating ad hoc analysis through business intelligence tools. Additionally, it empowers users to execute machine learning algorithms for immediate scoring and conduct geoanalytic queries in real-time, thereby enhancing decision-making processes. Furthermore, its versatility makes it a strong choice for organizations looking to derive insights from diverse data types efficiently.
  • 9
    Dataiku Reviews
    Dataiku serves as a sophisticated platform for data science and machine learning, aimed at facilitating teams in the construction, deployment, and management of AI and analytics projects on a large scale. It enables a diverse range of users, including data scientists and business analysts, to work together in developing data pipelines, crafting machine learning models, and preparing data through various visual and coding interfaces. Supporting the complete AI lifecycle, Dataiku provides essential tools for data preparation, model training, deployment, and ongoing monitoring of projects. Additionally, the platform incorporates integrations that enhance its capabilities, such as generative AI, thereby allowing organizations to innovate and implement AI solutions across various sectors. This adaptability positions Dataiku as a valuable asset for teams looking to harness the power of AI effectively.
  • 10
    JupyterLab Reviews
    Project Jupyter is dedicated to the creation of open-source tools, standards, and services that facilitate interactive computing in numerous programming languages. At the heart of this initiative is JupyterLab, a web-based interactive development environment designed for Jupyter notebooks, coding, and data manipulation. JupyterLab offers remarkable flexibility, allowing users to customize and organize the interface to cater to various workflows in fields such as data science, scientific research, and machine learning. Its extensibility and modular nature enable developers to create plugins that introduce new features and seamlessly integrate with existing components. The Jupyter Notebook serves as an open-source web application enabling users to produce and share documents that incorporate live code, mathematical equations, visualizations, and descriptive text. Common applications of Jupyter include data cleaning and transformation, numerical simulations, statistical analysis, data visualization, and machine learning, among others. Supporting over 40 programming languages—including popular ones like Python, R, Julia, and Scala—Jupyter continues to be a valuable resource for researchers and developers alike, fostering collaborative and innovative approaches to computing challenges.
  • 11
    Apache Hive Reviews

    Apache Hive

    Apache Software Foundation

    1 Rating
    Apache Hive is a data warehouse solution that enables the efficient reading, writing, and management of substantial datasets stored across distributed systems using SQL. It allows users to apply structure to pre-existing data in storage. To facilitate user access, it comes equipped with a command line interface and a JDBC driver. As an open-source initiative, Apache Hive is maintained by dedicated volunteers at the Apache Software Foundation. Initially part of the Apache® Hadoop® ecosystem, it has since evolved into an independent top-level project. We invite you to explore the project further and share your knowledge to enhance its development. Users typically implement traditional SQL queries through the MapReduce Java API, which can complicate the execution of SQL applications on distributed data. However, Hive simplifies this process by offering a SQL abstraction that allows for the integration of SQL-like queries, known as HiveQL, into the underlying Java framework, eliminating the need to delve into the complexities of the low-level Java API. This makes working with large datasets more accessible and efficient for developers.
  • 12
    Archon Data Store Reviews
    The Archon Data Store™ is a robust and secure platform built on open-source principles, tailored for archiving and managing extensive data lakes. Its compliance capabilities and small footprint facilitate large-scale data search, processing, and analysis across structured, unstructured, and semi-structured data within an organization. By merging the essential characteristics of both data warehouses and data lakes, Archon Data Store creates a seamless and efficient platform. This integration effectively breaks down data silos, enhancing data engineering, analytics, data science, and machine learning workflows. With its focus on centralized metadata, optimized storage solutions, and distributed computing, the Archon Data Store ensures the preservation of data integrity. Additionally, its cohesive strategies for data management, security, and governance empower organizations to operate more effectively and foster innovation at a quicker pace. By offering a singular platform for both archiving and analyzing all organizational data, Archon Data Store not only delivers significant operational efficiencies but also positions your organization for future growth and agility.
  • 13
    LogIsland Reviews
    The LogIsland platform serves as the core of Hurence's real-time analytics system, enabling the collection of factory events from the IIoT as well as data from websites. Hurence asserts that both factories and companies can be monitored and understood in real time through the myriad of events they experience, where each occurrence, such as a sales order, the production of an item by a robot, or the delivery of a product, qualifies as an event. Essentially, everything constitutes an event, and the LogIsland platform facilitates the capture of these events, organizing them within a message bus capable of handling substantial volumes. This system allows for real-time analysis with a range of plug-and-play analyzers that vary from basic functions like counting and alerting to advanced artificial intelligence models designed for predictive analytics and the identification of anomalies or defects. It stands as your versatile tool for real-time event analysis, equipped with custom analyzers tailored for two specific areas: web analytics and Industry 4.0, thereby enhancing decision-making processes across various domains.
  • 14
    Activeeon ProActive Reviews
    ProActive Parallel Suite, a member of the OW2 Open Source Community for acceleration and orchestration, seamlessly integrated with the management and operation of high-performance Clouds (Private, Public with bursting capabilities). ProActive Parallel Suite platforms offer high-performance workflows and application parallelization, enterprise Scheduling & Orchestration, and dynamic management of private Heterogeneous Grids & Clouds. Our users can now simultaneously manage their Enterprise Cloud and accelerate and orchestrate all of their enterprise applications with the ProActive platform.
  • 15
    Alluxio Reviews

    Alluxio

    Alluxio

    26¢ Per SW Instance Per Hour
    Alluxio stands out as the pioneering open-source technology for data orchestration tailored for analytics and AI within cloud environments. It effectively connects data-centric applications with various storage systems, allowing seamless data retrieval from the storage layer, thus enhancing accessibility and enabling a unified interface for multiple storage solutions. The innovative memory-first tiered architecture of Alluxio facilitates data access at unprecedented speeds, significantly surpassing traditional methods. Picture yourself as an IT leader with the power to select from a diverse range of services available in both public cloud and on-premises settings. Furthermore, envision having the capability to scale your storage for data lakes while maintaining control over data locality and ensuring robust protection for your organization. To support these aspirations, NetApp and Alluxio are collaborating to empower clients in navigating the evolving landscape of modernizing their data architecture, with an emphasis on minimizing operational complexity for analytics, machine learning, and AI-driven workflows. This partnership aims to unlock new possibilities for businesses striving to harness the full potential of their data assets.
  • 16
    Dagster+ Reviews

    Dagster+

    Dagster Labs

    $0
    Dagster is the cloud-native open-source orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. It is the platform of choice data teams responsible for the development, production, and observation of data assets. With Dagster, you can focus on running tasks, or you can identify the key assets you need to create using a declarative approach. Embrace CI/CD best practices from the get-go: build reusable components, spot data quality issues, and flag bugs early.
  • 17
    Union Cloud Reviews

    Union Cloud

    Union.ai

    Free (Flyte)
    Union.ai Benefits: - Accelerated Data Processing & ML: Union.ai significantly speeds up data processing and machine learning. - Built on Trusted Open-Source: Leverages the robust open-source project Flyte™, ensuring a reliable and tested foundation for your ML projects. - Kubernetes Efficiency: Harnesses the power and efficiency of Kubernetes along with enhanced observability and enterprise features. - Optimized Infrastructure: Facilitates easier collaboration among Data and ML teams on optimized infrastructures, boosting project velocity. - Breaks Down Silos: Tackles the challenges of distributed tooling and infrastructure by simplifying work-sharing across teams and environments with reusable tasks, versioned workflows, and an extensible plugin system. - Seamless Multi-Cloud Operations: Navigate the complexities of on-prem, hybrid, or multi-cloud setups with ease, ensuring consistent data handling, secure networking, and smooth service integrations. - Cost Optimization: Keeps a tight rein on your compute costs, tracks usage, and optimizes resource allocation even across distributed providers and instances, ensuring cost-effectiveness.
  • 18
    Apache Iceberg Reviews

    Apache Iceberg

    Apache Software Foundation

    Free
    Iceberg is an advanced format designed for managing extensive analytical tables efficiently. It combines the dependability and ease of SQL tables with the capabilities required for big data, enabling multiple engines such as Spark, Trino, Flink, Presto, Hive, and Impala to access and manipulate the same tables concurrently without issues. The format allows for versatile SQL operations to incorporate new data, modify existing records, and execute precise deletions. Additionally, Iceberg can optimize read performance by eagerly rewriting data files or utilize delete deltas to facilitate quicker updates. It also streamlines the complex and often error-prone process of generating partition values for table rows while automatically bypassing unnecessary partitions and files. Fast queries do not require extra filtering, and the structure of the table can be adjusted dynamically as data and query patterns evolve, ensuring efficiency and adaptability in data management. This adaptability makes Iceberg an essential tool in modern data workflows.
  • 19
    Oxla Reviews

    Oxla

    Oxla

    $50 per CPU core / monthly
    Designed specifically for optimizing compute, memory, and storage, Oxla serves as a self-hosted data warehouse that excels in handling large-scale, low-latency analytics while providing strong support for time-series data. While cloud data warehouses may suit many, they are not universally applicable; as operations expand, the ongoing costs of cloud computing can surpass initial savings on infrastructure, particularly in regulated sectors that demand comprehensive data control beyond mere VPC and BYOC setups. Oxla surpasses both traditional and cloud-based warehouses by maximizing efficiency, allowing for the scalability of expanding datasets with predictable expenses, whether on-premises or in various cloud environments. Deployment, execution, and maintenance of Oxla can be easily managed using Docker and YAML, enabling a range of workloads to thrive within a singular, self-hosted data warehouse. In this way, Oxla provides a tailored solution for organizations seeking both efficiency and control in their data management strategies.
  • 20
    Style Intelligence Reviews
    Style Intelligence from InetSoft is a complete business intelligence platform that empowers companies with the ability to analyze, monitor, report and collaborate on business and operational data coming from different sources in real-time. Its top features include a data mashup Data Block architecture and professional atomic block modeling tool. There is also a database write-back option. Style Intelligence is robust and easy-to-use. It offers granular security, multitenancy support, multiple integrations, and is fully scalable.
  • 21
    Instaclustr Reviews

    Instaclustr

    Instaclustr

    $20 per node per month
    Instaclustr, the Open Source-as a Service company, delivers reliability at scale. We provide database, search, messaging, and analytics in an automated, trusted, and proven managed environment. We help companies focus their internal development and operational resources on creating cutting-edge customer-facing applications. Instaclustr is a cloud provider that works with AWS, Heroku Azure, IBM Cloud Platform, Azure, IBM Cloud and Google Cloud Platform. The company is certified by SOC 2 and offers 24/7 customer support.
  • 22
    IBM Cloud SQL Query Reviews

    IBM Cloud SQL Query

    IBM

    $5.00/Terabyte-Month
    Experience serverless and interactive data querying with IBM Cloud Object Storage, enabling you to analyze your data directly at its source without the need for ETL processes, databases, or infrastructure management. IBM Cloud SQL Query leverages Apache Spark, a high-performance, open-source data processing engine designed for quick and flexible analysis, allowing SQL queries without requiring ETL or schema definitions. You can easily perform data analysis on your IBM Cloud Object Storage via our intuitive query editor and REST API. With a pay-per-query pricing model, you only incur costs for the data that is scanned, providing a cost-effective solution that allows for unlimited queries. To enhance both savings and performance, consider compressing or partitioning your data. Furthermore, IBM Cloud SQL Query ensures high availability by executing queries across compute resources located in various facilities. Supporting multiple data formats, including CSV, JSON, and Parquet, it also accommodates standard ANSI SQL for your querying needs, making it a versatile tool for data analysis. This capability empowers organizations to make data-driven decisions more efficiently than ever before.
  • 23
    PubSub+ Platform Reviews
    Solace is a specialist in Event-Driven-Architecture (EDA), with two decades of experience providing enterprises with highly reliable, robust and scalable data movement technology based on the publish & subscribe (pub/sub) pattern. Solace technology enables the real-time data flow behind many of the conveniences you take for granted every day such as immediate loyalty rewards from your credit card, the weather data delivered to your mobile phone, real-time airplane movements on the ground and in the air, and timely inventory updates to some of your favourite department stores and grocery chains, not to mention that Solace technology also powers many of the world's leading stock exchanges and betting houses. Aside from rock solid technology, stellar customer support is one of the biggest reasons customers select Solace, and stick with them.
  • 24
    Coginiti Reviews

    Coginiti

    Coginiti

    $189/user/year
    Coginiti is the AI-enabled enterprise Data Workspace that empowers everyone to get fast, consistent answers to any business questions. Coginiti helps you find and search for metrics that are approved for your use case, accelerating the lifecycle of analytic development from development to certification. Coginiti integrates the functionality needed to build, approve and curate analytics for reuse across all business domains, while adhering your data governance policies and standards. Coginiti’s collaborative data workspace is trusted by teams in the insurance, healthcare, financial services and retail/consumer packaged goods industries to deliver value to customers.
  • 25
    Rational BI Reviews

    Rational BI

    Rational BI

    $129 per month
    Allocate less time to data preparation and focus more on data analysis. By doing so, you can create visually appealing and precise reports while consolidating all aspects of data collection, analytics, and data science within a unified platform that is accessible to everyone in the company. Import your data seamlessly, regardless of its source. Whether your objective is to generate scheduled reports from Excel spreadsheets, cross-reference information across different files and databases, or convert your data into SQL-queryable formats, Rational BI offers a comprehensive suite of tools to meet your needs. Uncover the insights concealed within your data, make it readily available, and gain an edge over your competitors. Elevate your organization’s analytical capabilities with business intelligence that simplifies the process of locating the most current data and enables analysis through an interface that appeals to both seasoned data scientists and everyday data users. This approach ensures that all team members can leverage data effectively, fostering a culture of informed decision-making throughout the organization.
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next