Google Compute Engine
Compute Engine (IaaS), a platform from Google that allows organizations to create and manage cloud-based virtual machines, is an infrastructure as a services (IaaS).
Computing infrastructure in predefined sizes or custom machine shapes to accelerate cloud transformation. General purpose machines (E2, N1,N2,N2D) offer a good compromise between price and performance. Compute optimized machines (C2) offer high-end performance vCPUs for compute-intensive workloads. Memory optimized (M2) systems offer the highest amount of memory and are ideal for in-memory database applications. Accelerator optimized machines (A2) are based on A100 GPUs, and are designed for high-demanding applications. Integrate Compute services with other Google Cloud Services, such as AI/ML or data analytics. Reservations can help you ensure that your applications will have the capacity needed as they scale. You can save money by running Compute using the sustained-use discount, and you can even save more when you use the committed-use discount.
Learn more
Dragonfly
Dragonfly serves as a seamless substitute for Redis, offering enhanced performance while reducing costs. It is specifically engineered to harness the capabilities of contemporary cloud infrastructure, catering to the data requirements of today’s applications, thereby liberating developers from the constraints posed by conventional in-memory data solutions. Legacy software cannot fully exploit the advantages of modern cloud technology. With its optimization for cloud environments, Dragonfly achieves an impressive 25 times more throughput and reduces snapshotting latency by 12 times compared to older in-memory data solutions like Redis, making it easier to provide the immediate responses that users demand. The traditional single-threaded architecture of Redis leads to high expenses when scaling workloads. In contrast, Dragonfly is significantly more efficient in both computation and memory usage, potentially reducing infrastructure expenses by up to 80%. Initially, Dragonfly scales vertically, only transitioning to clustering when absolutely necessary at a very high scale, which simplifies the operational framework and enhances system reliability. Consequently, developers can focus more on innovation rather than infrastructure management.
Learn more
AWS Parallel Computing Service
AWS Parallel Computing Service (AWS PCS) is a fully managed service designed to facilitate the execution and scaling of high-performance computing tasks while also aiding in the development of scientific and engineering models using Slurm on AWS. This service allows users to create comprehensive and adaptable environments that seamlessly combine computing, storage, networking, and visualization tools, enabling them to concentrate on their research and innovative projects without the hassle of managing the underlying infrastructure. With features like automated updates and integrated observability, AWS PCS significantly improves the operations and upkeep of computing clusters. Users can easily construct and launch scalable, dependable, and secure HPC clusters via the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDK. The versatility of the service supports a wide range of applications, including tightly coupled workloads such as computer-aided engineering, high-throughput computing for tasks like genomics analysis, GPU-accelerated computing, and specialized silicon solutions like AWS Trainium and AWS Inferentia. Overall, AWS PCS empowers researchers and engineers to harness advanced computing capabilities without needing to worry about the complexities of infrastructure setup and maintenance.
Learn more
Amazon EC2 UltraClusters
Amazon EC2 UltraClusters allow for the scaling of thousands of GPUs or specialized machine learning accelerators like AWS Trainium, granting users immediate access to supercomputing-level performance. This service opens the door to supercomputing for developers involved in machine learning, generative AI, and high-performance computing, all through a straightforward pay-as-you-go pricing structure that eliminates the need for initial setup or ongoing maintenance expenses. Comprising thousands of accelerated EC2 instances placed within a specific AWS Availability Zone, UltraClusters utilize Elastic Fabric Adapter (EFA) networking within a petabit-scale nonblocking network. Such an architecture not only ensures high-performance networking but also facilitates access to Amazon FSx for Lustre, a fully managed shared storage solution based on a high-performance parallel file system that enables swift processing of large datasets with sub-millisecond latency. Furthermore, EC2 UltraClusters enhance scale-out capabilities for distributed machine learning training and tightly integrated HPC tasks, significantly decreasing training durations while maximizing efficiency. This transformative technology is paving the way for groundbreaking advancements in various computational fields.
Learn more