Back to blog

Infrastructure on AWS vs. Azure: Comparison

Cloud adoption
Cloud consulting
Cloud solutions
November 12, 2021
10 mins

You wouldn’t believe how often business owners ask themselves ‘Would it be cheaper for our infrastructure maintenance if we moved to Azure?’ Even though this question will never have a simple answer, we’ve decided to show you how it works in reality. We will showcase one infrastructure replicated on Azure and AWS with a cost comparison of both.

NOTE: Since both clouds offer different tools the implementation schemes will differ, although, it is the same infrastructure.

To display the example we chose an application with which you can upload a monochrome photo and have a colored version sent to you. So, you send a file, the system processes the photo and saves the result to storage.

Let’s see how this one will be implemented on AWS and Azure.

General Overview of Different Infrastructure Approaches

There are several approaches to implement the desired infrastructure in the сloud. Let’s compare several of them to check their strengths and weaknesses.

Serverless (AWS Lambda, Azure Functions)

Serverless is the most modern approach to run the application. Let’s break it down to pros vs. cons.

Pros:

  • you don’t need to take care of the background layers (virtual machines, physical servers, orchestrators, etc.)
  • you can potentially scale up and down without limits
  • pricing: it’s cheaper for small amounts of requests because you pay only for what is used. You can reduce operational and maintenance costs as no backend infrastructure needs to be managed — it’s the responsibility of the cloud vendors. It makes sense only if you have a few clients per month and the rest of the time you don’t need the infrastructure to be running.

Cons:

  • application must be written in the form that is supported by the Serverless paradigm
  • application should be properly tested to make sure that it uses only the time it needs, as you pay for each second you have the function running
  • each cloud has its own requirements for serverless applications, so you will need to spend additional time to adopt the application to the specific cloud in case you want to migrate or to be multi-cloud
  • sometimes after going completely Serverless, developers face issues that make them turn back to a more traditional paradigm

Virtual Machines on top of the cloud (AWS EC2, Azure VM)

This is the most straightforward approach, but at the same time the most popular in the IT world. Why? It all boils down to a single advantage:

Pros:

  • you do not need to change the application to be cloud-native

Cons:

  • each instance is added by operating system and environment overhead that increases TCO (Total Cost of Ownership) by about 15%
  • need to make cloud images for each application module using Ansible and Packer that can take longer than preparing the image and increase the scale-up and scale-down times
  • as we are not using Docker images when working with VM images, it will be harder to have the same local development environment or it will require having Docker images in addition to the VM images

Container services (AWS ECS, Azure App Service)

Container-based approach is currently outdated but still supported by most cloud services. These services are the first iteration of container orchestration, but there are more modern options.

Pros:

  • easy deployment and support
  • we can use the same Docker images for the local development

Cons:

  • application needs to be converted to be cloud-native
  • no control options for management

Kubernetes (AWS EKS, Azure AKS, Self-hosted Kubernetes service)

It’s the most popular solution of the vendor-agnostic, high-load, and highly elastic container orchestration service.

Pros:

  • easy deployment, management, and support
  • it supports both Windows- and Linux-based workers to run corresponding containers

Cons:

  • application needs to be converted to be cloud-native

Examples of Kubernetes-based Infrastructure on the AWS and Azure Platforms

High-level Concept Without Cloud Specifics

To make the application highly-available and elastic we decided to extend the original architecture with Message Queue Service (it can be a managed service which is available in each cloud platform such as AWS SQS, Azure Service Bus, or any other implementation of it like ActiveMQ, RabbitMQ, Kafka, and others). It will help to decouple the application and make it scalable. Of course, we can implement this logic on the API side, but it will make it just more complex and less predictable. Having a message queue between the API and backend can scale the number of backends depending on the amount of messages in the queue waiting to be processed. We can leave all of the other components as they are in the original diagram. In our example, the application consists of microservices that should be run on the Linux or Windows platforms. The Kubernetes cluster consists of two nodes of each type.

To reduce the cost of the architecture in the initial stage we omitted the following from the diagram:

  • some components in the single copy (like database), but for the production use it must be set up as the writer and reader (master/replica) in at least two availability zones. All other components are already designed to be resilient and highly available.
  • bastion/jump box or VPN to limit access to the internal resources only through bastion or VPN
    logging and monitoring (it is mentioned on the Kubernetes diagram, but to make it run we will need more nodes or bigger nodes as it is resource consuming). To monitor Kubernetes and applications we recommend to use Prometheus, Grafana, Alertmanager, and all logs can be aggregated with ELK stack (ElasticSearch + Logstash + Kibana).
  • dedicated SMTP (email) server which will be used to deliver the link with a watermarked file. Maintenance can be simplified using some managed solutions like AWS SNS, Sendgrid or other similar services.

To decide which database fits the Application architecture better, we compared two different solutions: RDBMS and a so-called “no-SQL database”. To avoid vendor-lock, we decided to compare PostgreSQL and MongoDB. Both database engines are free (opensource) and have no vendor-lock like DynamoDB has to Amazon.

The main benefits of RDBMS are transactions and data consistency (formally because they are following the ACID consistency model). It’s a traditional way to store and process data and many companies continue to use it today. The cons of RDBMS databases are usually the following:

  • Complicated process of the schema update. Because you have to specify each column type, size, relationship to other tables, indexes, sometimes it requires additional planning of the update beforehand. There are many tools, which were specially designed to help with the schema migration, such as migrate for the Golang.
  • No sharding support out of the box. You have to develop and maintain your own± sharding schema if you want to scale your application horizontally.

The main benefit of noSQL databases is that they have no schema in the classical meaning and can be fed with any JSON like data. Also you will have sharding out of the box which will do all the magic for you and split the data between several servers. No-SQL databases have the BASE consistency model. The cons of no-SQL databases are usually the following:

  • No support of atomic multi-document transactions
  • Greater size of data over the period of time
  • Fewer options to control the access to the data

Both database types support replication that can help with horizontal scalability of the read, but not the write requests.

I believe that selecting the proper database for such a type of the workload will be biased depending on the person who is making the decision, based on their experience with RDBMS or no-SQL databases. I’m biased to the RDBMS as I have a lot of experience working with them, fine tuning them, with schema migration, and queries optimization. There are plenty of ways to prove the selected point of view like saying that because we want to store information about financial operations in the database, we need to have the ACID consistency that is provided by RDBMS databases. The same way it can be proven that no-SQL should be selected because of the huge amount of rows potentially being stored, and without having the shards it will be hard to scale the solution.
— Dmitrii Sirant, CTO at OpsWorks Co.

Regardless of the decision the following point should be made:

  • Do not store binary data inside of the database
  • Make sure that the application can work with read replica and write master independently to make it easier to scale the load horizontally

If the project reaches the point where it’s not possible to scale with RDBMS anymore, use a no-SQL database for specific or even all data.

Azure: Network Diagrams

Azure Network Diagram
Azure Network Diagram
Backend Pods Control Schema
Backend Pods Control Schema

High-level description

As depicted in the diagrams, the whole system consists of such blocks as:

  • load-balancers
  • AKS (Azure managed Kubernetes service)
  • Storage
  • Queue storage
  • Azure Database for PostgreSQL service

Based on the internal Kubernetes metrics and external metrics (visits per minute, queue length) we can scale our Kubernetes cluster up and down to make sure that we are providing high-quality service to the clients.

As all of the components built are based on the Azure managed services it will be easier to maintain the infrastructure, install security, and software updates. It will be possible to conduct all changes to the infrastructure with zero downtime once we have HA setup for each of the services.

Cost calculation

Microsoft Azure Estimate
Microsoft Azure Estimate

AWS: Network diagram

AWS Network Diagram
AWS Network Diagram

High-level description

As depicted in the diagram, the whole system consists of such blocks as:

  • load-balancers
  • EKS (AWS managed Kubernetes service)
  • S3 Storage
  • SQS Queue
  • RDS PostgreSQL service
  • Lambda function to scale up and down

Based on the internal Kubernetes metrics and external metrics (visits per minute, queue length) we can scale our Kubernetes cluster up and down to make sure that we are providing high-quality service to the clients.

As all of the components are built based on the Amazon managed services it will be easier to maintain the infrastructure, and install security and software updates. It will be possible to conduct all changes to the infrastructure with zero downtime, once we have HA setup for each of the services.

Cost calculation

Amazon Web Estimate
Amazon Web Estimate

Kubernetes internal structure

All components of the application can run inside Kubernetes to fully utilize the benefits it provides: service isolation, resource limits, self-healing, scalability.

Service isolation. Kubernetes has several layers of isolation, such as Pods, Namespaces, Network Policies, RBAC, and can be precisely configured to achieve the highest level of security for the applications that are running on the top of it.

Resource limits. Each container can (and must) be configured with requests and limits for the CPU and Memory. Based on the request values cluster will decide which node is the most suited to run the container to have the most effective spread of containers among available nodes. Limits is the hard limit for the resources available to the container and need to make sure that high load on one container does not have an impact on other containers on the same node.

Self-healing. Each container has internally defined liveness and health probes, which are constantly checked by the Kubernetes cluster to make sure that container is operating properly. If the container stops to respond to the liveness or health probes it will be killed and restarted again.

Scalability. Kubernetes has internal tools (horizontal pod autoscaler) that were designed to make sure that applications can always cope with the traffic that comes to it. If more pods (containers) are required to process the traffic it decides to run more pods on the existing nodes. If a cluster has no nodes with available resources we trigger autoscaling that will spin up more nodes and connect them to the cluster. When traffic drops it will reduce the amount of pods and shut down the nodes that are not in use anymore.

Summary

Based on information above we think that the best solution would be implementing the POC based on AWS, as it is slightly cheaper than Azure. We didn’t make analyses of the Google Cloud or other providers that potentially provide similar managed services and might be cheaper, but the whole architecture was built to make sure that it can be easily implemented on any cloud (AWS, Azure, Google Cloud) or even be multi-cloud if needed.

Further Steps: Process of the PoC/MVP Implementation

  1. Estimate the time we need to build the infrastructure and prepare it for the load testing.
  2. Analyze the source-code and requirements of application, make sure that it can be run inside container
  3. Run the MVP inside Kubernetes and conduct manual tests to ensure that everything is working properly
  4. Develop an automated test that will be used for load testing
  5. Process load testing and do calculations of per-transaction cost

To get a similar comparison document on your specific infrastructure, e-mail us or fill out the contact form on the website.

Related articles

//
Cloud consulting
//
Business development
//
DevOps transformation
//
15 AWS Certifications Achieved by Dmytro Sirant
Learn more
November 20, 2024
//
CI/CD
//
DevOps transformation
//
Automation
//
Cloud solutions
Continuous Integration and Delivery Definition
Learn more
July 12, 2022
//
Microservices
//
Infrastructure optimization
//
//
Building DevOps with Microservices Architecture
Learn more
November 26, 2021

Achieve more with OpsWorks Co.

//
Stay in touch
Get pitch deck
Message sent
Oops! Something went wrong while submitting the form.

Contact Us

//
//
Submit
Message sent
Oops! Something went wrong while submitting the form.
//
Stay in touch
Get pitch deck