High-level Concept Without Cloud Specifics
To make the application highly-available and elastic we decided to extend the original architecture with Message Queue Service (it can be a managed service which is available in each cloud platform such as AWS SQS, Azure Service Bus, or any other implementation of it like ActiveMQ, RabbitMQ, Kafka, and others). It will help to decouple the application and make it scalable. Of course, we can implement this logic on the API side, but it will make it just more complex and less predictable. Having a message queue between the API and backend can scale the number of backends depending on the amount of messages in the queue waiting to be processed. We can leave all of the other components as they are in the original diagram. In our example, the application consists of microservices that should be run on the Linux or Windows platforms. The Kubernetes cluster consists of two nodes of each type.
To reduce the cost of the architecture in the initial stage we omitted the following from the diagram:
- some components in the single copy (like database), but for the production use it must be set up as the writer and reader (master/replica) in at least two availability zones. All other components are already designed to be resilient and highly available.
- bastion/jump box or VPN to limit access to the internal resources only through bastion or VPN
logging and monitoring (it is mentioned on the Kubernetes diagram, but to make it run we will need more nodes or bigger nodes as it is resource consuming). To monitor Kubernetes and applications we recommend to use Prometheus, Grafana, Alertmanager, and all logs can be aggregated with ELK stack (ElasticSearch + Logstash + Kibana).
- dedicated SMTP (email) server which will be used to deliver the link with a watermarked file. Maintenance can be simplified using some managed solutions like AWS SNS, Sendgrid or other similar services.
To decide which database fits the Application architecture better, we compared two different solutions: RDBMS and a so-called “no-SQL database”. To avoid vendor-lock, we decided to compare PostgreSQL and MongoDB. Both database engines are free (opensource) and have no vendor-lock like DynamoDB has to Amazon.
The main benefits of RDBMS are transactions and data consistency (formally because they are following the ACID consistency model). It’s a traditional way to store and process data and many companies continue to use it today. The cons of RDBMS databases are usually the following:
- Complicated process of the schema update. Because you have to specify each column type, size, relationship to other tables, indexes, sometimes it requires additional planning of the update beforehand. There are many tools, which were specially designed to help with the schema migration, such as migrate for the Golang.
- No sharding support out of the box. You have to develop and maintain your own± sharding schema if you want to scale your application horizontally.
The main benefit of noSQL databases is that they have no schema in the classical meaning and can be fed with any JSON like data. Also you will have sharding out of the box which will do all the magic for you and split the data between several servers. No-SQL databases have the BASE consistency model. The cons of no-SQL databases are usually the following:
- No support of atomic multi-document transactions
- Greater size of data over the period of time
- Fewer options to control the access to the data
Both database types support replication that can help with horizontal scalability of the read, but not the write requests.
‘I believe that selecting the proper database for such a type of the workload will be biased depending on the person who is making the decision, based on their experience with RDBMS or no-SQL databases. I’m biased to the RDBMS as I have a lot of experience working with them, fine tuning them, with schema migration, and queries optimization. There are plenty of ways to prove the selected point of view like saying that because we want to store information about financial operations in the database, we need to have the ACID consistency that is provided by RDBMS databases. The same way it can be proven that no-SQL should be selected because of the huge amount of rows potentially being stored, and without having the shards it will be hard to scale the solution.’,
— Dmitrii Sirant, CTO at OpsWorks Co.
Regardless of the decision the following point should be made:
- Do not store binary data inside of the database
- Make sure that the application can work with read replica and write master independently to make it easier to scale the load horizontally
If the project reaches the point where it’s not possible to scale with RDBMS anymore, use a no-SQL database for specific or even all data.