Tag Archives: Aws

HLO – DB MIGRATION AND STREAM PROCESSING ON AWS

Welcome to a new style of blogging, called the High-level overview (HLO) series. In these blogs, I will describe the problem, which usually is something I came across recently, followed by a high-level solution overview of how it could be solved. The goal is to get you to dig deeper into individual components that make this high-level solution possible.

Very recently, I had a call from one of our architects’ who was tasked with comparing different clouds and present a solution to his customer. While the customer was going forward with one solution, they were interested in finding out how a solution would be built out in AWS. The goal here was to have a replication environment for customer’s on-premise SQL server so it can be failed over to. Moreover, the customer wanted to be able to stream the data out of the SQL environment into an elastic MapReduce cluster for data analytics purposes. The customer was also concerned about storing large amounts of data into an archive and be able to retrieve it when needed. Needless to say, all the connectivity needed to be secure.

In summary –

Customer Objectives

OBJ.001 – Need to have functional disaster recovery environments for their data

OBJ.002 – Need to have an effective way to do data processing and modeling while keeping costs low

OBJ.003 – Need to have an archival methodology to fulfill long-term storage requirements.

OBJ.004 – Ensure data in transit is secure.

Functional Requirements

FR.001 – A busy SQL server needed replication, backup and archival to ensure availability, disaster recovery, and long-term storage.

FR.002 – Data from SQL server needs to be pushed into elastic MapReduce (EMR) for data processing and modeling.

FR.003 – Data from EMR needs to be archived for long-term storage purposes.

FR.004 – Secure connectivity to all services over which data would traverse from customer on-prem to a cloud provider.

Below is a high-level illustration of the current deployment as I understood.

 

The Options –

Given the customer objectives and functional requirements, AWS provides multiple products that help us define a solution to satisfy the customer’s use case. Let’s look at these products individually.

1. Amazon RDS – Amazon’s relational database service (RDS) provides scalable DBaaS offering with the ability to migrate and replicate SQL databases.

2. Amazon S3 – Amazon’s object storage offering, the Amazon S3 (Simple storage service) allows you to store large volumes of data with virtually unlimited capacity.

3. Amazon Glacier – AWS glacier is Amazon’s archival storage offering that allows you to archive PB scale data for extremely cheap prices. AWS glacier can also automatically archive data stored on Amazon S3 using life-cycle policies.

4. AWS Kinesis – Amazon Kinesis allows you to collect, process and analyze real-time data streams. Data can be analyzed using AWS Kinesis data analytics which allows the use of standard SQL queries. You can push this kinesis data stream to a stream processing framework like AWS Elastic MapReduce which support Apache Hadoop, Spark, and other big-data frameworks.

5. AWS Lamda – AWS Lamda is Amazon’s serverless technology that allows you to build purpose full functions and calls.

6. AWS SNS – AWS Simple Notification Service (SNS) allows you to trigger notifications or even AWS lambda functions based on pre-defined events.

7. AWS VPN Gateway – Allows you to create secure connections between sites.

8. AWS Storage Gateway – Allows you to deploy a virtual machine instance with different storage options on-premise. This virtual machine replicates all data stored on AWS S3 bucket.

9. AWS Snowball – AWS’s solution to migrate large amounts of data using cold migration techniques

10. AWS DirectConnect – Cost effective private network solution from on-premise to AWS datacenter for migrating large data. The solution can also be used to push network traffic on local networks rather than the internet.

Let’s connect the dots, slowly.

Networking

With AWS VPN Gateway, a customer can connect their on-prem environments securely to the AWS regions. This is crucial and fulfills customer’s FR.004 which requires all data to be secure in transit. It is important to remember that there is a limitation of 5 VPN Gateways per region. This limit can be increased by reaching out to AWS Support. Alternatively, there is an option for the customer to use AWS Directconnect that may be a better option in this scenario provided the customer’s data center is close to an AWS Partner Network provider (APN Technology Partner). AWS DirectConnect offers consistent high bandwidths (10GB) and a private connection into your VPC network on AWS. This means traffic that does not traverse the internet. DirectConnect is also ideal for real-time streaming data and can be used to seamlessly extend the customer’s network into AWS.

Migration

The customer currently has large sets of data that need to be migrated to AWS. While using DirectConnect is an option that allows for high bandwidth transfers, it can get very expensive. Amazon offers AWS Snowball to help transfer cold data into the AWS cloud. The process is simple. Once you put in a request for a snowball device, AWS sends you a secure device to your data center which can be connected to your environment. You can then copy all your data to this snowball device. Once done, you ship the device back to AWS. All data on the device is encrypted and is secure. AWS also offers Snowball edge that offers more compute within the device allowing you to access your data using a local EC2 instance. AWS Snowball has a limit of 50TB to 80TB while the edge device has a limit of 100TB.

For PB scale data migration, AWS offers SnowMobile. This is an 18-wheeler truck with a Container as a Datacenter. The container is transported to your data center and needs to be connected to power and network. Once done, you can copy PB scale data to this environment before Amazon picks it back up.

For a simpler way to transfer data, AWS offers storage gateways. A storage gateway is a light-weight virtual machine that can be deployed in your environment and configured with your AWS account. The gateway uses a local disk and exposes it as an iSCSI drive that is accessible by other virtual machines. Any data stored on this drive is then replicated to your AWS S3 storage account. Storage gateways’ are can be configured for hot, cold and cached data so you have a variety of options depending on your use case. Download of this storage gateway appliance is free of cost and so is the deployment so it has a low “barrier to entry” and can ideally be used for file transfers.

Storage and Archival

AWS’s S3 (Simple Storage Service) is an object data store that offers unlimited storage for files. S3 storage, like any object storage, is accessible over HTTP and HTTPS and can store data securely on AWS’s datacenters. The data is locally replicated but can also be replicated (Cross-region replication) across regions to increase availability. You can even serve these files directly from S3 into your application. An interesting concept of S3 is its ability to have life-cycle policies on your files which are stored in “S3 buckets”. You can set a life-cycle policy to archive all files after a set amount of time and S3 can move them over to AWS glacier – Amazon’s low-cost long-term archival solution.

Alerting

AWS’s Simple notification service (SNS) can be configured to alert the customer based on custom or pre-set triggers. You will find SNS being used in almost everything in AWS. For instance, when you create a new AWS account, the first thing to do is to create an SNS billing alert to ensure that you don’t exceed billing thresholds. SNS can also trigger or get triggered by other AWS services such as Lamda functions (Serverless).

Serverless

AWS Lamda is Amazon’s serverless technology which allows you to run objective-based focus functions based on events or triggers. You can trigger a lamda function to perform a certain task. For example, I can have an SNS service to ensure that billing does not exceed $100 per day. If it does exceed, I can have an event trigger sent to a lamda function that will immediately shut down by instances to save on billing costs.

Data analysis

AWS Kinesis is a solution for real-time data stream analysis. Real-time data can be collected and analyzed using AWS Kinesis data analytics – this can be helpful for this customer because the solution allows using regular SQL queries to analyze data. This data can also be pushed to a stream processing framework such as AWS Elastic MapReduce for big-data analysis before being archived.

Databases

AWS RDS (Relational Database Service) offers a managed database environment which can be readily consumed. You simply deploy database instances and pick a database flavor. Flavors such as Oracle and SQL are supported and can be deployed. This fulfills the customer’s use case where there was a need to migrate SQL database to a remote instance for disaster recovery purposes. AWS RDS allows SQL replication with changed data to be replicated from your primary database. You can even have read-only RDS instances and perform Disaster recovery tests to fulfill your Business continuity plan (BCP).

Putting it all together

 

I encourage you to read more about the different solutions discussed in this blog post. Feel free to comment.

Some important links

AWS Networking

AWS Migration

AWS RDS

Azure vs AWS IAAS/Networking Comparison

This is a good picture of how AWS and Azure IAAS/networking looks like – the side by side product comparison is really helpful when you are looking at both the products and pulling your hair out.

This again is a high level but begins to draw a picture in your head and helps connect dots.

Enjoy!

azurevsaws

You can read more @ MSDN – 

Preview New Products in Amazon AWS

Two products are in preview mode in AWS. This is a brief post about them.

DMS – DMS stands for database migration services. This service allows you to migrate a source database to a target database by means of a replication instance. AWS also allows you to migrate to a different database by using the AWS schema generator tool to generate a new appropriate schema for the new database. The replication appliances or instances are at the heart of the conversion process where all the conversion takes place. It is available in multiple sizes and depends on your performance requirements.

Below is a list of different sizes of the replication instance available.

Some points to remember.

  1. During conversion source db is fully operational and the process is non disruptive.
  2. Any delta changes at source are also fully replicated to the target. The replication appliance takes care of this and also does compression.
  3. There are no hidden charges – you only pay for the size of replication appliance you deploy regardless of your db size.
  4. Transfer between similar databases or transfer among dissimilar databases.
  5. DMS can also be used as a disaster recovery product because of its ability to constantly replicate the changed data from source to target.

Elastic File System – AWS enters the file storage sector by introducing the elastic file system – a file storage system over NFSv4 protocol. We know that S3 provides object storage and EBS provides block storage for AWS work loads. Now EFS provides file storage allowing for easy consumption for your workloads.

Some points to remember.

  1. EFS has no base charge and is pay as you go at 0.30 cents a GB. You only pay for what you use.
  2. EFS is mounted directly to your EC2 instance and multiple EC2 instances can share a EFS mount.
  3. EFS is backed by SSD storage and there is no additional cost for performance gains.
  4. EFS in theory is advertised as unlimited growth.
  5. Multiple EC2 instances in different Availability zones in a region can access the same EFS instance.
  6. EFS instance is replicated across multiple availability zones.

Hope this gave you a pretty high level over view of what AWS has up and coming.