An Internet advertising company needed a system to analyze and categorize what content was displayed on sites that hosted ads. This analysis would then be used to customize and control displayed ads, and allow more control over filtering out unsafe content, unverifiable sources, or suspicious advertisements.
There would be a large amount of information to analyze and process, requiring substantial computing resources, a very expensive undertaking without the existing infrastructure to accommodate such a project. A computing center was just too expensive.
Another problem is that computing load changes dynamically, meaning continuous load adjustments for the servers, opening up problems of overworked or idle machines.
So, with today’s technology, the best way to solve this problem is to use:
Cloud service
What exactly does cloud mean?
Cloud computing is the delivery of computing as a service rather than as a product, so that shared resources, software, and information are provided to computers and other devices as a metered service over a network, rather than coming from local, limited, isolated hardware resources.
That means having access to all the resources you need as you need them; how it is implemented at the hardware level is no longer your concern. You can get any number of virtual machines, storage, etc., at any time and immediately.
So we decided to choose Amazon Elastic Cloud.
Cloud scheme
Why Amazon EC2?
1. Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch virtual machine instances with a variety of operating systems, load them with your custom application environment, manage your networks access permissions, and run your image using as many or as few resources as you decide.
2. Flexible rates. You are in control. You pay only for the time of every working instance. You can shut down all instances that you aren’t using, or you can add new instances if you have an extreme load, and it can all be done automatically. This can help save you money – you don’t need to build your own data center or rent a fixed number of expensive servers for your startup.
3. You can organize your own virtual machine image with pre-set OS, applications, libraries, settings, etc. Your IT department won’t spend a lot of time starting new services – they just provide the initial, preconfigured images.
Other advantages
Elastic: Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds, or even thousands of server instances simultaneously. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs.
Completely Controlled: You have complete control of your instances. You have root access to each one, and you can interact with them as you would any other machine. You can stop your instance while retaining the data on your boot partition and then subsequently restart the same instance using web service APIs. Instances can be rebooted remotely using web service APIs. You also have access to the console output of your instances.
Flexible: You have the choice of multiple instance types, operating systems, and software packages. Amazon EC2 allows you to select a configuration of memory, CPU, instance storage, and the boot partition size that is optimal for your choice of operating system and application. For example, your choice of operating systems includes numerous Linux distributions and Microsoft Windows Servers.
Reliable: Amazon EC2 offers a highly reliable environment where replacement instances can be rapidly and predictably commissioned. The service runs within Amazon’s proven network infrastructure and datacenters. The Amazon EC2 Service Level Agreement commitment is 99.95% availability for each Amazon EC2 Region.
Secure: Amazon EC2 provides numerous mechanisms for securing your computer resources.
Amazon EC2 includes web service interfaces to configure firewall settings that control network access to and between groups of instances.
Load balance
To ensure load balance & scalability, the cloud works with a task queue; in classic network architecture a user sends a request to a server, that server then calculates the data and sends back a response.
In order to provide load balancing and scalability, the cloud uses a feature called the “Task queue” (or message queue.) The user sends a request, such as a download an archive, unpack it, parse it and return the data to the user, and that request is put into a queue. The queue is continuously monitored, and when there is some free time it will process the tasks and build a response. However, if all instances are busy, the user’s task will wait until an instance is freed up. In any case, while the task will not be forgotten it may have to wait until all instances of high-loaded – tasks have been processed, and perhaps after some considerable delay. See scheme.
Load balance scheme
How to ensure automatic scalability?
Your developers don’t need to write thousands of rows of code. For example, a procedure written in Python, look like this:
import boto
ec2 = boto.connect_ec2()
key_pair = ec2.create_key_pair(‘ec2-sample-key’)
key_pair.save(‘/Users/patrick/.ssh’)
reservation = ec2.run_instances(image_id=’ami-bb709dd2′, key_name=’ec2-sample-key’)
And that’s all!
Your developers can use simple coding, when your project needs more resources, to process data, and run new instances to increase performance.
Result
Amazon EC2 helped us to implement a powerful and flexible system – it can increase and decrease its computing capacity according to the current project load (elastic scalability). As we paid only for the time of our working instances, we saved a lot of money on servers that were turned off when we did not need them, on turned on only when we did need them.
When our project grew up, we didn’t do any architectural changes or optimization – we just ran new instances, as needed, to ensure computing capacity.
(For developers) Sample application: architecture & processes
The application process for creating preview images of PDF documents for display on a Web site.
The PDF is moved to Amazon S3 from the local Web server. Messages are queued up to process the uploaded PDFs; nodes are booted up in response to the queue count, and the nodes bootstrap themselves with the software necessary to process the PDFs. The nodes then write a result message to a queue that is checked by a background task on the Web server that updates the database with the appropriate record.