Case studies ETSY , Netflix

As a Web site where individuals can sell handmade, vintage items, and craft supplies, provides a market for creative members to sell their items online. When people join Etsy, they can post their items under applicable categories, enabling buyers to search for and locate items quickly. Etsy members reside in over 150 countries across the globe.In 2009, the company acquired Adtuitive, a startup Internet advertising company. Adtuitive’s ad server was completely hosted on Amazon Web Services and served targeted retail ads at a rate of over 100 million requests per month. Aduititve’s configuration included 50 Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Block Store (Amazon EBS) volumes, Amazon CloudFront, Amazon Simple Storage Service (Amazon S3), and a data warehouse pipeline built on Amazon Elastic MapReduce. Amazon Elastic MapReduce runs on a custom domain-specific language that uses the Cascading application programming interface.

Today, Etsy uses Amazon Elastic MapReduce for web log analysis and recommendation algorithms. Because AWS easily and economically processes enormous amounts of data, it’s ideal for the type of processing that Etsy performs. Etsy copies its HTTP server logs every hour to Amazon S3, and syncs snapshots of the production database on a nightly basis. The combination of Amazon’s products and Etsy’s syncing/storage operation provides substantial benefits for Etsy. As Dr. Jason Davis, lead scientist at Etsy, explains, “the computing power available with [Amazon Elastic MapReduce] allows us to run these operations over dozens or even hundreds of machines without the need for owning the hardware.”

Dr. Davis goes on to say, “Amazon Elastic MapReduce enables  to focus on developing Hadoop-based analysis stack without worrying about the underlying infrastructure. As  cycles shift between development and research,  software and analysis requirements change and expand constantly, and [Amazon Elastic MapReduce] effectively eliminates half of  scaling issues, allowing  to focus on what is most important.”

Etsy has realized improved results and performance by architecting their application for the cloud, with robustness and fault tolerance in mind, while providing a market for users to buy and sell handmade items online

How Cloud Computing Helped Netflix Emerge as a Streaming Media Powerhouse

Netflix may be getting a lot of bad press in recent times due to its management’s ill-advised decision to raise subscription rates by almost 50% resulting in widespread customer dissatisfaction and a groveling apology by CEO Reed Hastings, but it was not long ago that it was considered the epitome of home entertainment.Netflix is another new-age company that owes its success to cloud computing, the same way that Zynga, the creators of Facebook game sensation Farmville, does (See: Zynga, the Latest Cloud Computing Success). And not surprisingly, for both of them, the cloud provider of choice is Amazon, perhaps the earliest player in the game.

Although Netflix began life as a DVD-by-mail service in 1997, it was with the introduction of the on-demand streaming service that it saw a huge expansion of its customer base. In fact, when it crossed 10 million subscribers in 2009, it “attributed the recent surge in subscribers to growing consumer recognition of the value and convenience offered by Netflix and increasingly more ways to instantly watch a growing library of movies and TV episodes from Netflix on PCs, Intel-based Macs and TVs.”

Not surprisingly, this model was soon adopted by many other providers like Fox’s Hulu, Amazon and even Google, who created a paid version of YouTube content. Now, running such a service required a level of flexibility, resource optimization and redundancy that traditional data centers were ill-equipped to provide (See: Virtualization: The Virtual Way to Real-World Benefits). That is why Netflix today relies almost exclusively on cloud services for its infrastructure.

This point was reiterated by Netflix Cloud Security Architect Jason Chan in his presentation “Practical Cloud Security” at the United Security Summit in San Francisco. During his presentation, Chan articulated the advantages that being on the cloud provided Netflix, advantages that were not possible with traditional IT infrastructure.

Chen explained that in a traditional data center, applications are long-lived, code is pushed to running systems, and it can be difficult to enforce deployment patterns such as patches. However, on the cloud, new versions are written which replace the old ones entirely with new instances, eliminating the need for patches. Also, while earlier repeatable tasks such as adding a user account, changing firewall configurations or forensic analysis required multiple steps and interfacing with multiple systems, “these tasks are a simple API call with cloud.” Moreover, with systems being added to groups that control the connectivity, “there’s no one chokepoint” like the traditional firewall.

The key lesson  learned is you have to leave the old ways behind,” However,  that moving to the cloud did introduce some specific security concerns that had to be addressed. With Amazon having launched a similar service in direct competition with its own customer Netflix (See: Is Amazon’s Cloud Player a Game Changer in the Music Industry? ), this space should see some interesting developments in the near future.


Amazon web services : Business Innovation in cloud

Amazon Web Services (AWS) is making the case for its cloud platform as a driver of business innovation, saying that as the cost of using its infrastructure falls, so does any risk associated with a new venture.

The firm also argues that AWS is now a mature and robust enough platform for enterprise workloads, citing some customers using its infrastructure to operate even mission-critical applications.

At the AWS Summit in London, chief technology officer Werner Vogels said the cloud platform has had a fundamental impact on how IT has evolved since it launched in 2006. He stressed the firm’s commitment to openness and value as reasons for the success of AWS.

“We do not lock you in to any type of technology. You can choose any operating system and any application; you can run them all on AWS. There is no contract to force you to be our customer for say, five years, and this means we need to be on our toes – if you not satisfied, you can just walk away,” he said.

As Amazon continues to expand, this drives economy of scale and cuts costs, which the firm passes on to customers to keep them happy, with some customers seeing a 40 percent reduction in their bills at start of 2012. But this also helps to ensure customer success, according to Vogels.

“If we can get the cost of computing down low enough that you don’t need to worry about it, then the type of new applications we can help create will be enormous. Our aim is make infrastructure so cheap that it will drive innovation,” he said.

Vogels claimed that economics rather than technology is driving cloud uptake, with customers realising that they can gain access to IT resources quickly without any purchase cost, and only pay for what they use.

“You increase innovation when the cost of failure approaches zero, and so you can stop wasting money on IT, and spend it on the things that really matter for your business – building better products,” he said.

Amazon: cloud computing driving business innovation

Meanwhile, Amazon’s chief information security officer Stephen Schmidt detailed some of the lengths the firm goes to in order to safeguard customer data. These include ongoing security vetting of staff, restricting access to the AWS infrastructure to key staff, and all accesses logged.

The firm is clearly doing something right, as it was able to line up several customers at the AWS Summit which are increasing their use of the Amazon cloud platform.

News International chief information officer Chris Taylor said that AWS now provides about 20 percent of the organisation’s total compute power, and this is likely to expand.

“We’ve virtualised 90 percent of our infrastructure now, and will soon be at 100 percent. After that, the desire is to move it to the cloud,” he said.

AWS gave News International early access to its DynamoDB cloud database service to power the access control system behind the publisher’s paywall, according to Taylor.

DynamoDB is a fast, fully managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic. All data items are stored on Solid State Drives (SSDs), and are replicated across 3 Availability Zones for high availability and durability.

With DynamoDB, you can offload the administrative burden of operating and scaling a highly available distributed database cluster, while paying a low price for only what you use.

“That allows us to deal with over 45 million transactions per month from just two mid-size EC2 instances,” he claimed.

The Dutch central bank has also approved AWS as an outsourcing platform for the financial services industry, according to Amazon. Vogels also offered some predictions for future trends, including real-time analytics and use of encryption to protect data becoming routine.

“Analytics is still in flux. In the next two years, many prods will go real-time with feedback loops based on real-time analysis of data,” he said.

Meanwhile, Hadoop and MapReduce will become invisible, according to Vogels.

“Expect to see many layers built on Hadoop to make it more efficient and easier to use, with richer environments built on top of Hadoop and MapReduce.”

Vogels promised that customers can expect to see new encryption tools from AWS in the near future to help protect data.

“Encryption is going to be the most important tool to use in the coming years. You should be using it not just in the cloud, but also on-premise to protect your customers,” he said.

Innovation Types with cloud computing

A commonly held belief is that cloud computing—a utility model for computing capacity, software and business functionality—is a phenomenon whose value resides primarily in reducing IT costs. In fact, the flexibility that the cloud makes possible for infrastructures, services and processes means that it is also capable of driving significant innovation.

This is a key finding of new research from the London School of Economics and Accenture, based on a survey of 1,035 business and IT executives, as well as in-depth interviews with more than 35 service providers and other stakeholders.

The innovation trajectory of the cloud will be cumulative. Beginning first with technology and operational changes, its effects will then be felt at the business process level, changing the way companies operate and serve customers. It will be capable of delivering market innovations that enhance existing products and services, create new ones and enable entry into new markets. Finally, the cloud will support new ways of designing corporations themselves.

Reducing the friction: Operational innovation

One of the key ways that cloud computing supports operational and technological innovation is by moving an organization more briskly through the experimental or prototyping stages—or, as some of our interviewees put it, by “reducing the friction” of development. In a cloud model, companies acquire processing, storage or services when they need them, then can quickly decommission those resources when they are not needed.

Such a model supports “seed and grow” activities and faster prototyping of ideas. With traditional IT models, a decision to prototype a new system generally involves the procurement and installation of expensive hardware, with the associated checks and delays that conventional purchasing requires. Cloud provisioning, on the other hand, can be implemented rapidly and at low cost.

That means the cloud can also reduce the risks of operational innovation. Projects and processes that would have been too risky to attempt if they required a large capital investment become worth attempting if unsuccessful experiments can be decommissioned easily.

Language of the business: Process innovation

A distinctive feature of cloud computing is its ability to hide the technical complexity of solutions. The acquisition and deployment of IT becomes almost secondary. Companies are actually deploying a process or service, and that means business and IT executives need no longer try to communicate across a technology gap. They can speak a common language about what the business seeks to do and how it intends to do it.

Steve Furminger, group chief technology officer of global digital marketing agency RAPP UK, underscores this point by noting that the cloud is providing his company with the ability to produce solutions more rapidly without needing to be concerned at such a detailed level about how they are going to do it from a technical perspective: “Just a few years ago, that was a massive concern. Now we can almost forget the technology and just think in terms of what we want to do from a business perspective.”

To illustrate this point, consider an organization’s desire to innovate within its processes and technologies related to sales support—the ability to track contacts, manage and convert the sales pipeline, and generate revenue. With older models of IT solutions, a company would be restricted to particular packages or platforms—forced, in other words, to change its processes to match the software. With a cloud model, companies can think about processes at a level that is more detailed and personalized to their individual needs, but the solution will not need to be customized in older, prohibitively expensive ways. The company could provision a combination of software as a service for sales, along with an enterprise system or financial management system. Sales personnel could have access to specialized sales support over the cloud.

This ability to envision new combinations of cloud-based solutions and create new ways of performing end-to-end processes presents companies with new opportunities to be innovative in new-product development as well as in service and support.

Alignment of development cycles: Business innovation

Information technologies from their earliest days have represented enormous potential to deliver game-changing innovations. Although IT has become the lifeblood of the modern corporation, the path from point A (IT innovation) to point B (business value) has often been a tortuous one.

In part, this is because IT capability cycles and business demand cycles have rarely coincided. It has often taken up to 10 years for a new, major IT capability (client/server computing, for example) to be fully realized in terms of business value, while most businesses operate on near-term planning horizons. But the short development times made possible by cloud computing mean that business and technology have reached a fortuitous point in both cycles where they intersect.


This alignment enables IT innovation to more effectively drive business innovation. Service providers must maintain constant relevance. As Tim Barker, vice president of strategy for Europe, Middle East and Africa of notes, subscriptions or renewals are due every quarter or even every month. This supports the alignment of a company’s entire business to the success of that project and the success of the customer.

Although moving to the cloud may be disruptive to the existing IT function, it enables CIOs to have meaningful answers to board-level questions about the value being delivered by the current organizational IT environment, including how much it costs and how quickly new services can be provisioned.

The next phase: Innovation in business design

For especially forward-looking companies, cloud computing may provide a platform for radical innovation in business design—to the point where executives are actually provisioning and decommissioning parts of the business on an as-needed basis.

This is a step beyond software as a service or infrastructure as a service, and amounts to the offering of “business processes as a service”—configured business services and processes provided from the cloud. These would be assessed not just through typical service-level agreements but against key business performance indicators.

Although extremely promising in concept, the idea of adaptive business design heightens the importance of the integrator role. The traditional systems integrator would become, in effect, a business integrator charged with managing complex collaborations across a broader ecosystem of internal resources, partners, vendors and others. As business design or business architecture innovators, integrators would connect and manage business services in configurations that change as business needs and goals change.

Such collaborative, innovative relationships hint at a new agile and adaptive organizational form. Knowing what such a corporation might look like is difficult, but we can see glimpses of it by looking beyond the business community to the organization of particle physicists working at CERN on the Large Hadron Collider (LHC).

To make possible the staggering 15 million gigabytes of data that are being produced every year by the LHC’s experiments, there was a need to create a global organization of collaborators. More than 140 computer centers (each part of a university or research facility) work together to pool their processing resources into a grid computing infrastructure. A globally distributed platform–based on cloud technologies and run as a service—was developed. It is managed collectively by a loosely organized confederation of physicists and their data centers.

Organizational governance in this environment evolves to match the challenges and opportunities. The new organization connects the computer centers through loose memoranda of understanding and business processes, particularly around support, data analysis and technology upgrades. The bureaucratic hierarchies are limited in scope and power, and most work is achieved through collaboration among equals. This structure provides a kind of first look at how an agile, innovative global organization can be created when founded upon collaboration and shared cloud-based technology.

Some companies will perform better than others, of course, when it comes to harnessing cloud-based innovation. Organizational readiness will be key—that is, the ability of the corporate culture and leadership to recognize innovation-based opportunity and move quickly—as will implementation abilities. Above all, companies who are able to collaborate across a wider ecosystem of internal and external players will be at an advantage in capitalizing on the responsiveness and agility that the cloud delivers to the business.

 Infrastructure as a Service (IaaS) – Startups requiring the power only supercomputers can provide are able to deploy the resources of massive data centers without one dime in capital investment. With funding from family and friends, Animoto was started by a some young techies that worked for MTV, Comedy Central and ABC Entertainment who knew how to make professional quality video animations. Now their Cinematic Artificial Intelligence technology that thinks like an actual director and editor and high-end motion design bring those capabilities to anyone wanting to turn their photos or videos in to MTV-like videos. At one point, aside from some monitors and an espresso coffee machine Animoto had few actual assets. That’s because everything, including server processing, bandwidth and storage, is handled by cloud computing, a pay-as-you-use model. So when the Animoto application launched on Facebook, causing the number of users to soar from 25,000 to 750,000 in four days and requiring the simultaneous use of 5,000 servers, business carried on as usual. Without the ability to handle a spike like that, their business couldn’t exist. Meanwhile, it’s not just youngsters using IaaS. The New York Times processed four terabytes of data through a public cloud by simply using a credit card to get a new service going. In a matter of minutes it convert

ed scans of more than 15 million news stories into PDFs for online distribution—$240! Look, Ma, no New York Times IT infrastructure needed. Both Animoto and the New York Times observed new opportunities made possible in the Cloud, and acted decisively.
Platform as a Service (PaaS) – With PaaS, software developers can build or “mashup” Web software without installing servers or software on their computers, and then deploy those software applications without any specialized systems administration skills. PaaS service providers not only incorporate traditional programming languages but also include tools for mashup-based development, meaning that deep IT skills are not needed to build significant software. The implications for business innovation center on rapid development and rapid testing via multiple OODA Loops in the Cloud, making it possible to bring new products and services to market without the traditional 18-month IT development cycle or capital expenditures. Innovations that don’t pan out can be shut down, allowing a company to fail early, fail fast. Remember, innovation must allow for failure, else nothing really new is being done. On the other hand, innovations that prove successful can be scaled up to full Web scale in an instant. In short, PaaS takes traditionalIT software development off of the critical path of business innovation.
Software as a Service (SaaS) – With SaaS we are witnessing a huge shift from IT to BT (Business Technology). In the past, IT was about productivity. But now, BT is about collaboration, a shared information base and collective intelligence (the wisdom of crowds, social networking and crowdsourcing). SaaS is the delivery of actual end-user functionality, either as “services” grouped together and orchestrated to perform the required functionality or as a conventional monolithic application (e.g., CRM, ERP or SCM). The real driver for SaaS isn’t the traditional IT application; it’s the “edge of the enterprise” where business users require a flexible model to deploy new technologies to improve front-office performance. As a growing number of business units tap SaaS offerings without going through their central IT department, we have the advent of “Shadow IT.” The key significance is that while IT has a major role in the enterprise back office(transaction processing and systems of record), these new requirements are directly associated with “go-to-market” activities and will be subject to constant change via OODA Loops. These new requirements must be met very quickly for competitive purposes; some are likely to endure for only a few months; and their costs will be directly attributed to the business units consuming theneeded “services” and paying as they go.
Now consider operational innovation inside a huge company like GE blending both internal clouds and going beyond the firewall to reach out to suppliers in the Cloud. GE’s supply chain is huge, including 500,000 suppliers in more than 100 countries that cut across cultures and languages,buying up $55 billion a year. GE wanted to modernize its cum
bersome home-grown sourcing system, the Global Supplier Library, build a single multi-language repository, and offer self-service capabilities so that suppliers could maintain their own data. So did CIO Gary Reiner and team start programming? The short answer is “ no.” GE looked to the Cloud for a solution. GE engaged SaaS vendor Aravo to implement its Supplier Information Management (SIM) SaaS that would ultimately become the largest SaaS deployme nt to date. GE is deploying Aravo’s SaaS for
100,000 users and 500,000 suppliers in six languages. When GE goes outside its firewall toinnovate, you can bet that other CEOs will be asking their CIOs lots of questions aboutharnessing the Cloud for operational innovation.
BPM as a Service (BPMaaS) – Business Process Management (BPM) is what sets “enterprise cloud computing” apart from “consumer cloud computing.” Because the average end-to-end business process involves over 20 companies in any given value chain, multi-company BPM is
essential to business innovation and maintaining competitive advantage. Bringing BPMcapabilities to the Cloud enables multiple companies to share a common BPM environment and fully participate in an overall end-to-end business process. BPMaaS can be implemented as a“horizontal” Business Operations Platform (BOP) that has a Business Process Management
System (BPMS) at its heart. This is similar toPaaS, but rather than programming tools being accessed, the BPMS is being accessed for fu
ll process lifecycle management and specific process services such as process modeling and business activity monitoring. For example, using
a Business Operations Platform from Cordys, Lockheed Martin has deployed a Cloud-basedCollaborative Engineering systemto orchestrate the work of hundreds of subcontractors that have disparate product lifecycle management(PLM) and CAD/CAM systems. This represents
one of the world’s most complex enterprise computing environments now being addressed by
cloud computing. Meanwhile, Dell, Motorola, Boeing, Avon, Panasonic, IBM and othermultinationals use e2Open’s Business Network to provide complete demand and supply chain management in the Cloud.
Nowhere is the OODA Loop more applicable than in supply chain management, especially if you consider the massive disruptions that resulted from the tsunami in Japan or the need to bring newproducts and services to market with greatspeed. While BPMaaS can enable companies tomanage business processes more efficiently, it’s real business innovation impact is that it canalso empower entirely new business models that dynamically integrate demand-supply chain partners into virtual enterprise networks that offer compelling value to customers. Jasmine Young,a Facilitator at the Haas School of Business Institute for Business Innovation, summarized, “The Cloud is about leverage, the way credit is leverage in the financial industry. Businesses need to think about how they can leverage their suppliers and partners—and customers. And that’s how
the case toward innovation in the Cloud can best be driven.” By aggregating more and moreofferings for their customers, industry boundaries become blurred as smart competitors enter markets outside their primary industries. ExxonMobil is in the gourmet coffee business. Starbucks is in the Internet business. Wal-Mart is in retail banking. Microsoft is in the telephone businesswith its acquisition of Skype.
For now, let’s just leave it that all this OODA Loop activity happens in the Cloud, for it’s not Industrial Age assets that must be managed, it’s digital immediacy and the weaving of a digital tapestry among our customers and trading partne rs that counts in the 21st Century business innovation dogfights
Jason Stowe, founder and CEO of Cycle Computing, about his experiences as an entrepreneur who built his business on the cloud and offers the chance for others to do the same is as follows. Cycle delivers high-bandwidth supercomputer capabilities to scientific, engineering and technical firms — many of which are startups.  “Any size organization can now tap into supercomputing power, from big companies to start-ups to individual researchers,” he says. He even coined a term for what his firm is offering: “utility supercomputing.” Essentially, thanks to cloud, Cycle can make supercomputing power available to the masses.And lots of startups and small businesses are taking advantage of this relatively new cloud resource.  Stowe gives examples: a chip design firm runs simulations of its digital circuits on his firm’s CycleCloud clusters. Researchers at a bioinformatics start-up use Cycle’s cloud to index and query genomics data to help fight disease. A young, up-and-coming scientific instrument company uses Cycle’s clusters to process the high volume of data that comes off their products.“In these cases, start-ups can focus on their core-competency while still accessing a supercomputer that only Fortune 100s could build and operate before,” says Stowe.  Many of the startups he works with would not have been able to get off the ground without cloud offerings such as that Cycle is offering. “Science-heavy start-ups would require much larger capital investments to get off the ground if they didn’t take advantage of cloud and utility supercomputing offerings,” says Stowe. “For example, 30,000-core cluster for top-five pharma would have cost $5 to $10 million and about six months to build.”  With Cycle’s cloud offering, the project took eight hours to implement, at a cost of about $10,000.

Cloud Computing Now Makes It Easier (and Cheaper) to Innovate: Study

 It’s the new mantra of organizations large and small as they attempt to navigate and get the upper hand in today’s hyper-competitive and unforgiving global economy.

But innovation is not cheap. It can be extremely risky, since  a relatively small percentage of innovations actually deliver results in the end. The challenge is trying to figure out where to invest wisely, and which innovation is the potential winner. The natural reflex in the business world has been to avoid going overboard with innovation, since it means sinking considerable time and resources into ideas that don’t get off the ground.

However, cloud computing technology may be clearing the way to turn formerly hidebound businesses into innovation factories.  That’s because it now offers a low-cost way to try and fail with new ideas. In essence, the price of failure has suddenly dropped through the floor. Failure has become an option.

A recent survey of 1,035 business and IT executives, along with 35 vendors, conducted by the London School of Economics and Accenture, has unearthed this new emerging role for cloud computing — as a platform for business innovation. Many people these days still see cloud within it’s information technology context, as a cheaper alternative for existing systems. But this may only be the first and most obvious benefit.

The study’s authors. Leslie Willcocks, Dr. Will Venters and Dr. Edgar Whitley — all of the London School of Economics and Political Science — identified three stages cloud computing moves into as it’s adopted by organizations:

1) Technology and operational changes. The one-of-one exchange of traditional applications and resources for those offered as services through the cloud — such as Software as a Service.

2) Business changes. Altering the way companies operate and serve customers, such as enabling faster service, faster time to market.

3) New ways of designing corporations themselves. “For especially forward-looking companies, cloud computing may provide a platform for radical innovation in business design—to the point where executives are actually provisioning and decommissioning parts of the business on an as-needed basis,” the study’s authors observe.

It’s in that third phase where things get really interesting. Cloud computing, the authors point out, enable companies to quickly acquire processing, storage or services as needs dictate. They can just as quickly shed those resources when a project is completed. As a result, companies with more advanced cloud sites are able to rapidly move through experimental or prototyping stages:

“Such a model supports “seed and grow” activities and faster prototyping of ideas. With traditional IT models, a decision to prototype a new system generally involves the procurement and installation of expensive hardware, with the associated checks and delays that conventional purchasing requires. Cloud provisioning, on the other hand, can be implemented rapidly and at low cost.”

An example, cited in the study, is an effort to innovate within processes and technologies related to sales support—for example, tracking contacts, managing and converting the sales pipeline, and generating revenue. Change would be difficult, if not impossible, for processes locked into traditional on-site IT systems. Consider the possibilities with cloud:

“The company could provision a combination of software as a service for sales, along with an enterprise system or financial management system. Sales personnel could have access to specialized sales support over the cloud. This ability to envision new combinations of cloud-based solutions and create new ways of performing end-to-end processes presents companies with new opportunities to be innovative in new-product development as well as in service and support.”

A couple of years back, Erik Brynjolfsson and Michael Schrage made similar observations about technology’s promise to lower the costs and risk of innovation in an article in MIT Sloan Management Review. It’s all about the power of online real-world simulations and samplings, which reduce the cost of testing new ideas to pennies. For example, with a Website, “companies can test out a new feature with a quick bit of programming and see how users respond. The change can then be replicated on billions of customer screens.” This capability can be extended to supply chain management and customer relationship management systems as well.

Implementation of new ideas is blindingly fast, Brynjolfsson and Schrage stated. “When a company identifies a better process for screening new employees, the company can embed the process in its human-resource-management software and have thousands of locations implementing the new plan the next morning.” Brynjolfsson and Schrage also predicted that thanks to technology, many companies will shift from conducting two or three real-world experiments to 50 to 60 a year.

“Technology is transforming innovation at its core, allowing companies to test new ideas at speeds—and prices—that were unimaginable even a decade ago,” they said.  “Innovation initiatives that used to take months and megabucks to coordinate and launch can often be started in seconds for cents.”

We’ve already seen the impact of technology to shave tremendous time and costs in such areas as energy exploration and engineering. But now the ability to quickly test and deploy new innovations is available to all types of businesses. Add the ability to provision those workloads to on-demand cloud resources, and a huge weight — in cost and risk — has been lifted off innovation.

Over the past several months, Forbes has published a number of pieces in this space about Oracle’s adventures in the wild, wild world of cloud computing.  They have touched in the business value the cloud can create, the profound impact the cloud is having within companies and across entire industries, and the vast potential the cloud offers for liberating IT dollars to fund innovation.

The cloud and its potential to help companies shift their IT spending away from infrastructure and integration and toward innovation, revenue growth, and customer engagement. Unless CIOs and their colleagues can get this IT-spending ratio under control, their companies will be unable to fund innovative new initiatives, and their ability to compete will decline. And cloud computing is the best way to begin attacking that problem.

The Cloud Revolution and Creative Destruction.  That’s the central issue: what customers are doing with cloud solutions, how they’re able to free up more funding for growth and innovation, and how they’re able to adapt more rapidly and effectively to life in our customer-driven world. They ’ll begin to see the real the real creative-destruction power of the cloud unleashed when  they begin to define the cloud in terms of what business customers want and need, and when stop diddling around with inside-baseball constructs that mean little or nothing to the businesspeople who are ready to spend many tens of billions of dollars on cloud solutions that focus on and deliver business value.”

Google App Engine

Why App Engine

Google App Engine enables you to build web applications on the same scalable systems that power Google applications. App Engine applications are easy to build, easy to maintain, and easy to scale as your traffic and data storage needs grow. With App Engine, there are no servers to maintain: You just upload your application, and it’s ready to serve to your users.

App Engine is a complete development stack that uses familiar technologies to build and host web applications. With App Engine you write your application code, test it on your local machine and upload it to Google with a simple click of a button or command line script. Once your application is uploaded to Google we host and scale your application for you. You no longer need to worry about system administration, bringing up new instances of your application, sharding your database or buying machines. We take care of all the maintenance so you can focus on features for your users.

You can create an account and publish an application that people can use right away at no charge from Google, and with no obligation. When you need to use more resources, you can enable billing and allocate your budget according to your needs. Find detailed pricing for usage that has exceeded the free quota on our Billing page.

Automatic Scalability

For the first time your applications can take advantage of the same scalable technologies that Google applications are built on, things like BigTable and GFS. Automatic scaling is built in with App Engine, all you have to do is write your application code and we’ll do the rest. No matter how many users you have or how much data your application stores, App Engine can scale to meet your needs.

The Reliability, Performance, and Security of Google’s Infrastructure

Google has a reputation for highly reliable, high performance infrastructure. With App Engine you can take advantage of the 10 years of knowledge Google has in running massively scalable, performance driven systems. The same security, privacy and data protection policies we have for Google’s applications applies to all App Engine applications. We take security very seriously and have measures in place to protect your code and application data.

Currently, Google App Engine supports Java, Python, PHP, and Go. Additionally, your website templates can include JavaScript along with your HTML which, among other things, allows you to write AJAX-enabled web applications.

Google App Engine made a splash when it launched in the spring of 2008. It was different from most other cloud systems back in the day because it was neither IaaS (Infrastructure-as-a-Service, e.g., Amazon EC2) nor SaaS (Software-as-a-Service, e.g., Salesforce). It was something in-between and ushered in the era of PaaS (Platform-as-a-Service). Instead of a fixed application (SaaS) or raw hardware (IaaS), App Engine managed your infrastructure for users. Furthermore, it provided a development platform… users get to create apps, not used the one provided by the cloud vendor, and it leveraged the infrastructure as a hosting platform.

The development-to-release cycle is minimized because high-level services that developers would normally have to build are already available via an App Engine API. A development server is provided to let users test their code (with certain limitations) before running in production. And finally, deployment is simplified as Google handles that all. Outside of setting up an account and billing structure, there is no machine setup or administration as Google takes care of all logistics there too. Even as your app is running with fluctuating network traffic, the App Engine system auto-scales up to allocate more instances of your app as needed, then similarly releases resources when no longer needed.

A web developer can now use Google’s infrastructure, finely tuned for speed and massive scaling, instead of trying to build it themselves. In the past, developers would create an app, generally need a machine or web hosting service that could host a LAMP stack, administer each of the “L”, “A”, “M”, and “P” components, and somehow made the app globally accessible. Moreover, developers were also generally responsible for the load-balancing, monitoring, and reporting of their systems, and to reiterate one of the most difficult and expensive things to build yourself: scaling. All of these are taken care of by App Engine.

By now, you have a good idea as to why Google developed App Engine. To put it simply, to remove the burden of being a system administrator from the developer. Using a LAMP stack involves choosing a distribution of Linux, choosing of the kernel version, etc., configuring PHP and an Apache web server. There is also the need to run and manage a database server (MySQL or otherwise) and other aspects of a running system (monitoring, load-balancing, reporting). The list continues with managing user authentication, applying software patches and performing upgrades, each of which may break your app, bringing even more headaches for developers/sysadmins.

Other than the actual application, everything else is nearly orthogonal to the solution that developers are trying to create for their customers. App Engine attempts to handle these complexities to let you focus on your app(s). An app running on App Engine should be easy to build, manage, and scale.

PaaS and what isn’t App Engine?

Some users confuse Google App Engine with Amazon’s EC2 service. The problem is that this is an apples to oranges comparison. Both operate at different cloud service levels, and each have their strengths and minuses. With App Engine, you only need to worry about your application and let Google take care of hosting and running it for you. With EC2, you’re responsible for the app, but also its database server, web server, operating system, monitoring, load-balancing, upgrades, etc. This is the reason why typically, the costs for IaaS services run lower than that of PaaS services because with PaaS, you’re “outsourcing” more work/responsibility. Cost estimates usually clouded by not considering the administration overhead when managing the infrastructure yourself. A better “apples-to-apples” comparison would be EC2 to the Google Compute Engine IaaS service.

PaaS systems also differ from that of the SaaS layer above, as SaaS applications are fixed and must be taken directly from the respective cloud vendor. Unless you work with or at the vendor, you cannot change the SaaS application you use. It’s quite the opposite with PaaS systems because you (as a PaaS user) are the developer, building and maintaining the app, so the source code is your responsibility. One interesting perspective is that you can use a PaaS service to build and run SaaS apps!

Language Runtimes

App Engine lets engineers use familiar development tools and environments to build their applications. This includes the Python, Java, and Go languages. Because App Engine supports Java, a host of other languages which run on the Java Virtual Machine (JVM) are also supported… these include (but are not limited to): Scala, Ruby (JRuby), Groovy, PHP (Quercus), JavaScript (Rhino), and Python (Jython). (If you’re curious about the Jython support (running Python on the Java runtime vs. the pure Python runtime), it’s a great way to leverage a legacy Java codebase while developing new code in Python as Jython code can work with both Python and Java objects.)


Security is gravely important. Developers (typically) would not be interested in letting other applications/users get any kind of access to their application code or data. To ensure this, all App Engine applications run in a restricted environment known as a sandbox.

Because of the sandbox, applications can’t execute certain actions. These include: open a local file for writing, open a socket connection, and make operating system calls. (There used to be more restrictions, but over time, the team has tried to bump up quotas and remove as many restrictions as they can. These don’t make the airwaves as much as bad news does.)


Any network developer would say, “Without being able to support two-way socket connections, you can’t create useful applications!” The same may be true of the other restrictions. However, if you think about it carefully, why would you want use these lower-level OS features? “To make useful apps with!” You want to use outbound sockets to talk to other processes, and perhaps you may want inbound sockets to listen for service requests.

The good news is that the App Engine team knows what you want, so that’s why the team has created a set of higher-level APIs/services for developers to use. Want your app to send and receive e-mail or instant messages? That’s what the e-mail and XMPP APIs are for! Want to reach out to other web applications? Use the URLfetch service! Need Memcache? Google has a global Memcache API. Need a database? Google provides both its traditional NoSQL scalable datastore or access to the relational MySQL-compatible Google Cloud SQL service.

The list of all the services that are available to users changes quite often as new APIs are created. At the time of this writing, the following services/APIs are available to developers:

  • App Identity
  • Appstats
  • Backends
  • Blobstore
  • Capabilities
  • Channel
  • Cloud SQL
  • Cloud Storage
  • Conversion
  • Cron
  • Datastore
  • Denial-of-Service
  • Download
  • Federated Login (OpenID authentication)
  • Files
  • (Full-Text) Search
  • Images
  • LogService
  • Mail
  • MapReduce
  • Matcher
  • Memcache
  • Namespaces/Multitenancy
  • NDB (new database)
  • OAuth (authorization)
  • Pipeline
  • Prospective Search
  • Task Queue (Pull and Push)
  • URLfetch
  • Users (authentication)
  • WarmUp
  • XMPP

You can read more about most of these APIs in the official APIs docs pages. (Docs for the others are available but not on this page.) Also, The Google App Engine team is constantly adding new features, so keep your eyes on the Google App Engine blog for announcements on new and updated services and APIs.


One of the benefits in choosing to host your apps on PaaS systems is being freed from administration. However, this means giving up a few things… no longer do you have full access to your logs or be able to implement custom monitoring of your app (or your system). This is further impacted by the sandbox runtime environment mentioned above.

To make up for some of this lack of access to application and system information, Google has provided various tools for you to gain a better insight into your app, including its performance, traffic, error rate, etc.

The first tool is an administration console. (App Engine provides two “admin consoles” actually.) A fully-featured version is for your application running in production while the other one is a lightweight version for the development server.

. The team has added so many new features that the current incarnation of the dashboard includes far more features than are illustrated here. However, the general structure and information displayed is relatively the same.

Another tool is a general system status page, While it is not an indication of how any one particular app is doing, it does show what is going on with the system as a whole.

The final tool is Appstats. It is in the same class as a profiler but custom-made to help you determine inefficient ways your code may be interacting with App Engine services (rather than traditional profiling of code coverage, memory usage, function call metrics, program behavior, etc.). Its specific use is best described in App Engine team’s introductory blogpost:

“Appstats is a library for App Engine that enables you to profile your App Engine app’s performance, and see exactly what API calls your app is making for a given request, and how long they take. With the ability to get a quick overview of the performance of an entire page request, and to drill down to the details of an individual RPC call, it’s now easy to detect and eliminate redundancies and inefficiencies in the way your App Engine app works.”

Applications (web & non-web)

While many applications running on Google App Engine are web-based apps, they are certainly not limited to those. App Engine is also a popular backend system for mobile apps. When developing such apps, it’s much safer to store data in a distributed manner and not solely on devices which could get lost, stolen, or destroyed. Putting data in the cloud improves the user experience because recovery is simplified and users have more access to their data.

For example, the cloud is a great place for mobile app user info such as high scores, contacts, levels/badges, etc. If users lose their phone, they would only need to get a new phone and reinstall the application. All their data can be streamed from the cloud after that. Not only is recovery simplified, but it makes possible scenarios like users being able to pull up their leveling or high score info from the home computer upstairs in the evenings while their mobile phones charge downstairs. Again, cloud can be a tool to provide a better user experience!

When developing a backend for mobile applications, the same decision needs to be made on whether a company should host it themselves or take it to the cloud. Do you spend time and resources building out infrastructure, or is it better to leave it to companies that do this for a living and focus on the application instead?

Mobile phones only need to be able to make outbound HTTP requests to instantly connect to your App Engine backend. You can control the application-level protocol, authentication/authorization, and payload contents, so it’s not any more complex than providing a similar backend for a traditional web app.

Migration to/from App Engine

The section title alone cannot convey all the aspects of this category when considering cloud vendors. It includes migration of applications to/from your target platform (in this case App Engine), ETL and migration of data, bulk upload and download, and vendor lock-in.

Porting your applications to App Engine is made simpler by providing familiar development environments, namely Java, Python, and now Go. Java is the de facto standard in enterprise software development, and developers who have experience building Java servlets will find App Engine to be quite similar. In fact, Java App Engine’s APIs adhere as closely with existing JSR (Java Specification Request) standards as possible.

In addition to the servlet standard (JSR-154), App Engine supports the JDO and JPA database interfaces (or you can choose to use Objectify or the low-level interface directly). If you’re not comfortable with NoSQL databases yet, you can use Google Cloud SQL, the MySQL-compatible relational cloud database. The App Engine URLfetch service works like the Java SE class, the App Engine Mail API should work just like the javax.mail (JSR-919) class, the App Engine Memcache API is nearly identical to using the javax.cache (JSR-107) class, etc. You can even use JSP for your web templates.

On the Python side, while Google ships a lightweight web framework (webapp/webapp2) for you to use, you aren’t limited to it. You can also use: Django, web2py, Tipfy, Bottle, and Pyramid, to name some of the more well-known frameworks that work with Python. Furthermore, if you have a Django app and use the third-party Django-nonrel package (along with djangoappengine), you can move pure Django apps onto App Engine or off App Engine to a traditional hosting service supporting Django with no changes to your application code outside of configuration. For users choosing Cloud SQL instead of App Engine’s traditional NoSQL datastore, you can use Django directly as there is an adapter specially written for Cloud SQL.

Next are the SDKs. For all supported runtimes, they are open source. This allow users to become familiar with the App Engine client-side libraries and possibly build their own APIs. In addition, if users desire to control the backend, they can use the SDK and the client APIs to come up with corresponding backend services. Not comfortable letting Google host and run your app(s)? This gives you an alternative. In fact, there are already two well-known App Engine backend projects: AppScale and TyphoonAE. Both claim they are 100% API-compatible, meaning that any App Engine app that Google can run, they should be able to as well.

Next, you have control over your data. When using the traditional datastore, Google provides a datastore bulkloader. This tool lets you easily upload or download all of your data. You can find out more about the bulkloader in the official docs. Other features in App Engine related to your data include backup/restore, copying, or deleting your data. Find out more about those also in the official docs. Similarly, if using Google Cloud SQL, you can easily import or export your data using Google Cloud Storage as an intermediary. You can read more about that at the Cloud SQL docs on import/export.

Finally, with all the advantages of a PaaS system like Google App Engine, some developers may wonder about “vendor lock-in,” a situation describing how it may be difficult or impossible for companies to move their apps and/or data to similar or alternative systems. While Google would love you to stay as an App Engine customer for a long time, Google recognize that having choices makes for a healthy and competitive ecosystem.

If Google is a cloud vendor and App Engine is its product, does vendor lock-in apply? Well, yes and no. Think of it this way: you use Google’s infrastructure to avoid having to build it yourself. Arguably this is one of the most difficult and time-consuming things to do. So you can’t get something for nothing. The price you pay is that you need to integrate your app so that it connects to all of App Engine’s components.

However, while Google recommend that you code against Google App Engine’s APIs, there are workarounds. Also, think back to why you want to use App Engine… for its robustness and scalability. Google created our APIs so you could take advantage of Google’s infrastructure and not as a deliberate attempt to force you into using Google’s APIs.

By allowing alternative ways of accessing your data, using familiar development environments, following open standards, and distributing open source SDK libraries, Google fights vendor lock-in on your behalf. It may not be easy to migrate, but Google has implemented features to make it easier to migrate your app or upload/download all of your data. Google tries hard to ensure that you can move your apps or data onto or off of App Engine. But it doesn’t stop there… the team is continually innovating, listening to user input, and improving and simplifying App Engine services to further provide a development environment of choice for building your web (and non-web) apps. Finally, the best practices you’ll learn in creating great App Engine apps can be used for application development in general, regardless of the execution platform.

Other Important Features

The final two remarks here pertain mostly to enterprises. Google App Engine is proud to be compliant with global security standards. Google are certified SAS 70 compliant as well as compliant for its replacements SSAE 16 and ISAE 3402.

Enterprises who are Google Apps customers will find App Engine to be an extremely useful tool. When you purchase Google Apps, Google provide a default set of applications that help you run your business. In addition to the apps from Google, there are also many more built by third-party software firms that you may find compelling in the (Google Apps Marketplace]

If none of the applications above meet your needs, you can use Google App Engine to build your own custom applications and roll them into your Google Apps domain and manage them from the same control panel as if you bought it from Google directly or from vendors in the Apps Marketplace.


Appistry solutions leverage cutting-edge cloud-based architectures

Cloud-based architectures, with their inherent distribution of storage and computation, provide an ideal foundation for large-scale analytics, where many gigabytes or terabytes of data must be stored and processed. By designing solutions that leverage the inherent scalability, capacity, performance, simplicity and cost-efficiency of cutting-edge cloud technology, we help companies access computational power unlike any currently available.

Our cloud-based architectures unify large quantities of affordable, commodity systems with directly-attached storage, all working in concert to provide you with a single system view that transcends the performance and capacity of any single machine within the system.

The result is a system that combines three core technologies into a single unified platform. First, our system is a High Performance Computing system. Second, it is a Cloud Computing platform. And third, it is a complex analytics platform. Appistry combine all three together with unified storage and computation in a system of unlimited flexibility and power at a truly affordable price.

Appistry’s cloud-based analytics solutions give you:

  • Performance — Distributed processing and data agility can easily deliver 10-100x performance gains over “big iron” deployments
  • Scalability — An administrator is able to add additional computers, including their storage capacity, to a running system without a loss of availability of files or administrative functionality. Because tracking and membership are fully distributed and dynamic, the overall system can grow to tens of thousands of systems, or more.
  • Capacity — The analytics system provides a global view or namespace, aggregating the compute and storage capacity of all attached servers.
  • Reliability — By allowing the user to specify how many copies of each file to maintain in the system, the system is able to offer high levels of reliability at low cost.
  • Geographic Distribution — A single instance of a cloud-based analytics system can be deployed across multiple data centers. The cloud is aware of the network topology and will mirror and distribute files across the network so that the loss of any one data center does not limit access to data.
  • Disaster Recovery — The system is fully distributed; there is no central point of failure. Data ingest and analysis can continue operation even when entire data centers have been removed from the system.
  • Availability — Every computer in the analytics system is capable of performing analytics computations, managing data and responding to administrative requests. As a result, the system is impervious to the loss of individual or even entire racks of machines.
  • Management Simplicity — Administrators are able to update computer configurations, system configurations, or any of the running analytics applications without taking the files offline.
  • Hardware Flexibility — Not all machines in the system need to be constructed from similar hardware. The system can recognize the attributes of each attached computer and utilize their resources accordingly.

High-resolution satellites, multimodal sensors and other input sources are driving an explosion in data available to the Intelligence community. This presents a data processing challenge.
Ayrris™ / DEFENSE overcomes these challenges by providing high-volume data ingest, storage, analysis and delivery. By leveraging Appistry’s revolutionary Computational Storage™ technology, Ayrris
turns a cluster of commodity servers into a high-performance, petabyte-scale distributed storage system with no single points of failure or I/O bottlenecks. Rather than move the data to the application, we prefer to move the application to the data. Because of its unique computing platform, Ayrris / DEFENSE offers a new level of scalability, elasticity and reliability for dataintensive applications, and is fully compatible with existing agency data sources and analysis tools. Ayrris / DEFENSE allows enterprises to quickly turn raw data into useable, mission-critical intelligence better, faster and cheaper than ever before.

Storage Trends
The following three technology trends are having a dramatic impact on the way big data challenges will be addressed:

  • Transitioning Storage Systems to Cloud Technologies
  • Commoditization of Storage Equipment
  • Move Towards Data Localization

Industry progress in these areas provides solutions for the construction of large data storage systems.

Cloud computing architectures, on the other hand, are characterized by their use of large quantities of affordable, commodity systems with directly-attached storage, all working in concert to provide the user with a single system view that transcends the performance and capacity of any single machine within the system. A storage system
built in this manner provides the following attributes:

Scalability. A cloud storage administrator is able to add additional computers, including their storage capacity, to a running system without a loss of availability of files or administrative functionality. Because tracking and membership are fully distributed and dynamic, the overall system can grow to tens of thousands of systems, or more.
Capacity.The cloud storage system provides a global view or namespace, aggregating the capacity of all attached storage devices.
Reliability. Cloud storage allows the user to specify how many copies of each file to maintain in the system. The cloud is aware of the loss of any machines in the system. When these errors occur, the cloud can alert the proper administrators and take appropriate action to recover the requested reliability level.
Geographic Distribution. A single instance of a cloud storage system can be deployed across multiple data centers. The cloud is aware of the network topology and will mirror and distribute files across the network so that the loss of any one data center does not limit access to data.
Disaster Recovery. The storage system is fully distributed; there is no central point of failure. Cloud storage can continue operation even when entire data centers have been removed from the system. Cloud storage also manages the merging of multiple data centers after a logical or physical separation occurs. Out-of-date files are located and reconciled without user-intervention whenever possible.
Availability. Every computer in the cloud system is capable of serving access to files or administrative requests. Cloud storage is easily able to service a large number of client requests by distributing the work across many machines. The system is impervious to the loss of individual or even entire racks of machines.
Manageable. Administrators are able to update computer configurations, system configurations, or update the cloud system itself without taking the files offline.
Heterogeneous. Not all machines in the cloud system need to be constructed from similar hardware. The system needs to recognize the attributes of each attached computer and utilize their resources accordingly.
By taking a cloud-oriented approach to storage and compute, we are able to deliver a more powerful system. Moreover, because cloud storage systems are built with commodity components, they are much less expensive than traditional approaches.

Historically, system architects and administrators have depended on increasingly larger and larger machines and devices to satisfy their growing computational and storage needs. These high-end, proprietary systems have come at a steep price in terms of both capital and operational costs, as well as in terms of agility and vendor lock-in. The advent of storage and computational systems based on cloud architectures results in an advantageous economic position for purchasers and users of these solutions.


Move Towards Data Localization
In traditional system architectures, computational elements (i.e. application servers) and storage devices are partitioned into separate islands, or tiers. Applications pull data from storage devices via the local or storage area network, operate on it, and then push the results back to the storage devices. As a result, for most traditionally architected applications, the weak link in the system is the bottleneck between the application and its data.
Data localization is the unification of storage devices with computational elements for the purposes of reducing computational latency and overcoming network bottlenecks.
In a system exhibiting data locality, the work is moved to the data instead of the data being moved to the work. CloudIQ Storage was built from the ground up to enable data localization, which Appistry calls computational storage™. Other examples of data localization in practice include the Apache Hadoop project—an implementation of the MapReduce algorithm initially popularized by Google, Netezza’s data warehouse appliances, and various data caching technologies.
One way to compare the relative performance of traditional and cloud storage approaches and to quantify the performance benefits of computational storage is to look at the aggregate bandwidth available between application processing and storage.

Taken together, the impact of cloud computing architectures, the commoditization of storage and compute, and the move towards data localization are revolutionizing the delivery of data-intensive applications; solving problems once thought to be unsolvable because of economic or temporal concerns has now become possible.
Appistry CloudIQ Storage: Cloud Technology Applied to Big Data
Appistry CloudIQ Storage applies cloud computing architectural principles to create a scalable, reliable and highly cost-effective file storage system with no single points of failure, using only commodity servers and networking.
A CloudIQ Storage system is composed of multiple computers at one or more data centers,

CloudIQ Storage coordinates the activity of each of the computers and aggregates their attached storage to expose a single, logical data store to the user. The system is fully decentralized: each computer is a complete CloudIQ Storage system unto itself, but is aware of other members of the same storage system and shares storage responsibilities accordingly.
Several of the major functional characteristics of the CloudIQ Storage system are below. We consider these to be essential architectural characteristics of any cloud-based storage system.
Self-Organizing and Scalable
Appistry believes that fully distributed systems with self-healing and self-organizing properties are the path to solving big data challenges. The foundation of the CloudIQ storage architecture is a lightweight, yet robust membership protocol. This protocol updates the member machines with the addition, removal or unexpected loss of computers
dedicated to the storage system. This shared membership data contains enough information about the members of the cloud that each individual machine is capable of assessing its location and responsibilities. These responsibilities include establishing
proper communication connections and responding to system events requiring healing actions. Even though there is no central control structure, the system is capable of selforganizing thousands of machines.
An administrator can easily add or remove machines or update configurations. The members of the cloud, acting independently, will share information quickly and reconfigure appropriately. The storage cloud can elastically scale up to handle multiple petabytes without heavy management overhead.

Geographically Aware
One desired feature of a robust storage system is location awareness. Computers within the CloudIQ Storage environment can use location awareness to make informed decisions about reliability configurations and to optimize the handling of user requests.
CloudIQ Storage introduces the notion of a territory to be a logical collection of machines classified as a group for the purpose of data distribution. Users typically assign territories in one of several ways:

Computer Rack or Network Switch. This configuration allows an administrator to instruct a storage cloud to distribute files and functionality across systems within a single data center.
Data Center. This configuration allows an administrator to inform the cloud to distribute files and functionality between data centers.
User-Based. For storage clouds that span multiple geographies, it is beneficial to   inform the system which computers are dedicated to individual user groups. Often this is a similar configuration to the data center option.
Hardware-Based. This configuration allows different configurations of hardware to be grouped together. These groups provide the administrator with a method to store data on specialized hardware for different needs. For example, within a data center one might have two territories of low-latency computers set up across racks for availability. A third collection of machines might be constructed of higherstorage- density, higher-latency hardware to keep costs low while maintaining a third copy of the data.
Territory settings can be configured on a machine-by-machine basis. Administrators can choose from any of these use cases or develop hybrid configurations that meet their needs.
CloudIQ Storage uses territory settings to implement the behaviors described in the remainder of this section.
CloudIQ Storage provides high levels of reliability by distributing files and their associated metadata throughout the storage system. Each copy of a file possesses audit and configuration information needed to guarantee reliability requirements are achieved. The metadata of each file contains information on:
Reliability Needs. How many copies of a file need to be maintained?
Territory Distribution. Which territories can/should be used to keep a copy of the files?
Update History. What is the version history of each file?
The reliability requirements of each file in the system are distributed across the machines in the CloudIQ Storage system. Each machine watches over a subset of files in the system. When these monitors detect system changes, the following actions occur to guarantee the reliability needs of the system:
File Reliability Checks. Each monitor examines the files for which it is
responsible. If a machine holding a copy of the file has been lost, additional copies of the file are created.
File Integrity Checks. If a dormant or lost machine attempts to introduce an old copy of an existing file, the system reconciles the version against the metadata of the newer files and acts to reconcile the difference.
System Monitoring Reconfiguration. As machines are introduced or lost, the responsibilities for watching files are adjusted for the new system configuration.

File Placement Reconfiguration. As new machines become available, the monitors may decide to redistribute files. The reconfiguration distributes the storage needs and service requests more equally across machines in the storage cloud.
Files may also need to be repositioned to meet territory placement requirements.
As the storage cloud grows and changes with new hardware, new network connections, and configuration changes, the cloud storage system will constantly maintain the proper file distribution and placement.


CloudIQ Storage provides extraordinary levels of availability due to the completely decentralized nature of the architecture. Every computer within the system is capable of serving file ingestion or file retrieval requests. Therefore, the total bandwidth in and out of the system is the aggregate of that of all of the machines participating in the cloud.
In addition, even though multiple copies of the file are present in the cloud storage system, the user gets only a single, logical view of the system. The CloudIQ Storage architecture resolves the multiple territories, copies and even versions of each file to deliver the requested file to the user.
When a file retrieval request arrives at a computer in a cloud storage system, several actions occur to provide the user with their data quickly:
File Location. The computer servicing a file request locates which machines in the cloud hold the requested file using consistent hashing and a distributed hash tables.
No single machine holds the entire file directory, as it would become a performance bottleneck or a point of failure. Lookups are a constant time operation that returns the machines within the system holding a copy of the file.
Machine Selection. Once the target machines holding the file have been identified, the requesting machine can choose which machine is optimal for retrieving the file.
This choice can be made based on factors such as network proximity and machine utilization.
File Retrieval. Once the machine is selected, the file can be retrieved by the client.
In addition to optimized read operations, the cloud storage solution provides “always writable” semantics using a concept called eventual consistency. In an eventually consistent system, write operations always succeed as long as the system can access the number of nodes required by policy for a successful write (one, by default). During this write operation, audit information is stored with the file’s metadata so that any additional copies or version reconciliation can be performed later. Eventually consistent systems are not right for every storage application, but it is ideal for “write once, read many” style systems.

The availability, reliability, and location awareness features of a cloud storage solution bring the highest level of disaster recovery available to a storage administrator.
The system can lose a machine, a rack, or even an entire data center and the system remains capable of performing all necessary operations for the user.
Management and ease-of-use features are essential for the creation of a robust cloud storage system. When dealing with hundreds or thousands of machines, management operations must be simplified. CloudIQ Storage ensures this by providing the following
Always Available Operation. Any configuration changes made to the system must not remove the availability of files. In the event that multiple machines need to be taken off line for updates, the system must have a strategy for keeping files available. This may be achieved using territories. If two territories hold copies of the same files, machines in one territory can temporarily be taken off-line for updates
and the second territory can serve the files. Any file updates performed during the downtime will be automatically reconciled using the monitoring and reliability features of the cloud.
Configurable Reliability Settings. Administrators can declare how many copies of a file should be stored in the storage cloud. A cloud-wide setting is established, which may be overridden on a file-by-file basis.
Real-Time Computer Injection. When the storage system needs more capacity, the administrator needs to be able to add machines without affecting the availability of any file.
Real-Time Computer Decommissioning. When it is decided that a computer is no longer required, the administrator needs operations to gracefully remove the computer from processing requests and move its files to other machines within the cloud.
Auditing. Important operations, events, and system messages need to be saved.
System-Wide Configuration Changes. All configuration changes need to propagate across all machines with a single operation.
Because management tasks in the storage cloud are virtualized and automated, a small number of system administrators can easily maintain a large number of computers storing petabytes of data.

CloudIQ Storage implements a flexible security model designed to allow multi tenant operation while ensuring the security of user data. Just as the physical storage cloud
may be partitioned into territories, the logical storage cloud may be partitioned into “spaces,” with each space representing a hierarchical collection of files under common control. Access to each space, and to each file within a space, is governed by an access control list (ACL) that assigns access rights for users, groups and the file’s
To facilitate secure operation of the cloud, administration rights to the system are divided into a series of distinct privileges that may be assigned to individual users or groups.

Traditional storage and computational offerings fail to meet the needs of today’s big data environments. These approaches have been characterized by isolated pools of expensive storage and the constant movement of data from where it lives to the application servers and users who need it. Attempting to meet demanding capacity and
performance requirements in this manner often necessitates the purchase of extremely costly, special-purpose hardware. Yet, local and wide-area network bottlenecks remain a challenge.
Petabyte-scale environments dictate the need for distributed, high-performing solutions that bring the work to the data, not the other way around. In this paper we have demonstrated how the cloud computing architectural approach provides the key to meeting the challenges posed by big data.
Appistry CloudIQ Storage is software that applies these principles to deliver robust private cloud storage environments. Storage clouds based on CloudIQ Storage exhibit essential characteristics  Appistry propose for any cloud storage system: individual resources self-organize without centralized control, yielding extreme scalability; the system spans data centers and optimizes behavior based on topology; high levels of reliability and availability are transparently ensured; system management is policy-based, automated and virtualized.

With the Hadoop Edition, Appistry hopes to “upgrade” the performance and availability of Hadoop-based applications by replacing the Hadoop Distributed File System (HDFS) with CloudIQ Storage. While Hadoop is wildly popular right now, one issue is its use of a “namenode” – a centralized metadata repository that can constrain performance and creates a single point of failure. Appistry’s approach retains Hadoop’s MapReduce engine to assign parallel-processing tasks, but attempts to resolve namenode problems with CloudIQ Storage’s wholly distributed architecture.

 Appistry  is with an intelligence-sector customer that has “massive, massive” applications built on HBase, a distributed NoSQL database with Hadoop at its core. Although CloudIQ Storage doesn’t formally support HBase, it has helped the customer improve database throughput, and formal support might be on the way. Because of their inherently scalable natures,  CloudIQ Storage and NoSQL databases are complementary solutions to handle structured and unstructured data.

The idea behind cloud storage is the same as the idea behind cloud computing: Organizations want to meet their ever-expanding storage needs as they arise, and they want to do so at lower price points than are available from incumbent vendors like EMC and NetApp. For customers in areas like social media, scientific imaging or film rendering, though, scale and price must be matched with performance. This is where companies like Appistry come in, but it certainly isn’t alone in the quest for these dollars. Startups Scale Computing, Pivot3, MaxiScale and ParaScale all have raised millions for their unique offerings, and HP last summer snatched IBRIX to boost its relevance in the performance-hungry film-rendering market.