Satish Agrawal's Blog

Wednesday, March 7, 2012

Basics of Cloud Security

Cloud evolution can be considered synonymous to banking system. In good old days, people used to keep all the valuable assets (money, precious metals, stones etc.) in their personal possessions and even in underground lockers. They could not trust the bank for depositing their hard earned money. Banking system evolved over the period of time and it took them almost half a century to build that trust. Regulators all across world played big role in creating a trusted legal and secured framework for banking and other financial services in India. . Today, people hardly keep any cash with them, . Most of us carry plastic money and transact digitally.

Cloud computing is also evolving the same way.

Robust cloud architecture with strong security implementation at all layers in the stack powered with legal compliances and government protection is the key to cloud security. As Banks are doing business despite frauds, thefts and malpractices, cloud security is going to get evolved but as much faster rate. Digital world has zero tolerance for waiting! Evolution is natural and is bound to happen.

So what are the steps typically a cloud service provider should follow in order to secure its cloud?

Cloud is complex and hence security measures are not simple too. Cloud needs to be secured at all layers in its stack. These levels are:

· Infrastructure

· Platform

· Application

· Data

At infrastructure level:

A sysadmin of the cloud provider can attack the systems since he/she has got all the admin rights. With root privileges at each machine, the sysadmin can install or execute all sorts of software to perform an attack. Furthermore, with physical access to the machine, a sysadmin can perform more sophisticated attacks like cold boot attacks and even tamper with the hardware.

Protection measures:

1. No single person should accumulate all these privileges.

2. Provider should deploy stringent security devices, restricted access control policies, and surveillance mechanisms to protect the physical integrity of the hardware.

Thus, we assume that, by enforcing a security processes, the provider itself can prevent attacks that require physical access to the machines.

3. The only way a sysadmin would be able to gain physical access to a node running a costumer’s VM is by diverting this VM to a machine under his/her control, located outside the IaaS’s security perimeter. Therefore, the cloud computing platform must be able to confine the VM execution inside the perimeter, and guarantee that at any point a sysadmin with root privileges remotely logged to a machine hosting a VM cannot access its memory.

4. TCG (trusted computing group), a consortium of industry leader to identify and implement security measures at infrastructure level proposes a set of hardware and software technologies to enable the construction of trusted platforms suggests use of “remote attestation” (a mechanism to detect changes to the user’s computers by authorized parties).

At platform level:

Security model at this level relies more on the provider to maintain data integrity and availability. Platform must take care of following security aspects:

1. Integrity

2. Confidentiality

3. Authentication

4. Defense against intrusion and DDoS attack

5. SLA

At Application level:

The following key security elements should be carefully considered as an integral part of the SaaS application development and deployment process:

SaaS deployment model
Data security
Network security
Regulatory compliance
Data segregation
Availability
Backup/Recovery Procedure
Identity management and sign-on process

Most of the above are provided by PaaS and hence optimal utilization of PaaS in modeling SaaS is very important.

Some of the steps which can be taken to make SaaS secured are:

Secure Product Engineering
Secure Deployment
Governance and Regulatory Compliance Audits
Third-Party SaaS Security Assessment

At Data level:

Apart from securing data from corruption and losses by implementing data protection mechanism at infrastructure level, one needs to also make sure that sensitive data is encrypted during transit and at rest.

Apart from all the above measures, stringent security process implementation should also be part of making cloud secure coupled with periodic audits. Governing security laws should be amended with advent in technologies, ethical hacking and vulnerability testing should be performed to make sure the cloud is secure across all layers.

About author

Satish Agrawal is Vice President of Cloud Computing at e-Zest Solutions Ltd. His role includes further strengthening of existing competencies in the domain of cloud computing for enterprises and ISVs to deliver best value solutions from design to deployment for cloud adaptation. Satish Agrawal has over 16 years of experience in IT and software product engineering space. He has built and implemented end-to-end cloud solutions for clients across geographies. He has got insightful knowledge of managing various technology projects. He has worked with many leading software services companies at senior technical positions.

APM for cloud apps: Next big challenge

As more and more companies are pushing their apps on the cloud, industry is facing a new challenge of monitoring performance of the apps on various clouds and measuring the SLAs for application performance in cloud. I won’t be astounded if someone comes up with a new term "ALAs - Application Level Performance" in near future. As cloud itself has to fulfill composite set of expectations like auto scaling, billing, on demand provisioning etc, it’s becoming difficult to segregate the root causes of application level performance, simply because cloud has many layers to track and applications are becoming more and more distributed, loosely coupled and difficult to manage. In such a complex ecosystem, tracking application performance is a real challenge in front of companies working in the sphere of APM.

How important is APM?

APM is not a revenue generating solution for enterprises. However, it’s the core of IT services as the thrust of APM is to fulfill SLAs and make sure the end-user experience is of highest quality. In a highly competitive market, excellence plays the most important role. If you are not providing high-quality solutions to your customer, you are already out of the market. In such a fiercely competitive market, APM plays a key role. Instead of having reactive processes, companies are looking for proactive processes.

As mentioned in one of the research report from Gartner - The factors most responsible for the increased attention now being paid to the APM process and the tools and services supporting it do not come from IT, but from the business side of the enterprise, which has (during the past decade) fundamentally changed its attitude towards IT in general. Line of business and C-level executives now generally recognize that IT is not just infrastructure that supports background workflows, but is also, and more fundamentally, a direct generator of revenue and a key enabler of strategy.

Gartner has defined five dimensions of an APM solution:

1. End-User experience monitoring - The capture of data about how end-to-end application availability, latency, execution correctness and quality appeared to the end-user.

2. Runtime application architecture discovery, modeling and display - The discovery of the software and hardware components involved in application execution and the array of possible paths across which these components could communicate to enable that involvement.

3. User-defined transaction profiling - The tracing of events as they occur among the components or objects as they move across the paths discovered in the second dimension; this is generated in response to a user’s attempt to cause the application to execute what the user regards as a logical unit of work.

4. Component deep - Dive monitoring in application context -- the fine-grained monitoring of resources consumed by and events occurring within the components discovered in the second dimension.

5. Analytics - The marshaling of techniques, including behavior learning engines, complex-event processing (CEP) platforms, log analysis, and multidimensional database analysis to discover meaningful and actionable patterns in the typically large datasets generated by the first four dimensions of APM.

Suggested features for an APM supporting cloud apps

Besides the above dimensions for a typical APM solution, I suggest following features for an APM supporting the cloud apps:

1. Support for multiple clouds and ability to provide a CMDB for cloud components.

2. Agentless architecture to reduce performance monitoring side-effects.

3. Integration capabilities with cloud provided monitoring tools.

4. Auto-sensing of application performance degradation and isolation of the cause from cloud.

5. Integration capabilities with ITIL-compliant service desk/ IT operations management tools.

6. Ability to define and translate application performance in terms of business SLAs.

To meet the 5 dimensions of Application Performance Management viz End-User Experience Monitoring, Runtime Application Architecture Discovery, Modeling and Display, User-Defined Transaction Profiling, Component Deep - dive monitoring, and Analytics, a typical APM tool has the following component layers:

1. Real Time Dashboard, Alerts and Notification Layer

2. Application Monitoring Layer

3. Network Monitoring and Management Layer

4. Database Layer (Rule DB, Log, DB, CM DB)

5. Integration Layer (ESM integration, Infrastructure Integration and LDAP integration)

Following is the high level architecture diagram of a typical APM tool.

The APM works on the FCAPS philosophy. FCAPS is acronym for Fault Detection and Management, Configuration Management, Accounting, Performance Management and Security Management of underlying infrastructure comprising of various hardware, software and firmware components. It typically includes, central nervous system of your computing infrastructure (CNS = Compute, Network and Storage). APM becomes complex in a distributed environment when there are thousands of software and hardware components to manage using millions of events to monitor and thousands of rules to define.

A typical APM will get hooked to underneath infrastructure by providing the relevant SDKs. Example, if your infrastructure uses VMWare for virtualization, Oracle DB instances, IBM Web Sphere, MS Exchange Server etc, a good APM will be able to gather events from all these components and process these data based on the set rules to generate real time dashboards, alerts and notifications. A good APM tool will also provide facility to define performance indicators as performance definition varies in different context.

A good APM tool will have capabilities to get integrated with existing Enterprise systems Management tools so that it becomes part of the IT management eco-systems and doesn’t work in silos.

APM tool should also have capabilities to generate dashboards periodically to address needs to various business users such as business owners, technology owners and process owners.

Agent-Based and Agent-Less Architecture:

Most of the APM tools work on agent based architecture wherein agents with small footprint are deployed across distributed systems to listen, capture and transmit event information to central event DB. The agents are designed to keep the performance overload as minimum as possible else there can be a danger of agents themselves becoming performance and security bottlenecks.

Agent-less architecture is based on signal interception paradigm wherein event information is gathered from existing system components by intercepting events and logs and processing these events and logs to generate high level dashboards.

What do various components of APM do?

1. Network Monitoring Components:

a. Fault Management Engine: Fault management is a set of functions that enable the detection, isolation, and correction of abnormal operation of the network.

b. Configuration Management Engine: Configuration management provides functions to identify, collect configuration data from, exercise control over, and provide configuration data to network elements.

c. Accounting Management Engine: Accounting management lets you measure the use of network services and determine costs to the service provider and charges to the customer for such use. It also supports the determination of charges for services.

d. Performance Management Engine: Performance management provides functions to evaluate and report on the behavior of telecommunication equipment and the effectiveness of the network or network element. Its role is to gather and analyze statistical data for the purpose of monitoring and correcting the behavior and effectiveness of the network, network elements, or other equipment, and to aid in planning, provisioning, maintenance, and quality measurement.

e. Security Management Engine: Security services provide authentication, access control, data confidentiality, data integrity, and nonrepudiation. It also provides security event detection and reporting reports activities that may be construed as a security violation (unauthorized user, physical tampering with equipment) on higher layers of security applications.

2. Information Collection and Transition Agents: these are listeners in various forms across the network which can intercept signals, process them and transmit to a central repository for further processing. Example, SNMP agents. For those who do not know SNMP, it stands for Simple Network Monitoring Protocol , based on simple request/response paradigm. http://www.wtcs.org/snmp4tpc/snmp.htm seems to be a good primer on this topic.

3. Events Log and Data Processing Engine: Different systems generate logs in different formats. A log processor has to understand such a variety log formats by parsing these logs, extract the required information and store in the central DB for further processing.

4. Rules Definition and Enforcement Engine: Dashboards, alerts and notifications are required to generate for various user groups. Each user group in the eco-system will have its own way to see and interpret data for various business and technical reasons. To make log processing more meaningful, APM provides a way to define and enforce rules. Example, performance counters can be defined for benchmarking performance of the app for various business needs. A user trading shares online may want response time not more than 5 seconds whereas a data entry operator at university can afford to wait for few more seconds. Rule definition engine provides a way to create and edit rules dynamically and hence has the capability to enforce alert/notification generation, SLA report generation, and dashboard generation based on the set benchmarks and thresholds.

5. Dashboard, Notification and Alert Generation Engine: This is the top level component in an APM which generates user friendly graphs, reports, alerts and notification for the end users. It also have capabilities to benchmark itself to generate comparison charts, projections for the future performance and send warnings proactively.

Why so much focus on APM?

The natural answer is market size and complexity of cloud-based applications. Also APM is surely an expensive phenomenon. Business owners need to understand what is more expensive – the cost of application or the cost of customers moving away from their businesses due to lack of QoS. It’s a tread off business owners need to understand and surely it’s going to be a headache for most of them. APM solutions are available in wide range – from cost perspective and from features perspective. There is no single answer to the question like which is the best APM tool for my need. Since I see cloud to be the future of IT, I see APM to be the most critical area to tame for IT managers, and hence I can see huge interest companies will have in APM especially while moving their apps to cloud. Moreover, IT services companies will have to heavily rely on good APM solutions to make sure they are delivering a right solution to their customer. I can foresee the QA teams in IT and consulting services using and relying heavily on APM tools. In short, I can see a bright future for APM companies and the consultants working in this highly challenging area of specialization.

About Author: Satish Agrawal is Vice President - Cloud Computing at e-Zest Solutions Ltd. He has over 16 years of experience in IT and software product engineering space and has built and implemented end-to-end cloud solutions for clients across geographies

Saturday, April 4, 2009

Agile Testing -- Convergence of Development and Testing

Agile Delivery Practice revolves around the principle of delivering tangible results in small planned steps usually called as iterations. Agile Testing is not a specialized branch of Quality Assurance and Quality Control but a testing philosophy aligned to the agile delivery model. It requires all which is needed in a traditional testing plus agility :). What's agility? Test engineers are no more expected to sit waiting till the requirements get completed and documented and delivered to test engineers to design and develop test assets. Similarly there is no waiting time for them to prepare test environment, get the build and start testing. Rather, they are running along with all other actors in the system on parallel tracks. Automation is the key to success in Agile environment; without which test assets would become a huge backlog over the period of time and testing would not be "agile" any more.
Agile Testing requires Agile Project Management, Agile Test Management, Agile Defect Management and Agile Report Management. You must be thinking that this person is joking or talking non-sense. But that's correct. In agile environment, everything needs to be agile or you will get confused. Rally is a good tool that provides integrated project management, test management, defect management and report management for test managers who work in Agile Environments. A good testing management office would also have a stack of open source tools integrated smartly to provide the similar capabilities under one roof. Test Management Offices (I have coined this concept similar to Project Management Office or PMO and would be calling it as TMO) without consideration to Agile Testing would suffer a major setback in delivering quality in Agile Projects.
So in short, as developers need to be "agile", tester can not be left "passive". So eventually testers are pseudo developers who are communicating with business analysts, architects, developers and end users. I call it "convergence of development and testing teams to deliver a high quality piece of code in small iterations which makes planning, execution and control phase of project management much easier"

Sunday, March 1, 2009

Tough Times

"A pessimist sees the difficulty in every opportunity; an optimist sees the opportunity in every difficulty." - Winston Churchill