The Patching Nightmare

By Robert Gardos | October 2, 2009

I was amazed that in a roughly twelve month period, the largest open systems database players, namely Oracle and Microsoft, have once again introduced major changes to their patching methodologies. So much for anyone that has attempted to automate this process on their own as most of that work will now need to be thrown away. It seems the days of leaving critical systems unpatched, the most common behavior of database administrators, are long gone unless your business doesn’t care about protecting its most valuable asset. How does an IT organization cope?

Let’s briefly look at what the changes are.

The Oracle PSU – Patch Conflicts, Anyone?

The folks at Oracle have once again outdone themselves as it appears they have reversed direction with the introduction of the patch set update (PSU). If you’re an Oracle DBA, I suspect you are aware of Oracle’s critical patch updates (CPUs) that are also delivered quarterly. The introduction of CPUs meant a new OPatch methodology (’napply’ vs. ‘apply’) and the wondrous management of molecules or individual patch ‘components’ that could be applied at the DBA’s discretion. CPUs are security focused and often co-exist with a number of individual hot fixes. PSUs are a new beast all together. As opposed to security fixes they also focus on database stability. They use the old ‘apply’ methodology and seem to be more inline with a database upgrade than a patch (theoretically you get a new version number too, although that didn’t work with the original PSU).

Well that doesn’t seem like such a big deal until the concept of patch conflicts come into play. Oracle tells us that they will identify and roll back patch conflicts within the context of a patch event. As a user you will be responsible for finding (which may involve working with Oracle support on a case by case basis) and applying the individual hot fixes that are ‘certified’ with whatever PSU is being applied. If you don’t think this is a complete cluster f***, you’re out of your mind. Think about it. A patch event goes from apply a patch to apply a patch, rollback ‘n’ number of hot fixes and reapply those hot fixes with the replacement patches (which may or may not be readily available).

SQL Server 2008 - Rolling Updates

While we can applaud Microsoft for being more consistent with methodologies (they do have a broader management ecosystem to deal with that people actually use, unlike Oracle) they have also introduced a fundamental change to patching in SQL Server 2008. In hope of minimizing downtime, passive nodes in a SQL Server cluster can now be patched while the active node remains operational. While this is useful, especially for shops striving for minimal downtime, the patching workflow has become orders of magnitude more complex. SQL Server DBAs now must patch the passive node(s) of a cluster, initiate a failover to a designated node, patch the former active node and then presumably failback. Sounds easy but for shops with hundreds if not thousands of instances this is a complete nightmare.

If you’ve invested in automation (home grown or via some DCA/workflow solution) be prepared to spend significant time reworking your solution. Much of which you have created will be useless and expect lots of errors in applying your new automation. Part of the problem here is that the user experience needs to have enough intelligence to reduce the chances of human error. It’s not just the scripts but the ability for the operational DBA to easily and reliably patch databases independent of version, patch type, etc.

GridApp Clarity is designed to handle these cases along with all the other patch complexities your organization is facing. The entire process is built from the perspective of the operator who must contend with human error, failed pre-requisites and ongoing updates to run book processes. GridApp handles all of these areas to deliver a practical and dynamic solution to address the increasingly complex challenge of patching your database.

Topics: Patches | No Comments »

Attacks on Oracle Databases Get Even Simpler

By Eric Gross | July 24, 2009

According to this article, an open source tool, Metasploit, is getting new functionality specifically created to infiltrate an Oracle DBMS environment. True, the database version being attacked in the upcoming demo is antiquated (10g rather than 10gR2 or even 11g) but this goes to show that it is critical for Oracle databases to be patched regularly, either with a CPU or the new PSU.

Even if your databases sit inside of a trusted firewall, there is always the risk of an internal threat. Protect your important data by applying patches quickly upon release by the vendor. Of course, each patch brings with it the chance of instability in your environment so test it in staging environments before rolling out into production.

Topics: 10gR2, 11g, Patches, Security | No Comments »

Microsoft SQL Server Upgrade Woes: 2005 to 2008

By Eric Gross | July 20, 2009

Is your restore taking a long time?

A common upgrade methodology used in a MSSQL environment is backup/restore. If you choose to use this method and you are upgrading from MSSQL 2005 to MSSQL 2008 you should consider applying the most recent update (MSSQL Cumulative Update 4 for the base release or MSSQL Cumulative Update 1 for SP1) because if you don’t, the restore operation will take up to 10x as long as you expect it to take. Alternatively you could choose to use the detach/attach route which would not be affected by this particular bug.

Assuming you’re using automation software, you’ll want to add this to your comprehensive preflight checklist which exists to confirm that the operation that you are about to perform has all of its prerequisites met.

Details on Microsoft’s website

Topics: Uncategorized | No Comments »

Oracle Database Patching Just Got More Complicated

By Eric Gross | July 17, 2009

It seems like a regular occurrence that a new species gets introduced into the Oracle DBMS patch ecosystem. Most recently the Critical Patch Update (CPU) was created to simplify security patch distribution. These CPUs brought with them a new patch application methodology (NApply) which allows for multiple patches (now known as molecules) to be applied in one fell swoop. In addition to releasing a CPU this quarter Oracle has created a new patch called a Patch Set Update (PSU) which is a superset of the current CPU with additional non-security patches included as well. Currently there is a PSU exclusively for 10.2.0.4 but the next quarter will bring with it a PSU for 11.1.0.7 also, meaning the DBA should be familiar with this new format and anticipate its scheduled release along with the corresponding CPU bundle.

Read the rest of this entry »

Topics: Database Automation, Patches, Upgrades | 1 Comment »

Flexibility and Automation Don’t Go Hand in Hand

By Robert Gardos | June 18, 2009

Solutions in the datacenter automation industry seem to be squarely focused on being an extensible framework versus a content rich solution. Considering we’ve been selling in this space for nearly 7 years now it is quite clear why this is an attractive option. Before I get into that let me clarify what I mean by framework and content.

Framework – Ranging from a simple vehicle to push out arbitrary scripts to the management of complex workflow, the framework is the master facilitator. It serves as a repository of custom content, manages permissions around this custom content and audits the results of executing this content on target entities. By doing all these things centrally it can drive efficiency and accountability. It is limitless as far as what it can accomplish, although the actual ‘work’ must be created and more importantly maintained by someone.

Content – Automation content around common system tasks, like provisioning and patching, can be inherently provided by the software solution. The user defines all of the properties of the resulting configuration while the actual know-how to get there is the software’s problem. While this is attractive from an end-user development and maintenance perspective it is more limiting than the framework approach.

If you know anything about GridApp Systems, you know that we believe a content-rich solution is the only practical approach to automation, assuming the underlying infrastructure is dynamic in any way. Why do we view the flexible, framework-centric approach with such cynicism? The answer really comes down to the reasons folks embark on an automation project at all. What is the company trying to accomplish? The three most important objectives that we have seen are as follows:

1) Empower lower level people to independently perform senior level tasks. The key word in this statement is independently. Any type of escalation to senior engineering dramatically reduces the value proposition of automation.

2) Central audit trail of all activities dramatically reduces exception incidents. Any organization beholden to true process-driven regulation will tell you it’s not the auditing that’s time consuming, it’s ensuring the handling of exceptions are audited. Most challenging are the ad-hoc responses to failures.

3) Guarantee the end state of an environment is in-line with the company standard (driven by operational, security, etc. requirements).

Improving efficiency, saving money, etc. are all great high-level objectives but the key to accomplishing those are captured above.

How about, “making senior engineers more efficient.” as an objective? At least from our perspective, most senior engineers have already embraced the power of the all-mighty script. The good ones may have even built limited frameworks of their own. The incremental value of an automation solution is highly mitigated in these circumstances. This is ironic as we have often seen senior engineers make decisions based on this slight incremental value as opposed to the deeper benefits of automation. Good senior engineers are control freaks so how can you blame them.

This leads me to the initial question of why most automation solutions focus on the framework versus content. This is a simple function of the customer purchasing process. Unfortunately for the customer (and fortunately for the framework solution) it is wildly difficult to assess the practical issues of an automation solution without actually using it in the real world. Even within the confines of a POC, any framework solution can be made to work as the tasks and supporting environments are highly predictable. For instance, I can easily script out the process of applying a patch to an Oracle database at a given moment in time. According to the folks that run the POC, that automation product is capable of patching Oracle. In the real world, the required work changes based on variability in such things as the methodology of patching (e.g., NApply patches with molecules), operating systems, and the architecture (clustered/non-clustered). There are system dependencies (all of course pre-handled in the POC) that may be easy or difficult to determine. When the framework-centric automation solution is applied in the real world exceptions come up dramatically mitigating the value proposition (especially if you believe in the three main objectives listed above).

Beyond the POC the limitations of the framework-centric solution should be obvious to anyone that manages datacenter infrastructures. Since they often do not handle exceptions well, the operator will be faced with escalating or filing an exception report. Both are worst-case scenarios when it comes to driving empowerment and efficiency. The underlying automation is a moving target that will need to be regularly modified to handle inevitable system and application level changes. This is not only the actual automation logic but the framework and all the system dependencies that inevitably exist. Considering IT mandates often change within a few months it’s optimistic to think a team of consultants will be available in the not-so-long-term future to handle updated requirements. I guess there’s a reason why the average CTO lifespan is still 18 months.

So what are the limitations of the content based solution? While the loss of flexibility may seem the obvious Achilles heel, from a practical perspective that is not a real issue. IT is tasked with managing complex applications and by limiting process variability things get simpler with little loss to functionality. The true challenge of content based solutions is that they must be comprehensive, handling the vast majority of situations/exceptions that come up. If they aren’t able to handle the particulars of an environment they are less useful because they are not designed to be manipulated. This is a major reason why some of the few content rich solutions in the marketplace tend to focus on a particular area (e.g. GridApp Systems is only for databases).

I find it amusing that most of the deals we close at GridApp are from organizations that have already tried a framework-centric solution. At best, the end customer is getting a fraction of value promised from the original vendor and at worst they have written off this investment. The good news for GridApp is that the company finally recognizes that by slightly sacrificing flexibility they’ll get a solution that actually enables them to realize the objectives of the automation initiative.

It is important to note that most automation solutions are not strictly framework or content. While HP Opsware automation is a heavily framework centric solution, its OS provisioning/patching is handled by internally created content. Even at GridApp, while we tout our content, the product supports custom applications and scripts as well as logical points in predefined content to infuse custom logic into each process. The key to making a decision regarding automation is to think about the product’s functionality and ensure it is used in a way consistent with that functionality. Listening to vendors yap about what their product does, especially in a competitive situation, and thinking that will give you leverage in the future is a complete waste of time.

One last point. Beware of the senior engineer control freaks. They love all their gadgets and often view a product within the context of their day-to-day rather than the organization’s needs. While they may (and should) be influential it is important to reconcile that perspective with the objectives of an automation initiative.

Topics: Clarity, Consistency, Data Center Automation, Data Management, Database Automation, Efficiency, GridApp, Provisioning, Simplicity | No Comments »

Database Automation Challenges

By Eric Gross | May 29, 2009

Automating common system functionality, like provisioning, patching and upgrading, for the database is a far greater challenge than for operating systems. Most importantly, database vendors typically force a shorter support lifespan on each release compared to operating system vendors. This is especially true for releases preceding the terminal release – early adopters can expect many more upgrade events. In the case of Oracle, the Critical Patch Updates which secure the database from local and remote attacks are available only for a limited set of releases. Oracle publishes the date of the terminal patch for each release giving customers time to upgrade their environments to maintain security.

Since new versions of database software come out regularly, and there are existing installations that aren’t going to be changed until a valid reason arises, there is bound to be a plethora of different versions of database software in use at any time. Maintaining scripts to manage each of these versions is a lot of work. Each version of an OS is relatively similar to the preceding version while in a database there is a larger chance that a change in existing functionality is required.

Another aspect of the environment making it difficult to maintain DB automation scripts involves the tools used to manage a database environment. Whereas OS tools are virtually the same for decades (e.g., tar) there are fundamental changes to database tools whenever it seems appropriate to the vendor. The Oracle patching methodology has changed recently to allow for multiple patches to be incorporated into a single unit such that only the applicable parts are applied to any particular environment. This change, and many others, causes breakage in existing automation code forcing continuous updates to the code used to manage these tools.

Topics: Data Center Automation, Uncategorized | No Comments »

The Time For Application Automation

By Matthew Zito | May 28, 2009

Since the year 2000, we’ve seen an interesting change take place in the area of IT and infrastructure management. Prior to the rise in IT automation as a key business driver, organizations considered infrastructure management to fall into two key areas:

* Monitoring and performance tuning
* Admin “toolkit” solutions

Software products such as HP OpenView, IBM Tivoli, and Quest Software’s Toad for DBAs were considered “management software”. Organizations had a small number of servers, and weren’t increasing the footprint dramatically, instead tending to stack additional application instances on the same set of large servers.

However, the rise in Linux, Windows, and commodity x86 computing, coupled with the shift to a web-focused infrastructure, changed all of that. Now, organizations of all sizes were buying loads of fast, cheap, rack-mountable servers, and dedicating farms of them to various tasks. While this dramatically reduced the computing cost and enabled incremental scale and deployment, it increased the administration overhead.

Every new deployed server needed an OS, correct set of drivers, and third-party agents to do things like backup and recovery, performance tuning, monitoring, etc. In addition various configuration files needed to be managed periodically (e.g. OSes needed to be patched and new drivers deployed). All of this led to the first wave of IT automation – server deployment and provisioning.

This was led by Opsware and Bladelogic, now part of HP and BMC, respectively. They pioneered the idea of an elegant engine that innately knew how to deploy various operating systems, apply patches, track configuration changes, and report on the compliance of all of these things. On top of that, they offered an extensible scripting framework that could be used to tack on “extra” events and customization. Both products were huge successes, enabling organizations to finally take control of the security and configuration for their servers.

The second wave of IT automation had to do with process or runbook, automation. As the major server automation vendors tried to integrate into larger and larger environments, more complex workflow was required – with approvals, interactions with multiple systems, and other event management activities. Runbook automation provided this flexibility, allowing the server automation vendors to more elegantly manage these various environments.

More recently, the rise in virtualization has changed this equation. As the quantity of servers skyrocketed, the average utilization of those servers has decreased. With virtualization, many servers can be virtualized into one physical server, maintaining performance while reducing power consumption, datacenter space, etc. A virtual server, however, doesn’t require this complex OS deployment process that a traditional physical server requires. Instead of installing the OS, adding the correct drivers, patching the OS, etc., users can simply create a “gold build” image, store it, and then create new virtual machines with the click of a button. In virtual environments, “OS deployment” doesn’t require the complex, up-front process of defining a standard, managing the different drivers for different platforms, adding patches – build a gold image once, and clone it over and over again.

Between the server automation tools and virtualization, then, it seems as though server deployment and patching have been pretty well covered. The next step, then, is dealing with the applications themselves. The server automation and virtualization vendors have long claimed that they can easily handle application automation as well – VMWare recently described deploying Oracle by cloning a VM with Oracle installed, ignoring the issues of the datafiles placement, the reconfiguration of the listener, any clustering solutions or replication solutions involved, etc. Similarly, companies like HP/Opsware claim that through the scripting framework on top of their server automation tool, applications can be installed and managed, ignoring the fact that this turns a server automation user into a development organization, and one forced to keep up with the ever-changing application vendors.

While early server automation adopters have ignored these shortcomings, more and more organizations have realized that the server is becoming increasingly irrelevant. VMs can be built and destroyed on the fly, patches can be pushed en masse to farms of servers, but the applications themselves remain different on every node, and require special care and feeding.

Topics: Data Center Automation | No Comments »

Full Stack Automation- Critical For BSM

By Robert Gardos | May 27, 2009

Business Service Management (BSM) is a nice buzzword being thrown around by datacenter/systems management players as they attempt to bridge the perceived gap between business benefit and technology investment. Clearly business priorities driving IT, as opposed to eccentric engineers, will drive value for the enterprise. Automation is a critical component of BSM as it not only promises efficiency and repeatability, but empowers business owners to rapidly obtain necessary services (especially if you add the self-service and on-demand buzzwords into the mix). Of course the underlying delivery of those services could involve the deployment of server, Network, and storage resources. While automation players appear to be filling out their portfolios in these areas, and virtualization is helping simplify this dramatically, no one seems to be paying any attention to the least commoditized component, the application stack. Read the rest of this entry »

Topics: Data Center Automation | No Comments »

Enhanced Methods for System Interrogation Yields Efficiency

By Eric Gross | December 16, 2008

One of the most challenging issues for DBAs typically has little to do with the database directly.  System dependencies such as kernel parameters, user permissions, storage allocation, etc. required for the successful deployment of a database are often not met.  This presents an extremely difficult situation for the DBA as they frequently have little control over these variables, yet they are tasked with getting the database deployed quickly to meet company SLAs.  Determining the root cause of any provisioning issue is a tedious process especially if several system dependencies have not been met.  Errors, especially unusual ones, can take a lot of research to resolve - this is because rather than the DBA being aware of the root-cause of the problem, all that is available is the problem. Root-cause analysis is usually a time consuming activity which is why it should be avoided.  How?

System configuration flaws must be proactively identified before any provisioning or patching activity is initiated.  While database applications have a suite of tools that help with this validation, these are extremely limited and often result in false positives.  True validation requires the synthesis of thousands of hours of testing, automated analysis of vendor best practices as well as real-world, diversified experience (the more diverse installations, the better) that continually uncover undocumented system dependencies.  This is what has driven GridApp Clarity’s validation technology.

Clarity uses a far more extensive set of checks to detect root-cause issues.  This eliminates any possibility of activity failure as well as the time wasted troubleshooting. These checks extend far beyond any validation content from the database vendors (assuming they even worked) and are continually evolving based on testing and customer experience.

So unless you are satisfied with the endless email chains between departments blaming one another for a failed database deployment, we recommend a better, more efficient validation process.

Topics: Database Automation | No Comments »

The DBA as a Power Broker

By Eric Gross | December 10, 2008

Managing databases is an intricate endeavor. At a certain point in your career you’ve started writing scripts to accomplish various tasks. Writing the scripts turns out to be the easy part. The difficulty surrounds properly documenting them all, which is why most of this intelligence remains within the confines of the creator limiting its value. By publishing your productivity enhancements and properly documenting them, you allow others to benefit which in turn increases your value. Others can follow in your footsteps and even build on your work.

Images of database professionals sequestered in the basement, scattering like cockroaches when the boss comes looking for a report, represent the distant past. Perpetually on the receiving end of demands, DBAs were dealt and forced to accept change requests to make ‘this table’ or ‘that stored procedure’. Eventually it becomes necessary to perform far more involved tasks such as migrations and upgrades which empower the DBA to work with a variety of departmental resources such as storage, networking, and development. The increased interdependencies elevate the DBA role in the organization resulting in the coalescense of organizational hierarchies. The next step is for the DBA to begin to source initiatives rather than exclusively receiving orders. It makes sense that database management priorities should rise to the forefront as the importance of data itself has become increasingly intense, albeit in a lagging fashion.

In technology, a strategy that ignores the demands of information is a strategy doomed to fail.

Topics: Organization | No Comments »


« Previous Entries