Wednesday, March 25, 2009

A Brief History of Data Quality

Believe it or not, the concept of data quality has been touted as important since the beginning of the relational database. The original concept of a relational database came from Dr. Edgar Codd, who worked for IBM in the 1960s and 70s. Dr. Codd’s ideas about relational databases, storing data in cross-referenced tables, were groundbreaking, but largely ignored at IBM where he worked. It was only when Larry Ellison grabbed onto the idea and began to have success with a little company named Oracle that IBM did finally pay attention. Today, relational databases are everywhere.

Even then, Dr Codd advised about data integrity. He wrote about:

  • Entity integrity – every table must have a primary key and the column or columns chosen to be the primary key should be unique and not null.
  • Referential integrity – consistency between coupled tables. With certain values, there are obvious relationships between tables. The same ZIP code should always refer to the same town, for example.
  • Domain integrity – defining the possible values of a value stored in a database, including data type and length. So if the domain is a telephone number, the value shouldn’t be an address.

He put everything else into something he called 'business rules' to define specific standards for your company. An example of a business rule would be for companies who store part numbers. The part number field would have a certain length and data shape – domain integrity – but also have certain character combinations to designate the category and type of part – business rules.

The point is, information quality is not something new. It was something that the database pioneers even knew theoretically in the 1970s. In the old days, when the systems were inflexible, you may have been forced to break it.

For example, a programmer who may have worked for you in the past used 99/99/9999 in a date field to designate an inactive account. It all works fine when the data is used within the single application. However, these sorts of shortcuts cause huge headaches for the data governance team as they try to consolidate and move data from silo to enterprise-wide.

To solve these legacy issues, you have to:
  • Profile data to realize that some dates contain all 9s – one of the advantages of using data profiling tools in the beginning of the process.
  • Figure out what the 9s mean by collaborating with members of the business community.
  • Plan what to do to migrate that data over to a data model that makes more sense, like having an active/inactive account table.

If you take that one example and amplify it across thousands of tables in your company, you’ll begin to understand one of the many challenges that data stewards face as they work on migrating legacy data into MDM and data governance programs.

Friday, March 20, 2009

The Down Economy and Data Integration

Vendors, writers and analysts are generating a lot of buzz about the poor economic growth conditions in the world. It’s true that in tough times, large, well-managed companies tend to put off IT purchases until the picture gets a bit rosier. Some speculate that the poor economy will affect data integration vendors and their ability to advance big projects with customers. Yet, I don’t think it will have a deep or lasting impact. Here are just some of the signs still seem to point to a strong data integration economy.

Stephen Swoyer at TDWI wrote a very interesting article that attempts to prove that data integration and BI projects are going full-steam ahead, despite a lock-down on spending in other areas.

Research from Forrester seems to suggest that IT job cuts in 2009 won’t be as steep as they were in the 2001/2002 dot com bubble burst. Forrester says that the US market for jobs in information technology will not escape the recession, with total jobs in IT occupations down by 1.2% in 2009, but the pain will be relatively mild compared with past recessions. (You have to be a Forrester customer to get this report.)

You can read the article by Doug Henschen from Intelligent Enterprise for further proof on the impact of BI and real time analytics. The article contains success stories from Wal-Mart, Kimberly-Clark and Goodyear, too.

On this topic, SAP BusinessObjects recently asked me if I’d blog about their upcoming webinar on this topic entitled: Defy the Times: Business Growth in a Weak Economy. The concept of the webinar being that you can use business intelligence and analytics to cut operating expenses and discretionary spending and improve efficiencies. It might be a helpful webinar if you’re on a data warehouse team and trying to prove your importance to management during this economic down-turn. Use vendors to help you provide third-party confirmation of your value.

So, is the poor economy threatening the data integration economy? I don’t think so. When you look at the problems of growing data volumes and the value of data integration, I don’t see how these positive stories can change any time soon. You can run out of money, but the world will never run out of data.

Sunday, March 15, 2009

Data Governance and the Coke Machine Syndrome


I was in a meeting last week and recognized the Coke Machine Syndrome, an important business parable that I learned from an old boss. All meetings can fall victim to it, not just data governance meetings. Since meeting management is so crucial to the success of a data governance initiative, you should learn to recognize it and nip it in the bud as quickly as possible.

Data Governance and the Coke Machine Syndrome
The scene is your company’s conference room. You have just presented your new plan outlining the data governance projects for the entire year. The plan outlines where you’re going to spend this year to improve data quality. Each department argues persuasively for support from the data governance team. With some significant growth goals for the coming year, marketing and sales claims they can’t make it without better data for promotions. Manufacturing obviously can’t reach new goals for efficiency without improving the data within the ERP system. And administration simply must have better data for better metrics in the data warehouse to understand the business.

After limited discussion, the budget is approved and 95% of your team’s expenses have been committed for the current budget. This part of the meeting allocating millions of dollars and takes place in about 60 minutes.

The Coke Machine
At this point, the meeting leader mentions that the company has been considering the installation of a Coke machine in this section of the building. With a few minutes left in the meeting, he asks what drinks people want in the machine.

For the next 45 minutes, the debate rages with a heightened level of intensity. Should it be placed near the stairway, or in the employee cafeteria, or in the stairwell? Should it contain Pepsi products instead of Coke? Should it contain Red Bull? Should the bottles be recyclable, and how will the recyclable materials be handled?

By the time the meeting adjourns, nearly as much time has been spent on the Coke machine as has been spent on the entire data governance budget for the year. The Coke machine discussion is an incredible waste of management time and effort.

Why does it Happen
Coke machine syndromes happen because everyone knows about Coke machines and everyone has a stake in the decision. Knowledge about the issue makes it easier to speak up about the Coke machine than it would be to speak up about a complicated issue like the budget.

Managing it
To manage the Coke machine syndrome, you must recognize it when it occurs. You can identify this syndrome whenever a small, easily understood issue begins to consume more time than it should. There is usually a full range of logical, well-supported, and totally divergent opinions of what must be done, too.

Make sure you call it what it is. In other words, label it with the term: Coke Machine Syndrome and define it for your team. When it happens, you have a short-hand term that you can use to describe what’s happening.

Before each meeting, think about what items on your meeting agenda might turn into a Coke machine syndrome. If you can recognize it, that can be a big help. Many find it helpful to conduct pre-meetings with certain team members to prepare them for simple decisions without having to vet ideas in a meeting.

Finally, if calling it the Coke machine syndrome doesn't work, just use the phrase let’s take it off-line and move on.

Monday, March 2, 2009

Top Six Traits of a Data Champion


Data champions play a crucial role in making data governance successful. The data champions are enthusiastic about the power of data and in just about every company that has successfully implemented data governance, they often lead the way.

Let's take a look at what you must do in order to lead your organization to data governance. Here are the top six characteristics:

1. Passion. Champions are passionate about data governance and promote its benefit to all whom they meet. They are the vision of data governance, developing new efficient processes and working through any issues of non-cooperation that arise. If the data champion finds him/herself losing your passion for data management, it’s time for regime change.

2. Respect. A data champion is someone who is the glue between executives, business, IT and third-party providers. The data champion role requires someone who has both technology and business knowledge – someone who can communicate with others and build relationships as needed. In a way, a data champion is a translator, translating the technologist's jargon of schemas and metadata into business value, and vice versa. To do that, you really need to understand what makes all sides tick and have the respect of the team.

3. Maven-dom. A ‘maven’ is someone who wants to solve other people's problems, generally by solving his own, according to Malcom Gladwell, author of The Tipping Point (and another good book for data champions to read). A maven’s social skills and ability to communicate are powerful tools in evangelizing data governance. A data champion needs to be socially connected and willing to reach out and to share what is known about data governance. It is not easy for some to create and maintain relationships. If you’re the type of person who prefers closing the office door to avoid others, you may not be an effective data champion.

4. Persuasiveness. One of the success traits of a good data champion is that they have vision and they can sell it. Working with others within your organization to develop a vision is important, but the data champion is the primary marketer of the vision. Successful data champions understand the power of the elevator pitch and are willing to use it to promote the data governance vision to all who will listen. The term elevator pitch describes a sales message that can be delivered in the time span of an elevator ride. The pitch should have a clear, consistent message and reflects your goals to make the company more efficient through data governance. The more effective the speech, the more interested your colleagues will become.

5. Positive Attitude. A data champion must smile and train themselves to think positively. Why? Positive thinking is contagious and your optimism will build positive energy for your project. Data champions smile and speak optimistically to give others the confidence to agree with them. As a champion, you will encounter negative people who will attempt to set up road blocks in front of you. But as long you’re optimistic and respond positively, you will inspire team members to join your quest and share in your success.

6. Leadership. A data champion is a leader above all, so studying the qualities of successful leaders will serve you well. This is a catch-all category because leadership also has many faces and traits. Before you begin to champion the cause of data governance, read books like The 21 Indispensable Qualities of a Leader: Becoming the Person Others Will Want to Follow
where author John Maxwell identifies areas for you to work on.

Those are my top six qualities of a data champion. You’ll notice that I didn’t particularly put anything about technical expertise, although it is implied in number two. That’s because being a data champion is as much about managing people and resources than it is about technical know-how.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.