Thursday, February 19, 2009

Syncsort and Trillium Software Partnership

When you think of Syncsort, you think of, well… sorting. SyncSort offers their flagship product - a high-performance sort utility - that has been used for years to decrease processing time for large volumes of data. In the case of multiple customer databases, for example, you may want to sort the files different ways and compare them on many different keys. Sorting on multiple keys is a very resource-intensive data processing function, so maximizing sorting speed and efficiency is crucial.
SyncSort’s sheer performance is made possible by a fast, but proprietary sorting algorithm. Because of that performance boost, many Trillium Software customers use Syncsort sorting as part of their batch data quality processes.
On the other hand, when your company is named after what you do, it’s hard to change what you do. Syncsort's DMExpress has little to do with sorting, but instead is the company's low cost ETL tool. Trillium Software recently announced connectivity between Syncsort and the Trillium Software System. Trillium Software’s fast, scalable data cleansing combined with Syncsort’s fast scalable ETL makes for a great pairing.
I’m fascinated by some of the metrics that Syncsort has posted on their web site. An independent benchmark claims that it’s the fastest ETL ever. DMExpress extracted, transformed, cleansed and loaded 5.4 TB of raw data into the Vertica Analytic Database in 57 minutes 21.51 seconds, using HP BladeSystem c-Class running RedHat. In other words, low cost hardware and record performance. It beats the big boys of ETL on many levels.
Many of the case studies I read on Syncsort’s web site are from companies who can finally afford to get rid of those slow, hand-coded ETL processes. When you reduce extraction time by over 80% in many cases, it gives you the ability to provide business intelligence that’s a lot more current, and that’s a big deal. For a quick, low cost ETL, DMExpress makes perfect sense.

Wednesday, February 11, 2009

Using Data Quality Tools to Look for Bad Guys

Most companies do not want to do business with bad guys - those on the FBI most wanted or international terrorists. Here in Boston, we’re always on the lookout for James “Whitey” Bulger, a notorious mobster who has been on the FBI most wanted list for years. But how do you really know of you’re doing business with bad guys if you don’t pay attention to data quality?
If you work for a financial organization, you may be mandated by your country's government to avoid doing business with the bad guys. The mandates have to do with the lists of terrorists offered by the European Union, Australia, Canada and the United States. For example, in the U.S., the US Treasury Department publishes a list of terrorists and narcotics traffickers. These individuals and companies are called "Specially Designated Nationals" or "SDNs." Their assets are blocked and companies in the U.S. are discouraged from dealing with them by the Office of Foreign Asset Control (OFAC). In the U.K., the Bank of England maintains a separate list but with similar restrictions.
If your company fails to identify and block a bad guy (like Whitey here), there could be real world consequences such as an enforcement action against your bank or company, and negative publicity. On the other hand, many cases may be a "false positive," where the name is similar to a bad guy's name, but the rest of the information provided by the applicant does not match the SDN list. The false positives can make for poor customer relationships.
If you have to chase bad guys in your data, you need to make data quality a prerequisite. Data quality tools can help you both correctly identify foreign nationals on the SDN list and lower the number of false positives. If the data coming into your system is standardized and has all of the required information as mandated by your governance program, matching technologies and more easily and more automatically identify SDNs, and avoid those false positives.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.