Services: Data Mining Project Assessment, Data Preparation For Data Mining, Data Mining Model Development, Data Mining Model Deployment, Data Mining Course: Overview for Project Managers, Data Mining Course: Overview for Practitioners, Customized Data Mining Engagements
Insight 1: Find Correlated Variables Prior to Modeling Topic: Data Understanding and Data Preparation Sub-Topic: Feature Selection Insight 2: Beware of Outliers in Computing Correlations Topic: Data Preparation Sub-Topic: Outliers Insight 3: Create Three Sampled Data Sets, not Two Topic: Modeling Sub-Topic: Sampling Insight 4: Use Priors to Balance Class Counts Topic: Modeling Sub-Topic: Decision Trees Insight 5: Beware of Automatic Handling of Categorical Variables Topic: Data Understanding and Data Preparation Sub-Topic: Feature Selection and Creation Insight 6: Gain Insights by Building Models from Several Algorithms Topic: Modeling Sub-Topic: Algorithm Selection Insight 7: Beware of Being Fooled with Model Performance Topic: Data Evaluation Sub-Topic: Model Performance
Upcoming Data Mining Seminars A Practical Introduction to Data Mining Upcoming courses (nationwide) Data Mining Level II: A drill-down of the data mining process, techniques, and applications Data Mining Level III: A hands-on day of data mining using real data and real data mining software Anytime Courses Overview for Project Managers: Train project managers on the data mining process. Overview for Practitioners: Train practitioners (data analysts, project managers, managers) on the data mining process.
Mr. Abbott is a seasoned instructor, having taught a wide range of data mining tutorials and seminars for a decade to audiences of up to 400, including DAMA, KDD, AAAI, and IEEE conferences. He is the instructor of well-regarded data mining courses, explaining concepts in language readily understood by a wide range of audiences, including analytics novices, data analysts, statisticians, and business professionals. Mr. Abbott also has taught applied data mining courses for major software vendors, including Clementine (SPSS), Affinium Model (Unica Corporation), Model 1 (Group1 Software), and hands-on courses using S-Plus and Insightful Miner (Insightful Corporation), and CART (Salford Systems).
| 02/18/2010 | 01:08 AM |
| Predictive Analytics World Recap |
|
Predictive Analytics World (PAW) just ended today, and here are a few thoughts on the conference. PAW was a bigger conference than October's or last February's and it definitely felt bigger. It seemed to me that there was a larger international presence as well. Major data mining software vendors included the ones you would expect (in alphabetical order to avoid any appearance of favoritism): Salford Systems, SAS, SPSS (an IBM company), Statsoft, and Tibco. Others who were there included Netezza (a new one for me--they have an innovative approach to data storage and retrieval), SAP, Florio (another new one for ... Read More >> |
| 02/17/2010 | 09:28 PM |
| Prinicpal Components for Modeling |
|
Problem Statement Analysts constructing predictive models frequently encounter the need to reduce the size of the available data, both in terms of variables and observations. One reason is that data sets are now available which are far too large to be modeled directly in their entirety using contemporary hardware and software. Another reason is that some data elements (variables) have an associated cost. For instance, medical tests bring an economic and sometimes human cost, so it would be ideal to minimize their use if possible. Another problem is overfitting: Many modeling algorithms will eagerly consume however much ... Read More >> |
| 02/12/2010 | 11:58 AM |
| Predictive Analytics World - San Francisco |
|
The next Predictive Analytics World is coming up next week. This is a conference look forward to very much because of the attendees; I have found that at the first two PAWs, there have a been a good mix of folks who are experts and those who are spinning up on Predictive Analytics. I'll be teaching a hands-on workshop Monday (using Enterprise Miner), and presenting a talk on using trees to ... Read More >>
|
| 01/19/2010 | 06:38 PM |
| Is there anything new in Predictive Analytics? |
|
Federal Computer Week's John Zyskowski posted an article on Jan 8, 2010 on Predictive Analytics entitled "Deja vu all over again: Predictive analytics look forward into the past". (kudos for the great Yogi Berra quote! But beware, as Berra stated himself, "I really didn't say everything I said") Back to Predictive Analytics...Pieter Mimno is quoted as stating: There's nothing new about this (Predictive Analytics). It's just old ... Read More >> |
| 01/23/2010 | 10:10 AM |
| Counting Observations |
|
Data is fodder for the data mining process. One fundamental aspect of the data we analyze is its size, which is most often characterized by the number of observations and the number of variables in the given set of data- typically measured as counts of "rows and columns", respectively. It is worth taking a closer look at this, though, as questions such as "Do we have enough data?" depend on an apt measure of how much data we have. Outcome Distributions In many predictive modeling situations, cases are spread fairly evenly among the possible outcomes, but this is not always true. Many fraud detection ... Read More >> |
| 01/06/2010 | 09:32 PM |
| Data Mining and Terrorism... Counterpoint |
|
In a recent posting to this Web log (Data Mining and Privacy...again, Jan-04-2010), Dean Abbott made several points regarding the use of data mining to counter terrorism, and related privacy issues. I'd like to address the question of the usefulness of data mining in this application. Dean quoted Bruce Schneier's argument against data mining's use in anti-terrorism programs. The specific technical argument that Schneier has made (and he is not alone in this) is: Automatic classification systems are unlikely to be effective at identifying ... Read More >> |
| 01/05/2010 | 12:53 AM |
| The Next Predictive Analytics World |
|
Just a reminder that the next Predictive Analytics World is coming in another 6 weeks--Feb 16-17 in San Francisco. I'll be teaching a pre-conference Hands-On Predictive Analytics workshop using SAS Enterprise Miner on the 15th, and presenting a text mining case study on the 16th. For any readers here who may be going, feel free to use this discount code during ... Read More >> |
| 01/05/2010 | 12:42 AM |
| Data Mining and Privacy...again |
|
A google search tonight on "data mining" referred to the latest DHS Privacy Office 2009 Data Mining Report to Congress. I'm always nervous when I see "data mining" in titles like this, especially when linked to privacy because of the misconceptions about what data mining is and does. I have long contented that data mining only does what humans would do manually if they had enough time to do it. The concerns that most privacy advocates really are complaining about is the data that one has available to make the inferences from, albeit more ... Read More >>
|
| 12/29/2009 | 11:03 AM |
| 2009 Retrospective |
|
I was thinking about top data mining trends in 2009, and searched for what others thought about it. I'll combine a few 2009 "top 3" lists here, including top trends (as described at Enterprise Regulars here), and posts here that generated the most buzz. First, the top data mining news story was IBM's purchase of SPSS. It will be very interesting to see if this continues the trend toward integration of Business Intelligence and Predictive Analytics that one sees with SAS, Tibco and now ... Read More >> |
| 01/05/2010 | 12:43 AM |
| Overlap in the Business Intelligence / Predictive Analytics Space |
|
I've received considerable feedback on the post Business Intelligence vs. Business Analytics, which has also caused me to think more about the BI space and its overlap with data mining (DM) / predictive analytics (PA) / business analytics (BA). One place to look for this, of course, is with Gartner, how they define Business Intelligence, and which vendors overlap between these industries. (I think of this in much same way as I do DM; I look to data miners to define themselves and what they do ... Read More >>
|
| 12/08/2009 | 09:00 PM |
| Business Analytics vs. Business Intelligence |
|
I used to be one that thought the term "data mining" would stay as the description of the kind of analytic work I do. To a large degree it has, but there are always new spins on things, and it seems that quite often in the business world, Predictive Analytics or Business Analytics are the terms of the day. I just came across this post from the Smart Data Collective: OLAP is Dead (Long Live Analytics), which had some fascinating graphs on hits related to the phrases OLAP and Analytics. The first shows the ... Read More >> |
Invoice Fraud Detection: Successful application of data mining by Abbott Analytics
Data Mining Levels II & III: Las Vegas, NV - April 14 - 16, 2010
Hands-on Data Mining Using Statistica: San Diego, CA - May 17 - 18, 2010
Click here for the complete list of upcoming events
Abbott, D.W., Combining Models to Improve Classification Accuracy and Robustness (PDF), The 2nd International Conference On Information Fusion - FUSION'99, San Jose, CA, July 6, 1999.