news / tech talk
Data Mining
by Lee LeClair08/10/2007
As seen in Inside Tucson Business
- Arizona Daily Star
- BizPlanIt
- Linux World Expo
- 40 under 40
- They're on the A-List
- Dotche system built by Ephibian
- AzBusiness
- Arizona Daily Star
- Arizona Daily Star
- Phoenix Business Journal
- Ranking Arizona
- The Arizona Republic
- Hostingtech.com
- American City Business Journals Inc.
- AZtechBiz
- Inside Tucson Business
- Arizona Business Gazette
- Inside Tucson Business
- Fiesta Mall
- Arizona Daily Star
- .com Success!
- Business Wire
- Buck's Woodside Menu
- CRN
- Arizona Daily Star
- LocalBusiness.com
- The Business Journal - Phoenix
- Phoenix Business Journal
- LocalBusiness.com
- Business Wire
- Inside Tucson Business
- internet.com
- AzBusiness
- AZtechBiz
- designshops.com
- AZtechBiz
- BizAZ
- Virtualized Cloud
- Collaboration and Communication
- Personally Identifiable Information
- Cyberwarfare
- iPad and E-Readers
- Trusted Platform Module
- Smartphone Data Security
- Cyber-Espionage
- DTNs
- Have a Plan
- Cloud Computing - Part 2
- Impact of Technology on Existing...
- Data Archiving
- Mobile Telephony - Part 2
- Cloud Computing
- Social Networks
- Password Management
- Netbooks
- Microtargeting
- Packet Analysis
- IP v6
- Surge Protection
- Traveling Safely
- Thin Client
- Uptime
- Mobile Telephony
- Know Thy Programs
- Voice Over IP - Part 3
- Google Apps
- Virtual Computing
- Securing Users
- Simple Desktop Management
- Service Oriented Architecture
- Light-based Communication
- Data Mining
- Small Business Architecture
- Voice Over IP - Part 2
- Business Automation
- Database Needs
- DMZs
- CPUs
- SPAM & Botnets
- Security Testing
- Customer Advocacy
- Laptop Security
- Windows Vista
- Large Scale Deployment
- Network Access Control
- Generator Use
- Uninterrupted Power Supplies
- Web Site Security
- Blu-ray vs. HD-DVD
- Dual-Core Processors
- Business Security
- AJAX
- 3G Mobile Internet
- Apple Intel Processors
- Entertainment Tech
- Cafe Wireless
- Commercial Hosting
- Gaming Consoles
- Voice Over IP
- Blogging
- Is WI-FI Secure?
- OpenDocument Format
- Allured Publishing Changes Name to...
- Computer Model Can Help Prevent War?
- Defense contractors run gamut from...
- ASU gears on-site construction...
- The Cleveland Foundation Selects...
- Global Partners Join Forces to Speed...
- Intuit Completes Acquisition of...
- Strategy unveiled on how tobacco tax...
- Gaiam's, Real Goods' revenues increase...
- LSST Awarded Time on TeraGrid
- Aldine Independent School District...
- Miraval featured in Natural Solutions...
- Ventana Medical Systems Joins TSIA to...
- UA $3 Million Bioterrorism Grant...
- Arizona Center for Integrative...
Recently, it made the geek news that police in Richmond had used data mining to predict where and when crimes were most likely to occur in their jurisdiction. After taking preventative steps based on this information, they were able to reduce crime by 25% the first year and a further 19% the following year. The interesting thing about this was that they used their existing data and simply set about organizing it to determine if there were patterns that might lead to an increased risk of crime.
In essence, they made use of the ocean of data that they had collected to do something useful. Data mining is not a new concept but it is most commonly associated with major corporations or the NSA and lots of expensive hardware. However, the increasing power of hardware and the decreasing costs of business intelligence software have brought data mining down to a level where medium sized businesses can take advantage of it to attain real results if they apply it correctly.
Data mining involves examining your data in ways that were often not intended when the data was originally collected. It requires thinking about what you want to learn and then figuring out how to coax that data out of your system. This is where the business intelligence software comes in. BI programs help determine ways to ask the right questions. Powerful hardware is usually necessary because the mining effort involves scouring a very large amount of data for the pieces of the puzzle that you are trying to put together. The more mundane DB systems that collected the data were usually designed to collect data slowly and put it in large storage and query only for small parts of the data.
On the other hand, when mining data, it helps to have a separate and dedicated DB server with the data spread around to multiple smaller hard disks. This allows the query processes to speed up access to the data since its spread across multiple disks. The advent of dual and even quad core processors also helps the process; parallel processing is generally better than serial processing. Finally, it helps to organize your data in a different way than when it was collected.
Most data designs are geared for Online Transaction Processing (OLTP). These types of data designs are meant to maximize data integrity during the write process and reduce data duplication (i.e., a given set of data is stored only once). For data mining, Online XXXX (OLAP) data designs are a better fit. These designs optimize query speeds and are not as concerned about data integrity during the entry process. They are not at all good for OLTP but they are great for business intelligence queries. Naturally, it is also a good idea to have this data mining database copied from the original source so that it can be abused with lots of queries that would take the source database to its knees and disrupt operations.
The key to getting what you need out of your data is understanding what you have and what is possible to get from it. If you have crime statistics, crime location information, etc. then you should not expect to be able to determine tomorrow’s weather. You should be able to figure out what times of month there are spikes in crimes and in what areas. Then you can look at external information, like when it is payday for local employers and how far frequent crime locations are from cash machines. If you have a small or medium sized business, think about the data you have likely amassed and what you might be able to obtain from examining it. Make no mistake, it will take careful consideration and planning, but data mining could help you take your business to the next level.
Lee Le Clair is the CTO at Ephibian. His Tech Talk column appears the third week of each month in Inside Tucson Business
In essence, they made use of the ocean of data that they had collected to do something useful. Data mining is not a new concept but it is most commonly associated with major corporations or the NSA and lots of expensive hardware. However, the increasing power of hardware and the decreasing costs of business intelligence software have brought data mining down to a level where medium sized businesses can take advantage of it to attain real results if they apply it correctly.
Data mining involves examining your data in ways that were often not intended when the data was originally collected. It requires thinking about what you want to learn and then figuring out how to coax that data out of your system. This is where the business intelligence software comes in. BI programs help determine ways to ask the right questions. Powerful hardware is usually necessary because the mining effort involves scouring a very large amount of data for the pieces of the puzzle that you are trying to put together. The more mundane DB systems that collected the data were usually designed to collect data slowly and put it in large storage and query only for small parts of the data.
On the other hand, when mining data, it helps to have a separate and dedicated DB server with the data spread around to multiple smaller hard disks. This allows the query processes to speed up access to the data since its spread across multiple disks. The advent of dual and even quad core processors also helps the process; parallel processing is generally better than serial processing. Finally, it helps to organize your data in a different way than when it was collected.
Most data designs are geared for Online Transaction Processing (OLTP). These types of data designs are meant to maximize data integrity during the write process and reduce data duplication (i.e., a given set of data is stored only once). For data mining, Online XXXX (OLAP) data designs are a better fit. These designs optimize query speeds and are not as concerned about data integrity during the entry process. They are not at all good for OLTP but they are great for business intelligence queries. Naturally, it is also a good idea to have this data mining database copied from the original source so that it can be abused with lots of queries that would take the source database to its knees and disrupt operations.
The key to getting what you need out of your data is understanding what you have and what is possible to get from it. If you have crime statistics, crime location information, etc. then you should not expect to be able to determine tomorrow’s weather. You should be able to figure out what times of month there are spikes in crimes and in what areas. Then you can look at external information, like when it is payday for local employers and how far frequent crime locations are from cash machines. If you have a small or medium sized business, think about the data you have likely amassed and what you might be able to obtain from examining it. Make no mistake, it will take careful consideration and planning, but data mining could help you take your business to the next level.
Lee Le Clair is the CTO at Ephibian. His Tech Talk column appears the third week of each month in Inside Tucson Business