Introduction To data Minig---------An Innovation to the Acessing A Large amount of data..

Q ) What is mean By Data Mining?

Ans:-
   Data mining, a branch of computer science,[1] is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Data mining is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage. It is currently used in a wide range of profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery.

The related terms data dredging, data fishing and data snooping refer to the use of data mining techniques to sample portions of the larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These techniques can, however, be used in the creation of new hypotheses to test against the larger data populations.
---------------------------------------------------------------------------------------------------------
Q History of Data Mining?
Ans:-
 
The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data include Bayes' theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology has increased data collection, storage and manipulations. As data sets have grown in size and complexity, direct hands-on data analysis has increasingly been augmented with indirect, automatic data processing. This has been aided by other discoveries in computer science, such as neural networks, clustering, genetic algorithms (1950s), decision trees (1960s) and support vector machines (1980s). Data mining is the process of applying these methods to data with the intention of uncovering hidden patterns.[2] It has been used for many years by businesses, scientists and governments to sift through volumes of data such as airline passenger trip records, census data and supermarket scanner data to produce market research reports. (Note, however, that reporting is not always considered to be data mining.

A primary reason for using data mining is to assist in the analysis of collections of observations of behaviour. Such data are vulnerable to collinearity because of unknown interrelations. An unavoidable fact of data mining is that the (sub-)set(s) of data being analysed may not be representative of the whole domain, and therefore may not contain examples of certain critical relationships and behaviours that exist across other parts of the domain. To address this sort of issue, the analysis may be augmented using experiment-based and other approaches, such as Choice Modelling for human-generated data. In these situations, inherent correlations can be either controlled for, or removed altogether, during the construction of the experimental design.

There have been some efforts to define standards for data mining, for example the 1999 European Cross Industry Standard Process for Data Mining (CRISP-DM 1.0) and the 2004 Java Data Mining standard (JDM 1.0). These are evolving standards; later versions of these standards are under development. Independent of these standardization efforts, freely available open-source software systems like the R Project, Weka, KNIME, RapidMiner, jHepWork and others have become an informal standard for defining data-mining processes. Notably, all these systems are able to import and export models in PMML (Predictive Model Markup Language) which provides a standard way to represent data mining models so that these can be shared between different statistical applications.[3] PMML is an XML-based language developed by the Data Mining Group (DMG),[4] an independent group composed of many data mining companies. PMML version 4.0 was released in June 2009.

Research and evolutionIn addition to industry driven demand for standards and interoperability, professional and academic activity have also made considerable contributions to the evolution and rigour of the methods and models; an article published in a 2008 issue of the International Journal of Information Technology and Decision Making summarises the results of a literature survey which traces and analyzes this evolution.

The premier professional body in the field is the Association for Computing Machinery's Special Interest Group on Knowledge discovery and Data Mining (SIGKDD).[citation needed] Since 1989 they have hosted an annual international conference and published its proceedings,[8] and since 1999 have published a biannual academic journal titled "SIGKDD Explorations". Other Computer Science conferences on data mining include:

DMIN – International Conference on Data Mining

DMKD – Research Issues on Data Mining and Knowledge Discovery

ECDM – European Conference on Data Mining

ECML-PKDD – European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

EDM – International Conference on Educational Data Mining

ICDM – IEEE International Conference on Data Mining

MLDM – Machine Learning and Data Mining in Pattern Recognition

PAKDD – The annual Pacific-Asia Conference on Knowledge Discovery and Data Mining

PAW – Predictive Analytics World

SDM – SIAM International Conference on Data Mining
--------------------------------------------------------------------------------------------
Q-What are the Differnt Application of Data-Mining?
 
Ans:-
Many of these organizations are combining data mining with such things as statistics, pattern recognition, and other important tools. Data mining can be used to find patterns and connections that would otherwise be difficult to find. This technology is popular with many businesses because it allows them to learn more about their customers and make smart marketing decisions.
There are a number of applications that data mining has. The first is called market segmentation. With market segmentation, you will be able to find behaviors that are common among your customers. You can look for patterns among customers that seem to purchase the same products at the same time. Another application of data mining is called customer churn. Customer churn will allow you to estimate which customers are the most likely to stop purchasing your products or services and go to one of your competitors. In addition to this, a company can use data mining to find out which purchases are the most likely to be fraudulent.
For example, by using data mining a retail store may be able to determine which products are stolen the most. By finding out which products are stolen the most, steps can be taken to protect those products and detect those who are stealing them. While direct mail marketing is an older technique that has been used for many years, companies who combine it with data mining can experience fantastic results. For example, you can use data mining to find out which customers will respond favorably to a direct mail marketing strategy. You can also use data mining to determine the effectiveness of interactive marketing. Some of your customers will be more likely to purchase your products online than offline, and you must identify them.

While many businesses use data mining to help increase their profits, many of them don't realize that it can be used to create new businesses and industries. One industry that can be created by data mining is the automatic prediction of both behaviors and trends. Imagine for a moment that you were the owner of a fashion company, and you were able to precisely predict the next big fashion trend based on the behavior and shopping patterns of your customers? It is easy to see that you could become very wealthy within a short period of time. You would have an advantage over your competitors. Instead of simply guessing what the next big trend will be, you will determine it based on statistics, patterns, and logic.

Another example of automatic prediction is to use data mining to look at your past marketing strategies. Which one worked the best? Why did it work the best? Who were the customers that responded most favorably to it? Data mining will allow you to answer these questions, and once you have the answers, you will be able to avoid making any mistakes that you made in your previous marketing campaign. Data mining can allow you to become better at what you do. It is also a powerful tool for those who deal with finances. A financial institution such as a bank can predict the number of defaults that will occur among their customers within a given period of time, and they can also predict the amount of fraud that will occur as well.

Another potential application of data mining is the automatic recognition of patterns that were not previously known. Imagine if you had a tool that could automatically search your database to look for patterns which are hidden. If you had access to this technology, you would be able to find relationships that could allow you to make strategic decisions.
-----------------------------------------------------------------------------------------------------------
Q What are the Different Tools Or Software Present for the Data Mining in the MARKET?
 
Ans:-
Commercial data-mining software and applicationsSAS Enterprise Miner - data mining software provided by the SAS Institute.

SPSS Modeler - data mining software provided by IBM SPSS. According to Rexer's Annual Data Miner Survey in 2010, IBM SPSS Modeler (along with STATISTICA Data Miner and R) received the strongest satisfaction ratings in both 2010 and 2009.

STATISTICA Data Miner - data mining software provided by StatSoft. According to Rexer's Annual Data Miner Survey in 2010, STATISTICA Data Miner (along with IBM SPSS Modeler and R) received the strongest satisfaction ratings in both 2010 and 2009; moreover, it was rated as the primary data mining tool chosen most often (18%).

 Free data-mining software and applicationsHere are five free data mining tools, which were rated highly by TechSource's Jun Auza in 2010

.KNIME – the Konstanz Information Miner, a user friendly and comprehensive data analytics framework.

2.JHepWork – Java (multi-platform) data analysis framework developed at ANL

3.RapidMiner – is an environment for machine learning and data mining experiments

4.Weka – is a suite of machine learning software written in Java

5.Orange – is a component-based data mining and machine learning software suite

Also, according to Rexer's Annual Data Miner Survey in 2010, the open source R language overtook other tools to become the tool used by more data miners (43%) than any other.

ELKI is a university research project with advanced cluster analysis and outlier detection methods writtien in Java but little focus on commercial application.
----------------------------------------------------------------------------------------------------------
Q What is Mean by Text Data Mining?

Ans:-Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the divining of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities)..
------------------------------------------------------------------------------------------------------
Q Explain the Diffent application of Text data Mining?
 
Ans:-
ApplicationsRecently, text mining has received attention in many areas.

Security applicationsMany text mining software packages are marketed towards security applications, particularly analysis of plain text sources such as Internet news.It also involves in the study of text encryption.

 Biomedical applicationsMain article: Biomedical text mining

A range of text mining applications in the biomedical literature has been described. One example is PubGene that combines biomedical text mining with network visualization as an Internet service. Another text mining example is GoPubMed.org.[4] Semantic similarity has also been used by text-mining systems, namely, GOAnnotator.

Software and applicationsResearch and development departments of major companies, including IBM and Microsoft, are researching text mining techniques and developing programs to further automate the mining and analysis processes. Text mining software is also being researched by different companies working in the area of search and indexing in general as a way to improve their results.

 Online Media applicationsText mining is being used by large media companies, such as the Tribune Company, to disambiguate information and to provide readers with greater search experiences, which in turn increases site "stickiness" and revenue. Additionally, on the back end, editors are benefiting by being able to share, associate and package news across properties, significantly increasing opportunities to monetize content.

Marketing applicationsText mining is starting to be used in marketing as well, more specifically in analytical Customer relationship management. Coussement and Van den Poel (2008) apply it to improve predictive analytics models for customer churn (customer attrition).

Sentiment analysisSentiment analysis may involve analysis of movie reviews for estimating how favorable a review is for a movie.[8] Such an analysis may require a labeled data set or labeling of the affectivity of words. A resource for affectivity of words has been made for WordNet.

 Academic applicationsThe issue of text mining is of importance to publishers who hold large databases of information requiring indexing for retrieval. This is particularly true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access.

Academic institutions have also become involved in the text mining initiative:

The National Centre for Text Mining (NaCTeM), is the first publicly funded text mining centre in the world. NaCTeM is operated by the University of Manchester[10] in close collaboration with the Tsujii Lab,University of Tokyo. NaCTeM provides customised tools, research facilities and offers advice to the academic community. They are funded by the Joint Information Systems Committee (JISC) and two of the UK Research Councils (EPSRC & BBSRC). With an initial focus on text mining in the biological and biomedical sciences, research has since expanded into the areas of Social Science.

In the United States, the School of Information at University of California, Berkeley is developing a program called BioText to assist biology researchers in text mining and analysis.
---------------------------------------------------------------------------------------------------------
Q- Waht is mean by Multimedia Data Mining?
 
Ans:-
Multimedia information is ubiquitous and essential in many applications, and repositories of multimedia are numerous and extremely large. Consequently, researchers and professionals need new techniques and tools for extracting the hidden, useful knowledge embedded within multimedia collections, thereby helping them discover relationships between the various elements and using this knowledge in decision-making applications.

Multimedia Data Mining and Knowledge Discovery, assembling the work of leading academic and professional/industrial researchers worldwide, provides an overview of the current state-of-the-art in the field of multimedia data mining and knowledge discovery, and discusses the variety of hot topics in multimedia data mining research. Consisting of an introductory section and four topical parts, the book describes the objectives and current tendencies in multimedia data mining research and their applications. Each part contains an overview of its chapters and leads the reader with a structured approach through the diverse subjects in the field.

---------------------------------------------------------------------------------------------------------
Q) Explain the Differnent Application of Multimedia data Mining?
 
Ans:-
This year the Multimedia Data Mining workshop will bring together a diverse group of academics and industry practitioners in integrated state-of-art analysis of digital media content, multimedia database systems and multimedia data streams. The workshop will address issues specifically related to mining information from multi-modality, multi-source, multi-format data in an integrated way. This workshop also focuses on semantic understanding of multimedia content, and knowledge discovery in other complex data. Many analysis domains collect data from several sources, including static databases, streaming data, web pages, or conditionally collected data. Data appear in multiple forms, including structured, numeric, free text, video, image, speech, or combinations of several types. Analysis in these domains requires combining of techniques and integrating methods.

The aim of the workshop is to contribute in finding suitable answers to the following questions:

- What are the theoretical foundations of multimedia data mining?

- What are the problems and applications where multimedia data mining can have severe impact?

- What are the advanced architectures of multimedia data mining systems?

- What are the specific issues raised in integrated patterns extraction from multimedia data and it's components, including images, sound, video, and other non-structured data?

- What are suitable multimedia representations and formats that can help data mining in multimedia data?
Also we can use the Data mining in case of multimedia ie audio/video data mining for the Specofic server or data base retrieval..
---------------------------------------------------------------------------------------------------------
Q What is mean by World Wide Web Data Mining?

Ans:-
The term Web Data Mining is a technique used to crawl through various web resources to collect required information, which enables an individual or a company to promote business, understanding marketing dynamics, new promotions floating on the Internet, etc. There is a growing trend among companies, organizations and individuals alike to gather information through web data mining to utilize that information in their best interest.

Data Mining is done through various types of data mining software. These can be simple data mining software or highly specific for detailed and extensive tasks that will be sifting through more information to pick out finer bits of information. For example, if a company is looking for information on doctors including their emails, fax, telephone, location, etc., this information can be mined through one of these data mining software programs. This information collection through data mining has allowed companies to make thousands and thousands of dollars in revenues by being able to better use the internet to gain business intelligence that helps companies make vital business decisions.

Before this data mining software came into being, different businesses used to collect information from recorded data sources. But the bulk of this information is too much too daunting and time consuming to gather by going through all the records, therefore the approach of computer based data mining came into being and has gained huge popularity to now become a necessity for the survival of most businesses.

This collected information is used to gain more knowledge and based on the findings and analysis of the information make predictions as to what would be the best choice and the right approach to move toward on a particular issue. Web data mining is not only focused to gain business information but is also used by various organizational departments to make the right predictions and decisions for things like business development, work flow, production processes and more by going through the business models derived from the data mining.

A strategic analysis department can undermine their client archives with data mining software to determine what offers they need to send to what clients for maximum conversions rates. For example, a company is thinking about launching cotton shirts as their new product. Through their client database, they can clearly determine as to how many clients have placed orders for cotton shirts over the last year and how much revenue such orders have brought to the company.

After having a hold on such analysis, the company can make their decisions about which offers to send both to those clients who had placed orders on the cotton shirts and those who had not. This makes sure that the organization heads in the right direction in their marketing and not goes through a trial and error phase to learn the hard facts by spending money needlessly. These analytical facts also shed light as to what the percentage of customers is who can move from your company to your competitor.

The data mining also empowers companies to keep a record of fraudulent payments which can all be researched and studied through data mining. This information can help develop more advanced and protective methods that can be undertaken to prevent such events from happening. Buying trends shown through web data mining can help you to make forecast on your inventories as well. This is a direct analysis, which will empower the organization to fill in their stocks appropriately for each month depending on the predictions they have laid out through this analysis of buying trends.

The data mining technology is going through a huge evolution and new and better techniques are made available all the time to gather whatever information is required. Web data mining technology is opening avenues on not just gathering data but it is also raising a lot of concerns related to data security. There is loads of personal information available on the internet and web data mining had helped to keep the idea of the need to secure that information at the forefront.
-----------------------------------------------------------------------------------------------------------
Q What is mean by the Web Usage Data mining?

Ans:-
Web Usage Mining

Web usage mining is the third category in web mining. This type of web mining allows for the collection of Web access information for Web pages. This usage data provides the paths leading to accessed Web pages. This information is often gathered automatically into access logs via the Web server. CGI scripts offer other useful information such as referrer logs, user subscription information and survey logs. This category is important to the overall use of data mining for companies and their internet/ intranet based applications and information access.
Usage mining allows companies to produce productive information pertaining to the future of their business function ability. Some of this information can be derived from the collective information of lifetime user value, product cross marketing strategies and promotional campaign effectiveness. The usage data that is gathered provides the companies with the ability to produce results more effective to their businesses and increasing of sales. Usage data can also be useful for developing marketing skills that will out-sell the competitors and promote the company’s services or product on a higher level.

Usage mining is valuable not only to businesses using online marketing, but also to e-businesses whose business is based solely on the traffic provided through search engines. The use of this type of web mining helps to gather the important information from customers visiting the site. This enables an in-depth log to complete analysis of a company’s productivity flow. E-businesses depend on this information to direct the company to the most effective Web server for promotion of their product or service.

This web mining also enables Web based businesses to provide the best access routes to services or other advertisements. When a company advertises for services provided by other companies, the usage mining data allows for the most effective access paths to these portals. In addition, there are typically three main uses for mining in this fashion.

The first is usage processing, used to complete pattern discovery. This first use is also the most difficult because only bits of information like IP addresses, user information, and site clicks are available. With this minimal amount of information available, it is harder to track the user through a site, being that it does not follow the user throughout the pages of the site.

The second use is content processing, consisting of the conversion of Web information like text, images, scripts and others into useful forms. This helps with the clustering and categorization of Web page information based on the titles, specific content and images available.

Finally, the third use is structure processing. This consists of analysis of the structure of each page contained in a Web site. This structure process can prove to be difficult if resulting in a new structure having to be performed for each page.

Analysis of this usage data will provide the companies with the information needed to provide an effective presence to their customers. This collection of information may include user registration, access logs and information leading to better Web site structure, proving to be most valuable to company online marketing. These present some of the benefits for external marketing of the company’s products, services and overall management.
Internally, usage mining effectively provides information to improvement of communication through intranet communications. Developing strategies through this type of mining will allow for intranet based company databases to be more effective through the provision of easier access paths. The projection of these paths helps to log the user registration information giving commonly used paths the forefront to its access.

Therefore, it is easily determined that usage mining has valuable uses to the marketing of businesses and a direct impact to the success of their promotional strategies and internet traffic. This information is gathered on a daily basis and continues to be analyzed consistently. Analysis of this pertinent information will help companies to develop promotions that are more effective, internet accessibility, inter-company communication and structure, and productive marketing skills through web usage mining.
---------------------------------------------------------------------------------------------------------
Q What is mean by Predictive Analytics and Data Mining?
Ans:-
When used together, predictive analytics and data mining can make marketing more efficient. There are many techniques and methods, including business intelligence data collection.

What is Business Intelligence Data?
Business intelligence is a decision support system where information is gathered for the purpose of predictive analysis and support for business decisions. Prior to the widespread availability of data marts and reporting software, business intelligence data was gathered manually. Collecting information across corporate departments such as finance, sales and production, and correlating it into meaningful presentations created further time delays.

Current availability of business intelligence data in computer-readable form, both within a company and from online sources, make incorporation of business intelligence data into business operations more dynamic and bring it closer to real time. Instead of having to wait a week, or even a month for data, managers are now able to mine data and perform predictive analysis from multiple sources daily.

What Are Predictive Analytics?

Predictive analytics is using business intelligence data for forecasting and modeling. It is a way to use predictive analysis data to predict future patterns. It is used widely in the insurance, medical and credit industries. Assessment of credit, and assignment of a credit score is probably the most widely known use of predictive analytics. Using events of the past, managers are able to estimate the likelihood of future events.

Data mining aids predictive analysis by providing a record of the past that can be analyzed and used to predict which customers are most likely to renew, purchase, or purchase related products and services.

Business intelligence data mining is important to your marketing campaigns. Proper data mining algorithms and predictive modeling can narrow your target audience and allow you to tailor your ads to each online customer as he or she navigates your site. Your marketing team will have the opportunity to develop multiple advertisements based on the past clicks of your visitors.

Predictive analytics can aid in choosing marketing methods, and marketing more efficiently. By only targeting customers who are likely to respond positively, and targeting them with a combination of goods and services they are likely to enjoy, marketing methods become more efficient. In the best cases, predictive analytics can reduce the amount of dollars spent to close a sale.

At its most effective, business intelligence data mining can help marketing professionals anticipate and prepare for customer needs, rather than just reacting to them. And data mining can present data on demographics which may have been previously overlooked. For example, have your loyal customers gotten older or younger? Are they now shopping for maternity clothing, instead of clothing to wear to a club? Are they more hip or environmentally aware? Any combination of those changes in your customer demographics could be useful in determining what newspaper or magazine is the best venue for your print campaign and what type of campaign it should be.

When applied to marketing strategy, predictive analytics and data mining can help managers to bring in more sales, while spending less on campaigns.
------------------------------------------------------------------------------------------------------------