MANOHAR RATHOD: UNIT FIVE:- Knowledge discovery

Q. What is mean by Knowledge discovery?

Ans:-Knowledge discovery is a concept of the field of computer science that describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data [1]. It is often described as deriving knowledge from the input data. This complex topic can be categorized according to 1) what kind of data is searched; and 2) in what form is the result of the search represented. Knowledge discovery developed out of the Data mining domain, and is closely related to it both in terms of methodology and terminology [2].

The most well-known branch of data mining is knowledge discovery, also known as Knowledge Discovery in Databases (KDD). Just as many other forms of knowledge discovery it creates abstractions of the input data. The knowledge obtained through the process may become additional data that can be used for further usage and discovery.

Another promising application of knowledge discovery is in the area of software modernization which involves understanding existing software artifacts. This process is related to a concept of reverse engineering. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An entity relationship is a frequent format of representing knowledge obtained from existing software. Object Management Group (OMG) developed specification Knowledge Discovery Metamodel (KDM) which defines an ontology for the software assets and their relationships for the purpose of performing knowledge discovery of existing code. Knowledge discovery from existing software systems, also known as software mining is closely related to data mining, since existing software artifacts contain enormous business value, key for the evolution of software systems. Instead of mining individual data sets, software mining focuses on metadata, such as database schemas.

-----------------------------------------------------------------------------------------------------------

Q.Explain the Different Application of Knowledge Discovery?

Ans:=3.1 Games

1 Business

2 Science and engineering

3 Spatial data mining

4 Challenges

5 Surveillance

6 Pattern mining

7Subject-based data mining

----------------------------------------------------------------------------------------------------------

Q )What is mean by Web Mining?

Ans:-

Web mining - is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining

Web usage mining

Web usage mining is a process of extracting useful information from server logs i.e users history. Web usage mining is the process of finding out what users are looking for on the Internet. Some users might be looking at only textual data, whereas some others might be interested in multimedia data.

[edit] Web content mining

Web content mining is the process to discover useful information from text, image, audio or video data in the web. Web content mining sometimes is called web text mining, because the text content is the most widely researched area. The technologies that are normally used in web content mining are NLP (Natural language processing) and IR (Information retrieval). Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost.

[edit] Web structure mining

Web structure mining is the process of using graph theory to analyze the node and connection structure of a web site. According to the type of web structural data, web structure mining can be divided into two kinds:

1. Extracting patterns from hyperlinks in the web: a hyperlink is a structural component that connects the web page to a different location.

2. Mining the document structure: analysis of the tree-like structure of page structures to describe HTML or XML tag usage.

[edit] Web mining Pros and Cons

[edit] Pros

Web mining essentially has many advantages which makes this technology attractive to corporations including the government agencies. This technology has enabled ecommerce to do personalized marketing, which eventually results in higher trade volumes. The government agencies are using this technology to classify threats and fight against terrorism. The predicting capability of the mining application can benefits the society by identifying criminal activities. The companies can establish better customer relationship by giving them exactly what they need. Companies can understand the needs of the customer better and they can react to customer needs faster. The companies can find, attract and retain customers; they can save on production costs by utilizing the acquired insight of customer requirements. They can increase profitability by target pricing based on the profiles created. They can even find the customer who might default to a competitor the company will try to retain the customer by providing promotional offers to the specific customer, thus reducing the risk of losing a customer or customers.

[edit] Cons

Web mining, itself, doesn’t create issues, but this technology when used on data of personal nature might cause concerns. The most criticized ethical issue involving web mining is the invasion of privacy. Privacy is considered lost when information concerning an individual is obtained, used, or disseminated, especially if this occurs without their knowledge or consent.[1] The obtained data will be analyzed, and clustered to form profiles; the data will be made anonymous before clustering so that there are no personal profiles.[1] Thus these applications de-individualize the users by judging them by their mouse clicks. De-individualization, can be defined as a tendency of judging and treating people on the basis of group characteristics instead of on their own individual characteristics and merits.[1]

Another important concern is that the companies collecting the data for a specific purpose might use the data for a totally different purpose, and this essentially violates the user’s interests. The growing trend of selling personal data as a commodity encourages website owners to trade personal data obtained from their site. This trend has increased the amount of data being captured and traded increasing the likeliness of one’s privacy being invaded. The companies which buy the data are obliged make it anonymous and these companies are considered authors of any specific release of mining patterns. They are legally responsible for the contents of the release; any inaccuracies in the release will result in serious lawsuits, but there is no law preventing them from trading the data.

Some mining algorithms might use controversial attributes like sex, race, religion, or sexual orientation to categorize individuals. These practices might be against the anti-discrimination legislation.[2] The applications make it hard to identify the use of such controversial attributes, and there is no strong rule against the usage of such algorithms with such attributes. This process could result in denial of service or a privilege to an individual based on his race, religion or sexual orientation, right now this situation can be avoided by the high ethical standards maintained by the data mining company. The collected data is being made anonymous so that, the obtained data and the obtained patterns cannot be traced back to an individual. It might look as if this poses no threat to one’s privacy, actually many extra information can be inferred by the application by combining two separate unscrupulous data from the user.

----------------------------------------------------------------------------------------------------------

Q) What is mean by Telecommunication Network?

ANs:-A telecommunications network is a collection of terminals, links and nodes which connect together to enable telecommunication between users of the terminals. Networks may use circuit switching or message switching. Each terminal in the network must have a unique address so messages or connections can be routed to the correct recipients. The collection of addresses in the network is called the address space.

The links connect the nodes together and are themselves built upon an underlying transmission network which physically pushes the message across the link.

Examples of telecommunications networks are:

computer networks

the Internet

the telephone network

the global Telex network

the aeronautical ACARS network

Network structure

In general, every telecommunications network conceptually consists of three parts, or planes (so called because they can be thought of as being, and often are, separate overlay networks):

The control plane carries control information (also known as signalling).

The data plane or user plane or bearer plane carries the network's users' traffic.

The management plane carries the operations and administration traffic required for network management

Example: the TCP/IP data network

The data network is used extensively throughout the world to connect individuals and organizations. Data networks can be connected together to allow users seamless access to resources that are hosted outside of the particular provider they are connected to. The Internet is the best example of many data networks from different organizations all operating under a single address space.

Terminals attached to TCP/IP networks are addressed using IP addresses. There are different types of IP address, but the most common is IP Version 4. Each unique address consists of 4 integers between 0 and 255, usually separated by dots when written down, e.g. 82.131.34.56.

TCP/IP are the fundamental protocols that provide the control and routing of messages across the data network. There are many different network structures that TCP/IP can be used across to efficiently route messages, for example:

wide area networks (WAN)

metropolitan area networks (MAN)

local area networks (LAN)

campus area networks (CAN)

virtual private networks (VPN)

There are three features that differentiate MANs from LANs or WANs:

The area of the network size is between LANs and WANs. The MAN will have a physical area between 5 and 50 km in diameter.[2]

MANs do not generally belong to a single organization. The equipment that interconnects the network, the links, and the MAN itself are often owned by an association or a network provider that provides or leases the service to others.[2]

A MAN is a means for sharing resources at high speeds within the network. It often provides connections to WAN networks for access to resources outside the scope of the MAN.[2]

----------------------------------------------------------------------------------------------------------

Q What is mean by Object Oriented Data base?

ANs:-

An object database (also object-oriented database) is a database model in which information is represented in the form of objects as used in object-oriented programming. Object databases are a niche field within the broader database management system (DBMS) market dominated by relational database management systems. Object databases have been considered since the early 1980s and 1990s but they have made little impact on mainstream commercial data processing, though there is some usage in specialized areas

Comparison with RDBMSs

An object database stores complex data and relationships between data directly, without mapping to relational rows and columns, and this makes them suitable for applications dealing with very complex data.[8] Objects have a many to many relationship and are accessed by the use of pointers. Pointers are linked to objects to establish relationships. Another benefit of OODBMS is that it can be programmed with small procedural differences without affecting the entire system.[9] This is most helpful for those organizations that have data relationships that are not entirely clear or need to change these relations to satisfy the new business requirements.Technical features

Advantages:

Objects don't require assembly and disassembly saving coding time and execution time to assemble or disassemble objects.

Reduced paging

Easier navigation

Better concurrency control - A hierarchy of objects may be locked.

Data model is based on the real world.

Works well for distributed architectures.

Less code required when applications are object oriented.

Disadvantages:

Lower efficiency when data is simple and relationships are simple.

Relational tables are simpler.

Late binding may slow access speed.

More user tools exist for RDBMS.

Standards for RDBMS are more stable.

Support for RDBMS is more certain and change is less likely to be required.

Most object databases also offer some kind of query language, allowing objects to be found by a more declarative programming approach. It is in the area of object query languages, and the integration of the query and navigational interfaces, that the biggest differences between products are found. An attempt at standardization was made by the ODMG with the Object Query Language, OQL.

Access to data can be faster because joins are often not needed (as in a tabular implementation of a relational database). This is because an object can be retrieved directly without a search, by following pointers. (It could, however, be argued that "joining" is a higher-level abstraction of pointer following.)

Another area of variation between products is in the way that the schema of a database is defined. A general characteristic, however, is that the programming language and the database schema use the same type definitions.

Multimedia applications are facilitated because the class methods associated with the data are responsible for its correct interpretation.

Many object databases, for example VOSS, offer support for versioning. An object can be viewed as the set of all its versions. Also, object versions can be treated as objects in their own right. Some object databases also provide systematic support for triggers and constraints which are the basis of active databases.

The efficiency of such a database is also greatly improved in areas which demand massive amounts of data about one item. For example, a banking institution could get the user's account information and provide them efficiently with extensive information such as transactions, account information entries etc. The Big O Notation for such a database paradigm drops from O(n) to O(1), greatly increasing efficiency in these specific cases

------------------------------------------------------------------------------------------------------------

Q) What is mean by Semi-structured data?

Ans:- Semi-structured data[1] is a form of structured data that does not conform with the formal structure of tables and data models associated with relational databases but nonetheless contains tags or other markers to separate semantic elements and hierarchies of records and fields within the data.

Semi-structured data is increasingly occurring since the advent of the Internet where full-text documents and databases are not the only forms of data any more. Especially in object-oriented databases you often find semi-structured data.

[edit] Types of Semi-structured data

XML[2], other markup languages, email, and EDI are all forms of semi-structured data.

---------------------------------------------------------------------------------------------------------

Basic Clustering Techniques

We distinguish two types of clustering techniques: Partitional and Hierarchical. Their definitions are as follows
[HK01]:
Partitional : Given a database of objects, a partitional clustering algorithm constructs partitions of
the data, where each cluster optimizes a clustering criterion, such as the minimization of the sum of
squared distance from the mean within each cluster.
One of the issues with such algorithms is their high complexity, as some of them exhaustively enumerate
all possible groupings and try to find the global optimum. Even for a small number of objects,the number of partitions is huge. That’s why, common solutions start with an initial, usually random, partition and proceed with its refinement. A better practice would be to run the partitional algorithm for different sets of initial points (considered as representatives) and investigate whether all solutions lead to the same final partition.
Partitional Clustering algorithms try to locally improve a certain criterion. First, they compute the values of the similarity or distance, they order the results, and pick the one that optimizes the criterion.Hence, the majority of them could be considered as greedy-like algorithms.Hierarchical : Hierarchical algorithms create a hierarchical decomposition of the objects. They are either agglomerative (bottom-up) or divisive (top-down):

Agglomerative algorithms start with each object being a separate cluster itself, and successively
merge groups according to a distance measure. The clustering may stop when all objects are in
a single group or at any other point the user wants.These methods generally follow a greedy-like bottom-up merging.(b) Divisive algorithms follow the opposite strategy. They start with one group of all objects and
successively split groups into smaller ones, until each object falls in one cluster, or as desired.
Divisive approaches divide the data objects in disjoint groups at every step, and follow the same
pattern until all objects fall into a separate cluster. This is similar to the approach followed by
divide-and-conquer algorithms.