Sunday 25 December 2016

DBMS :

Normalization

What is Normalization?
Normalization is the process of efficiently organizing
data in a database. There are two goals of the
normalization process: eliminating redundant data
(for example, storing the same data in more than one
table) and ensuring data dependencies make sense
(only storing related data in a table). Both of these
are worthy goals as they reduce the amount of space
a database consumes and ensure that data is
logically stored.

The Normal Forms:
The database community has developed a series of
guidelines for ensuring that databases are
normalized. These are referred to as normal forms
and are numbered from one (the lowest form of
normalization, referred to as first normal form or 1NF)
through five (fifth normal form or 5NF). In practical
applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form
is very rarely seen and won't be discussed in this article.
Before we begin our discussion of the normal forms, it's important to point out that they are
guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet
practical business requirements. However, when variations take place, it's extremely important
to evaluate any possible ramifications they could have on your system and account for possible
inconsistencies. That said, let's explore the normal forms.

First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
For more details, read Put t ing your Dat abase in First Normal Form

Second Normal Form (2NF)
Second normal form (2NF) further addresses the
concept of removing duplicative data:
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique
column or set of columns (the primary key).
of a table and place them in separate tables.
Create relationships between these new tables and
their predecessors through the use of foreign keys.

Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.

Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds
one more requirement:
Meet all the requirements of the third normal form.
Every determinant must be a candidate key.

Fourth Normal Form (4 NF)
Finally, fourth normal form (4NF) has one additional requirement:
Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.

Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must
first fulfill all the criteria of a 1NF database.

Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows

Data Communication & Networks:

Notes on Data Communication & Networks :Part II

Hello Friends!


Data Communication & Networks is one of the important subjects in UGC NET exam on which lots of questions are asked. So, prepare this subject well. The syllabus of the subject is vast, but is one of the easiest subjects to prepare. 

Please read the post and then post your comments/suggestions:

ETHERNET: Ethernet frames must carry a minimum payload of 46 bytes, which ensures that a valid Ethernet frame is 512-bits long (considering bits of header section also). The shortest Ethernet frame is 64 bytes in length, which carry Control messages.

Ethernet frames don't have a way to indicate end-of-frame, but an inter-frame gap (of time required to send 96 bit of data, i.e. 9.6 micro sec.) is used.

Horn antennas are very popular at UHF (300 MHz-3 GHz) and higher frequencies. They have a wide impedance bandwidth, implying that the input impedance is slowly varying over a wide frequency range.
S-parameters describe the input-output relationship between ports (or terminals) in an electrical system. A port can be defined as any place where we can deliver voltage and current.
Waveforms are not made up of a discrete number of frequencies, but rather a continuous range of frequencies (sum of frequencies).
Through modulation the waveforms can be relocated to separate frequency bands. Television is broadcast primarily at 54-216 MHz. FM radio operates between 87.5-108 MHz.
Frequency Band Name
Frequency Range
Wavelength (Meters)
Application
Extremely Low Frequency (ELF)
3-30 Hz
10,000-100,000 km
Underwater Communication
Super Low Frequency (SLF)
30-300 Hz
1,000-10,000 km
AC Power (though not a transmitted wave)
Ultra Low Frequency (ULF)
300-3000 Hz
100-1,000 km
Earth Mode Communications*
Very Low Frequency (VLF)
3-30 kHz
10-100 km
Navigational Beacons
Low Frequency (LF)
30-300 kHz
1-10 km
AM Radio
Medium Frequency (MF)
300-3000 kHz
100-1,000 m
Aviation and AM Radio
High Frequency (HF)
3-30 MHz
10-100 m
Shortwave Radio
Very High Frequency (VHF)
30-300 MHz
1-10 m
FM Radio
Ultra High Frequency (UHF)
300-3000 MHz
10-100 cm
Television, Mobile Phones, GPS
Super High Frequency (SHF)
3-30 GHz
1-10 cm
Satellite Links, Wireless Communication
Extremely High Frequency (EHF)
30-300 GHz
1-10 mm
Astronomy, Remote Sensing
Visible Spectrum
400-790 THz (4*10^14-7.9*10^14)
380-750 nm (nanometers)
Human Eye
* Communications through the ground using conduction fields e.g. military communications.

Slot Time: It is twice the time it takes for an electronic pulse to travel the maximum distance between two nodes. Thus Propagation delay takes half of the time of slot time since it is only the measure of the time required for signal to reach from node A to B. Slot time is used for half-duplex Ethernet network operation. It is 512 bit times for Ethernet networks operating below 1 Gbit/s, and 4096 bit times for Gigabit Ethernet. To reliably detect collisions, the minimum transmission time for a complete frame must be at least one slot time, whereas the round-trip propagation delay must be less than a slot time (half of slot time).

Back-off Algorithm: Once a collision is detected by simultaneous transmitters, they will follow a backoff algorithm, which requires each transmitter to wait an integral number of slot times (51.2 µs) before attempting a new transmission sequence. The integer is determined by the equation:
0<=r<2 power k where k = min (n, 10)
The variable k is actually the number of collisions capped at a maximum of 10. Therefore, r can range from 0 to 1023. The actual value for r is determined by a random process within each Ethernet node. As the number of consecutive collisions increases, the range of possible backoff times increases exponentially. The number of possible retries is max. 16.
There are no collisions with a full-duplex link, where each node is paired with a port on the hub.

Collision Domain - a section of a network where data packets can collide with one another when being sent on a shared medium or through repeaters, in particular.

The 5-4-3 rule: A system can have up to five segments in series, with up to four repeaters and no more than three mixing segments (a segment that may be connected to more than two transceivers).

The FCS field in Ethernet frame provides only bit-level error detection, no error recovery.

UDP is also known as laissez-faire protocol

TCP is used for unicast addresses only, so multicast applications must use the UDP transport protocol.

In asynchronous transmission, the Start bit always has a value of 0 (a Space). The Stop Bit always has a value of 1 (a Mark). This means that there will always be a Mark (1) to Space (0) transition on the line at the start of every word.

Application layer is free to send any size of data, there is no upper limit defined by standards. The lower layers divides the data if needed.
A channel with x bps may not necessarily transmit data at x rate, since protocols, encryption, and other factors can add may overheads.

The asymptotic bandwidth (formally asymptotic throughput) for a network is the measure of maximum throughput for a greedy source (a traffic generator that generates data at the maximum rate possible and at the earliest opportunity possible).

CIDR: Classless Inter-Domain Routing, known as supernetting, is a solution to limited address space problem in a network. It allocates address space to ISPs and end users on any address bit boundary, instead of on 8-bit segments (which is class based addressing). It appends to the IP address a slash character and the decimal number as routing prefix, e.g., "192.168.2.0/24" for IPv4, and 2001:db8::/32 for IPv6. The value after / indicates how many bits are used for the network prefix, leaving the remaining bits to identify the specific host.

CIDR currently uses prefixes anywhere from 13 to 27 bits. This solution fits an organization's specific needs. It helps in reducing number of entries in global routing tables. It is the concept of subnetting within the internet itself.

The industrial, scientific and medical (ISM) radio bands are radio bands (portions of the radio spectrum) reserved internationally for industrial, scientific and medical purposes other than communications. These are for unlicensed operations. Cordless phones, Bluetooth devices, near field communication (NFC) devices, and wireless computer networks all use frequencies allocated to low power communications as well as ISM.
Hartley's law- "The maximum data rate of a physical communication link is proportional to its bandwidth in hertz, which is sometimes called frequency bandwidth, spectral bandwidth, RF bandwidth, signal bandwidth or analog bandwidth."
A Baud Rate represents the number of bits that are actually being sent over the media, not the amount of data that is actually moved from one DTE device to the other. That means, baud rate decides the actual bit rate. For example, the bit rate is 9600
The Intelligent Network (IN) is the standard network architecture which allows telecom operators to differentiate themselves by providing value-added services in addition to the standard telecom services. The intelligence is provided by network nodes on the service layer (a conceptual layer within a network service provider architecture. It aims at providing middleware that serves third-party value-added services and applications at a higher application layer.)

The Internet protocol suite ( TCP/IP Model), occasionally known as the DoD model due to the foundational influence of the ARPANET. The TCP/IP model and related protocols are maintained by the Internet Engineering Task Force (IETF).

PORT: Each process that wants to communicate with another process identifies itself to the TCP/IP protocol suite by one or more ports. Application Layer talks with Transport layer through ports. A port number helps the transport layer protocols (like TCP) to know the type of content residing inside the packet.
A port is a 16-bit number, used by the host-to-host protocol to identify to which higher level protocol or application program (process) it must deliver incoming messages. There are two types of ports. Well-known port numbers(0-1023) are typically odd, because early systems using the port concept required an odd/even pair of ports for duplex operations.
The well-known ports are controlled and assigned by the Internet Assigned Number Authority (IANA) and on most systems can only be used by system processes or by programs executed by privileged users. Ephemeral ports are opposite to well-known ports. Such port number are used by clients and are contained in the UDP datagrams sent to the server.

Normally, a server will use either TCP or UDP, but there are exceptions. For example, domain name servers use both UDP port 53 and TCP port 53.

SOCKET: A socket is a special type of file handle, which is used by a process to request network services from the operating system. A socket address is the triple: <protocol, local-address, local-process>. For example, in the TCP/IP suite:
<tcp, 193.44.234.3, 12345>
An association is the 5-tuple that completely specifies the two processes that comprise a connection:
<protocol, local-address, local-process, foreign-address, foreign-process>. In the TCP/IP suite, the following could be a valid association:
<tcp, 193.44.234.3, 1500, 193.44.234.5, 21>
Two processes communicate via TCP sockets. The socket model provides a process with a full-duplex byte stream connection to another process.

UDP: UDP is basically an application interface to IP. It adds no reliability, flow-control, or error recovery to IP. It simply serves as a multiplexer/demultiplexer for sending and receiving datagrams, using ports to direct the datagrams. It requires the application to take responsibility for error recovery and so on.
Slow-start is one of the algorithms that TCP uses to control congestion inside the network. It is also known as the exponential growth phase.
Broadband means "having instantaneous bandwidths greater than 1 MHz and supporting data rates greater than about 1.5 Mbit/s." In telecommunication, Broadband refers to a communication bandwidth of at least 256 kbit/s. Each channel is 6 MHz wide.
The additive-increase/multiplicative-decrease (AIMD) algorithm is a feedback control algorithm used by  TCP for Congestion Avoidance. The approach taken is to increase the transmission rate (window size), probing for usable bandwidth, until loss occurs.
IP address classes:
Class
Leftmost bits
Start address
Finish address
A
0xxx
0.0.0.0
127.255.255.255
B
10xx
128.0.0.0
191.255.255.255
C
110x
192.0.0.0
223.255.255.255
D
1110
224.0.0.0
239.255.255.255
E
1111
240.0.0.0
255.255.255.255

IP address range for Intranets (Private Networks):

Class
Private start address
Private finish address
A
10.0.0.0
10.255.255.255
B
172.16.0.0
172.31.255.255
C
192.168.0.0
192.168.255.255

IP packets addressed by them cannot be transmitted onto the public Internet. If such a private network needs to connect to the Internet, it must use either a network address translator (NAT) gateway, or a proxy server.In IPV6, The address block fc00::/7 has been reserved for private networks.
IP officially reserves the entire range from 127.0.0.0 through 127.255.255.255 for loopback purposes.
IPv6 does not use classes. IPv6 supports the following three IP address types: unicast, multicast and anycast. IPv6 does not support broadcast. Multicast addresses in IPv6 start with 'FF' (255) just like IPv4 addresses. Unicast addresses have 3 defined scopes, including link-local, site-local and global; and multicast addresses have 14 scopes.
The number of IPv6 addresses is 1028. There is no ARP in V6. Currently, DHCP, FTP, PPP, RIP, SNMP, VPN, L2TP and Telnet do not support IPv6.
IPv6 does not require NAT. NAT, too, doesn't support V6. Currently, IPv6 packets are not forwarded.
IPv6 reserves just two special addresses: 0:0:0:0:0:0:0:0 and 0:0:0:0:0:0:0:1. IPv6 uses 0:0:0:0:0:0:0:0 internal to the protocol implementation, so nodes cannot use it for their own communication purposes. IPv6 uses 0:0:0:0:0:0:0:1 as its loopback address, equivalent to 127.0.0.1 in IPv4. The minimum size of an IP datagram is 28 bytes, including 20 bytes of header.
BIND and NSD are the most widely used DNS software on the Internet
Anycast is a network addressing and routing methodology in which datagrams from a single sender are routed to the topologically nearest node in a group of potential receivers, though it may be sent to several nodes, all identified by the same destination address. On the Internet, anycast is usually implemented by using BGP.
In denial-of-service attacks, a rogue network host may advertise itself as an anycast server for a vital network service, to provide false information or simply block service.
"6 to 4" is an Internet transition mechanism for migrating from IPv4 to IPv6, a system that allows IPv6 packets to be transmitted over an IPv4 network. 6to4 does not facilitate interoperation between IPv4-only hosts and IPv6-only hosts, but simply a transparent mechanism used as a transport layer between IPv6 nodes.
The network requests supporting DNS lookups run over TCP and UDP, port 53 by default.

SDE

Software Design Engineering

Coupling/Cohesion

 Software Design

  • Software design is a creative process, just like designing anything else
  • To see a wrong design, we can check with the requirements in the analysis model
  • To see a bad design, we need to assess the design model and analyse the components, whether the performance can be improved by changing the modules or the interfaces
         In Analysing the software Design many factors are used, such as Coupling, Cohesion, Factoring, System Shape, etc.

 Coupling

  • The degree of interdependence between two modules”
  • We aim to minimise coupling - to make modules as independent as possible
    Low coupling can be achieve by:
  • eliminating unnecessary relationships
  • reducing the number of necessary relationships
  • easeing the ‘tightness’ of necessary relationships

Types of Coupling

  • Data coupling       (Most Required)   
                               
  • Stamp coupling
     
  • Control coupling
     
  • Hybrid coupling
     
  • Common coupling
     
  • Content coupling  (Least Required)

Data Coupling 

  • Modules communicate by parameters
     
  • Each parameter is an elementary piece of data
     
  • Each parameter is necessary to the communication
     
  • Nothing extra is needed

Data coupling problems

  • Too many parameters - makes the interface difficult to understand and possible error to occur
     
  • Tramp data - data ‘traveling’ across modules before being used

Stamp coupling

  • A composite data is passed between modules
     
  • Internal structure contains data not used
     
  • Bundling - grouping of unrelated data into an artificial structure

Control coupling

  • A module controls the logic of another module through the parameter
     
  • Controlling module needs to know how the other module works - not flexible!

Hybrid coupling

  • A subset of data used as control
     
  • Example: account numbers 00001 to 99999
  • If 90000 - 90999, send mail to area code of last 3 digit (000 - 999)

Common coupling

  • Use of global data as communication between modules
     
  • Dangers of
  • ripple effect
     
  • inflexibility
     
  • difficult to understand the use of data

Content coupling

  • A module refers to the inside of another module
     
  • Branch into another module
     
  • Refers to data within another module
     
  • Changes the internal workings of another module
     
  • Mostly by low-level languages

 

Cohesion

  • “The measure of the strength of functional relatedness of elements within a module”
     
  • Elements: instructions, groups of instructions, data definition, call of another module
                                                                                                   
  • We aim for strongly cohesive modules
  • Everything in module should be related to one another - focus on the task
     
  • Strong cohesion will reduce relations between modules - minimise coupling

      Types of Cohesion

  • Functional cohesion (Most Required)
     
  • Sequential cohesion
     
  • Communicational cohesion
     
  • Procedural cohesion
     
  • Temporal cohesion
     
  • Logical cohesion
     
  • Coincidental cohesion (Least Required)

Functional cohesion

  • All elements contribute to the execution of one and only one problem-related task
     
  • Focussed - strong, single-minded purpose
     
  • No elements doing unrelated activities
  • Examples of functional cohesive modules:
  • Compute cosine of angle
     
  • Read transaction record
     
  • Assign seat to airline passenger

Sequential cohesion

  • Elements are involved in activities such that output data from one activity becomes input data to the next
     
  • Usually has good coupling and is easily maintained
     
  • Not so readily reusable because activities that will not in general be useful together
     
  • Example of Sequential Cohesion
     
  • module format and cross-validate record
     
  •    use raw record
     
  •     format raw record
     
  •    cross-validate fields in raw record
     
  •    return formatted cross-validated record
                 end module

Communicational Cohesion

  • Elements contribute to activities that use the same input or output data
     
  • Not flexible, for example, if we need to focus on some activities and not the others
     
  • Possible links that cause activities to affect each other
     
  • Better to split to functional cohesive ones
     
  • Example of Communicational Cohesion
     
  • module determine customer details
     
  •    use customer account no
     
  •    find customer name
     
  •    find customer loan balance
     
  •    return customer name, loan balance
end module
 

Procedural cohesion

  • Elements are related only by sequence, otherwise the activities are unrelated
     
  • Similar to sequential cohesion, except for the fact that elements are unrelated
     
  • Commonly found at the top of hierarchy, such as the main program module
     
  • Example of Procedural Cohesion
  • module write read and edit something
  •    use out record
     
  •    write out record
     
  •    read in record
     
  •    pad numeric fields with zeros
     
  •    return in record
         end module

Temporal cohesion

  • Elements are involved in activities that are related in time
     
  • Commonly found in initialisation and termination modules
     
  • Elements are basically unrelated, so the module will be difficult to reuse
     
  • Good practice is to initialise as late as possible and terminate as early as possible
     
  • Example of Temporal Cohesion
  • module initialise
  •    set counter to 0
     
  •    open student file 
     
  •    clear error message variable
     
  •    initialise array
       end module
 

Logical cohesion

  • Elements contribute to activities of the same general category (type)
     
  • For example, a report module, display module or I/O module
     
  • Usually have control coupling, since one of the activities will be selected
     
  • Example of Logical Cohesion
  • module display record
      use record-type, record
      if record-type is student then
             display student record
      else if record-type is staff then
             display staff record
end module
 

Coincidental cohesion

  • Elements contribute to activities with no meaningful relationship to one another
     
  • Similar to logical cohesion, except the activities may not even be the same type
     
  • Mixture of activities - like ‘rojak’!
     
  • Difficult to understand and maintain, with strong possibilities of causing ‘side effects’ every time the module is modified
  • Example of Coincidental Cohesion
module miscellaneous functions
   use customer record
   display customer record
   calculate total sales
   read transaction record
   return transaction record
end module
 
Determining Module Cohesion

 
 
Other Design Factors to Consider
  • Factoring: reduce module size, clarifying system,minimise duplication of code, separating work from management, creating useful modules, simplifying
     
  • System Shape (Structure)
     
  • Redundancy
     
  • Fan-in/Fan-out
     
  • Restrictivity/Generality

DBMS

DBMS Notes Series-1

Hello Friends!

Let's discuss questions on DBMS from previous year papers. To begin with, some of the multiple choice type questions are listed below:


Q. Location transparency allows :

 I. Users to treat the data as if it is done at one location.
 II. Programmers to treat the data as if it is at one location.
 III. Managers to treat the data as if it is at one location.
 Which one of the following is correct ?
 (A) I, II and III (B) I and II only (C) II and III only (D) II only

The right answer is (A). 

Reason: Location transparency means the location of data must not matter to the person who accesses/manipulates the data. This is a feature of distributed databases, which applies to every kind of database user. According to a definition on Wikipedia, "The location of a resource doesn't matter to either the software developers or the end-users. This creates the illusion that the entire system is located in a single computer, which greatly simplifies software development."

The I and II are database users. The III is a component of distributed databases. Database Manager components are responsible for providing seamless data access to users without regards to its location. Hence, this covers all 3 choices.


Q. Which of the following is correct?

I. Two phase locking is an optimistic protocol.
II. Two phase locking is pessimistic protocol
III. Time stamping is an optimistic protocol.
IV. Time stamping is pessimistic protocol.

(A) I and III (B) II and IV (C) I and IV (D) II and III


The right answer is (D).

Reason: Optimistic Vs. Pessimistic approach: The optimistic concurrency control approach doesn't actually lock anything. It is based on the assumption that conflicts of database operations are very less. Means, when when oner transaction is executing, other transactions will not access the same data item being accessed by the executing one. It lets transactions run to completion and only checks for conflicts when they are about to commit. Thus, a transaction is executed without any restrictions until it is committed.

The pessimistic approach believes that some other transaction might try to access the same piece of data. So, in order to prevent any conflict, a transaction will first acquire all the required locks, then perform all the operations. It has two phases:

1. Growing Phase, where a transaction must first acquire all the locks.
2. Shrinking Phase, where a transaction releases all the locks one-by-one.(It cannot issue lock requests here.)

Q. Data warehousing refers to
(A) storing data offline at a separate site (B) backing up data regularly
(C) is related to data mining (D) uses tape as opposed to disk


The right answer is (C)
Reason: NOT A because: Not all data of Data warehouse stored offline as it depends on nature and usage of data.
NOT B because: A data warehouse typically stores a lot of historical data, that is often not subject to change. Data that does not change only needs to be backed up once.
NOT D because: Data may be stored using a proper mix of disks, tapes, or near-line storage.

Common data warehouse models include a data warehouse that is subject oriented, time variant, non-volatile, and integrated.


Q. The "PROJECT' operator of a relational algebra creates a new table that has always
(A) More columns than columns in original table
(B) More rows than original table
(C) Same number of rows as the original table
(D) Same number of columns as the original table

The right answer is (A) 
Reason: The number of tuples in the result of PROJECT  <list> (R) is always less or equal to the number of tuples in R.

Now, a few questions with descriptive answers:


Q. Show that 2-phase locking ensures serializability?


A. In databases and transaction processing two-phase locking, (2PL) is a concurrency control method that guarantees serializability. A transaction is said to follow the two-phase locking protocol if all locking operations (read_lock, write_lock) precede the first unlock operation in the transaction. Such a transaction can be divided into two phases:


Phase 1: Growing Phase

i)  transaction may obtain locks
ii)  transaction may not release locks

Phase 2: Shrinking Phase

i)  transaction may release locks
ii)  transaction may not obtain locks

If lock conversion is allowed, then upgrading of locks (from read-locked to write-locked) must be done during the expanding phase, and downgrading of locks (from write-locked to read-locked) must be done in the shrinking phase. Hence, a read_lock(X) operation that downgrades an already held write lock on X can appear only in the shrinking phase.


The protocol assures serializability. It can be proved that the transactions can be serialized in the order of their lock points  (i.e. the point where a transaction acquired its final lock). Two-phase locking does not ensure freedom from deadlocks.


Q. How to determine candidate key(s) for a relation?

A. Finding candidate keys is just as simple as applying some algorithm here and there.

In the first example, there are five attributes:
W H O S E
WH -> S
HOS -> E
Steps:
1. Find the attributes that are neither on the left and right side
> (none)
2. Find attributes that are only on the right side
> E
3. Find attributes that are only on the left side
> WHO
4. Combine the attributes on step 1 and 3
> since step 1 has no attributes, it’s just WHO
5. Test if the closures of attributes on step 4 are all the attributes
> in this case, it is true. Because with WH we can get S, and by HOS, we can get E.
So we have only one candidate key that is WHO.

Q. What are steps of a Database design?


A. Major Steps in Database Design are:
  1. Requirements Analysis: Talk to the potential users! Understand what data is to be stored, and what operations and requirements are desired.
  2. Conceptual Database Design: Develop a high-level description of the data and constraints (we will use the ER data model)
  3. Logical Database Design: Convert the conceptual model to a schema in the chosen data model of the DBMS. For a relational database, this means converting the conceptual to a relational schema (logical schema).
  4. Schema Refinement: Look for potential problems in the original choice of schema and try to redesign.
  5. Physical Database Design: Direct the DBMS into choice of underlying data layout (e.g., indexes and clustering) in hopes of optimizing the performance.
  6. Applications and Security Design: It defines how the underlying database will interact with surrounding applications.
Q. Differentiate b/w Specialisation and Generalisation in ER Model?

A. Specialisation:

Top-down design process; we designate subgroupings within an entity set that are distinctive from other entities in the set.
These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set.
Depicted by a triangle component labeled ISA (E.g. customer “is a” person).
Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked.

Generalisation:

A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set.
Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.
The terms specialization and generalization are used interchangeably.

Grammars

TYPES OF GRAMMAR

A grammar G is 4-tuple or quadruple G=(V,T,P,S) where
V is set of variables or non-terminals.
T is set of terminals.
P is set of productions.
S is the start symbol.
Each production is of the form α -> β where α is a non empty string of terminals and/or non-terminals and β is string of terminals and/or non-terminals including the null string. This grammar is also called as phase-structure grammar.

CHOMSKY HIERARCHY

Phase-structure grammars may be classified according to their productions. The following table would help you understand with the classifications.
S.no TYPE NAME PRODUCTION RULE RESTRICTION LANGUAGE GENERATED MACHINE WHICH RECOGNISES
1 Type 0 grammar Unrestricted grammar α -> β No restriction on length of α and β.α cannot be epsilon type o language or recursively enumerable language Turing machine
2 Type 1 grammar Context sensitive grammar α -> β Length of β must be atleast as much as the length of α Type 1 language or context sensitive language Linear Bounded automata
3 Type 2 grammar Context free grammar A->α The symbol epsilon can appear on the right side of any production. type 2 language or context free language Pushdown automaton
4 Type 3 grammar Regular grammar A->wB and/or A->w
Regular language Finite state automaton
Following are the questions from previous NET exams on the above topic.

DECEMBER 2006
JUNE 2005
JUNE 2010
32. Which of the following is the most general phase-structure grammar?
A)Regular B)Context-sensitive
C)Context free
D)Syntax tree or D) None of these
Ans:-B
Explanation:- The above question has appeared in more than 3 previous papers. The right answer is Context-sensitive grammar.
DECEMBER 2007
3) A context free grammar is :
A)type 0
B)type 1
C)type 2
Ans:- C

DECEMBER 2009
34). Context free grammar(CFG) can be recognised by
A)Finite state automaton
B)2-way linear bounded automata
C)Push down automata
D)Both B & C
Ans:- C

JUNE 2013 - PAPER III - QNo. 39
Match the following :
a. Context sensitive language     i. Deterministic finite automation
b. Regular grammar      ii. Recursive enumerable
c. Context free grammar     iii. Recursive language
d. Unrestricted grammar     iv. Pushdown automation

Ans:-

Unrestricted grammar is Recursive enumerable which is also type 0.
Context free grammar is recognised by pushdown automation
Regular grammar is recognised by Deterministic finite automation
Context sensitive language would be recursive language which is also type 1 grammar.
Choose the appropriate option by looking at the answer.

KNAP

KNAPSACK PROBLEM

JUNE 2014 - PAPER III Q.No 62

62. Consider the fractional knapsack instance n=4, (p1,p2,p3,p4) = (10,10,12,18)
(w1,w2,w3,w4)=(2,4,6,9) and M=15. The maximum profit is given by,
(Assume p and w denotes profit and weights of objects respectively).
(A)40
(B)38
(C)32
(D)30
Ans:-B

Explanation:-
Knapsack problem can be solved either using Greedy approach or through Dynamic programming. I am going to be explaining using the former. It has been proved that for the knapsack problem using greedy method we always get the optimal solution provided the objects are arranged in decreasing order of pi/wi. So, before solving the problem, let us arrange all the objects in decreasing order of pi/wi.
Capacity of the knapsack M = 15
Number of objects = 4
Profits(p1,p2,p3,p4)=(10,10,12,18)
Weights(w1,w2,w3,w4)=(2,4,6,9)
To get the solution arrange objects in decreasing order of profit/weights as shown below.
p1/w1=10/2 =5

p2/w2=10/4=2.5

p3/w3=12/6=2

p4/w4=18/9=2

Arrange in decreasing order of pi/wi, we get
Object Weight Profit Pi/wi
1 2 10 5
2 4 10 2.5
3 6 12 2
4 9 18 2

The fractions of the objects selected and the profit we get can be computed as shown below:

Remaining Capacity Object selected Weight of the object Fraction of the object selected
15 1 2 1 full unit
15-2=13 2 4 1 full unit
13-4=9 3 6 1 full unit
9-6=3 4 9 3/9 or 1/3 fraction of unit


So, the solution vector will be=(1,1,1,1/3)
Profits=1 X 10 + 1 X 10 + 1 X 12 + 1/3 X 18
Profits=10+10+12+6
Profits=20+18
=38
So the correct answer will be 38. The maximum profit is given by option (B) which is 38. Solve the following problem and see.

I. Obtain the optimal solution for the knapsack problem given the following: M=40,n=3, (p1,p2,p3)=(30,40,35) and (w1,w2,w3)=(20,25,10).

. The answer for the above problem is 82.5

DPD

Dependency Preservation Decomposition

December 2010 - Question No 17
The dependency preservation decomposition is a property to decompose database schema D, in which each functional dependency X → Y specified in F,

(A) appeared directly in one of the relation schemas Ri in the decomposed D.
(B) could be inferred from dependencies that appear in some Ri.
(C) both (A) and (B)
(D) None of these

Explanation:- The question itself requires a bit of explanation. It is not enough if you just know what is the right answer but you must also know why it is the right answer. The explanation would be a bit lengthy. Let us first dissect the question and explain some terms in terms of DBMS.
Decomposition - This means replacing a relation with a collection of smaller relations.

Relation - Relation is known as Table.

Relation Schema - This is known as Table definition. Relation Schema for a "student" relation can be shown in the following way:
Student(FirstName,LastName,DOB,Gender,Course,Regno,Address)

Definition of Dependency preservation decomposition:-
Each FD specified in F either appears directly in one of the relations in the decomposition, or be inferred from FDs that appear in some relation.

Let us consider an example for Dependency preservation

Let R be a relation R(A B C D)
Let there be 3 functional dependencies.
FD1: A->B
FD2: B->C
FD3: C->D
Let the relation R be decomposed into two more relations.
R1(A B C)  :  R2(C D)
Let us first consider the relation R1(A B C). Here between A and B the functional dependency FD1 is preserved. Between B and C, FD2 is preserved.
Let us now consider the second relation R2(C D). Between C and D the FD, FD3 is preserved. So in the two relations R1 and R2, all the 3 functional dependencies are preserved.

Let us consider an example for Non-dependency preservation

Let R be a relation R(A B C D) Let there be again 3 functional dependencies.
FD1:A->B
FD2:B->C
FD3:C->D
Let the relation be decomposed into two more relations>
R1(A C D) R2(B C)
Let us first consider the relation R1(A C D). There is no FD between A and C. There is a FD3 between C and D.
Now let us consider the second relation R2(B C). There is FD2 between B and C.
So, the two relations only support only FD's FD2 and FD3. FD1 is not supported. So these relations does not preserve dependency.
Generally there are three desirable properties of a decomposition.

  1. Lossless
  2. Dependency preservation
  3. Minimal redundancy
The above question was based on dependency preservation decomposition. This example has been taken from the dependency preservation presentation by Jason Allen. The explanation is quite good there.

SUMMARY:-

The dependency preservation decomposition is a property to be considered for decomposing a relation into two or more smaller relations. The functional dependency X->Y specified in F can appear directly in one of the relation schemas Ri in the decomposed D or it could be inferred from dependencies that appear in some Ri. So the answer for this question is C.

Ans:-C