World's most popular travel blog for travel bloggers.

Key Differences Between Strong Entity and Weak Entity

  1. The basic difference between strong entity and a weak entity is that the strong entity has a primary key whereas, a weak entity has the partial key which acts as a discriminator between the entities of a weak entity set.
  2. A weak entity always depends on the strong entity for its existence whereas, a strong entity is independent of any other entity’s existence.
  3. A strong entity is denoted with a single rectangle and a weak entity is denoted with a double rectangle.
  4. The relationship between two strong entities is denoted with single diamond whereas, a relationship between a weak and a strong entity is denoted with double diamond called Identifying Relationship.
  5. The strong entity may or may not show the total participation in its relations, but the weak entity always shows total participation in the identifying relationship which is denoted by the double line.
Design ER n Table accordingly



The following are the relational schemes of Employee, Project and Assigned-to Employee (Emp#, Emp_name, Profession), Project (Proj#, Proj_name, Chief_Architect), Assigned-to (Proj#, Emp#). 

Create appropriate samples of each relation according to the question. Write the following queries in SQL.
(i)  Get Emp# of employees working on Project numbered MCS-043.
(ii) Get details of employees working on database projects.
(iii) Finally create an optimal query tree for each query.

Given the following semi-structure data in XML, create the DTD 
(Document Type Declaration) for it 


What are the different options available for storing XML data?


  • Project Name: - Medical Store Management 
  • This project is used mainly for medical stores to maintain the details of medical store such as stock and account. 
  • This medical shop management software is so designed as to ease the work load of medical shop professionals. The main feature includes inventory and stock control, accounting, client management

Scope & Objectives 

As this is generic software it can be used by a wide variety of outlets (Retailers) to automate the process of manually maintaining the records related to the subject of maintaining the stock and cash flows. 
This project is basically updating the manual chemist inventory System To Automated inventory system, So that organization can manage their record in efficient and organized form.

This software helps you to track all the products of medical shop moreover it’s a medical shop accounting software. Flexible and adaptive software suited to medical shops or stores or pharmacies of any size. 
 Project Characteristics: 
 Customer management 
 Transition management 
 Sales management 
 Reporting

 The main goal of the application is to maintain the records of purchase, Sales and stock details with cash transaction maintenance. 
 Medical Store Management software is very needy for Medical Store . This software help them maintain day to day transactions in computer.

Background & Specification

A medical store needs to maintain its inventory of medicines and other products using a 
computerized system. It is planning to create a network of computers which should be 
placed at various sales and cash counters. It also proposes to have a centralized 
workstation for the database and system administrators. Customer orders are accepted 
at the sales counters which in turn produces a medicine collection challan. The challan 
includes the order number, name of medicine, batch number, date of expiry, shelf 
number where it is kept and quantity ordered. One order may contain more than one 
medicine. As per challan, medicines are put in a basket by a person, who passes it to 
billing assistant. Billing assistant checks the medicine is as per the challan, any 
shortcoming is either corrected or reported to customer. On receiving conformation 
from the customer the bill is generated. The cash counter collects the money as per the 
bill and dispenses the medicine to the customer.

Project Features 

 There Will be two user Accessing System System 
 Manager : Who will act As Administrator 
 Other Staff: Who Will accessing the System 
 User The Features for manager are 
       Add ,delete update any product 
 Manage store i-e(put price,make salaries,Calculate Revenue )

User Classes and Characteristics 

 User Of project include customers and staff 
 Customer can be Member or visitor who are accessing this system. 
 Staff Which act as administrator and controlling overall system 
 User Should IT literate And know to use computer 
 Cashier Should Know Data entry & Typing 
 Manager should have knowledge of Internet & Browsing 

Operating environment 

 This project will be operating in windows environment.Also compatible with internet explorer. 
 The only requirement For using this project is having machine. Design and implemention constraints 
 This project is developed using java.on the back hand for database we are using Sql server. The product is Accomplished With the login facility for user. User Documentation 
 This project will include a user manual. The user manual include Complete overview of the producst,Configuration of the Tool used (Sql Sever or other), technical details, backup procedures and Contact Information which will include email address and Ph# .

Hardware Requirements 

 Processor : 1.6 Ghz and above
 RAM : 1GB RAM 
 Monitor : 1 5”Colour Monitor 
 Processor Speed : 1.7 GHZ 
 Hard disk : 10GB HDD 
 CD Drive : 52- X CD ROM 
 Keyboard : Mercury 110 Keys 
 Mouse : Logitech Mouse

System Features Description & priorirty 

 Proposed Database is intended to store, retrieve, update, andmanipulate information related to Chemist which include 
 Order Processing & taking 
 Staff information 
 Customer Bill Detail 
 Product Details 
 Calculation of Revenue
 Searching of product 
 Remainder About Products expiry,Shaortage 
 Generate Reports

Functional Requirements 

 The software must allow input of products data from Administrator& secured access at , and from the data streaming real-time monitoring equipment 
 The project must request username and password for access to data, only after authentication will allow access to the system. 
 The project must require high levels of error correction and input validation. 
 The project must allow browsing by the Director&Staff of Cms To Acces And update information products & Customers ,vendors. 
 The project must identify the Products & Customer by a unique numeric identifier derived from a function performed on the Customer’s birth date or product Id;

Safety Requirements 

 The Database may get crashed or damaged due to some viruses or operating system requirements.Therefore Its is mandatory to have backup for your data.Ups/inverter facility should be there in case of power failure. 
 Security Requirements 
 System will use secure Database 
 Staff can just see the products & mark their attendance .They Cannot edit or,modify anything except their personal information. 
 Proper user Authentication Will be provided. 
 There should be separate account for Admin &user. So that no one else can access the database except Admin.

User requirements 

 The User Of system are Staff ,Managers and customer of the store.
 The members share assumed to have basic knowledge of computer & internet browsing While administrator of system should have more knowledge so he/she can resolve small problems and perform information’s.
 The user manual ,installation guide and other related material should be sufficient to educate the user how to use and maintain the system.

DFD 0 Level
DFD 1 level
DFD Level 1

Download complete pdf for reference DFD, ERD and Data Dicionary
Click Here

Type of ExpensesWithout SoftwareWith Software
Employee  6 3
 -- Cost (per year)5,76,0002,88,000
Software Exp.(One time)02,00,000
Medicine Expiry
(Per Year)
Customer Return
(No Stock)
 25000 0-1000
Purchase Order,
Gov. Reports,
Stock Analysis
2 hour daily
= 36500INR
@ 50INR/hr
5 min
= 29.9hrs/yr
= 1460INR
@ 50INR/hr
List of Customers (Value)0(-10000)
Total(First Year)6,70,500/year501460/year

In terms of : Medical store management system

This project is used mainly for medical stores to maintain the details of medical store such as stock and account.

This medical shop management software is so designed as to ease the work load of medical shop professionals.

The main feature includes inventory and stock control, accounting, customer management.

> Less staff
> Easy Payments
> Less Expired Products
> Autmoatic Stock Re-Order level, no need see all the shelf for the medicine stock.
> One click report of expiring soon medicine, minimum stock

There is different way to look for the system costs.

  1. Hardware needs in shop: Facilitate computer, printer and related equipment to shop (~ 1.2 lac)
  2. Software cost: 
    1. Analysis Cost
    2. Designing Cost 
    3. Developing Cost
    4. Testing Cost
    5. Installing and maintenance 

Reason why we choose Waterfall model over Spiral Model

Spiral model
  1. Good for large and critical projects
  2. Working software is produced early during the lifecycle.
  3. Large amount of risk analysis.
As we know we don't have a big project, so there is no need to use Spiral Model

Reason why we choose Waterfall model over Iterative Model

Iterative Model

  1. Produces working software early during the life-cycle.
  2. More flexible as scope and requirement changes can be implemented at low cost.
  3. Testing and debugging is easier, as the iterations are small.
  4. Low risks factors as the risks can be identified and resolved during each iteration.
We already have a fixed requirement and already documented.

SDLC Model: Waterfall Model

Reason of selecting Waterfall Model

ØThis is Small Project and Requirements Are Well understood.
ØThis model is simple and easy to understand and use.

ØWaterfall model is simple to implement and also the amount of resources required for it are minimal.
ØIt is easy to manage due to the rigidity of the model – each phase has specific deliverables and a review process.

Data Mining: What is Data Mining?


Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

Continuous Innovation

Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost.


For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.

Data, Information, and Knowledge


Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes:
  • operational or transactional data such as, sales, cost, inventory, payroll, and accounting
  • nonoperational data, such as industry sales, forecast data, and macro economic data
  • meta data - data about the data itself, such as logical database design or data dictionary definitions


The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale transaction data can yield information on which products are selling and when.


Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts.

Data Warehouses

Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed to maximize user access and analysis. Dramatic technological advances are making this vision a reality for many companies. And, equally dramatic advances in data analysis software are allowing users to access this data freely. The data analysis software is what supports data mining.

What can data mining do?

Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data.
With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.
For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. American Express can suggest products to its cardholders based on analysis of their monthly expenditures.
WalMart is pioneering massive data mining to transform its supplier relationships. WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive 7.5 terabyte Teradata data warehouse. WalMart allows more than 3,500 suppliers, to access data on their products and perform data analyses. These suppliers use this data to identify customer buying patterns at the store display level. They use this information to manage local store inventory and identify new merchandising opportunities. In 1995, WalMart computers processed over 1 million complex data queries.
The National Basketball Association (NBA) is exploring a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies. For example, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one! Advanced Scout not only finds this pattern, but explains that it is interesting because it differs considerably from the average shooting percentage of 49.30% for the Cavaliers during that game.
By using the NBA universal clock, a coach can automatically bring up the video clips showing each of the jump shots attempted by Williams with Price on the floor, without needing to comb through hours of video footage. Those clips show a very successful pick-and-roll play in which Price draws the Knick's defense and then finds Williams for an open jump shot.

How does data mining work?

While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available: statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:
  • Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.
  • Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.
  • Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative mining.
  • Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes.
Data mining consists of five major elements:
  • Extract, transform, and load transaction data onto the data warehouse system.
  • Store and manage the data in a multidimensional database system.
  • Provide data access to business analysts and information technology professionals.
  • Analyze the data by application software.
  • Present the data in a useful format, such as a graph or table.
Different levels of analysis are available:
  • Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.
  • Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution.
  • Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. CART segments a dataset by creating 2-way splits while CHAID segments using chi square tests to create multi-way splits. CART typically requires less data preparation than CHAID.
  • Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique.
  • Rule induction: The extraction of useful if-then rules from data based on statistical significance.
  • Data visualization: The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships.

What technological infrastructure is required?

Today, data mining applications are available on all size systems for mainframe, client/server, and PC platforms. System prices range from several thousand dollars for the smallest applications up to $1 million a terabyte for the largest. Enterprise-wide applications generally range in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver applications exceeding 100 terabytes. There are two critical technological drivers:
  • Size of the database: the more data being processed and maintained, the more powerful the system required.
  • Query complexity: the more complex the queries and the greater the number of queries being processed, the more powerful the system required.
Relational database storage and management technology is adequate for many data mining applications less than 50 gigabytes. However, this infrastructure needs to be significantly enhanced to support larger applications. Some vendors have added extensive indexing capabilities to improve query performance. Others use new hardware architectures such as Massively Parallel Processors (MPP) to achieve order-of-magnitude improvements in query time. For example, MPP systems from NCR link hundreds of high-speed Pentium processors to achieve performance levels exceeding those of the largest supercomputers.
Problem Detail: 

I think I have a passable understanding of what Big-O and little-o mean. I'm just wondering whether it makes sense notation-wise to state something like the following:

$$O(n^c) = o(n^k) \text{ for } k > c$$

For example, $O(n^2) = o(n^3)$ since $3 > 2$. Basically, any constant times $n^2$ will still grow more slowly than any constant times $n^3$ for large enough $n$.

Does it make sense to have both sides of an equation consist of big-O and/or little-o notations like this, or does one side of the equation have to consist of a bare function only? Thanks.

Asked By : user3280193

Answered By : D.W.

I can suggest two different plausible ways of interpreting the notation.

  • One way to understand notation like $O(n^2)$ is to treat it as denoting a set of functions. With this viewpoint, we might interpret the claim $O(n^2) = o(n^3)$ as claiming that the set $O(n^2)$ contains exactly the same set of functions (no more, no fewer) as the set $o(n^3)$.

  • Another way to understand notation like $n \lg n = O(n^2)$ is to treat this as claiming that the function $n \lg n$ is in the set $O(n^2)$. In other words, when we have $= O(n^2)$ on the right-hand side, we treat that as a sloppy short-hand for $\in O(n^2)$. With this viewpoint, we might interpret the claim $O(n^2) = o(n^3)$ as being equivalent to $O(n^2) \subseteq o(n^3)$, i.e., that the set $O(n^2)$ is a subset of the set $o(n^3)$.

Both interpretations are plausibly defensible: neither one is ridiculous or obviously wrong.

Notice how we start from two different perspectives that are very similar, yet we end up with two completely different interpretations of the claim $O(n^2) = o(n^3)$. This is a sign that writing something like $O(n^2) = o(n^3)$ is a bad idea -- it can easily be misinterpreted. It would be better if the author wrote this differently, in a way that makes it clearer what the author's intent was.

So, this notation is ambiguous or potentially confusing. If you see this in some written material and can't ask the author, you'll just have to guess what was meant from context.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

I would like to know how they obtained the expression $2n-1$ as said from the excerpt of article (p.3):

The key advantage is that in Chomsky Normal Form, every derivation of a string of n letters has exactly 2n−1 steps.

I could get how $2n$ comes since there are only 2 variables on the R.H.S of each production but couldn't get how the expression $−1$ came in $2n−1$.

Asked By : justin

Answered By : Auberon

Let $n$ be the length of a string. We start with the (non-terminal) symbol $S$ which has length $n=1$.

Using $n - 1$ rules of form $(non-terminal) \rightarrow (non-terminal)(non-terminal)$ we can construct a string containing $n$ non-terminal symbols.

Then on each non-terminal symbol of said string of length $n$ we apply a rule of form $(non-terminal) \rightarrow (terminal)$. i.e. we apply $n$ rules.

In total we will have applied $n - 1 + n = 2n - 1$ rules.


Observe following grammar in Chomsky-normal form.

$ \begin{align} S & \to AB \\ A & \to BC | AC\\ A & \to h|b\\ B & \to a \\ C & \to z \\ \end{align} $

Consider following derivation

$ \begin{align} \text{Current string} & & \text{rule applied} & & \text{#rules applied} & & \text{#length of string} \\ S & & \text{\\} & & 0 & & 1 \\ AB & & S \to AB & & 1 & & 2 \\ BCB & & A \to BC & & 2 & & 3 \\ \vdots & & \vdots & & \vdots & & \vdots \\ A\cdots CB & & \text{[multiple rules]} & & n-1 & & n \end{align} $

This last line represents a string containing only non-terminals. You can see that a string containing $n$ non-terminals is derived using $n-1$ rules. Let's continue. Applying $n$ rules of form $A \to a$ to each non-terminal in the string above gives you a string containing only terminals and thus a string from the language decided by the grammar. The length of the string has not changed (it's still $n$) but we applied an additional $n$ rules so in total we have applied $n-1 + n = 2n - 1$ rules.

While this explanation hopefully gives you an intuitive understanding, I think it would be an useful excercise to construct a formal proof using induction.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

If we have some function $f\colon \mathbb{N} \rightarrow \mathbb{N}$ that is not total, i.e. for some values $x \in \mathbb{N}$, $f(x) = {\perp}$, is $f$ always partial computable? By partial computable I mean there exists a program that $f(a)$ is defined for $a \in \mathrm{dom}(f)$ and $f(a)$ is not defined for $a \notin \mathrm{dom}(f)$.

I expect that this isn't the case but cannot come up with a counter example.

Asked By : Jed McGowan

Answered By : David Richerby

The fact that we can encode Turing machines as strings means that there are only countably many Turing machines, so only countably many computable or partial-computable anythings. Since there are uncountably many partial functions $\mathbb{N}\to\mathbb{N}$ (or even partial functions $\mathbb{N}\to\{0\})$, it follows that, in a very strong sense, "most" partial functions are not partial computable.

For a concrete example, use the fact that, if a set and its complement are both semi-decidable, they are both decidable (exercise: prove this), and the analogous result for partial functions. This means that, if you take any set $S$ that is semi-decidable but not decidable, then $\overline{S}$ is not semi-decidable. So, for example, the function $$f(x) = \begin{cases} 0 &\text{ if }x=\langle M\rangle \text{ for some TM $M$ that loops on every input}\\ {\perp} &\text{ otherwise} \end{cases}$$ is not partial computable.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

It is always claimed that Cook and Levin gives the very first NP-complete problem and proof of it.

But when I look at the actual proof I think what Cook and Levin did was reduced (or transformed) SAT to "P-bound NTM acceptance" problem.

Formally, we define

A P-bound NTM is a non-deterministic Turing machine that was given a polynomial expression p(n) associate with it, such that if n is the size of the input string, and it accepts the string iff there is an execution sequence the NTM halt at an final state within p(n) steps.

The "P-bound NTM acceptance" problem is:

For given input string s, and a P-bound NTM T, is that T accepts s?

So if I understand right, what Cook presented in his proof is

Given an instance of P-bound NTM acceptance, there is a polynomial reduction/translation that convert this instance into an instance of SAT. (P-bound NTM acceptance reduce to SAT)

with this and an easier fact that

Every instance of NP problems can be reduced into an instance of P-bound NTM acceptance. (P-bound NTM acceptance is NP-hard)

lead us to the conclusion : SAT is both NP-hard and in NP, so it is NP-complete

But in addition to that we can also see

P-bound NTM acceptance can be resolved with an NTM within polynomial time. (P-bound NTM acceptance is in NP)

This should be straight forward. The only thing to worry about is we are simulating an NTM so we need to think about the guessing staff. Fortunately we are also defining an NTM, so what we actually need is a non-deterministic universal Turing machine.

Running this machine would not require more than polynomial additional time.

According to this we have

P-bound NTM acceptance is NP-complete.

I am not to disregard Cook/Levin's work, instead, their work is still very important and historical as they finished the first NP-complete proof.

However if I were right would it open some another view of the NP-complete problems? If we use other computation model for example, the "P-bound NTM acceptance" will be different, and if some clever computation model was invented we could be able to use this result to solve P=?NP problem.

Also I think if we see P-bound NTM acceptance instead of SAT as the first (or zeroth?) NP-complete problem it would help people understand the proof of Cook-Levin's proof.


Given that every existing proof of NP-complete results is actually through it (yes, they are going through SAT, and SAT going through it), is this a natural, intuitive view that "P-bound NTM acceptance" is the ultimate initial NP-complete problem?

Is this only a matter of defintion

A comment suggests this is "P-bound NTM acceptance" is the definition of NP.

The definition of NP is (Skudkamp, Languages and Machines edition 3, Def 15.2.2)

A language L is said to be accepted in nondeterministic polynomial time if there is a nondeterministic Turing machine $M$ that accepts L within $tc_M \in O(n^r)$, where $r$ is a natural number independent of $n$. The family of languages accepted in nondeterministic polynomial time is denoted $NP$.

So the problem "Is language $L \in NP$"? is defined as

Is there a nondeterministic Turing machine $M$ that accepts L within $tc_M \in O(n^r)$, where $r$ is a natural number independent of $n$?

This is not exactly the same as "P-bound NTM acceptance".

In general, the f(n) bound TM acceptance problem is not in complexity O(f(n)). Although it is true that you can always simulate the TM within f(n) steps, but the simulation will introduce additional work, not just to decode the machine and simulation its action, but also the need to check the simulation steps is actually within the bound.

This increases the time complexity, and it is not trivial in general. See time hierachy theorems, there is a proof of "there are languages can be accepted in O(n^2) but not in O(n)" based on this argument.

It is true that "P-bound NTM acceptance" is in NP simply because those additional work can be done in P time.

It is to know that this "P-bound NTM acceptance is in NP" is also implied by Cook's theorem because SAT is in NP and PNA reduces to it. However, a direct proof is also straight forward.

Asked By : Earth Engine

Answered By : Luke Mathieson

The definition of $\mathcal{NP}$ essentially immediately gives an $\mathcal{NP}$-complete problem (P-bounded NTM acceptance), what Cook and Levin gave was a proof that this automata theoretic problem could be turned into a "natural" problem (for a certain definition of natural). So mostly people are being lazy when they refer to $SAT$ as the "first" $\mathcal{NP}$-complete problem. What they mean is that it is the first not-completely-obvious $\mathcal{NP}$-complete problem.

More importantly, along with Karp's work very soon thereafter, it helped show that the class $\mathcal{NP}$ had a useful, important meaning; that it somehow "tightly" characterised a certain level of hardness of a whole raft of well known computational problems.

Some other things worth noting:

  1. Cook doesn't mention $\mathcal{NP}$ at all - it had only just been invented at the time he was writing "The complexity of theorem-proving procedures". So it's only in retrospect that we recognise it as an $\mathcal{NP}$-completeness proof (admittedly, this realization took very little time - it was an exciting period for this foundational work).
  2. Levin, in a strict sense, didn't even give a proof of $\mathcal{NP}$-completeness, he was working with search problems (Russians always have to show off and do it the hard way).
Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

A process in an Operating System is considered to occupy several states and a process transitions between these states. A scheduler is responsible for assigning CPU time to a process. However, a process that is performing an Input Output Operation and has gone into the blocked state must transition to the ready state or the ready queue to be more precise to be considered for scheduling by the scheduler. Which component of the operating system handles this transition? And does this transition require any CPU time for operation?

Asked By : Kaustabha Ray

Answered By : Brian Hibbert

When an I/O is started, it will typically have an I/O request structure associated with it that includes items like the process ID that the I/O belonged to. When the I/O completes the device driver will typically fork into an OS level I/O subsystem in the OS which will queue the IO request packet notification to the process and move the process to a computable queue. (this may vary a little from one OS to another). The process then waits for it's turn to become computable.

And yes, anything that executes code requires CPU time... though this is considered "interrupt time" rather than user time.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

when I write a program, in a Von Neumann architecture, how the ram will be divided to receive data and instructions? For example if I have 1 MB of ram, the addresses from 0 to 499.999 for data and from 500.000 to 999.999 for instructions? Or I could find instructions mixed with data? Is divided the entire memory for example a side for all data of all programs and the other from all the instructions of all programs or when I load the memory with a single program, is reserved a small space (just for that program) and then that space is divided? Is the compiler to decide or the OS? Thanks in advance

Asked By : user5507798

Answered By : AProgrammer

Is the compiler to decide or the OS?

The architecture put some constraints, then the OS is built to work within these and adds its own constraints, then the compiler does the same thing.

how the ram will be divided to receive data and instructions?

It depends.

Or I could find instructions mixed with data?

You could very well. It's more or less a defining property of Von Neumann architectures.

Is divided the entire memory for example a side for all data of all programs and the other from all the instructions of all programs

That would be rare.

or when I load the memory with a single program, is reserved a small space (just for that program) and then that space is divided?

That's more the case. But you seem to forget segmentation and paging which make it so that the addresses that a program see are not the physical one and what a program see as continuous may be physically interleaved with the memory of other programs.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

What is the definition for k-line-connectedness of the graph ? I am in doubt whether it differs from usual k-vertex (edge) connectedness. I've encountered it in the paper titled "Np-complete problems on a 3-connected cubic planar graph and their applications" where it comes without definition. Unfortunately for me I couldn't get the definition from the context given in its authors' proofs.

Asked By : KKS

Answered By : Yuval Filmus

Judging from the proof of Lemma 1 in the paper, 3-line-connected is the same as 3-edge-connected; line is probably a translation from Japanese.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

It has been shown that any Linear Program (LP) can be solved in a polynomial number of steps. An example of such algorithm is the ellipsoid method.

To solve a problem which has $k$ variables and can be encoded in $L$ input bits, this algorithm uses $O(k^4L)$ pseudo-arithmetic operations on numbers with $O(L)$ digits.

I have a problem that can be formulated as a LP, but yet has been proven to be NP-hard. The proof is rather hard and complicated for my understanding, so I just want to argue why it is, at least, not in P.

Consider the following excerpts from here.

Primal LP

I can see why the ellipsoid algorithm (or any LP solving algorithm) would take exponential amount of steps because the number of variables $k$ is exponential in $n$ ($k = m^n$). Dual LP

Now the number of variables is not exponential anymore ($k = nm^2$) and I can not argue why solving the Dual LP would take an exponential number of steps. The only thing I could come up with is that $L$ (number of input bits) could be exponential in size because of the exponential number of constraints ($m^n$). But that is just a wild guess.

Additionally, if the statement about $L$ is true, than it is exponential in the primal LP as well, meaning that both $k$ and $L$ in $O(k^4L)$ are exponential. Does that mean that solving the dual LP is more efficient than solving the primal one, in which only $L$ is exponential but not $k$?

NB: The context is finding optimal correlated equilibria in succinct games.

Asked By : Auberon

Answered By : Yuval Filmus

The running time of algorithms for linear programming depends not only on the number of variables but also, unsurprisingly, on the number of constraints. This is hidden in the parameter $L$ which is the input size; clearly $L$ is at least the number of constraints. If there are exponentially many constraints, then any generic algorithms must take exponential time, since it has to consider all the constraints.

In some cases it is possible to solve linear programs with exponentially many constraints efficiently. This is the case if there is a separation oracle which, given a variable assignment, outputs a violated constraint (if any). If this can be done efficiently, then the ellipsoid algorithm can solve the linear program efficiently (at least approximately). If your problem is NP-hard then there is (probably) no efficient separation oracle for the dual LP. (Another possible reason for your problem to be hard is that rounding a solution could be hard or, perhaps, impossible.)

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

When using monitors for most concurrency problems, you can just put the critical section inside a monitor method and then invoke the method. However, there are some multiplexing problems wherein up to n threads can run their critical sections simultaneously. So we can say that it's useful to know how to use a monitor like the following:

monitor.enter(); runCriticalSection(); monitor.exit(); 

What can we use inside the monitors so we can go about doing this?

Side question: Are there standard resources tackling this? Most of what I read involve only putting the critical section inside the monitor. For semaphores there is "The Little Book of Semaphores".

Asked By : legen wait for it dary

Answered By : Spencer Wieczorek

Monitors and Semaphores are to accomplish the same task, to ensure that between n processes/threads that each enter their critical section atomically. Although Monitors go towards a Object Ordinate Approach to this, making the code easier to read for example. In this case a Monitor is at a higher level then a Semaphore. You can use a Semaphore to implement a Monitor. In that case a Monitor has a counter variable in the same way a Semaphore does.

The V and P of a Semaphore is very much like the wait and signal of a Monitor. In which wait and signal are used on a Monitors condition variables. They can be used to implement the same functionality of a Semaphore. Even though there are some minor differences, such as for signal with no process in need of being unblocked there is no affect on the condition variable. Along with that locks are also used within the procedure.

While in most cases you don't want to. You can use a Monitors condition variables and lock outside of it (monitor.lock,monitor.condition_i), basically doing the normal things we do within a procedure but on the outside, not within it's function. This is assuming you want to have the critical section outside of the monitor, otherwise there is no need in doing this:

// monitor enter monitor.lock.acquire(); while(...)  monitor.condition_i.wait( monitor.lock );  runCriticalSection();  // monitor exit monitor.condition_j.signal(); monitor.lock.release(); 
Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

From what I know If the predicate $P(t,x_1,...,x_n)$ belongs to some PRC class $\zeta$ then so do the predicates

$(\forall t)_{\le y}$  $P(t,x_1,...,x_n)$

$(\exists t)_{\le y}$  $P(t,x_1,...,x_n)$

But what about the unbounded quantifier? what difference does it make if I replace $(\forall t)_\le$ with $(\forall t)$ and also $(\exists t)_\le$ with $(\exists t)$ ?

Davis in his book page 43 says:

Theorem 3.3: A function is primitive recursive if and only if it belongs to every PRC class

I saw a problem related to what I said that I couldn't solve, here it is :

If $P(x)$ and $Q(x)$ are primitive recursive predicates which one of the following may not be primitive recursive:

  1. $P(x) \rightarrow Q(x)$

  2. $Q(z) \wedge P([\sqrt{x}])$

  3. $\forall x(x \le y \rightarrow P(x))$

  4. $\exists x(P(x) \wedge Q(x)) $

Since only one of the above choices is right, so I don't know 3 is the answer or 4!

Asked By : Drupalist

Answered By : David Durrleman

If $P(t,x_1,\dots,x_n) \in \zeta$, then clearly $(\forall t)_{\le 0}P(t,x_1,\dots,x_n) = P(0,x_1,\dots,x_n) \in \zeta$.

Moreover, if for some $y$, we have $(\forall t)_{\le y}P(t,x_1,\dots,x_n) \in \zeta$, then it can easily be shown that $(\forall t)_{\le y+1}P(t,x_1,\dots,x_n) = P(y+1,x_1,\dots,x_n) \land (\forall t)_{\le y}P(t,x_1,\dots,x_n) \in \zeta$, noting that taking the logical and is a primitive recursive operation.

By induction, we have that for any $y$, $(\forall t)_{\le y}P(t,x_1,\dots,x_n) \in \zeta$. Things work similarly for $(\exists t)_{\le y}$. However, this reasoning doesn't extend to unbounded quantifiers, and in the general case, $\forall t P(t,x_1,\dots,x_n) \notin \zeta$.

If you think of PRC predicates as things that can be checked by a computer in finite time, the underlying meaning is that

  • if it takes finite time to check whether a given proposition is true for any given value, then it also takes finite time to check whether it's true for any/all values in a finite set (at most the sum of the finite times taken to check for each element in the set)
  • on the other hand, it's not necessarily possible to check that it's true for any/all values in an infinite set. Naively checking for $0$, then $1$, then $2$, etc..., could take forever.

The answer to your second question is proposition 4. Indeed, in proposition 3, the universal quantifier over $x$ is bounded by the free variable $y$, which is a finite value. In other words, proposition 3 could be rewritten as $(\forall x)_{\le y}P(x)$.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

My teacher told us that when the ram runs out of space it will push a program in the cache, i argued that the program should go to the swap space in the hard drive, plus the memory cache cannot hold instructions because instructions and data have different access patterns, it can only hold data frequently accessed. So i am asking the experts, can the memory cache hold instructions ?
Edit : what i mean by memory cache is the L1 memory cache

Asked By : Baroudi Safwen

Answered By : tillz

Not perfectly sure whether i understood your question correctly. Basically, there are multiple questions:

A) 'when the system runs out of ram [...] push to cache' A cache is a dedicated storage to accelerate the access to often-used Resources. So no, neither ne CPU-cache nor any other cache is used to replace missing ram. I really think your professor wanted to explain SWAP

B) In fact, there are different architectures for computer systems. One of them is the Harvard Architecture where two different memories exist. One is reserved for instructions, while the other is used for data. This Architecture is used by the popular AVR Microcontrollers from Atmel and other MCUs.

The other architecture is the von-Neumann-architecture (sometimes also called Princeton-architecture). This one has exactly one memory, which is used for instructions and data. This architecture is used in all Computer systems we know as 'PC' today. This architecture is by the way the reason for most of the really bad security problems (e.G. buffer overflows). In a vulnerable software, an attacker can put instructions somewhere in the memory (the sections which originally should only hold 'data' like strings), and it'll be executed by the processor if its in the 'correct' location.

To return to your question: The CPU-cache doesn't care which memory gets cached. It's simply memory. The sections which are accessed more often will be cached. In a recursive function (many branches to the same location in the one memory), the CPU cache may hold instructions.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

I have a sequence of $n$ integers in a small range $[0, k)$ and all the integers have the same frequency $f$ (so the size of the sequence is $n = f * k$). What I'm trying to do now is to compress this sequence while providing random access (what is the $i$-th integer). I'm more interested in achieving high compression at the expense of higher random access times.

I haven't tried with Huffman coding since it assigns codes based on frequencies (and all my frequencies are the same). Perhaps I'm missing some simple encoding for this particular case.

Any help or pointers would be appreciated.

Thanks in advance.

Asked By : jplot

Answered By : Yuval Filmus

You can use the so-called "binomial encoding", described in one of my answers on math.stackexchange. You have to consider, though, whether it will be worthwhile. The naive encoding takes $n \log_2 k$ bits, and the best encoding saves about $O(\log n)$ bits (the best encoding uses $\log_2 \frac{n!}{f!^k}$ bits). Plug in your actual numbers to see what kind of saving you can expect.

Edit: I can't find my supposed post on math.stackexchange, so here is a summary of binomial coding (or rather multinomial coding), as it applies in this case. For a vector $x$ of length $k$, define $$ w(x) = \frac{(\sum_i x_i)!}{\prod_i x_i!}. $$ Note that for $x \neq 0$, $$ w(x) = \sum_{i\colon x_i > 0} w(x-e_i), $$ where $e_i$ is the $i$th basis vector. For example, if $k = 2$ and $x_1,x_2>0$ then $$ w(x_1,x_2) = w(x_1-1,x_2) + w(x_1,x_2-1). $$ This is just Pascal's identity $$ \binom{x_1+x_2}{x_1} = \binom{x_1+x_2}{x_1-1} + \binom{x_1+x_2}{x_1}. $$

For every vector $x$ and every $i$ such that $x_i > 0$, define $$ w(x,i) = \sum_{j < i\colon x_j > 0} w(x-e_j). $$ Given a sequence $\sigma$ of letters in $\{1,\ldots,k\}$, let $H(\sigma)$ be its histogram, a vector of length $k$. For a non-empty word $\sigma \neq \epsilon$, let $\sigma_1$ be the first symbol, and $\sigma_{>1}$ be the rest, so that $\sigma = \sigma_1 \sigma_{>1}$. We are now ready to define the encoding: $$ C(\sigma) = \begin{cases} w(H(\sigma),\sigma_1) + C(\sigma_{>1}) & \sigma \neq \epsilon \\ 0 & \sigma = \epsilon. \end{cases} $$ $C(\sigma)$ is an integer between $0$ and $\frac{n!}{f!^k}-1$.

Here is an example, with $k=f=2$. Let us encode $\sigma = 0101$: $$ \begin{align*} C(0101) &= w((2,2),0) + w((1,2),1) + w((1,1),0) + w((0,1),1) \\ &= 0 + w(0,2) + 0 + 0 = 1. \end{align*} $$ You can check that the inverse images of $0,1,2,3,4,5$ are $$ 0011,0101,0110,1001,1010,1100. $$ These are just all the solution in increasing lexicography order.

How do we decode? By reversing the encoding procedure. We know the initial histogram $H_1$. Given an input $c_1$, we find the unique symbol $\sigma_1$ such that $w(H_1,\sigma_1) \leq c_1 < w(H_1,\sigma_1+1)$. We then put $H_2 = H_1 - e_{\sigma_1}$, $c_2 = c_1 - w(H_1,\sigma_1)$, and continue. To test your understanding, try to decode $0101$ back from its encoding $1$.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 
  1. In FIFO: when process get cpu time and then go to IO. What happend in the time it is in IO? another process can run? After finishing IO it continue or "loss its turn"? Why FIFO give priority to cpu bound process?

  2. In RR: assume slice time is s and after (s-2) for eample it get IO, what happend with the 2 other seconds? it loss them?

    3.In preemptive SJF we look on the next cpu burst. What about non preemptive SJF? we lock on all the cpu in the process will need?And in genreal we look only on cpu? not on io?

Asked By : user2459338

Answered By : Wandering Logic

Each program in the system is in one of three states: running (currently using the cpu), waiting (for i/o), or ready (not waiting for i/o, but not currently using the cpu). When discussing scheduling disciplines jobs are not programs, rather each burst of cpu usage between i/o requests in a program is a job.

First-in-first-out (FIFO), round-robin (RR) and shortest job first (SJF) are scheduling disciplines for choosing the next job to run from the ready queue of the operating system. So in all three cases the jobs under discussion correspond to the set of processes that are currently in the ready state.

In FIFO as each job reaches the head of the queue it gets the cpu, when the job requests i/o then it is done. The process (not the job) enters state waiting. When the i/o is done the process is moved from state waiting to state ready, and the job corresponding to its next burst of cpu is put on the back of the FIFO queue.

A cpu bound process is one where the average cpu bursts (the jobs) are much longer than the average time spent waiting for each i/o. Since FIFO is non-preemptive when a long cpu-burst gets to the head of the queue and starts running, all of the ready processes in the queue behind that job are stuck.

In round-robin we are also referring to jobs, not processes. When a job requests i/o it is done. When the process goes from waiting to ready the next job (cpu burst) from that process joins the round-robin queue.

Non-preemptive shortest job first is only practical if you really have future knowledge about exactly how long the jobs are. If some jobs are taking longer than you expected and you aren't preempting them, then you're not really running the shortest one first.

In general scheduling disciplines are only looking at the jobs in the ready queue. One of the points of SJF, though, is to make sure that the i/o system is kept as busy as possible. Jobs usually end with an i/o request, so by scheduling the shortest job first you are getting to the next i/o request the soonest.

This insight about shortest job first is what is behind the multi-level feedback queue scheduling used in most modern operating systems: you are using past behavior to predict future job lengths, and then scheduling (expected) shortest job first (but preempting the job if it goes much beyond its expected length.) The classic paper is Corbató, F. J., M. M. Daggett, and R. C. Daley, "An experimental time-sharing system", AFIPS Conference Proceedings, Vol. 21, pp. 335-344, 1962.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

Assume I have an incomplete knowledge base, for example:

(rich(dave), poor(dave))  // dave is either poor or rich  (not rich(dave), not poor(dave))  // dave is not poor and rich at the same time. 

My questions are: 1. If I do resolution on the above clauses, will I get the empty clause? and 2. If Yes, Does that mean my knowledge base is inconsistent?

Asked By : Aida T.

Answered By : Romuald

The theory is not inconsistent and trivially admits two different models :

  • a first one in which dave is rich but not poor;
  • a second one in which dave is poor but not rich.

You're maybe confusing the "empty clause" which contains no literal, thus being always false, with the "true clause" which contains both a literal and its negation, thus being always true. Applying resolution leads to the "true clause" resolvent, not the "empty clause".

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

I'm trying to figure out a way I could represent a Facebook user as a vector. I decided to go with stacking the different attributes/parameters of the user into one big vector (i.e. age is a vector of size 100, where 100 is the maximum age you can have, if you are lets say 50, the first 50 values of the vector would be 1 just like a thermometer).

Now I want to represent the Facebook interests as a vector too, and I just can't figure out a way. They are a collection of words and the space that represents all the words is huge, I can't go for a model like a bag of words or something similar. How should I proceed? I'm still new to this, any reference would be highly appreciated.

Asked By : mabounassif

Answered By : Emre

The interests are categorical data and may be modeled as binary variables (a user either likes them or he does not). You can subsume little-used categories under broader categories. For example, a user who likes a little-known horror movie can simply be marked as liking horror movies. You can even subsume such items under multiple categories if it belongs to several.

For what you can do with the data see A Review on Data Clustering Algorithms for Mixed Data

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents

Problem Detail: 

I'm currently trying to prove a language regular (for personal amusement). The language is:

The language containing all numbers in ternary that have even bit-parity when encoded in binary.

Now, I've currently tried a few different approaches that have not led me to any success. I've tried using the pumping lemma (couldn't find anything to pump on), Myhill-Nerode (similar) and even counted the number of strings of each length for which the statement is true (my intuition is that it checks out with a probabilistic argument).

Are their any other approaches that might help here, or are there any intuitions that might be helpful? At this point, my best guess is that the language is not regular, but I don't seem to be able to come up with an explanation.

Asked By : James

Answered By : Jeffrey Shallit

This can be proven, but you need some nontrivial tools to do it.

Start with the set S = {0,3,5,6, ...} of non-negative integers having an even number of 1's in their base-2 expansion.

It is well-known that this set is "2-automatic"; that is, there is a finite automaton accepting exactly the base-2 expansions of elements of S. Furthermore, it is well-known that this set is not ultimately periodic (that is, it is not true that there exists a period P such that after some point C, if x >= C is in S then so is x+P). (If it were, then the associated Thue-Morse word 01101001... would be ultimately periodic, but it is known from Thue's 1912 paper not to contain any block repeated three times consecutively.)

Next, assume S is actually "3-automatic"; that is, there is an automaton accepting exactly the base-3 expansions of elements of S. Then by a classic theorem of Cobham (Math. Systems. Theory 3 (1969) 186-192, this would imply that S is ultimately periodic. But we have already seen it isn't.

You can find a lot more about these ideas in my book with Allouche, "Automatic Sequences". Warning, though, our proof of Cobham is a bit flawed, and a corrected version by Rigo can be found online here: .

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents