World's most popular travel blog for travel bloggers.
Problem Detail: 

I'm curious about two things.

  1. When we define the class called "probabilistic polynomial-time algorithm" in computer science, does it include polynomial-time algorithm with exponential space? For example, when algorithm is considered to be given a input from domain $\{0,1\}^n$, what if the algorithm internally queries its exponential sized table (ex. $0^n\to0,0^{n-1}1\to1$ and so on..) and outputs the result? Does it still polynomial-time algorithm?

  2. In theoretical cryptography, one-way function $f:\{0,1\}^*\to\{0,1\}^*$ has a requirement, which is related with hard-to-invert property, as following block. If the answer to above question is yes, is it possible to construct algorithm $A'$ to simulate exactly same as $f$ for every value in $\{0,1\}^n$ using exponential table as described in above question? If then, it implies that it's impossible to design one-way function which is definitely not true. So what have i missed?

    For every probabilistic polynomial-time algorithm $A'$, every positive polynomial $p(\cdot)$, and all sufficiently large $n$'s,

    $Pr[A'(f(U_n),1^n)\in f^{-1}(f(U_n))]<\frac{1}{p(n)}$

    where $U_n$ is random variable uniformly distributed over $\{0,1\}^n$

Asked By : euna

Answered By : Yuval Filmus

Regarding your first question, what you're missing is where your "exponential table" comes from. Your algorithm has a finite description and should work for every $n$. So it cannot explicitly contain the $n$-table for all $n$. It could contain instructions for computing the table, but then it would have to first execute them, and constructing an exponential size table takes exponential time.

On the other hand, your program could use a (supposedly) exponential amount of uninitialized space. Since its running time is polynomial, it only accesses polynomially many cells. You can implement memory access in such a way that if $T$ cells are ever accessed then $\tilde{O}(T)$ memory is used (exercise). The corresponding running time might become much worse, but still polynomial.

A third possibility are non-uniform computation models, which are popular in cryptography. Here the idea is that the algorithm is allowed to use a certain hard-coded amount of data which depends only on $n$. However, this data has to be polynomial size. This restriction comes from the interpretation of the model in terms of circuits. A machine running in polynomial time corresponds to circuits for every $n$ of polynomial size. If we now relax the constraint that all these circuits come from some single algorithm, then we get non-uniform polynomial time, which is the same computation with advice depending on $n$ of polynomial size (exercise).

The answer to the first question should obviate the second one. I would just mention that sometimes, instead of probabilistic polynomial-time algorithms, one considers (deterministic) polynomial size circuits.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

This is the problem I have:

Compute the Hamming code with odd parity for the memory word 1101 1001 0001 1011 (2 pts.). In your solution, mark the parity bits as in the following example, where parity bits are: 3, 5, 11, and 13

0001 0100 1011 1100 1001 1    P  P      P  P 

Note: This is only an example on how you should mark the parity bits. It is not a correct code word by any means.

There are 2 things I do not understand:

  1. A 16-bit memory word needs 5 check bits from this formula $2^k -1 \ge m + r$, where m is the number of data bits and r the number of check bits. But in the exercise we have just 4 check positions.

  2. The check positions usually occupy the positions that are power of 2, for example $2^0$, $2^1$, $2^2$... So why are they asking to use other positions for the check bits?

Asked By : nbro

Answered By : Ran G.

  1. They clearly say that the example is only given as an example. If you believe you need 5 parity bits - use 5 bits. They use 4 bits in their example for no good reason.

  2. True, they still wish you to mark it clearly. I guess the reason is (a) to make sure you know that the parity comes in powers of 2, and (b) to ease their grading. note you should not use positions 3,5,11, and 13! these are again only their example, and the note below the example clarifies you need not pay attention to it.

(at least, that's the way that I understand the question. There is a small ambiguity there that allows one to understand it the way you did)

Best Answer from StackOverflow

Question Source :

Problem Detail: 

Define the problem $W$:

Input: A multi-set of numbers $S$, and a number $t$.

Question: What is the smallest subset $s \subseteq S$ so that $\sum_{k \in s} k = t$, if there is one? (If not, return none.)

I am trying to find some polytime equivalent decision problem $D$ and provide a polytime algorithm for the non-decision problem $W$ assuming the existence of a polytime algorithm for $D$.

Here is my attempt at a related decision problem:


Input: A multi-set of numbers $S$, two numbers $t$ and $k$.

Question: Is there a subset $s \subseteq S$ so that $\sum_{k \in s} k = t$ and $|s| \leq k$?

Proof of polytime equivalence:

Assume $W \in \mathsf{P}$.

solveMIN-W(S, t, k): 1. S = sort(S) 2. Q = {} 3. for i=1 to k: 4.     Q.add(S_i) 5.     res = solveW(Q, t) 6.     if res != none and res = t: return Yes 7. return No 

I'm not sure about this algorithm though. Can anyone help please?

Asked By : omega

Answered By : Shaull

The problems you mention here are variations of a problem called SUBSET-SUM, FYI if you want to read some literature on it.

What is $S_i$ in your algorithm? If it's the $i$-th element, then you assume that the minimal set will use the smaller elements, which is not necessarily true - there could be a large element that is equal to $t$, in which case there is a subset of size $1$. So the algorithm doesn't seem correct.

However, as with many optimization problems that correspond to NP-complete problems, you can solve the optimization problem given an oracle to the decision problem.

First, observe that by iterating over $k$ and calling the decision oracle, you can find what is the minimal size $k_0$ of a subset whose sum equals $k$ (but not the actual subset).

After finding the size, you can remove the first element in $S$, and check whether there is still a subset of size $k_0$ - if not, then $s_1$ is surely in a minimal subset, so compute $t-s_1$, and repeat the process with the rest of $S$.

If $s_1$ does not effect the size $k_0$, then repeat the process with $S\setminus \{s_1\}$.

It is not hard to verify that this process takes polynomial time.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

I want to understand the expected running time and the worse-case expected running time.

I got confused when I saw this figure (source),


where $I$ is the input and $S$ is the sequence of random numbers.

What I don't understand from the above equation is why the expected running time is given for one particular input $I$?

I always thought that for a problem $\pi$, $E(\pi) = \sum_{input \in Inputs}(Pr(input)*T(input))$ , isn't this correct?

So, let's assume Pr(x) is the uniform distribution, and we are to find the expected running time of the problem of searching an element in a $n$ element array using linear search.

Isn't the expected running time for linear search,

$$E(LinearSearch) = \frac{1}{n}\sum_1^ni $$

And what about the worst case expected running time, isn't it the time complexity of having the worst behavior? Like the figure below,

enter image description here

I would highly appreciate if someone can help me understand the two figures above.

Thank you in advance.

Asked By : Curious

Answered By : Yuval Filmus

There are two notions of expected running time here. Given a randomized algorithm, its running time depends on the random coin tosses. The expected running time is the expectation of the running time with respect to the coin tosses. This quantity depends on the input. For example, quicksort with a random pivot has expected running time $\Theta(n\log n)$. This quantity depends on the length of the input $n$.

Given either a randomized or a deterministic algorithm, one can also talk about its expected running time on a random input from some fixed distribution. For example, deterministic quicksort (with a fixed pivot) has expected running time $\Theta(n\log n)$ on a randomly distributed input of length $n$. This is known as average-case analysis. Your example of linear search fits this category.

A related concept is smoothed analysis, in which we are interested in the expected running time of an algorithm in the neighborhood of a particular input. For example, while the simplex algorithm has exponential running time on some inputs, in the "vicinity" of each input it runs in polynomial time in expectation.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

Given a post-order traversal of Binary Search tree with $k$ nodes, find an algorithm that constructs the BST.

My Algortihm

Let $n$ represent the next element to be inserted.

Let $P(y)$ represent the parent of node $y$.

  1. We will read the traversal in reverse. The last element of the traversal is the root. Let $l = root$. $l$ will represent the element last inserted in the BST (except for the 3rd case below- where it will change to the parent).

Loop the following till there's no element left to be inserted

  1. if $l<n$ then $n$ is the right child of $l$. For the next insertion, $l$ changes to it's right child and $n$ becomes the next element(in reverse order of traversal ).

  2. else, if $l>n$ and $P(l)<n$ then $n$ is the left child of $l$. For the next insertion, $l$ changes to it's left child and $n$ becomes the next element(in reverse order of traversal).

  3. else, if $l>n$ and $P(l)>n$ then $l$ becomes $P(l)$.($n$ hasn't been inserted - we loop with $l$ changed)

[Let $P(root)=- \infty$, so that the $2^{nd}$ case applies]

Complexity Analysis : Every element may contribute at max 3 comparisons, 1 each for - left child, right child and for finally leaving i.e. subtree has been constructed. Even if I missed a comparison or two, It should be constant no. of comparisons per element and no. of operations for node construction will also be constant per element. Hence, giving $O(k)$ time complexity.

Actual Question

If the algorithm is correct, I need the correctness proof for it. Yes, I thought I had the proof but then brain got fried and I am stuck and unable to reason succinctly.
If the algorithm is incorrect, then why? And what is time complexity of the most efficient algorithm for the same question?

Also, is the $O(k)$ complexity correctly calculated - irrespective of the correctness of the algorithm?

Asked By : atulgangwar

Answered By : Terence Hang

You are in the right track. But the algorithm is incomplete. You missed the case inserting element on the left sub tree and back.

Here is the modified algorithm: (Changes marked in bold)

Let $n$ represent the next element to be inserted.

Let $P(y)$ represent the parent of node $y$.

Let $g=G(y)$ represent the first node g on the path $y\to root$ such that $P(g)<g$.

  1. We will read the traversal in reverse. The last element of the traversal is the root. Let $l = root$. $l$ will represent the element last inserted in the BST (except for the 3rd case below- where it will change to the parent). Let $g=root$ tracking $G(l)$, initialize empty stack $stkG$ storing previous $g$'s on current path.

  2. Loop the following till there's no element left to be inserted

    1. if $l<n$ then $n$ is the right child of $l$. For the next insertion, $l$ changes to it's right child and $n$ becomes the next element(in reverse order of traversal ). Push $g$ on $stkG$, and let $g=l$.

    2. else, if $l>n$ and $\textbf{P(g)<n}$ then $n$ is the left child of $l$. For the next insertion, $l$ changes to it's left child and $n$ becomes the next element(in reverse order of traversal).

    3. else, if $l>n$ and $\textbf{P(g)>n}$ then $l$ becomes $\textbf{P(g)}$ and pop $g$ from $stkG$.($n$ hasn't been inserted - we loop with $l$ changed)

[Let $P(root)=- \infty$, so that the $2^{nd}$ case applies]

For correctness, you can prove the following loop invariant:

  1. Root is the first inserted element. for each insertion except root, the parent element has been inserted before.
  2. Each insertion $n$ correctly maintains the order between $n$ and $l$. ($n<l$ if $n$ is left child of $l$, $n>l$ otherwise)
  3. After backtracking step 3, the next insertion always occurs on the left branch.

1 and 3 ensures the post-order traversal, while 2 ensures the BST.

For complexity:

  1. insertion cost: each element is inserted exactly once.
  2. traversal and comparison: the algorithm actually performs a post-order traversal in reverse order, with $O(1)$ comparison on each step.
  3. $g$,$stkG$ maintain cost: each node, which is right child of parent, is pushed and popped from $stkG$ at most once.

Thus the time complexity is $O(k)$.

Alternative algorithm:[1]

Using a recursive procedure:

procedure BST_from_postorder(post, len, target) /* input: post[0..len-1] -- (partial) postorder traversal of BST        len -- length of post to be processed.        target -- a cutoff value to stop processing Output: tree_ptr -- pointer to root of tree/sub tree constructed from post[len_rest..len-1]         len_rest -- remaining length that has not been processed. */  1. if len <= 0 or post[len-1] <= target, then return null.  2. root <- new Node created from from post[len-1].  3. (root->right, new_len) <- BST_from_postorder(post, len-1, post[len-1])  4. (root->left, rest_len) <- BST_from_postorder(post, newlen, target)  5. return (pointer to root, rest_len)  /* BST_from_postorder(post, length of post, -infinity)  will return the BST construct from given postorder traversal. */ 
Best Answer from StackOverflow

Question Source :

Problem Detail: 

Let $a_{1} , ... , a_{m} $ be real numbers $\geq 1$, where $m$ is at least 1. I am supposed to store them in an augmented AVL structure with the following operations:

-PartialSum (i): Return the $i_{th}$ partial sum.

-Change (i, y): Change the $a_{i}$ element for y.

I need this structure to retain the AVL properties and both of these functions to be in $O(logn)$. The problem is that my approach doesn't let me have that asymptotic bound on CHANGE.

My approach goes like this: Let i be the key, and also store the relative $i_{th}$ partial sum in each node, the value for a as well.

Thus each of my nodes has:

  1. Key = The $i_{th}$ element of the series input.
  2. Sum = The $i_{th}$ partial sum so far.
  3. Val = The value of $a_{i}$ for this node.

By doing that, the operation PartialSum will be identical to Search, thus being in $O(logn)$.

Unfortunately my approach doesn't work so well with change. By storing the partial sums in every node, if a lower node changes, all of the tree's sums are to be recomputed, being in $O(n)$.

I am trying to take a different approach but I'm confused. Can someone HINT me here?

Asked By : JOX

Answered By : JOX

Instead of storing the whole $i_{th}$ sum in each node, just store the partial sum of the left sub tree.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

I'm currently working on a reduction from $A_{TM}$ to another language, and have been reading through some example proofs. I've come across the situation where, for example, we have $L = \{ \langle M,w \rangle | \text{ ...etc} \}$, where obviously this would normally stand for $M$ being a TM and $w$ being a string. However, later in the proofs, we replace the $w$ (a string) with an "encoding of a turing machine". Sometimes it's even "an encoding of the TM, $M$".

I'm rather lost on this idea. How do we pass an "encoding of a TM" into a parameter for a string? How do we run that on a TM? Maybe I'm misunderstanding the definition of an "encoding of a TM", which I assume to be the TM itself somehow converted into a string format.

Would anyone mind explaining this to me? I'm sure truly understanding this concept would immensely help me in writing further reductions.

Asked By : user3472798

Answered By : Ran G.

A Turing machine $M$ can be described as a 7-tuple $(Q,F,q_0,\Sigma,\Gamma,\delta, blank)$. This means that if someone gives you this 7-tuple, then the TM is well-defined, and you can precisely define how it behaves, etc.

The encoding of a TM, usually denoted as $\langle M \rangle$ is a string that encompasses all the information of the 7-tuple describing $M$. You can think of it as "writing the 7-tuple as a binary string" (but this is a simplification). So the encoding of M, is just a string that describes how the TM works.

The last observation is that if you know the encoding - you know everything about the TM; specifically, if $\langle M \rangle$ is given as an input (to a machine $M'$), the TM $M'$ can "run" or "simulate" what $M$ would have done on any given input -- the machine $M'$ knows the states $Q$ of $M$ and the transition $\delta$ of $M$, so it can imitate its actions, step by step.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

Let $f$ be a fixed time-constructable function.

The classical universal simulation result for TMs (Hennie and Stearns, 1966) states that there is a two-tape TM $U$ such that given

  • the description of a TM $\langle M \rangle$, and
  • an input string $x$,

runs for $g(|x|)$ steps and returns $M$'s answer on $x$. And $g$ can be taken to be any function in $\omega(f(n)\lg f(n))$.

My questions are:

  1. What is the best known simulation result on a single tape TM? Does the result above also still hold?

  2. Is there any improvement on [HS66]? Can we simulate TMs on a two-tape TM for $f(n)$ steps in a faster way? Can we take $g(n)$ to be in $\omega(f(n))$ in place of $\omega(f(n)\lg f(n))$?

Asked By : Kaveh

Answered By : Kaveh

What is the best known simulation result on a single tape TM? Does the result above also still hold?

We can simulate a multiple-tape TM on a single-tape TM with quadratic increase in time. The simulation time is $O(n^2)$. The quadratic increase is required since there are languages (e.g. palindromes) that require time $\Omega(n^2)$ on a single-tape DTM but can be solved in time $O(n)$ on a two-tape DTM.

In short, the result above does not work when the simulator is a single-tape TM.

For simulation of single-tape TMs on a single-tape TM (with arbitrary finite alphabet), the result holds, i.e. the simulation can be done with $\lg$ factor increase in time. See (2) and (3). The reference seems to be (6).

Is there any improvement on [HS66]? Can we simulate TMs on a two-tape TM for $f(n)$ steps in a faster way? Can we take $g(n)$ to be in $\omega(f(n))$ in place of $\omega(f(n)\lg f(n))$?

It seems that there hasn't been any improvement since that would imply a better time hierarchy theorem than currently known.

Correction: The usual hierarchy theorems are based on time complexity classes defined using single tape TMs. For $n$-tape TMs a tight result similar to the space hierarchy theorem is proven by Furer in 1982 (5). The $\lg$ factor is not needed. Also see (4).


  1. Peter van Emde Boas, "Machine Models and Simulation", in Handbook of Theoretical Computer Science, 1990
    (in particular, pp. 18-21)

  2. Michael Sipser, "Introduction to the Theory of Computation", 2006
    (time complexity classes are defined using TMs with single-tape infinite in both directions and arbitrary finite alphabet, see pages 140 and 341)

  3. Odifreddi , "Classical Recursion Theory", vol. I & II, 1989 & 1999
    (definitions are similar to Sipser, see Def. I.4.1 in vol. I page 48, Def. VII.4.1 in vol. II page 67, and Thm. VII.4.15 in vol. II page 83)

  4. Piergiorgio Odifreddi, "Classical Recursion Theory", vol. II, 1999
    (page 84)

  5. Martin Fürer, "The Tight Deterministic Time Hierarchy", 1982

  6. Juris Hartmanis, "Computational Complexity of One-Tape Turing Machine Computations", 1968

  7. F.C. Hennie and R.E. Stearns, "Two-Tape Simulation of Multitape Turing Machines", 1966

  8. Other related questions:

    1. Lower bounds and class separation,
    2. Justification of $\lg f$ in DTIME hierarchy theorem,
    3. Alphabet of single-tape Turing machine,
    4. For the time hierarchy theorem, how is the input translated efficiently?,
    5. a comment by Luca Trevisan.
Best Answer from StackOverflow

Question Source :

Problem Detail: 

I have a distance to get to, and square tiles that have a cost and length. EX: a 1 unit block that costs 1 unit to purchase.

So if I was trying to get 10 units away. I would require 10 of the 1 unit blocks at that point for a total cost of 10.

So another example would be a distance of 10 units away except I have two tiles to pick from of [1,1], [5,4], [x,y], x = length, y = cost.

The cheapest cost would be 8, (2 of 5 blocks = 10 distance).

So, at first I thought this would be a shortest path problem, but seeing how I can re-use tiles that didn't work out the best.

I then tried to implement a rod-cutting algorithm, using the length of the rod as distance and the price for the length of cuts. This algorithm though kept trying to maximize the distance, which wasn't what I wanted.

My final method, was to divide the cost by length for each tile to find the most cost efficient tile (shown below). This worked for about 9% of my unit-tests, it was failing most because it couldn't see that it might be more cost effective to use a more expensive tile, etc.

I started with a greedy algorithm using cost / length for most efficient cost tiles, when I couldn't get that to work. I moved towards a dynamic programming algorithm using the cutting rod example.

This required a lot of hacky work, since I didn't really want the maximized value via cutting. I wanted the minimal cost value of that distance, though it also only had about a 2% success rate.

I then went into recursion, with help from another tip. Using a tiny little Tile class to help me pass variables around, this had lots of success with my self test values, but once again only had a 10% success rate.

So after trying DFS, Rod-Cutting, Recursion, and Dynamic Programming. I am lost. Did I overlook one of these algorithms?

Asked By : Connor Tumbleson

Answered By : D.W.

You're heading a good direction with dynamic programming, but you haven't set up the set of subproblems quite right just yet. Don't give up on it yet. Instead, you should keep exploring some more candidate sets of subproblems.

The art of dynamic programming lies in picking the right set of subproblems, and then identifying a suitable recurrence relation in terms of those subproblems. Sometimes you'll have to try several possible ways of decomposing into subproblems before you find something that works, so don't give up if your first try is not a success!

Here are some hints to help you:

  • When one of your inputs is an array, you can try using "the set of prefixes of the array" as your set of subproblems. In other words, if the input is an array A[0..n-1], your subproblems can be all of the prefixes A[0..j-1], where j=0,1,2,..,n-1 (this gives n subproblems).

  • When one of your inputs is a non-negative integer, you can try using "the set of non-negative integers no larger than the input" as your set of subproblems. In other words, if the input is a non-negative integer m, your subproblems can be all of the integers i, where i=0,1,2,..,m.

  • When you have two inputs, there are some possibilities to choose a set of subproblems. One way is to hold one of the inputs fixed and consider all subproblems for the other input. Another way is to choose all pairs of subproblems (if $S$ is the set of subproblems for the first input, and $T$ is the set of subproblems for the second input, use $S \times T$ as your set of subproblems).

Also, you set up your dynamic programming implementation wrong. Your goal should not be to maximize the number of price, or to maximize anything. You are trying to minimize the total cost (so Math.max is out of place here). Also you will need to pass in something that describes both the cost and the length of each tile; your code doesn't show anything about that.

Finally, don't write code now. Just figure out the ideas. Write down a recurrence relation, and try to check whether it is correct. Then, write pseudocode and show it is correct. Only then should you write code. Diving straight into code is a mistake and is getting you mixed up in implementation details before you've even worked out the conceptual material. Also, on this site, you should not be showing us code: you should be showing us algorithms and pseudocode. This site is about algorithms, not code (StackOverflow is the place for code).

These hints should be enough to enable you to find a solution via dynamic programming (i.e., an efficient algorithm that always finds the least-cost tiling, with reasonable running time).

Best Answer from StackOverflow

Question Source :

Problem Detail: 

In long-multiplication, you shift and add, once for each $1$ bit in the lower number.

Let $r = p \otimes q$ be an operation similar to multiplication, but slightly simpler: when expressed via long-multiplication, the addition does not carry. Essentially you bitwise-xor the shifted numbers.

Like so:

$$ \left[\begin{matrix} &&p_n & ... & p_i & ... & p_2 & p_1 \\ &&q_n & ... & q_i & ... & q_2 & q_1 & \otimes\\ \hline\\ &&q_1 \cdot p_n & ... & q_1 \cdot p_i & ... & q_1 \cdot p_2 & q_1 \cdot p_1\\ &q_2 \cdot p_n & ... & q_2 \cdot p_i & ... & q_2 \cdot p_2 & q_2 \cdot p_1\\ &&&&&&&...\\ q_i \cdot p_n & ... & q_i \cdot p_i & ... & q_i \cdot p_2 & q_i \cdot p_1 & \stackrel{i}{\leftarrow} &&{\Huge{\oplus}} \\ \hline \\ \\r_{2n}& ... & r_i & ... &r_4& r_3 & r_2 &r_1 & = \end{matrix} \right] $$

Using the long-multiplication-style formulation, this takes $\mathcal O\left(\max\left(\left|p\right|,\left|q\right|\right)^2\right)=\mathcal O\left(\left|r\right|^2\right)$ time. Can we do better? Perhaps we can reuse some existing multiplication algorithms, or even better.

Followup: Shift-and-or multiplication operation

Asked By : Realz Slaw

Answered By : D.W.

Your operation is multiplication of polynomials over $GF(2)$, i.e., multiplication in the polynomial ring $GF(2)[x]$.

For instance, if $p=101$ and $q=1101$, you can represent them as $p(x)=x^2+1$, $q(x)=x^3+x^2+1$, and their product as polynomials is $p(x) \times q(x) = x^5+x^4+x^3+1$, so $p \otimes q = 111001$.

If $p,q$ are $r$ bits long, this polynomial multiplication operation can be computed in $O(r \lg r)$ time using FFT techniques, but in practice this may not be a win unless your polynomial is extremely large. There is also a Karatsuba-style algorithm, whose complexity is something like $O(r^{1.6})$, as well as other options. The situation is somewhat analogous to integer multiplication, in that many of the same fast algorithms can be applied, but not identical.

See, e.g.,

Best Answer from StackOverflow

Question Source :

Problem Detail: 

In practice I understand that any recursion can be written as a loop (and vice versa(?)) and if we measure with actual computers we find that loops are faster than recursion for the same problem. But is there any theory what makes this difference or is it mainly emprical?

Asked By : Dac Saunders

Answered By : Johan

The reason that loops are faster than recursion is easy.
A loop looks like this in assembly.

mov loopcounter,i dowork:/do work dec loopcounter jmp_if_not_zero dowork 

A single conditional jump and some bookkeeping for the loop counter.

Recursion (when it isn't or cannot be optimized by the compiler) looks like this:

start_subroutine: pop parameter1 pop parameter2 dowork://dowork test something jmp_if_true done push parameter1 push parameter2 call start_subroutine done:ret 

It's a lot more complex and you get at least 3 jumps (1 test to see if were done, one call and one return).
Also in recursion the parameters need to be set up and fetched.
None of this stuff is needed in a loop because all the parameters are set up already.

Theoretically the parameters could stay in place with recursion as well, but no compilers that I know of actually go that far in their optimization.

Differences between a call and a jmp
A call-return pair is not very much more expensive then the jmp. The pair takes 2 cycles and the jmp takes 1; hardly noticeable.
In calling conventions that support register parameters the overhead for parameters in minimal, but even stack parameters are cheap as long as the CPU's buffers do not overflow.
It is the overhead of the call setup dictated by the calling convention and parameter handling in use that slows down recursion.
This is very much implementation dependent.

Example of poor recursion handling For example, if a parameter is passed that is reference counted (e.g. a non const managed type parameter) it will add a 100 cycles doing a locked adjustment of the reference count, totally killing performance vs a loop.
In languages that are tuned to recursion this bad behavior does not occur.

CPU optimization
The other reason recursion is slower is that it works against the optimization mechanisms in CPU's.
Returns can only be predicted correctly if there are not too many of them in a row. The CPU has a return stack buffer with a (few) handfuls of entries. Once those run out every additional return will be mispredicted causing huge delays.
On any CPU that uses a stack return buffer call based recursion that exceeds the buffer size is best avoided.

About trivial code examples using recursion
If you use a trivial example of recursion like Fibonacci number generation, then these effects do not occur, because any compiler that is 'knows' about recursion will transform it into a loop, just like any programmer worth his salt would.
If you run these trivial examples in a enviroment that does not optimize properly than the call stack will (needlessly) grow out of bounds.

About tail recursion
Note that sometimes the compiler optimizes away tail recursion by changing it into a loop. It is best to only rely on this behavior in languages that have a known good track record in this regard.
Many languages insert hidden clean up code before the final return preventing the optimization of tail recursion.

Confusion between true and pseudo recursion
If your programming environment turns your recursive source code into a loop, then it is arguably not true recursion that is being executed.
True recursion requires a store of breadcrumbs, so that the recursive routine can trace back its steps after it exits.
It is the handling of this trail that makes recursion slower than using a loop. This effect is magnified by current CPU implementations as outlined above.

Effect of the programming environment
If your language is tuned towards recursion optimization then by all means go ahead and use recursion at every opportunity. In most cases the language will turn your recursion into some sort of loop.
In those cases where it cannot, the programmer would be hard pressed as well. If your programming language is not tuned towards recursion, then it should be avoided unless the domain is suited towards recursion.
Unfortunately many languages do not handle recursion well.

Misuse of recursion
There is no need to calculate the Fibonacci sequence using recursion, in fact it is a pathological example.
Recursion is best used in languages that explicitly support it or in domains where recursion shines, like the handling of data stored in a tree.

I understand any recursion can be written as a loop

Yes, if you are willing to put the cart before the horse.
All instances of recursion can be written as a loop, some of those instances require you to use an explicit stack like storage.
If you need to roll your own stack just to turn recursive code into a loop you might as well use plain recursion.
Unless of course you've got special needs like using enumerators in a tree structure and you don't have proper language support.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

If I have this pseudocode:

for i=0 to n/2 do    for j=0 to n/2 do         ... do anything .... 

The number of iterations is $n^2/4$.

What is the complexity of this program? Is it $O(n^2)$?

What is the intuition formal/informal for which is that complexity?

Next, if i have this other pseudocode :

for i=0 to n do    for j=0 to n do         ... do anything .... 

The complexity is again $O(n^2)$ -- is that correct?

Is there any difference between efficiency in practice and theoretical complexity in this case? Is one of these faster than the other?

Asked By : kafka

Answered By : Auberon

Big O Formally

$O(f(n)) = \{g(n) | \exists c > 0, \exists n_0 > 0, \forall n > n_0 : 0 \leq g(n) \leq c*f(n)\}$

Big O Informally

$O(f(n))$ is the set of functions that grow slower (or equally fast) than $f(n)$. This means that whatever function you pick from $O(f(n))$, let's name her $g(n)$, and you pick a sufficiently large $n$, $f(n)$ will be larger than $g(n)$. Or, in even other words, $f(n)$ will eventually surpass any function in $O(f(n))$ when $n$ grows bigger. $n$ is the input size of your algorithm.

As an example. Let's pick $f(n) = n^2$.

We can say that $f(n) \in O(n^3)$. Note that, over time, we allowed notation abuse so almost everyone writes $f(n) = O(n^3)$ now.

Normally when we evaluate algorithms, we look at their order. This way, we can predict how the running time of the algorithm will increase when the input size ($n$) increases.

In your example, both algorithms are of order $O(n^2)$. So, their running time will increase the same way (quadratically) as $n$ increases. Your second algorithm will be four times slower than the first one, but that is something we're usually not interested in **theoretically*. Algorithms with the same order can have different factors ($c$ in the formal definition) or different lower order terms in the function that evaluates the number of steps. But usually that's because of implementation details and we're not interested in that.

If you have a algorithm that runs in $O(log(n))$ time we can safely say it will be faster than $O(n)$ [1] because $log(n) = O(n)$. No matter how bad the $O(log(n))$ algorithm is implemented, no matter how much overhead you put in the algorithm, it will always [1] be faster than the $O(n)$ algorithm. Even if the number of steps in the algorithms are $9999*log(n) = O(log(n))$ and $0.0001*n = O(n)$, the $O(log(n))$ algorithm will eventually be faster [1].

But, maybe, in your application, $n$ is always low and will never be sufficiently large enough so that the $O(log(n))$ algorithm will be faster in practice. So using the $O(n)$ algorithm will result in faster running times, despite $n = O(log(n))$

[1] if $n$ is sufficiently large.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

In a video discussing the merits of particle filters for localization, it was implied that there is some ambiguity about the complexity cost of particle filter implementations. Is this correct? Could someone explain this?

Asked By : DorkRawk

Answered By : John Percival Hackworth

It appears that the speaker feels that there isn't a definitive complexity analysis yet for the technique. This could be due to a couple of factors.

  1. The analysis is hard, and no one has figured it out yet.
  2. The technique has multiple different appropriate implementations, based on the problem context.

I'd bet on the second option, particularly in light of the speaker's comment that the technique may not be appropriate for higher dimension parameter spaces. Keep in mind that the video did not present an algorithm, it presented a very high-level discussion of a technique.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

Given that Let S = {a | |a| is odd}. I know that since S is decidable, but does there exist a subset within S that is undecidable?

Asked By : user3277633

Answered By : Yuval Filmus

Hint: For every language $L$, the language $M = \{ 1^{2x+1} : x \in L \}$ is a subset of $S$.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

Inspired by this question in which the asker wants to know if the running time changes when the comparator used in a standard search algorithm is replaced by a fair coin-flip, and also Microsoft's prominent failure to write a uniform permutation generator, my question is thus:

Is there a comparison based sorting algorithm which will, depending on our implementation of the comparator:

  1. return the elements in sorted order when using a true comparator (that is, the comparison does what we expect in a standard sorting algorithm)
  2. return a uniformly random permutation of the elements when the comparator is replaced by a fair coin flip (that is, return x < y = true with probability 1/2, regardless of the value of x and y)

The code for the sorting algorithm must be the same. It is only the code inside the comparison "black box" which is allowed to change.

Asked By : Joe

Answered By : frafl

The following deterministic (without the comparator) algorithm works for an input tuple $(a_1,\dots,a_n)$:

  1. Do the Fisher-Yates shuffle using your comparator with some static pair (say $a_1 < a_2$) as a coin flip (doing acceptance-rejection sampling). If the comparator outputs $1$ the first time, use it inverted to avoid an endless rejection loop in the deterministic case.
  2. (optional speedup: Try a single pair $n$ times, where $n$ is the length or your input. If any two of the outputs differ return the permutation obtained in (1))
  3. Sort your array using merge sort.

Given a deterministic order relation as comparator this algorithm sorts an array in time $\mathcal{O}(n \log n)$ since the Fisher-Yates shuffle runs in $\mathcal{O}(n)$ using maximal $\mathcal{O}(\log n)$ nonrandom "random bits" (e.g. calls to your comparator) in each step and merge sort has the same asymptotic complexity. The result of (1) is totally useless in this case, but since it's followed by a real sort, this does no harm.

Given a real coin flip as comparator (1) permutes the array with equal probability for each permutation and if you really have to do (3) (you left out (2) or (2) failed to determine the randomness), this is no harm because the distribution of its result only depends on the order of its input which is uniformly distributed among all permutations because of (1), so the result of the entire algorithm is also uniformly distributed. The number of times each acceptance-rejection sampling has to be repeated is geometrically distributed (reject with probability $< \frac{1}{2}$) and therefore it has an expected value $< 2$. Each repetition uses at most $\log n$ bits, so the runtime analysis is nearly the same as in the deterministic case, but we only get an expected runtime of $\mathcal{O}(n \log n)$, with the possibility of nontermination (terminates only almost surely).

As Joe pointed out: If you don't like the test for the first bit in (1), do (3) then (1) and use $a_n < a_1$ which is always $0$, since the array is already sorted in the deterministic case. Additionally you have to subtract your random number from the upper bound on the range in the loop, because the upper bound for the random number yields the identical permutation. But be aware that (2) is forbidden then, because you always have to do the shuffle in the ransom case.

You can even use the same calls to your comparator for (1) and (3), but then proving that the result is uniformly distributed is at least a lot harder, if possible at all.

The following algorithm is has no distinct phases to shuffle and sort, but is asymptotically slower. It's essentially insertion sort with binary search. I will use $a=(a_1,\dots,a_n)$ to denote the input and $b_k=(b_{k,1},\dots,b_{k,k})$ to denote the result after the $k$-th round:

  1. Set $b_{1,1} = a_1$
  2. If $a_2 < a_1$ then $b_2 = (a_2,a_1)$ and $(c,d):= (2,1)$ else $b_2 = (a_1,a_2)$ and $(c,d):= (1,2)$. In either case $a_d < a_c$ will always be $0$ (i.e. false) for a nonrandom comparator.
  3. To obtain $b_{k}$ for $k \geq 3$ obtain $b_{k-1}$ first.
  4. Let $l=\lceil log_2 k \rceil$ and $k' = 2^l$, i.e. $k'$ is the least power of $2$ not smaller than $k$.
  5. Let $i_0 = 0$. For every $j \in \{1,\dots,l\}$ let $$i_j = \begin{cases} i_{j-1} + 2^{l-j} & i_{j-1} + 2^{l-j} > k-1 \wedge a_d < a_c\\ i_{j-1} & i_{j-1} + 2^{l-j} > k-1 \wedge \neg (a_d < a_c)\\ i_{j-1} + 2^{l-j} & i_{j-1} + 2^{l-j} \leq k-1 \wedge b_{k-1,i_{j-1} + 2^{l-j}} < a_k \\ i_{j-1} & i_{j-1} + 2^{l-j} \leq k-1 \wedge \neg(b_{k-1,i_{j-1} + 2^{l-j}} < a_k) \\ \end{cases}$$
  6. If $i_l > k$ repeat (5.) else $b_k=(b_{k-1,1},\dots,b_{k-1,i_l -1},a_k,b_{k-1,i_l},\dots,b_{k-1,k-1})$
  7. Output $b_n$

Random case: 5 + the if clause of 6 is essentially acceptance-rejection sampling. The rest of the algorithm is a naive shuffle: shuffle the first $k-1$ elements and add the $k$-th element to each position with equal probability. If we used the normal insertion sort, we would get a binomial distribution instead.

Note that this algorithm is inefficient in both modes compared to the Fisher-Yates shuffle and merge sort as inserting an element to an arbitrary position is expensive if using an array and binary search needs linear time if using a list. But perhaps a modification of heap sort or tree sort in a similar way could lead to a faster algorithm.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

I am confused why Binomial heaps do not utilize marking.

Concerning Fibonacci heap children:

Each tree in a Fibonacci heap is allowed to lose at most two children before       that tree needs to be "reprocessed" at a later step. The marking step in the   Fibonacci heap allows the data structure to count how many children have  been lost so far. An unmarked node has lost no children, and a marked node  has lost one child. Once a marked node loses another child, it has lost two  children and thus needs to be moved back to the root list for reprocessing.   

How do Binomial heaps not need marking? It seems like in Fibonacci, this is in place so that when a marked node loses another child, it knows to move it back to the root and reprocess. I do not understand how Binomial heaps can possibly handle this without marking.

Asked By : Carlo

Answered By : Hendrik Jan

In binomial heaps the structure of the trees in the heap is very strict, they all have a number of elements that is a power of two. When a node is removed it is the root of a tree, and all the children (which are again roots trees with a power of two elements) are merged with the other original trees.

There are no nodes that have an "imperfect" number of children, and there is no marking needed to indicate that.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

We are given an undirected graph weighted with positive arc lengths and a distinguished edge $(a,b)$ in the graph. The problem is to replace this edge by two edges $(a,c)$ and $(c,b)$ where $c$ is a new node, such that the length of the path $(a,c,b)$ is equal to the inital length of $(a,b)$, and such that the choice of the length of $(a,c)$ minimizes the maximum distance of node $c$ to any other node of the graph. Is there any graph-theory based algorithm that can solve such problem rather than brute force?

Actually, the existence of the original edge $(a,b)$ is not essential. Only the end nodes $a$ and $b$, and the length matter.

For example we have a graph:

1 2 10 2 3 10 3 4 1 4 1 5 

In every line the first two values indicate the node number, and the 3rd one is the corresponding edge value. Now I want to find a new node $c$ between nodes 1 and 2. The answer gives a node $c$ at distance 2 to node 1 (and distance 8 to node 2). The distance of node $c$ is 2 to node 1, 8 to node 2, 1+5+2=8 to node 3, and 5+2=7 to node 4. The maximum distance is 8, which is minimal for all possible choices of the length of $(a,c)$.

Asked By : user3153970

Answered By : babou


This answer is in two parts.

The first is an analysis of the problem mixed with a sketch of the algorithm to solve it. As it is my first version, it is detailed, but results in an algorithm that is a bit more complex than needed.

It is followed by a pseudo-code version of the algorithm, written sometime later, as the algorithm was clearer. This pseudo-code is more direct, and somewhat simpler, but probably easier to understand after reading the first part.

In retrospect, the more important ideas are:

  • to see that the maximum distance of a node to the new node C, as a function of the length of $(A,C)$, is given by the superposition of roof shaped (inverted V) curves, characterized by their top, associated to nodes of the graph.

  • to see that irrelevant curves can be eliminated efficiently by a scan in monotonic order of abscissa of tops (with minimum backtrack).

  • then minimal points can be enumerated simply by intersecting the remaining curves two by two, again in monotonic order of abscissa of tops.

Sketch of an algorithm

NOTE: I changed the initial presentation of the question to get a more standard view of graphs, by considering that the edge $(A,B)$ is replaced by a new node $C$ and two edges $(A,C)$ and $(C,B)$ with the same total length as $(A,B)$.

Let $(A,B)$ be the distinguished edge, and let $l$ be its length. Let $e$ and $n$ be respectively the number of edges and the number of nodes in the graph.

First you remove the edge $(A,B)$. Then, for every node $U$ of the graph, you compute the distance $\alpha(U)$ between $A$ and $U$, and the distance $\beta(U)$ between $B$ and $U$. This can be done with Dijkstra's shortest path algorithm, in time $O(e+n\log n)$.

Then you add a new node $C$ with an edge $(A,C)$ and an edge $(C,B)$, of length respectively $a$ and $b$, such that $a+b=l$. The length $a$ is not known, and is to be found so as to minimize the distance of the most remote node to node $C$. The distance of $C$ to $A$ or $B$ may be 0.

The difficulty of the problem is that the node most distant from $C$ via $A$ may be close to $C$ via $B$, and thus does not matter. Also, a node closer to $C$ via $A$ than via $B$ may inverse that situation as $C$ moves between $A$ and $B$, i.e. as $a$ varies in $[0,l]$.

We consider three sets of nodes in the graph, forming a partition of all nodes:

  • the set $N_A$ of nodes that are always closer to $C$ via $A$ independently of the value of $a$: $N_A=\{U|\alpha(U)+l\leq\beta(U)\}$

  • the set $N_B$ of nodes that are always closer to $C$ via $B$ independently of the value of $a$: $N_B=\{U|\beta(U)+l\leq\alpha(U)\}$

  • and $N_{AB}$ the set of nodes that can be closer to $C$ via $A$ or via $B$, depending on the value of $a$, actually closer via $A$ for smaller values of $a$, and closer via $B$ for larger values of $a$ in $[0,l]$: $N_{AB}=\{U\mid\alpha(U)+l>\beta(U) \wedge \beta(U)+l>\alpha(U)\}$

Let $U_A$ the node in $N_A$ most remote from $A$, and $U_B$ the node in $N_B$ most remote from $B$.

No other node in $N_A\cup N_B$ is ever more distant from $C$ than the more distant of these two nodes, whatever the distance $a$ of $(A,C)$. Hence all other nodes in $N_A\cup N_B$ can be discarded for the computation of the optimal value of $a$.

Note that it may happen that $N_A$, or $N_B$, or both, are empty. In which case, the corresponding nodes $U_A$ and/or $U_B$ do not exist.

We will build a distance curve on a plane with coordinate $(x,y)$, which will ultimately represent the longest distance $y$ of a node from $C$ as a function of the distance $x=a$ of $C$ from $A$. Hence $x=a\in[0,l]$

This distance curve is actually a zig-zag composed of segments angled $\pm 45^\circ$. Once it is built, the point minimizing the longest distance is to be found in the minimal points of the curve. There may be several such points. All these points have to be computed, and then compared.

We describe below how to compute them.

Actually, we consider a collection of simpler distance curves, with only one or two segments in the interval $[0,l]$, such that the longest distance of a node from $C$ is always on or above the curve. The final curve is obtained by taking for every value of $x$ the maximum value of $y$ given by one of the distance curves. This can be done formally by computing the intersection of the segments in an ordered fashion.

First we have (at most) two distance curves composed of only one segment. These are the curves corresponding to the distance of $U_A$ and $U_B$ to $C$. The first is upward at a $45^\circ$ angle, and the second is downward at a $45^\circ$ angle.

Assuming no other node is ever more distant from $C$, the optimal value of $a$ is the abcissa of their intersection. At that abscissa, that value for $a$, the nodes $U_A$ and $U_B$ are at the same distance from $C$. Hence the distance is minimum when $a$ satisfies the equation $\alpha(U_A)+a=\beta(U_B)+b$. Since $a+b=l$, this gives $\alpha(U_A)+a=\beta(U_B)+l-a$, and finally $a=(\beta(U_B)-\alpha(U_A)+l)/2$. This corresponds to a distance $y=(\alpha(U_A)+\beta(U_B)+l)/2$ of both nodes to $C$.

Together, the two curves give a V shaped curve, with a minimum at the point just computed.

But we also have to account for all the nodes in $N_{AB}$. The distance of a node $U\in N_{AB}$ to $C$ first increases with $x$ while the shortest path goes through $A$, them it decreases with $x$ when the shortest path goes through $B$. Hence, the corresponding distance curve is formed of two segments, roof shaped, with a maximum at an abscissa such that the distance to $C$ is the same through $A$ or through $B$. This gives the equation $\alpha(U)+a=\beta(U)+b$. Resolving it as above, we get $x=a=(\beta(U)-\alpha(U)+l)/2$, and $y=\alpha(U)+\beta(U)+l)/2$. We note $T_U$ (for Top of $U$) the point with these coordinate. The top $T_U$ characterizes the roof shaped distance curve for node $U$ since the slope of both side is $\pm 45^\circ$.

Now all the difficulty of the problem is that several nodes $U\in N_{AB}$ may be such that $T_U$ is above our first V-shaped curve, which will lead to the final zig-zag line.

But, we first can eliminate all nodes $U\in N_{AB}$ such that $T_U$ is below the V-shaped distance curve, as they will not ever contribute to the longest distance. This is easily done and is skipped here for clarity.

Then we want also to eliminates all nodes $U\in N_{AB}$ such that their roof-shaped curve is always below the curve of another node $V\in N_{AB}$. This is determined by the position of their tops, and we say that $T_V$ dominates $T_U$, or that $V$ dominates $U$. Actually $T_V$ and $T_U$ are in a domination relation iff the absolute value of the abcissa difference is less than the absolute value of the ordinate difference. The dominating node/top is the one with the highest ordinate.

For this purpose, we first sort all tops of the remanining nodes of $N_{AB}$ in ascending order of increasing abscissa. This is done in time $O(n\log n)$.

Then we remark that if top $T_U$ dominate a non adjacent top $T_V$, then any intermediary top $T_W$ dominate $T_V$, or is dominated by $T_U$, or both. Hence, to eliminate all nodes/tops that are dominated, on can simply proceed from left to right (by increasing abscissa), comparing only adjacent nodes, and removing any that is dominated. If the node on the right is dominated, it is removed and the node on the left is compared again to the next right. If the node on the left is dominated, it is removed and the node on the right is then compared with the previous one on its left. If none dominates the other, then the node on left is kept, and the node on right is then compared with the next right. This is repeated until the rightmost node/top has been thus processed, in time $O(n)$.

Whatever nodes have been kept do contribute to the final curve, in left to right order of their tops, with the line for $U_A$ at the left and the line for $U_B$ at the right, if they exist. The minima are easy to compute by considering the intersections of the contributing curves from left to right, in time $O(n)$.

Finally, the abcissa for the smallest distance of all nodes to $C$ may be found by a linear scan of all minima for the lowest ones. They may actually be several such abscissa, i.e. values for the length $a$ of edge $(A,C)$.

If the distances have to be integers, the abscissa of the minima can be rounded up or down.

The time complexity of the algorithm is: $O(e+n\log n)$.

With thanks to user3153970 who remarked that my initial solution was not quite correct.

Pseudo-code version

The procedure described in the analysis of the problem is modified or improved in some ways.

  • the length l is changed to d to avoid confusion with the number 1.

  • for simplification, the V curve is replaced by two tops of roof curves placed on both end of the interval [0,d].

  • the sets $N_A$ and $N_B$ need not be created. Only $d_A$ and $d_B$ are needed, briefly, to crate the two extra tops.

  • the set $N_AB$ is not created either, only the corresponding set (actually a multiset, or bag, that may be implemented as a list) of tops, as two distinct nodes may produce the same top, if at the same distance of A and B.

(bugs are provided free of charge)

d := distance (A,B)   none := -1     % programming trick - none is for distances to be ignored % Remove the edge (A,B) from the graph.   Using Dijkstra's shortest path algorithm, for every node $U$ compute its shortest     distances α(U) and β(U) to respectively A and B.   d_A := none d_B := none   % We use lists of pairs, representing the [x,y] coordinate of points, where the %   % abcissa x stands for the choice of length (A,C), and the ordinate y stands is the %   % distance of some node to C. %   % The call first(L) returns a pointer to the first pair of list L. %   % Given a pointer t to a pair, t.x and t.y give the two components of the pair, %   % and the functions prev and next return a pointer to the previous or the next pair, or %   % NIL if it does not exist. The call add([a, b], L) adds the pair [a, b] to the list L. %   % The call remove(t) removes the element pointed by t from its list. % For every node U do    T := []     % T is the list of tops of curves for individual nodes %       if    α(U)+d ≤ β(U) then d_A := max(d_A, α(U))        elsif β(U)+d ≤ α(U) then d_B := max(d_B, β(U))        else add( [(β(U)-α(U)+d)/2, (β(U)+α(U)+d)/2], T) % Instead of V curve, create dummy tops at ends of [0,d] interval % if d_A ≠none then add( [d, d_A+d], T) if d_B ≠none then add( [0, d_B+d], T) Sort T in increasng order of abscissa % Remove from T all tops that are dominated by another top % t1 := first(T) t2 := next(t1) Repeat     if abs(t1.x-t2.x) ≤abs(t1.y-t2.y)     then  % one top dominates the other %         if t1.y≥t2.y             then  % t1 dominates %             remove(t2)             t2 := next(t1)             if t2=NIL then exit loop         else % t2 dominates %             remove(t1)             t1 := pred(t2)             if t1=NIL then                 t1 := t2;                  t2 := next(t1)                 if t2=NIL then exit loop      else % neither top dominate the other %          t1 := t2          t2 := next(t1)          if t2=NIL then exit loop % Compute all local minima % M := []  % M is the list of minima, where roof shaped curves intersect, %          % or intersect the boundaries of the interval [0.d]. % t2 := first(T); if t2.x≠0 then add( [0, t2.y-t2.x], M) repeat     t1 := t2     t2 := next(t1)     if t2=NIL then exit loop     add( [(t1.y-t2.y+t1.x+t2.x)/2, (t1.y+t2.y+t1.x-t2.x)/2], M) if t2.x≠d then add( [d, t2.y-d+t2.x], M) Select in M the pairs with the smallest ordinates. Their abcissas are all     the possible answers to the problem. 

If the solutions must be in integer values, it is possible to round the coordinates of the points in M to the nearest integer(s), before selecting the pairs with smallest ordinates. However some more analysis is needed to check whether it can involve passing over a top, and whether abcissas and ordinates are always integer simultaneously. This is left to the reader as an exercise.

All bug reports are welcome.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

I'm reading multi-level cache and came across a question through which i got confused. I've read that

  • Between processor and Cache Word/ Byte is transfered
  • Between Cache and Main memory Block(Block of words) is transfered.
  • If there is a miss in lower level cache and hit in higher level cache, first Block of words is transfered from higher level cache to lower level cache and then particular words is transferred to the ptocessor from lower level cache.

In case of multi-level caches cache at lower level generally has lower size as compared to cache at higher level. Hence Block size of lower level cache is generally smaller than block size of higher higher cache.

So, How actually Block of words is transferred between caches.

For instance suppose there two caches $L_1$ and $L_2$ with block size 4 words and 16 words respectively, connected with a databus having capacity to transfer 4 words at a time. Now Suppose there is a miss in $L_1$ cache and a hit in $L_2$ cache. Then Do we transfer 16 words or only 4 words from $L_2 \rightarrow L_1$

enter image description here

Asked By : Atinesh

Answered By : ultrajohn

It depends. What is the level of associativity of L1? If it's a direct mapped cache, then naturally, you will have to transfer 4 words of data from L2 to L1. If it's a 2-way set associative, then 8 words. This makes the interface between L1 and L2 as simple as possible given their configurations.

The idea is to make the hit time of L1 and miss rate of L2 as low as possible.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

I am a beginner in the topic of algorithms. I have a query about Greedy Algorithms.

From what I understand, if there is a function and we are supposed to find its maxima/minima, if we find the local maxima/minima, we become 'greedy' and declare that as the global maxima/minima.

I suppose that is wrong?

Asked By : Victor

Answered By : Sagnik

Greedy algorithms are those in which we construct the optimum solution piece by piece.

After identifying a part of the optimum solution (this is intuitive by nature and problem dependent), try to add or subtract elements from it to see whether it still remains optimum.

I guess you should study Matroid Theory and Linear Programming Duality if you are very interested in the mathematics of greedy algorithms.

Try going through these problems on your own and practicing other problems related to these: Maximal Independent Set in a Tree, Disjoint Set (Interval Scheduling), Interval partitioning, 0-1 Knapsack and Fractional Knapsack.

What you are asking is about the Hill Climbing problem specifically. The Wikipedia page provides enough discussion about it, but if you are a newcomer in the world of algorithms I suggest you start out with the above problems first.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

In a Whitted ray tracer, each ray-object intersection spawns a transmitted ray (if the object was translucent), a reflected ray and a shadow ray. The shadow ray contributes the direct lighting component.

But what happens if the shadow ray intersects a transparent object? Is the direct lighting component ignored? How will diffuse objects submerged in water be lit if they don't get any direct light contributions from the shadow ray?

Asked By : user11171

Answered By : c c

You should first refer to Rendering Equation. It is the general equation to describe the physically transmitting of light in the view of computer science.

Whitted model is just an approximation of the surface integration of the Rendering Equation. It only calculates three light rays(shadow ray, reflected ray, and refracted ray). In a more sophisticated ray tracer, you should use Monte-Carlo ray tracing, where at every intersection point on the object, you will use thousands of rays sampling according to BRDF. Such backward algorithms actually work not good at caustic scene, which is the scene you say. You can use Photon Mapping plus Monte-Carlo Ray Tracing to get a better visualization.

If you just want to use Whitted model, you can multiply the lighting of shadow ray by a factor defined by the occluded transparent object.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

Why is the reduction $\textbf{SAT} \leq_P \textbf{3SAT}$ possible, but $\textbf{SAT} \leq_P \textbf{2SAT}$ not possible, given, that $\textbf{SAT}$ is $\textbf{NP}$-complete, $\textbf{2SAT} \in \textbf{NP}$ and $\textbf{3SAT} \in \textbf{NP}$?

Asked By : Anderson

Answered By : mdxn

I believe the asker is wanting to know specifically why the standard approach to reducing $\text{SAT}$ to $\text{3SAT}$ does not continue to extend to the $\text{2SAT}$ case. Here is a walkthrough as to why.

Conversion of a SAT Instance to a 3-SAT Instance:

For convenience, let us assume everything is in $\text{CNF}$, or conjunctive normal form. The $\text{SAT} \le_p \text{3SAT}$ proof works by introducing dummy variables to spread the actual variables over multiple clauses.

Illustration: Consider a single clause in boolean formula $(x_1 \lor x_2 \lor x_3 \lor x_4)$. The $\text{3SAT}$ reduction works by introducing a dummy variable to represent a disjunction of two literals in hopes that it reduces the clause's length by one. For example, $y = x_3\lor x_4$. The clause can be rewritten as:

$(x_1 \lor x_2 \lor y) \land (\bar y \lor x_3 \lor x_4)$

Notice that these clauses are now in a $\text{CNF}$ for $\text{3SAT}$. If this is repeated multiple times for each clause, the final $\text{CNF}$ formula will be at most polynomial in length of the original. This is why a polynomial time reduction would suffice.

Attempting to Apply the Method to Convert to 2-SAT:

If we tried to use the trick above with a clause of length 3, $(a\lor b\lor c)$, then we would not get the desired improvement. Let our dummy variable here be $y=b \lor c$.

The result: $(a \lor y) \land (\bar y \lor b \lor c)$

As you can see, it still has a clause of size 3. If we used the method again on the second clause, we would end up with the same problem while also adding an additional clause of length 2.

It could be said that the simplest reason for this is because $\land$ and $\lor$ are binary operators. You still need to apply these operators on pairs of variables, then include an additional dummy variable (resulting in a clause of length 3).

There have been attempts at getting around this that are successful, but the resulting sequence of clauses become exponential in length. Handling such a formula is outside the scope of $\text{P}$, which is why such a reduction would not be a polynomial time reduction.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

I'm considering an automaton $A$ over a alphabet $\Sigma$, with a set of states $Q$, such that $\Sigma \subset Q$, which includes special "accept" and "blank" states not in $\Sigma$. It also has an infinite list of cells, which begins with the input string, the rest being blank. It has a transition function $\delta: Q \times Q \times Q \to Q$. The automaton computes by updating the value of the middle cell, $a$, using $\delta(x,a,y)$. The automaton accepts if any cell is in the accept state. From what I understand, this automaton is very similar to a 1-dimensional cellular automaton, with the addition of the accept state.

I'm trying to prove that a language is recognizable by a Turing machine if and only if it can be recognized by the automaton described.

I've been trying to do a proof by construction, but I'm really lost. I did some research on cellular automata, but the material I've been finding is way over my level. I know somehow I'm going to have to construct a TM from CA, and vica versa, but I'm not even sure how to get started. I would really appreciate any help.

Asked By : Kevin G

Answered By : Yuval Filmus

Here are some hints. The easy direction is simulating cellular automata with Turing machines. Turing machines can simulate C programs, and you can simulate cellular automata in C. (Replace C with your favorite programming language.)

Given a Turing machine, we want to construct a cellular automaton simulating it. We need to upgrade the alphabet by allowing, besides regular symbols $\sigma$, also symbols $\langle q,\sigma \rangle$ which signify that the head is currently at that point, the state is $q$, and the tape symbol is $\sigma$. The initial tape would contain $\langle q_0,\sigma_0 \rangle$ as the first symbol, where $q_0$ is the initial state and $\sigma_0$ is the first symbol on the input tape of the Turing machine.

It remains to show how to simulate the Turing machine by an appropriate choice of the propagation rules $\delta$. The basic idea is to maintain the invariant that the contents of the cells constitute of the tape of Turing machine, with the exception of the location of the head, which is of the form $\langle q,\sigma \rangle$. I'll let you work out the details.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

I'm trying to find a polynomial time algorithm for finding the minimum vertex cover for a graph. I've written the algorithm below; I know this problem is $\mathsf{NP}$-hard, which means there are probably some graphs for which this algorithm will not work.

I need some help in finding the flaw in this algorithm and also, an indication for what restrictions should be imposed on graphs such that this algorithm works.

In the algorithm below I have a graph $G=(V,E)$. I also define the $\text{priority}(v)$ function; in rough terms, it is the number of edges that are covered by vertex $v$. The function has the property that

$$\sum_{v \in V} \text{priority}(v) = \text{number of edges}.$$

In other words, an edge is counted as covered by only one of its vertices, not both.

Define degree : V -> NaturalNumbers degree(v) = number of edges connected to v, for all v in V  Define priority : V -> NaturalNumbers Initialize priority(v) = 0 for all v in V  For all (u, v) in E:     If degree(u) >= degree(v):         priority(u) = priority(u) + 1     Else         priority(v) = priority(v) + 1  Define S as the solution to the vertex cover problem Initialize S to the empty set  For all v in V:     If priority(v) != 0:         Add v to the set S  Output S as the solution 
Asked By : Paul Manta

Answered By : Nicholas Mancuso

Trying to take "high priority" i.e. high degree vertices over their neighbors will not work in all cases. This algorithm chooses $n-1$ for $C_n$ (Cycle Graph on $n$ vertices). It should choose either $(n-1)/2$ or $n/2$ if its odd or even. It also fails for Perfect Graphs.

I'm pretty sure this gives a $O(\log n)$ approximation too. If you order vertices by degree and greedily select vertices until all edges are covered, it is basically the same greedy algorithm as $O(\log n)$ set cover.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

As I have studied, deciding regularity of context-free languages is undecidable.

However, we can test for regularity using the Myhill–Nerode theorem which provides a necessary and sufficient condition. So the problem should be decidable.

Where is my mistake?

Asked By : Jiya

Answered By : David Richerby

Myhill–Nerode does indeed provide a characterization of the regular languages but that isn't enough to show that the problem is decidabe. "Decidable" means that there is an algorithm (more formally, a Turing machine) that terminates for every input and, of course, always gives the correct answer. So, in this case, you would need to give an algorithm that, given a language as input, determines whether the Myhill–Nerode relation has a finite number of equivalence classes. It turns out that this cannot be done for context-free languages; details in your favourite formal languages textbook.

If you want to decide whether a general language is regular, a further subtlety is that you have to be careful about what is the input to your algorithm. The input must be a finite string – otherwise, even just reading the input would be a non-terminating algorithm. In the case of context-free languages, you could use a grammar as a finite representation of an infinite language. For more general languages, you would need... well, something more general. Ultimately, though, if you want to deal with all languages, you're doomed. Over any finite alphabet, there are uncountably many languages but only countably many finite strings. That means you can't possibly describe all languages using finite strings.1 Therfore, trying to write an algorithm to determine whether arbitrary languages given as input are regular actually fails before it begins. It's not just that you can't write the algorithm: you can't even write the input!

Note that this doesn't mean that you, a human being, can't use Myhill–Nerode to determine whether languages are regular. It just means that you can't write down a set of precise instructions to tell me how to do that. At some point, any such set of instructions would have to say something like, "And then play about with it to see what works," or "From there, you have to figure it out on your own."

  1. In particular, this means that some languages must be undecidable, since there are more languages than there are algorithms.
Best Answer from StackOverflow

Question Source :

Problem Detail: 

I have recently been asked in an interview to devise an algorithm that divides a set of points in a coordinate system so that half of the points lie on one side of the line, and the rest on the other side.

The points are unevenly placed and the line must not pass through any of the points.

Can any one give any approach to solve the problem? Analysis of the algorithm is appreciated.

Hints: Count the points, use medians.

The number of points is assumed to be even.

Asked By : Ravi Teja

Answered By : Yuval Filmus

One approach is choosing a "generic" direction (in practice, a random direction), projecting all points along this direction, and then using a median algorithm (your line should correspond to any translation which lies between the two medians). If you choose a bad direction then points might clump together, making it impossible to separate them along that direction. However, if you choose a "generic" direction then this won't happen (assuming that the points are distinct). Since there are linear time median algorithms, this is an $O(n)$ algorithm. Using the randomized quickselect algorithm, you get a practical linear time algorithm.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

Does anybody know a good definition of 2 decision / optimization problems being equivalent?

I am asking since for example allowing polynomial time computations any 2 problems in NP could be considered equivalent.

Asked By : Reb

Answered By : Yuval Filmus

Suppose $L_1$ and $L_2$ are two decision problems, and $\mathcal{A}$ is a class of algorithms (like $P$). Say that $L_1$ is $\mathcal{A}$-reducible to $L_2$ if there is an $f \in \mathcal{A}$ such that $x \in L_1$ iff $f(x) \in L_2$. Say that $L_1$ and $L_2$ are $\mathcal{A}$-equivalent if each is $\mathcal{A}$-reducible to the other. For example, all NP-complete problems are $P$-equivalent. If you want a finer notion, you can consider more restricted classes, such as log-space or even AC0. Such notions of reducibility actually come up in refinements of Schaefer's dichotomy theorem.

As for optimization problems, let's consider maximization problems. With each maximization problem $L$ (which is a function mapping instances to optimal values), you can associate the decision problem $L'$ consisting of pairs $(x,v)$ such that $L(x) \geq v$. Now you can define reducibility and equivalence as before.

Best Answer from StackOverflow

Question Source :

Problem Detail: 

I am trying to create a DFA that can recognize strings with alphabet $\{a,b,c\}$ where $a$ and $c$ appear even number of times and where $b$ appears odd number of times.

I am wondering that this may only be expressed with other methods such as Turing machine or context-free languages.

You might find it fun to think of the solution.

Asked By : NeilPeart

Answered By : usul

This is doable with a DFA. Hint: We need to keep track of three things simulataneously:

  1. The parity of a's (have we seen an odd or even number of a's so far?)
  2. The parity of b's
  3. The parity of c's

So there are 8 possibilities for what we've seen so far:

  1. Even number of a's, even number of b's, even number of c's.
  2. Odd number of a's, even number of b's, even number of c's.
  3. Even number of a's, odd number of b's, even number of c's.
  4. ...

Does that help?

Best Answer from StackOverflow

Question Source :

Problem Detail: 

This was interview question.

When the input is a infinite sequence of numbers starting from 1, what is the nth digit?

e.g.) 123456789101112131415161718192021.....

here 28th digit is 1.

Asked By : Sungguk Lim

Answered By : wvxvw

Just to add a concrete implementation to what was already said:

(defun nth-digit (n)   (loop :with previous := 0      :for i :upfrom 0      :sum (* 9 (expt 10 i) (1+ i)) :into digits      :if (> digits n) :do      (multiple-value-bind (whole remainder)          (floor (- n previous) (1+ i))        (return (aref (write-to-string (+ (expt 10 i) whole)) remainder)))      :else :do (setf previous digits))) 

The idea for implementation is to:

  1. Sum the length of all 1-digit numbers (of which there are $9*10^0$), then sum all 2-digit numbers (of which there are $9*10^1$), 3-digit numbers, of which there are $9*10^2$ and so on. This gives:

$$ N = \sum_{i=0}^m 9\times 10^i \times (i+1) $$

  1. Notice that $x = \lfloor\frac{n - N}{i+1}\rfloor$ will be a positive integer which counts the number of numbers having $m+1$ digits in them.
  2. Finally, after you've already found what number contains your digit, you can find $e$ s.t. $r - e = 10^i + \frac{n - N}{i+1}$, and $e \leq i$ the $e'th$ digit of $x$ is going to be the one you are looking for.
Best Answer from StackOverflow

Question Source :