World's most popular travel blog for travel bloggers.
Problem Detail:

Probability-related Info theory question that I can't figure out. Thanks in advance!

I assume you are talking about the binary digits of all nonnegative integers < $2^{16}$.

The first binary digit of any positive integer must be $1$, because otherwise it's a leading zero and that means we're counting numbers twice. However, if we do not want two consecutive digits to be the same, this uniquely determines the rest of the bits, because they must alternate: $1010101...$.

So for $b > 1$, there exists exactly one $b$-bit integer with no consecutive binary digits.

For $b = 1$ there are two strings: $0$ and $1$.

Thus, if we are interested in the number of integers below $2^{16}$ without consecutive digits, there is one 16-bit number, one 15-bit, .. one 2-bit and two 1-bit numbers. So the answer is 17.

For completeness' sake, here they are:

0 = 0 1 = 1 2 = 10 5 = 101 10 = 1010 21 = 10101 42 = 101010 85 = 1010101 170 = 10101010 341 = 101010101 682 = 1010101010 1365 = 10101010101 2730 = 101010101010 5461 = 1010101010101 10922 = 10101010101010 21845 = 101010101010101 43690 = 1010101010101010 

Question Source : http://cs.stackexchange.com/questions/63705

3200 people like this
Problem Detail:

Suppose we have a set E of entities. Each entity is described by a set P of binary properties (i.e. each element e of E has a defined true/false value for each element p of P).

|P| >> |E|

We now want to select a subset of fixed size (e.g., 10) of P that will enable us distinguish the elements in E as accurately as possible.

In extensions of this basic scenario, the properties could assume numeric values.

Has this problem been studied? Under what name?

A practical example: there is a set of 100 bacterial species that are characterized for the presence or absence of 1000 genes. We now want to select a subset of 10 genes that, upon typing of a novel sample, will tell us which bacterial species that sample represents.

If you need the optimal answer, the best solution I know is exhaustive search: try all ${|P| \choose 10}$ different subsets, and see which is best. The running time of this will be $O(|P|^{10})$, though, which is probably too high to be feasible.

Given this, you will probably need to accept solutions that are heuristics or not guaranteed to give an absolutely optimal answer.

One standard approach is to use a greedy algorithm: you iteratively build up a set of properties, one by one. At each step, if your set is currently $S$, you choose the property $p$ that makes $S \cup \{p\}$ as accurate as possible, and then add $p$ to $S$. To turn this into a full algorithm, you need to decide how you want to measure/evaluate each candidate $S \cup \{p\}$.

For comparison, you can look also at the ID3 algorithm. Rather than trying to pick a set of size 10, it tries to pick a decision tree of depth 10, so it's not solving exactly the same problem: but it is similar. The metric used at each step to evaluate the candidates is the information gain; you could do the same, but for a set rather than a tree.

In the machine learning literature, there is a lot of work on feature selection: given a large number of possible features, the goal is to pick a subset of the features that makes the classifier as accurate as possible. You could explore that literature and see if any of those methods are effective in your domain.

Question Source : http://cs.stackexchange.com/questions/51632

3200 people like this
Problem Detail:

One of the inputs to my neural network is a set. I have a set $S = \{s_0, s_1, ..., s_n\}$ in which all values $s_i$ are constant. An example of such a set could be the set of French wines (Beaujolais, Languedoc-Rousillon, Champagne) or the set of players in a sports event (Player A, Player B, ...). The input to the neural network is a subset $T$ of $S$ (e.g., Player A competing against Player B or Beaujolais wine being served at a table, but nothing else).

Due to the restrictions of my neural network design, all input values must be normalized within the interval $[0,1]$. How would I encode the set $T$ to obtain an input to the neural network? How do I normalize the values in my set $S$ in a way to respect this condition?

My current idea is to use one boolean input per $s_i$: there would be $\#(S)=n$ boolean inputs, all set to 0 except for the values in $T$, which would be all 1. However, this presents the obvious flaw that for large $n$ there would be a lot of input neurons. Moreover, if one imposes the additional restriction of having at most $m$ elements in $T$, the resulting model would not efficiently correspond to the model (i.e. what if $m+1$ values were set to true)?

Is there a better way of modeling such a situation? Or better, is there a standard way for handling input sets with multiple possibilities?

Yes, the "one boolean per set element" is the standard way of encoding such a set. This is known as a "one-hot encoding". Yes, there will be a lot of input neurons, but that's not necessarily a serious problem; current procedures for training neural networks are able to handle millions of nodes with no problems.

If you know that $T$ will contain at most $m$ elements, in principle there are alternative encodings (e.g., use $m \lg n$ wires, where you use $\lg n$ wires for each element of $T$ to encode which element of $S$ it is)... but in practice I do not expect them to perform well.

In that case I think a better approach is to try to build a feature vector that is shorter than $n$ dimensions. Do you have some domain knowledge you can use to identify attributes of the elements? For instance, you could have a feature that says "how many elements of $T$ are red wines?" and "how many elements of $T$ are white wines?" and "how many elements of $T$ are dessert wines?" and so on. You'll have to use your domain knowledge about the task to identify what attributes might be relevant to the classification task. In this way, the number of input wires can be made much smaller than $n$.

Question Source : http://cs.stackexchange.com/questions/67062

3200 people like this
Problem Detail:

In Sebesta's Concepts of Programming Languages 10th edition on page 189 he explains the concept of $\text{FIRST}$ sets using the following example:

$A \rightarrow aB \ | \ bAb \ | \ Bb$

$B \rightarrow cB \ | \ d$

The $\text{FIRST}$ sets for the RHSs of the $A$-rules are $\{a\}$, $\{b\}$, and $\{c, d\}$, which are clearly disjoint.

Why aren't the first two sets combined into a single set $\{a, b\}$ instead? If $\{c, d\}$ is constructed by taking the two leftmost terminal symbols from $B$ then why don't we likewise take the two leftmost terminal symbols from $A$ into a single set?

###### Answered By : André Souza Lemos

There are three $A$-rules, each one with its right-hand side: $aB$, $bAb$, and $Bb$. Each produces a set as its value for $FIRST$. $FIRST(aB) = \{a\}$, and $FIRST(bAb) = \{b\}$.

In its turn, $FIRST(Bb) = FIRST(B)$ (because $B$ does not derive the empty string), and $FIRST(B) = FIRST(cB) \cup FIRST(d) = \{c\} \cup \{d\} = \{c,d\}$.

Naturally, $FIRST(A) = \{a, b, c, d\}$.

Question Source : http://cs.stackexchange.com/questions/65604

3200 people like this
Problem Detail:

I'm in the process of optimizing my neural network. I'd like to optimize on a small training set (1000 rows) as opposed to my full training set (100K rows) for speed reasons.

Will the optimal hyper-parameters (i.e. my learning rate, dropout prob, regularization parameter, # of hidden units, etc...) for my small training set also be optimal for my large training set? In other words, which parameters can I optimize on my small training set, and which must I try to optimize on my large?

Thanks--

This is a bad idea. For many tasks, you'll likely get poor performance. For many machine learning tasks, having a lot of data is essential to getting good results.

Instead, I recommend you set yourself up with software and hardware that can train your network on the full training set efficiently: buy a fast GPU, use software that can use the GPU for training, use stochastic gradient descent with mini-batches and other standard techniques.

You'll likely need to optimize all of your hyper-parameters on the full training set. I don't think optimizing the hyper-parameters on a small training set is likely to work well. If there's any past research on similar machine learning tasks, you might look at what network architecture and hyper-parameters they used as use that as a starting point for your exploration.

Question Source : http://cs.stackexchange.com/questions/60347

3200 people like this
Problem Detail:

I know that a graph G' is a subgraph for G if V(G')⊆V(G) and E(G')⊆E(G).

Today my professor wrote that G' is a subgraph for G if G' is a graph and V(G')⊆V(G), and then he told us that this definition is equivalent to the one we've seen before (= the first line in this post).

However, I don't think that these definitions are equivalent. In fact, if this is G:

       2      /   \     1     3 

and this is G':

    1 - 3 

we have that G' is a subgraph for G according to the second definition (wrong), while it is not according to the first one (right).

Am I missing something or was the professor wrong?

###### Answered By : David Richerby

The alternative definition that you attribute to your professor is wrong, as your counterexample clearly shows.

I hope your professor actually said something else and you misunderstood him/her!

Question Source : http://cs.stackexchange.com/questions/64448

3200 people like this
Problem Detail:

If I have the following statements:

For Idempotent:

Since X * X = X, would that imply that ~X * ~X = ~X

For Dominance:

Since X + 1 = 1 would that imply that ~X + 1 = 1

###### Answered By : Yuval Filmus

The principle of substitution states that if $\varphi$ is a valid (fully parenthesized) formula in variables $x_1,\ldots,x_n$ and $e_1,\ldots,e_n$ are arbitrary expressions, then if we substitute $e_i$ for $x_i$ in $\varphi$ (you have to substitute all occurrences of $x_i$ by $e_i$) we still get a valid formula. You can apply this principle to your examples.

Perhaps the principle will become easier to understand if we take an example from algebra. Consider the identity $(x+y)(x-y) = x^2-y^2$. If we substitute $2a$ for $x$ and $1+a$ for $y$ then we get $((2a)+(1+a))((2a)-(1+a)) = (2a)^2-(1+a)^2$, which is still valid. The same happens in your case.

Question Source : http://cs.stackexchange.com/questions/54214

3200 people like this
Problem Detail:

Can anyone explain why pictures are not considered 2D, but rather high dimensional? Especially with regards to CV and AI.

From one perspective, a picture is a 2D image, because it has height and width.

But from a machine learning perspective, we can think of a picture as a point in a high-dimensional space. In particular, suppose we have a greyscale picture that is $m\times n$ pixels, i.e., $m$ pixels wide and $n$ pixels high. Then there are a total of $mn$ pixels in the image. Each pixel has a greyscale intensity, which we can think of as a real number in the interval $[0,1]$. Therefore, we can think of the picture as being a collection of $mn$ real numbers. In other words, the picture can be treated as a $mn$-dimensional vector -- as an element of $\mathbb{R}^{mn}$. Thus, any particular picture can be thought of as an element of a high-dimensional space.

The latter perspective arises natural for some machine learning approaches to computer vision, e.g., where we feed the pixels of the image into the machine learning algorithm, where each pixel value is treated as a separate pixel.

(A color image can be thought of as an element of $\mathbb{R}^{3mn}$: for each pixel, we have three numbers, corresponding to the intensity in the red, green, and blue channels.)

Question Source : http://cs.stackexchange.com/questions/55222

3200 people like this
Problem Detail:

I'm reading Tanenbaum's and Wheterall's book on Computer Networks. I'm trying out some examples of situations that can occur in the one bit sliding window protocol for a datalink layer.

The code for the protocol can be found here on page 214.

The situation I was investigating was the following: Let's say client A successfully sends a frame to client B. Client B then sends his frame containing data along with the acknowledgement (ACK) that he received client A's frame correctly (piggybacking). Now somewhere along the way this frame gets damaged. Therefore, in the code, a chksum_err event occurs. Then the exact same frame client A sent before is sent again. Why? Shouldn't it be the case that client A now sends a frame with the sequence number inverted, new data but the acknowledgement is not inverted? This way, assuming nothing special happens along the way, client B correctly receives a new frame (instead of a duplicate) and client B knows that the frame he sent was damaged so he needs to send it again.

Isn't this the beauty of the protocol? Is the code wrong or what am I missing?

Shouldn't it be the case that client A now sends a frame with the sequence number inverted, new data but the acknowledgement is not inverted?

This could indeed be done when the error occured in the data. i.e. you're sure that it wasn't the acknowledgement or sequence number that was damaged. In that case, sender A could indeed send a new frame as described and avoid sending unnecessary duplicates.

However, although an error can be detected, it can not be detected where the error occured (in the data, acknowledgement or sequence number). If the error occured in the aknowledgement number then the client may unrightfully think that his previously sent frame was received correctly while it actually wasn't received correctly and so, if he sents a new frame with new data as proposed in the question, a client will miss a frame!

Question Source : http://cs.stackexchange.com/questions/54602

3200 people like this
Problem Detail:

When I compute a gaussian kernel for image blurring should I normalize the 1D vectors? Because when I apply the raw values sometimes the image gets lighter or darker. The function I'm using is $f(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-x^2}{2\sigma^2}}$ or the 2D function. So, should I just apply the results using cross correlation/convolution or should I normalize first?

###### Asked By : Júlio César Batista

Your function is used to compute 2D kernel, which you should normalize, otherwise if the sum of used weights is higher than $1$ your image gets lighter, when it is smaller it gets darker. You should not normalize vectors, because you apply 2D kernel, so normalized 1D vectors will not help.

The operation that you perform is convolution, not correlation, which in common meaning would try to measure how similar is the part of image to given kernel.

Question Source : http://cs.stackexchange.com/questions/57991

3200 people like this
Problem Detail:

I stumbled on the Hungarian algorithm during my personal research when I was assigned interesting problem as homework:

Given a list of objects $L$ and a pairing function $\delta : L \times L \rightarrow \left[0, 1\right]$, pair each object $\alpha$ in $L$ with exactly one $\beta$ in $L$ such that:

• $\sum_{i = 1}^n \delta(\alpha_n, \beta_n)$ is maximal
• $(\alpha, \beta) \iff (\beta, \alpha)$
• $(\alpha, \beta) \implies \alpha \neq \beta$

Now, since we just started programming as a class, our TA wants us to use a heuristic approach to solve the problem - e.g. swap assignments until you reach a local maximum etc.

I thought of using the Hungarian algorithm to guarantee an optimal matching in (relatively) feasible time. For this, I constructed a graph with all $\alpha$ having nodes to all $\beta$ and initializing the edge of two same elements to $\infty$.

However, as it turns out, the implementation I wrote disregards the symmetric aspect of this matrix - in other words my algorithm does not guarantee that $\alpha$ is also it's the partner of the partner of $\alpha$.

I suspect that reason for this is that this is not a bipartite graph anymore. Even though there is a difference between an $\alpha$ on the left and a $\beta$ on the right, it's not a "real" bipartite graph. Is this correct?

According to the Math stack exchange I should use the more general blossom algorithm in this case, but somehow I have a feeling that I could make better use of the properties of my matrix. Is there a better algorithm for this case?

###### Answered By : Yuval Filmus

Your problem is exactly the same as maximum weight perfect matching, solved by the blossom algorithm. Although $\delta$ is not given as symmetric, you can reduce your problem to the symmetric case by replacing $\delta$ with $$\delta'(\alpha,\beta) = \max(\delta(\alpha,\beta),\delta(\beta,\alpha)).$$ I'll let you figure out how to translate a solution which is optimal with respect to $\delta'$ to one which is optimal with respect to $\delta$.

Question Source : http://cs.stackexchange.com/questions/65694

3200 people like this
Problem Detail:

I'm currently writing a library that solves a specific type of problem that involves mainly constraint satisfaction.

I have came across the Min-Conflicts Algorithm which proved to be rather efficient in the context of the problem.

However, I have recently chanced upon the Coordinate Descent Algorithm and how strikingly it resembles the Min-Conflicts Algorithm.

Probably the only difference is that in Min-Conflicts, a random variable is selected to be modified at each step whereas Coordinate Descent cycles through the variables.

Am I right to say that apart from this difference, min-conflicts and coordinate descent are essentially equivalent? If so why are classified differently?

###### Asked By : Yiyuan Lee

Am I right to say that apart from this difference, min-conflicts and coordinate descent are essentially equivalent?

In the same sense that A* and BFS are essentially equivalent except for the choice of which node to visit next, yes.

If so why are classified differently?

There are some problems for which Min-Conflicts performs asymptotically faster (in the expected-time sense) than Coordinate Descent, and vice versa. Ditto, there are some problems where one (generally) converges and the other (generally) doesn't.

Question Source : http://cs.stackexchange.com/questions/60963

3200 people like this
Problem Detail:

Are there any techniques that can be used to determine whether a digital logic circuit is optimal, or if it has extraneous operations that don't contribute to the output?

I'm especially curious in the case where the circuit is defined in terms of XOR and AND gates.

I know I could make a karnaugh map and create a circuit from that, but that doesn't really tell me about the circuit I had, it just gives me another circuit.

Thanks for any info!

###### Asked By : Alan Wolfe

Use one of the standard techniques for circuit minimization. As far as I know, verifying that a candidate circuit is minimal is as hard as finding the minimal circuit in the first place (I know of no faster algorithm for verifying minimality than finding a minimal circuit in the first place).

Circuit minimzation is NP-hard (in fact it's even harder than that, in some sense), so you shouldn't expect an efficient algorithm for it that handles all functions efficiently.

Question Source : http://cs.stackexchange.com/questions/49250

3200 people like this
Problem Detail:

The Polya conjecture is a disproved conjecture that states over half the numbers less than any number has an odd number of prime factors. It first fails at $n = 906,150,257$, thus being a good example that testing very many cases is not always enough.

I have been giving "disproving the Polya conjecture" as an exercise a few times, as the counter example is well within computational capabilities.

The simple algorithm for it is to have an incrementing $n$, factorize it, count the factors, and increment either the counter for odd numbers, or even numbers. Continue until you reach the counter example.

Factorization is however an operation to try to avoid, as even the most effective algorithms for that have a nasty computational complexity.

The amount of unnecessary information created by factorizing every number got me wondering if there are better ways. Because:

• We do not have to know what the factors of a number are, just how many.
• Even the number of prime factors is not needed, as we only need the parity.
• We do not have to know the parity of the number of prime factors of each number in a range, it is sufficient to know how many are odd, and how many are even.

Factorization here yields a lot more information than actually needed. Is it really necessary?

###### Answered By : Yuval Filmus

You can use a sieve to significantly improve the running time. Decide on a number $N$, say $N = 2^{30}$. Initialize an array of length $N \times 2$ bits: one to keep track of the parity of the number of prime factors, and the second to mark which numbers have been visited. Now run an algorithm similar to the sieve of Eratosthenes. The running time equals the total number of prime factors of all numbers up to $N$, which is roughly $N\log\log N$.

With this information in hand, it is easy to check for violations of the conjecture.

Question Source : http://cs.stackexchange.com/questions/56384

3200 people like this
Problem Detail:

Is there a way to subtract and add properties of axioms to generate new axioms?

For example:

{L} = {P S K} // natural deduction

{P S K} = {P H K I} // natural deduction

{S K} = {?} // constructive logic

{K I} = {?}

Where:

L = ((A -> B) -> C) -> ((C -> A) -> (D -> A)) // Łukasiewicz's axiom system

P = ((A -> B) -> A) -> A // Piece's Law

H = (A -> B) -> ((B -> C) -> (A -> C)) // Weak hypothetical syllogism

S = (A -> (B -> C)) -> ((A -> B) -> (A -> C))

K = A -> (B -> A)

I = A -> A

I want to be able to be able to add and subtract axioms such that P + S + K = P + H + K + I implies S encodes the properties of H + I

I'm probably using unjustified assumptions here. For example, I assume you can derive S from H and I, without using P or K. Ideally, there would be a way to automate the process of constructing and destructing axioms (though it'd probably be NP hard).

###### Answered By : Yuval Filmus

The natural way to define your addition operation is as set intersection. The value of a single axiom is the set of all models satisfying the axiom. The sum of several axioms is then the set of all models satisfying all the axioms.

A similar definition is $\mathbf{A} + \mathbf{B} = \mathbf{A} \land \mathbf{B}$, but then you need to interpret equality as logical equivalence. The definitions then become completely equivalent.

However, you can't "cancel" axioms. If you know that $\mathbf{A}+\mathbf{B} = \mathbf{A}+\mathbf{C}$ then all you can conclude is that in the presence of $\mathbf{A}$, $\mathbf{B}$ and $\mathbf{C}$ are equivalent. As an example, let $\mathbf{A}$ be $x=0$, let $\mathbf{B}$ be $y=1$, and let $\mathbf{C}$ be $x+y=1$. In the presence of $\mathbf{A}$, the axioms $\mathbf{B}$ and $\mathbf{C}$ are equivalent, but in general they aren't.

If are are really keen on the cancellation law, you can always take formal sums. It is then true that $\mathbf{A}+\mathbf{B} = \mathbf{A}+\mathbf{C}$ implies $\mathbf{B}=\mathbf{C}$, but at the cost of $\mathbf{A}+\mathbf{B}$ having no other meaning than the formal sum of $\mathbf{A}$ and $\mathbf{B}$.

Question Source : http://cs.stackexchange.com/questions/54943

3200 people like this
Problem Detail:

Given a set of data points (shown in red), it is possible to fit a line $y = mx + c$ through the points using linear least squares regression.

I would like to modify this to fit a 1D lattice (grid) along the line such that $d$ is the offset of the the grid along the line and $i$ is the interval between neighboring grid points along the line.

An additional constraint is that more or less the same number of data points should be in the neighbourhood for each grid point in order to prevent picking an arbitrarily small $i$ and overfitting the data. I believe that calculating the mean square error on a solution can be simple. The squared distances of all of the data points in the neighbourhood of the grid point would be:

$$mse = \sum_{j = 0}^{g} \sum_{k = 0}^{h} distance^2(\vec{latticePoint_j}, \vec{dataPoint_{j,k}})$$

(Where $g$ is the number of clusters and $h$ is the number of points inside the grid point's neighborhood.)

However, I'm not sure how to go about solving for all four variables $m$, $c$, $d$ and $i$ since I'm left with a discrete function. My hope is that since $i$ is fixed and points are distributed reasonably evenly among the clusters this could be done without invoking an overly complicated clustering algorithm.

The number of clusters are not known ahead of time and may contain outliers. Any help is much appreciated.

###### Asked By : Rehno Lindeque

Define an objective function $\Psi(m,c,d,i)$ that specifies how "bad" a particular choice of values for $m,c,d,i$ are. Then, use any standard black-box optimization algorithm (e.g., gradient descent) to find $m,c,d,i$ that minimize the objective function. You might choose the initial values of $m,c$ based on a least-squares fit, and choose the initial values $d,i$ based on a best guess (maybe by projecting the points to the line $y=mx+c$ and then taking a FFT and looking for peaks, to try to find the periodicity), and then run gradient descent from there.

So, it all reduces to choosing an objective function $\Psi(m,c,d,i)$ that captures what you want. We can't give that for you, since you haven't fully specified the problem. For instance, you say you want "more or less the same number of data points" in each cluster, but you haven't specified how to trade off the errors: of course, we can't achieve this exactly, so we need some way to measure and compare different candidate solutions.

For instance, given a set $P$ of points, you might use an objective function like

$$\Psi(m,c,d,i) = \sum_{p \in P} d_{m,c,d,i}(p)^2$$

where $d_{m,c,d,i}(p)$ is the distance from $p$ to the nearest grid point on the line $y=mx+c$, i.e.,

$$d_{m,c,d,i}(p) = \min\{\sqrt{(p_x-x)^2 +(p_y-(mx+c))^2} : x=d + ij, j \in \mathbb{Z}\}$$

where $p=(p_x,p_y)$ in $(x,y)$-coordinates.

This doesn't do anything special about outliers; they incur a very large penalty. That might be undesirable -- it might lead to poor results in the presence of outliers. There are various methods for dealing with outliers. One simple one is to fit a line using a robust least-squares fit, find outliers and remove them as part of a pre-processing step, and then run the procedure on the result.

Also, you mention fear of overfitting the data set. Another way to avoid overfitting is to add a regularization term that penalizes values of $d,i$ that are considered a priori unlikely. For instance, you might add a penalty term that is proportional to $1/i$ or to $-i$. The penalty term gets added to the objective function. Then, when you minimize the objective function, this will exert some pressure to avoid choosing small values of $i$.

Question Source : http://cs.stackexchange.com/questions/63012

3200 people like this
Problem Detail:

I am reading the 1989 paper called "An O(NP) Sequence Comparison Algorithm" by Wu, Manber, Myers, and Miller. The algorithm sounds like a good fit for a project I'm doing at work. I have found some implementations in my target language that I could reuse, but I want to make sure that I understand the algorithm (and the code) because it is so crucial to my project.

I am a ways into the paper, and seem to be understanding it, but there's something important in the abstract that still doesn't make sense to me. This is the relevant portion:

Let $A$ and $B$ be two sequences of length $M$ and $N$ respectively, where without loss of generality $N \ge M$, and let $D$ be the length of a shortest edit script between them. A parameter related to $D$ is the number of deletions in such a script, $P = D/2 - (N - M)/2$. We present an algorithm...

This relationship between $P$, $D$, $N$, and $M$ is not proven in the paper. And when a try using it with a simple example (turning "xy" into "x", for instance), I get a nonsensical answer. Can someone please explain the relationship?

This is a straightforward consequence of the fact that the edit script only contains insert and delete operations. It follows from a little bit of basic arithmetic -- it's nothing especially deep.

Consider a script that contains $I$ insert operations and $P$ delete operations. Then the total number of operations in the edit script is $I+P$. If we're given that the total number of operations is $D$, we know $D=I+P$. Also we know $M+I-P=N$ (since each insert operation increases the length of the document, and each delete decreases it). Solving these two simultaneous equations yields the relationship you list.

Question Source : http://cs.stackexchange.com/questions/56607

3200 people like this
Problem Detail:

I am new to complexity theory and I have some maybe stupid questions:

Are "decision problems" with the characteristic functions x(t)=1 or x(t)=0 "decision problems" or not?

If yes: Are the corresponding formal languages members of P or not?

(The calculation time needed is constant.)

If the answer is yes in both cases:

If $P\neq NP$ no P decision problem can be NP-complete. This is easy to prove.

I have read that if $P=NP$ all P decision problems would be NP-complete. This is easy to prove for all other problems however the problems described above cannot be complete in any way.

Are these problems an exception?

###### Answered By : David Richerby

Yes, of course they're in P: you can decide them in constant time, which is certainly bounded by a polynomial.

Anybody who tells you that P$\,=\,$NP implies that every problem in P is NP-complete has forgotten about these two trivial problems. Neither of them can be NP-complete, because NP-completeness is defined in terms of many-one reductions, but there cannot be a many-one reduction from a non-trivial probelm to either of the trivial problems. A many-one reduction from $A$ to $B$ must map "yes" instances of $A$ to "yes" instances of $B$, and "no" instances to "no" instances. But one of the trivial problems has no "yes" instances and the other has no "no" instances, so a problem that has both "yes" instances and "no" instances cannot be reduced to either.

Question Source : http://cs.stackexchange.com/questions/63279

3200 people like this
Problem Detail:

I am wondering if some symbols such as the ones in propositional logic have precedence over others in drawing parse trees.

For example, the sentence: p ∧ q → r, would ∧ take precedence over → in becoming the root of the parse tree or vice versa?
Or is there no precedence meaning, → or ∧ could be used the root?

Operator precendences have to be specified, either in the grammar or using other means. There is no inherent precendence to rules, but you can fix one arbitrarily.

Question Source : http://cs.stackexchange.com/questions/60755

3200 people like this
Problem Detail:

The Howard-Curry correspondence is enormously powerful. I'd like to use it, but I'd like a choice other than Haskell and Scala.

The languages that I can see that support the Howard-Curry correspondence (with type-checking at compile-time) are:

• Agda
• Coq
• Scala
• Shen
• Qi II

My question is: Which languages (apart from the list below) can do Howard Curry type checking at compile-time?

Check out Idris and ATS. Both support this, and dependent types, and are designed to be practically useful.

Question Source : http://cs.stackexchange.com/questions/56827

3200 people like this
Problem Detail:

I am learning basic grammar and parsing concepts and ran into this statement:

Nonterminal symbols represent syntactic entities: statements, declarations, or expressions.

But the three terms (statements, declarations, expressions) were not really defined. I have plenty of programming experience so I have a fairly intuitive grasp of the terms, but I'm wondering, from a language grammar standpoint, what are the generally accepted definitions for each? When I searched online I found several language-specific discussions but nothing that gave a general definition of these terms that can apply across any language.

Are there such definitions out there? Thanks.

###### Answered By : Yuval Filmus

These are informal terms. For example, in the C language:

1. Statements are things like x = 3;.
2. Declarations are things like int x;.
3. Expressions are things like x * y.

So declarations are a particular type of statements, and expressions appear in statements.

A better formulation of the statement you quoted would be:

Nonterminal symbols represent syntactic entities such as statements, declarations, or expressions.

Question Source : http://cs.stackexchange.com/questions/65433

3200 people like this
Problem Detail:

I need to create the combination of 3 elements from a array of given n elements, however every of those 3 el. has to be in "relationship" with each other. Here is an example I can describe my issue better on:

We've got elements 1, 2, 3, 4, 5 & 6. We know following relationships:

• 1 <--> 2
• 1 <--> 3
• 1 <--> 4
• 1 <--> 6
• 2 <--> 3
• 2 <--> 5
• 3 <--> 5
• 3 <--> 6
• 4 <--> 5
• 4 <--> 6

A group {1,2,3} is viable, because 1 is in relationship with either 2 & 3 and 2 is in relationship with 3.

In a group {1,2,4} is 1 in relationship with 2 & 4, however 2 isn't with 4, thus this combination can't exist.

I programmed a functional solution(which involved the above mentioned creating of combinations, redirecting to the list of relations of every element) for small amount of elements and relationship, but I need to have one for approx. 500 000. I'm a high school student and I haven't met a branch of mathematics which would help me to understand or to solve this problem.

I've been searching the net and I think that a relational algebra (binary relations) might be the solution, yet I'm not quiet sure where to start.

Can you think of any hint, advice, web source or view that could help me?

Thank you

###### Answered By : David Richerby

The relevant branch of mathematics is graph theory.

Your elements are the vertices of the graph and the relationships between them are the edges. The triples you're looking for are known as triangles: sets of three vertices with all possible edges between them.

I'm not quite sure what you're looking for in your question. If you just want a list of all the triangles in the graph, that can be done naively just by trying all the triples. If you have $n$ vertices in your graph, that takes about $n^3$ tests. More advanced algorithms turn out to be closely related to matrix multiplication, which can be done a little faster – roughly $n^{2.5}$ steps. That might still be rather slow for half a million vertices. If you want to go faster, you'll have to either use approximations (if you're happy with finding most but not necessarily all of the triangles). Alternatively, the particular graph you're interested in might have properties that make it easier to find triangles, for example by ruling out a lot of the possibilities.

Question Source : http://cs.stackexchange.com/questions/50686

3200 people like this
Problem Detail:

I have a bunch of wmv files (~300), named the following way: video1.wmv -> videox.wmv. Some of these video files, are exact duplicates of one another (same format, same bitrate, same length, basically, they were copied and renamed, so the only thing that differs from them is their name). Is there any way to find out if two video files from this group are identical or not, so I would be able to filter out the duplicate ones?

Comparing their size is unfortunately out of the question, because I manually went through some of them, and because each of them is around the same length(even the non duplicate ones), some have identical sizes, but they are not identical per se.

If there is a possibility, I would prefer a method which could be implemented as a PHP script, but any other method of comparing them would be welcome.

Thanks!

A general algorithm is to compute the SHA256 hash of each file, then sort the hashes and look for duplicates. After sorting any duplicates that may exist will be consecutive. For all practical purposes, you can assume that two files will be identical if and only if their SHA256 hash is the same.

If you're asking how to do this from the command line or looking for a product recommendation, take a look at this:

Question Source : http://cs.stackexchange.com/questions/65469

3200 people like this
Problem Detail:

I'm reading up on System Z introduced by Judea Pearl (in System Z: a natural ordering of defaults with tractable applications to nonmonotonic reasoning). A central definition is that of tolerance of a subset $R'$ of a rule set $R$ for a rule $r = \alpha_r\to\beta_r$ (denoted by $T(r\ \vert\ R')$). Tolerance of $R'$ for $r$ is defined to be the set of satisfiable formulas $$(\alpha_r\land\beta_r)\bigcup_{r'\in R'}(\alpha_{r'}\supset\beta_{r'})$$ (see page 2 of the article by Pearl).

I don't understand what $\alpha_{r'}\supset\beta_{r'}$ is supposed to mean. Can anyone explain?

In elementary logic, $A \supset B$ is notation for the formula $\neg A \lor B$ (for the implication "$A$ implies $B$"). In other words, $A \supset B$ is a formula; it is true if $A$ is false or if $B$ is true, and false otherwise. http://math.stackexchange.com/q/1106001/14578

How you could have figured this out on your own: Read the entire paper. For instance, a useful place would be to start would be reading the examples in Section 3, where from the examples one can reverse-engineer the probable meaning of the symbol.

Question Source : http://cs.stackexchange.com/questions/48539

3200 people like this
Problem Detail:

Suppose if class $\mathcal C$ is in $PP$ or $BPP$ does it mean complement also belongs to $PP$ or $BPP$ respectively? Does it immediately follow from $PP=coPP$?

Yes, you should follow the definitions, if $L\in \mathcal{C}\subseteq PP$, then $\overline{L}\in PP$. It immediately follows that $co-\mathcal{C}=\{L | \overline{L}\in \mathcal{C}\}\subseteq PP$.

Question Source : http://cs.stackexchange.com/questions/55944

3200 people like this
Problem Detail:

Assume we have a set of $n$ objects $X=\{x_1,x_2,\ldots,x_n\}$, where each object $x_i$ has a penalty $p_i$. Additionally, we have a set of incompatibility constraints $C=\{(x_i,x_j),\ldots\}$, where a tuple $(x_i,x_j)$ says that object $x_i$ is incompatible with object $x_j$. The problem is to find a subset $Y$ of $k<n$ compatible objects that minimize the sum of penalties, i.e. $\min_{Y} \sum_{x_i \in Y} p_i$. The objects in $Y$ need to be compatible, i.e. $\forall x_i,x_j \in Y: (x_i,x_j) \not\in C$.

Let me make an example. Assume we have 4 objects $X=\{x_1,\ldots,x_4\}$ with penalties $p_1 = 2,\ p_2 = 0.1,\ p_3=3,\ p_4=100$. Furthermore the following incompatibilities are given: $C=\{(x_1,x_2),\ (x_2,x_3),\ (x_3,x_4)\}$. The $k=2$ compatible objects that minimize the function are $Y = \{x_1,x_3\}$ with a total penalty of $5$. Object $x_2$ with the least penalty is not part of the solution, because the only compatible object is $x_4$ with a penalty $p_4=100$.

I have two questions:

• Is this problem already known under some name or a variation of a known problem?
• Is there an efficient (polynomial time) algorithm to solve it?

First of all you have to find if there exist independent sets of size $k$ and then select the one with the minimum penalty. We have maximum weighted independent problem (size of independent set is unconstrained), but I am not aware of any optimization problem which select exactly $k$-sized independent set and minimizes/maximizes the total weight.

The decision version of the optimization problem that you describe will be: Does there exist an independent set of size $k$ and penalty less than or equal to $P$.

We can reduce standard independent set problem to this problem by specifying the penalty of each vertex as 1 and $P = k$. Thus the decision version of the described optimization problem is NP-complete.

So there won't be polynomial time algorithm for the problem unless $P = NP$.

Question Source : http://cs.stackexchange.com/questions/56587

3200 people like this
Problem Detail:

Consider a directed graph with nodes {1,2,3...n} and include an edge (i,j) whenever i<j.

According to me, it should be n(n-1)/2 but the book says it's nC2(combinations).

###### Answered By : Tom van der Zanden

Did you try working out what n choose 2 evaluates to?

Question Source : http://cs.stackexchange.com/questions/47881

3200 people like this
Problem Detail:

Let $G=(V,E)$ be a directed graph. The "invertible" part of $G$ is the subgraph $H=(V,E_2)$ such that $(u,v)\in E_2 \iff (u,v)\in E \land (v,u)\in E$.

Find an algorithm that generate $H$ from $G$ in a linear time, where $G$ is represented with an adjacency list. Non linear memory is allowed.

Obviously the brute force approach won't be linear, and I thought to use an adjacency matrix but that would be at least $\Omega (n^2)$.

I also thought maybe it's possible to create a forest of DFS trees and then, checking each edge if it has an anti edge would be a lot more easier, but I'm not sure if it would be possible to keep all the information from the original graph...

Another way is to generate the anti graph of $G$, then intersect it with $G$ and then subtract the intersection from $G$, we'll be left with all edges that didn't have an anti edge, it sounds too complicated too work in linear time though and I'm not sure how to implement this algorithm..

Am I on the right track? Is there another way?

###### Answered By : Yuval Filmus

Here is one approach, which assumes that the adjacency lists are sorted. First, compute the reversed graph (your anti graph) in linear time. Then, compute the intersection by first merging the adjacency lists from both the original graph and the reversed graph, and then looking for edges which appear twice. This takes linear time since the lists are sorted.

Question Source : http://cs.stackexchange.com/questions/54469

3200 people like this
Problem Detail:

To prove that a problem $\Pi_2$ is NP-hard one has to:

1. select a known NP-hard problem $\Pi_1$;
2. from an arbitrary instance of $\Pi_1$, create an instance of $\Pi_2$ in polynomial-time; and
3. show that solve $\Pi_1$ with the given instance $\iff$ solve $\Pi_2$ with the created instance.

Now, my question is the following.

Suppose now I did 1., 2. and only solve $\Pi_1\Rightarrow$ solve $\Pi_2$ but when I am trying to show the inverse, i.e., solve $\Pi_2\Rightarrow$ solve $\Pi_1$, the instance of $\Pi_1$ is changed a little. Is this proof correct?

More precisely, let $I_1=(n, a_1,\ldots,a_n,b_1,\ldots,b_n,\alpha,\beta,k)$ be an arbitrary instance of $\Pi_1$. The created instance (in polynomial-time) of $\Pi_2$ is $I_2=(n, a_1,\ldots,a_n,b_1,\ldots,b_n,\Delta,k)$. Now when I tried to show solve $\Pi_2$ with $I_2\Rightarrow$ solve $\Pi_1$ with $I_1$, I found another instance of $\Pi_1$, $I_1'=(n, a_1,\ldots,a_n,b_1,\ldots,b_n,a,b,k)$ where $a\neq \alpha$ and $b\neq\beta$. Hence I did not solve $\Pi_1$ with $I_1$ but with $I_1'$.

My guess is that the proof still correct, right?

###### Answered By : Denis Pankratov

No, you may not modify $I_1$. By the time you get to Step 3, everything is determined and you might not change your instances. Say $f$ is the polytime transformation from Step 2. Then you need to show

$I_1$ is a yes-instance of $\Pi_1$ if and only if $f(I_1) = I_2$ is a yes-instance of $\Pi_2$.

If after arriving at Step 3 you notice that your $I_2$ instance does not allow you to prove the statement, then you might want to change $f$ in your Step 2, and redo Step 3.

Question Source : http://cs.stackexchange.com/questions/53291

3200 people like this
Problem Detail:

If you have a number such as $3.14626437$ and you need to know what symbols create it, as far as I know, there are two tools:

1- ISC

and the answer is $\sqrt2+\sqrt3$

I am wondering what algorithm these websites are using and how much is their complexity?

RIES is another method for finding closed-form expressions; its web page describes the algorithm it uses.

If you do a Google search for algorithm for inverse symbolic calculator, you will find Wikipedia's page on algorithms for finding integer relations between multiple real numbers (as the page describes, you look for an integer relation between $x$, the number you care about, and a list of other standard mathematical constants).

Question Source : http://cs.stackexchange.com/questions/56804

3200 people like this
Problem Detail:

I've been using the Stanford Algorithms (1) Coursera course, and in a description of a problem, the lecturer said that in the problem of allocating n processes to n servers at random, the sample space of allocations is n^n, and each has equal probability.

Intuitively this seems unlikely to me: if you imagine each server having a number and that number being the number of processes assigned, you wouldn't have the case of two servers being assigned n processes; yet the n^n -- as far as I can see -- assumes that such allocations are possible.

Am I missing something?

The number $n^n$ can be easily obtained if you think the problem in the other direction: each process has $n$ choices and therefore $n$ processes has $n^n$ possible choices in total. Of course it is impossible that two servers have $n$ processes. If a process can be assigned to more than one server, the number of possible assignments is far more than $n^n$.

Question Source : http://cs.stackexchange.com/questions/41186

3200 people like this
Problem Detail:

It seems to me that there are two different situations which get called PH collapse",

(1) That $\exists i \geq 1$ s.t $\Sigma_i ^p = \Sigma_{i+1}^p$
(2) That $\exists i \geq 1$ s.t $\Sigma_i^p = \Pi_i^p$

• Are these two different independent situations?

AFAIK the only natural relation is that $\Sigma_i^p \subseteq \Pi_{i+1}^p \subseteq \Sigma_{i+2}^p$ and I don't think this is enough to make the two above scenarios be equivalent.

• Is there any natural relation even between $\Sigma_i^p$ and $\Sigma_{i+1}^p$ or between $\Pi_i^p$ and $\Pi_{i+1}^p$? For some $k > i$, if one shows that $\Sigma_k^p \subseteq \Sigma_i^p$ does that say anything about $\Pi_k^p$ vs $\Sigma_i^p$ ?

• Do both these scenarios lead to the conclusion that $PH = \cup _{j \geq 1} \Sigma_j^p = \Sigma_i ^p$ ? (If yes, then can someone kindly help with the proof?)

Roughly I guess the proof has to go like this :

That inductively assume that for some $k > i$ one has shown that $\Sigma_j^p \subseteq \Sigma_i^p$ $\forall j$ s.t $i\leq j \leq k$. Now sit at the threshold of the inductive hypothesis — and take some $L \in \Sigma_{k+1}^p$ and by truncating the first quantifier construct a language $L'$ such that $\langle x,u_1\rangle \in L'$ iff $x \in \Sigma_{k+1}^p$ with its first existential quantifier fixed to $u_1$. This makes $L' \in \Pi_{k}^p$

Now I don't know... something needs to be claimed about $L' \in \Pi_{k}^p$ to show that $L \in \Sigma_i^p$. But I don't see what goes here.

###### Answered By : Yuval Filmus

Suppose that $\Sigma_i^p = \Sigma_{i+1}^p$ (and so $\Pi_i^p = \Pi_{i+1}^p$). Then every statement of the form $(\exists y_1) \cdots (Q y_{i+1}) f(x,\vec{y})$ for polytime $f$ is equivalent to another statement $(\exists y_1) \cdots (Q y_i) g(x,\vec{y})$ for polytime $g$.

Now suppose you have a predicate in $\Sigma_{i+2}^p$, say $(\exists y) (\forall z_1) \cdots (Q z_{i+1}) f(x,y,\vec{z})$. The inner $i+1$ quantifiers are a $\Pi_{i+1}^p$ predicate, and so they are equivalent to a $\Pi_i^p$ predicate $(\forall z_1) \cdots (Q z_i) g(x,y,\vec{z})$. Adding back the outer quantifier $(\exists y)$, we find that the original predicate is equivalent to a $\Sigma_{i+1}^p$ predicate, and so $\Sigma_{i+2}^p = \Sigma_{i+1}^p = \Sigma_i^p$.

In the same way, we show that $\Sigma_i^p = \Sigma_{i+1}^p$ implies $\Sigma_i^p = \Sigma_{i+j}^p$ for all $j$, so that $\mathsf{PH} = \Sigma_i^p$.

Note that $\Pi_i^p \subseteq \Sigma_{i+1}^p$. Indeed, if $(\forall y_1) \cdots (Q y_i) f(x,\vec{y})$ is a $\Pi_i^p$ predicate, then the equivalent $(\exists w) (\forall y_1) \cdots (Q y_i) f(x,\vec{y})$ is a $\Sigma_{i+1}^p$ predicate.

Since $\Pi_i^p \subseteq \Sigma_{i+1}^p$, we see that $\Sigma_i^p = \Sigma_{i+1}^p$ implies that $\Pi_i^p \subseteq \Sigma_i^p$ and so $\Pi_i^p = \Sigma_i^p$.

Now for the other direction. Suppose that $\Pi_i^p = \Sigma_i^p$, and consider some $\Sigma_{i+1}^p$ predicate $(\exists y_1) (\forall y_2) \cdots (Q y_{i+1}) f(x,\vec{y})$. The inner $i$ quantifiers are a $\Pi_i^p$ predicate, and so they are equivalent to some $\Sigma_i^p$ predicate $(\exists z_2) \cdots (Q z_{i+1}) g(x,\vec{y})$. Adding back the outer quantifier $(\exists y_1)$ and folding $y_1,z_2$, we obtain a $\Sigma_i^p$ predicate equivalent to our original $\Sigma_{i+1}^p$ predicate.

Summarizing, if $\Pi_i^p = \Sigma_i^p$ then $\Sigma_i^p = \Sigma_{i+1}^p$, and so (as we have seen above) $\mathsf{PH} = \Sigma_i^p$.

Question Source : http://cs.stackexchange.com/questions/42332

3200 people like this
Problem Detail:

This question comes from economics. There is a market $M$ that contains $m$ items. There is a value function $v:2^M\to \mathbb{R}$, that assigns a monetary value to each "bundle" (- subset of $M$). The function is given by $2^m$ values - a value per bundle.

A price-vector $p$ is a vector that assigns to each item $x\in M$ a non-negative price $p_x$. Given a price-vector, the net utility of a bundle is its value minus its price, i.e:

$$u(X) = v(X) - \sum_{x\in X}p_x$$

A bundle is called demanded if its net-utility is maximal - no bundle has larger net-utility.

A pair of bundles is called a demand-pair if there exists a price-vector such that both of these bundles are demanded (i.e, they have the same net-utility and it is maximal).

I am looking for an algorithm that receives the valuation $v$ and finds all demand-pairs.

For example, suppose there are $m=2$ items and the valuation function is:

$v(\emptyset)=0$, $v(\{x\})=2$, $v(\{y\})=3$, $v(\{x,y\})=4$.

Then:

• $\{y\}$ and $\{x,y\}$ are demand-pairs, because when $p_x=1,p_y=0$, both these bundles have net-utility 3, which is maximal.
• $\{\emptyset\}$ and $\{x,y\}$ are not demand-pairs, because if they have the same net-utility, then this utility must be 0. Then, necessarily $p_x+p_y=5$; but then either $p_x<2$ or $p_y<3$; but then either $\{x\}$ or $\{y\}$ (or both) have a net-utility of more than 0, so $\emptyset$ is not demanded.
###### Asked By : Erel Segal-Halevi

This can be solved in polynomial time, by solving $2^{2m}$ linear programs. (That might look exponential, but is polynomial in the size of the input, as the input size is $2^m$.)

Given a pair of bundles, you can check whether it is a demand-pair using linear programming; see below. So, you can apply this to all candidate pairs.

Suppose we have two bundles, $X,Y$, and we want to know if they are a demand-pair. Introduce a variable $p_x$ for each $x \in M$. It's easy to write a linear equality that expresses that $X,Y$ have the same net-utility: $u(X)=u(Y)$. Also, it's easy to write linear inequalities to express that no other bundle has higher net-utility: $u(X) \ge u(Z)$ for all other bundles $Z$. Each of those is a linear (in)equality on the variables $p_x$. We also have the inequalities $p_x \ge 0$. Now use linear programming to test whether all of those inequalities are simultaneously satisfiable; if they are, there exists a price-vector where $X,Y$ are simultaneously demanded.

As an optimization, note that $X,Y$ are a demand-pair for market $M$ iff they are a demand-pair for market $X \cup Y$: without loss of generality you can set the price of every item in $(X \cup Y) \setminus M$ to $+\infty$. Also, without loss of generality you can set the price of every item in $X \cap Y$ to 0. Consequently, $X,Y$ is a demand-pair for market $M$ iff $X'=X \setminus Y$ and $Y'=Y \setminus X$ are a demand-pair for market $X' \cup Y'$. This will make each linear program smaller and more efficient to solve. It also means that you only need to test pairs that have no common intersection, so it suffices to test at most $3^m$ pairs $X',Y'$.

Question Source : http://cs.stackexchange.com/questions/54379

3200 people like this
Problem Detail:

I am reading the paper Measuring the hardness of SAT instances by Ansótegui, Bonet, Levy and Manyà (Proc. 23rd AAAI Conf. on AI, pp. 222–228, 2008) (PDF). I am trying to understand the last part of the demonstration of the Lemma 3 (in bold). For this, I get an example. Let be $\Gamma = (a+b)(a+b')(a'+c)(a'+c')$ then its tree-like refutation is: Following the demonstration of the last part of the Lemma 3 (in bold), $[b\rightarrow 1]\Gamma=(1)(a)(a'+c)(a'+c')$, and adding the literal $b'$ where $[b\rightarrow 1]$ has removed it, we get $\Gamma' = (1)(a+b')(a'+c)(a'+c')$. In accordance the paper the tree-like refutation of $\Gamma'$ is a proof for $\Gamma \vdash b'$. According to the paper, similarly, for $[b\rightarrow 0]$, $[b\rightarrow 0]\Gamma = (a)(a'+c)(a'+c')$, and adding the literal $b$ where $[b\rightarrow 0]$ has removed it, we get $(a+b)(a'+c)(a'+c')$. My questions are,

1) Is there any difference between these Strahlers? for me there is not any difference, but why the author consider the function $\max$?

2) From the demonstration,

Adding a cut of $x$ to these two proofs, we get a proof of $\Gamma \vdash \Box$.

Is it a rule of sequente calculus, if yes, following wikipedia Could you help to identify who are $\Sigma$, $\Pi$, $\Delta$ and $\Sigma$?

What means cut of $x$?

Lemma 3 The space satisfies the following three properties:

1. $s(\Gamma \cup \{\Box\})$ = 0
2. For any unsatisfiable formula $\Gamma$, and any partial truth assignment $\phi$, we have $s(\phi(\Gamma))\leq s(\Gamma)$.
3. For any unsatisfiable formula $\Gamma$, if $\Box\notin\Gamma$, then there exists a variable $x$ and an assignment $\phi\colon\{x\}\to\{0,1\}$, such that $s(\phi(\Gamma))\leq s(\Gamma)-1$.

The space of a formula is the minimum measure on formulas that satisfy (1), (2) and (3). In other words, we could define the space as:3

$$s(\Gamma) = \min_{x, \overline{x}\in\Gamma, b\in\{0,1\}} \big\{ \max\{s([x\mapsto b](\Gamma))+1, s([x\mapsto\overline{b}](\Gamma))\}\;\big\}$$ when $\Box\notin\Gamma$, and $s(\Gamma\cup\{\Box\}) = 0$.

You have asked (at least) two questions. The answer to your first question is that sometimes it is much easier to refute a formula given that $x=0$ compared to given $x=1$. Hence the $\max$. In other cases, $x=1$ is easier. That's why we need to go over both possible values of $b$. Finally, the reason why the formula in part 3 of the lemma also maximizes over the variable $x$ is that $x$ represents the last variable which is cut (the one at the root of the tree).
The answer to your second question is as follows. The cut rule allows deriving $a \lor b$ given $a \lor x$ and $b \lor \bar{x}$. Here $a,b$ are arbitrary formulas, in this context clauses. In particular, from $x$ and $\bar{x}$ you can conclude contradiction. The cut rule is the only rule in the resolution proof system.