World's most popular travel blog for travel bloggers.

# [Solved]: How do you go about designing a vector processor architecture for the sum of matrix products?

, ,
Problem Detail:

The following equation is a matrix expression where $B_i$ and $C_i^T$ are $n\times n$ matrices and k is a positive integer:

$$P = \sum_{i=1}^k B_i C_i^T$$

So $P = B_1 C_1^T + B_2 C_2^T + \cdots +B_k C_k^T$

If $B_i$ and $C_i$ are $n\times n$ matrices themselves, we have a total of 2 $\times$ k matrices that some how need to be stored in this vector architecture.

So this means P will end up being an $n\times n$ matrix after all the computation has completed.

What is the simplest possible vector processor architecture that is required to perform the matrix computation above?

Is there any literature or articles out there that discuss how this can be done?

Would appreciate all / any advise

From talking to the OP, algorithms that handle parallel matrix multiplication are apparently acceptable answers. Matrix multiplication and parallel algorithms for it are a highly studied problem in CS partly because of its widespread application e.g. in scientific computing. There are other ways to parallelize the problem given in the question, e.g. an obvious "map-reduce" that maps the separate matrices to separate processors and the reduce step does the addition.

Note also the new "scicomp" stackexchange for questions on scientific computing.