Somebody asked how one may count the number of floating point operations in a MATLAB program.
Prior to version 6, one used to be able to do this with the command flops
, but this command is no longer available with the newer versions of MATLAB.
flops
is a relic from the LINPACK days of MATLAB (LINPACK has since been replaced by LAPACK). With the use of LAPACK in MATLAB, it will be more approrpiate to use tic
andtoc
to count elapsed CPU time instead (cf. tic
,toc
).
If you're interested to know why flops
is obsolete, you may wish to read the exchanges in NA digest regarding flops
.
Nevertheless, if you feel that you really do need a command to count floating point operations in MATLAB, what you can do is to install Tom Minka's Lightspeed MATLAB toolbox and use the flops counting operations therein.
@cise.ufl.edu>
@cise.ufl.edu>
To count flops, we need to first know what they are. What is a flop?
LAPACK is not the only place where the question "what is a flop?" is
relevant. Sparse matrix codes are another. Multifrontal and supernodal
factorization algorithms store L and U (and intermediate submatrices, for
the multifrontal method) as a set of dense submatrices. It's more
efficient that way, since the dense BLAS can be used within the dense
submatrices. It is often better explicitly store some of the numerical
zeros, so that one ends up with fewer frontal matrices or supernodes.
So what happens when I compute zero times zero plus zero? Is that a flop
(or two flops)? I computed it, so one could argue that it counts. But it
was useless, so one could argue that it shouldn't count. Computing it
allowed me to use more BLAS-3, so I get a faster algorithm that happens to
do some useless flops. How do I compare the "mflop rate" of two
algorithms that make different decisions on what flops to perform and
which of those to include in the "flop count"?
A somewhat better measure would be to compare the two algorithms based an
external count. For example, the "true" flop counts for sparse LU
factorization can be computed in Matlab from the pattern of L and U as:
[L,U,P] = lu (A) ;
Lnz = full (sum (spones (L))) - 1 ; % off diagonal nz in cols of L
Unz = full (sum (spones (U')))' - 1 ; % off diagonal nz in rows of U
flops = 2*Lnz*Unz + sum (Lnz) ;
The same can be done on the LU factors found by any other factorization
code. This does count a few spurious flops, namely the computation a_ij +
l_ik*u_kj is always counted as two flops, even if a_ij is initially zero.
However, even with this "better" measure, the algorithm that does more
flops can be much faster. You're better off picking the algorithm with
the smallest memory space requirements (which is not always the smallest
nnz (L+U)) and/or fastest run time.
So my vote is to either leave out the the flop count, or at most return a
reasonable agreed-upon estimate (like the "true flop count" for LU, above)
that is somewhat independent of algorithmic details. Matrix multiply, for
example, should report 2*n^3, as Cleve states in his Winter 2000
newsletter, even though "better" methods with fewer flops (Strassen's
method) are available.
Tim Davis
University of Florida
davis@cise.ufl.edu@cise.ufl.edu>