Skip to content

Add R-Bidiagonalization step to BDCSVD

What does this implement/fix?

This adds an R-Bidiagonalization step to BDCSVD. (mentioned in section 8.6.3 of the Golub and Van Loan "Matrix Computations" book.)

  • When the input matrix A is sufficiently tall (or wide), this first computes the QR decomposition of A (or A*). It then just runs the normal bidiagonalization/svd routines on the top rows of R. Finally, it has to apply Q on the left of U (or V) to get the actual SVD of A = Q(R/0) = Q(USV^T/0).
  • Optimization for tall matrices since less work is done in bidiagonalization.

As far as I can tell, LAPACK always does this in dgesdd when there's sufficient workspace. I'm not sure if adding additional workspace here is acceptable, but this mr shouldn't add much more. A lot of the new workspace needed for the QR decomposition should be cancelled out because UpperBidiagonalization uses less.

Looked like an easy win, so I thought I'd try it out. (There was also some recent discussion on Discord where people were using BDCSVD on super tall matrices, so I think this meets a real need.)

Additional information

I think it's working pretty well. Passes the test suite and I'm pleased with the speedup I'm seeing.

These are from some informal benchmarks:

  • Run on an AMD EPYC 7742, with gcc-10.2.0, -O3 -march=core-avx2
  • show results for computing both Thin U and V and just computhing Thin V, for double and complex.
Small-ish
Benchmark                                                    Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------
bdcsvd_double_computeThinUV/38/38                         -0.0150         -0.0146        560730        552343        558549        550367
bdcsvd_double_computeThinUV/38/138                        -0.0311         -0.0317        665697        644985        663199        642201
bdcsvd_double_computeThinUV/38/238                        -0.0851         -0.0852        783044        716412        780009        713560
bdcsvd_double_computeThinUV/38/338                        -0.0897         -0.0905        884546        805231        881206        801484
bdcsvd_double_computeThinUV/38/438                        -0.0891         -0.0896        969086        882702        965227        878719
bdcsvd_double_computeThinUV/38/538                        -0.1708         -0.1712       1118555        927540       1114049        923328
bdcsvd_double_computeThinUV/38/638                        -0.1533         -0.1534       1213303       1027309       1208212       1022824
bdcsvd_double_computeThinUV/38/738                        -0.2031         -0.2033       1334411       1063345       1329151       1058981
bdcsvd_double_computeThinUV/38/838                        -0.1994         -0.1998       1430272       1145130       1424248       1139691
bdcsvd_double_computeThinV/38/38                          +0.0027         +0.0025        495862        497220        494012        495256
bdcsvd_double_computeThinV/38/138                         +0.0156         +0.0153        526925        535143        524829        532842
bdcsvd_double_computeThinV/38/238                         -0.0104         -0.0108        564038        558148        562120        556069
bdcsvd_double_computeThinV/38/338                         -0.0942         -0.0950        654858        593161        652641        590671
bdcsvd_double_computeThinV/38/438                         -0.1722         -0.1726        746768        618174        743960        615546
bdcsvd_double_computeThinV/38/538                         -0.1859         -0.1867        780695        635544        777907        632657
bdcsvd_double_computeThinV/38/638                         -0.2150         -0.2149        842719        661564        839380        658989
bdcsvd_double_computeThinV/38/738                         -0.2572         -0.2572        926010        687850        922064        684915
bdcsvd_double_computeThinV/38/838                         -0.2878         -0.2876        996698        709868        992685        707202
Larger

Benchmark                                                      Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------------------------
bdcsvd_double_computeThinUV/500/500                         -0.0223         -0.0218     120934799     118241344     120206450     117587615
bdcsvd_double_computeThinUV/500/1000                        -0.0781         -0.0768     168039073     154912570     166927595     154101147
bdcsvd_double_computeThinUV/500/2000                        -0.0841         -0.0839     252251583     231030995     250837282     229786319
bdcsvd_double_computeThinUV/500/3000                        -0.2933         -0.2931     354488660     250504511     352466612     249157515
bdcsvd_double_computeThinUV/500/4000                        -0.3849         -0.3845     498082335     306358967     495251025     304808607
bdcsvd_double_computeThinUV/500/5000                        -0.4102         -0.4101     611187632     360497410     607805405     358562461
bdcsvd_double_computeThinUV/500/6000                        -0.5858         -0.5855     993030518     411281976     986559546     408897937
bdcsvd_double_computeThinV/500/500                          +0.0023         +0.0025      96144720      96369521      95591914      95829350
bdcsvd_double_computeThinV/500/1000                         -0.0108         -0.0104     125544195     124185888     124790609     123487358
bdcsvd_double_computeThinV/500/2000                         +0.0516         +0.0515     169231449     177963941     168368397     177041849
bdcsvd_double_computeThinV/500/3000                         -0.3211         -0.3211     220459022     149679402     219407709     148955709
bdcsvd_double_computeThinV/500/4000                         -0.3434         -0.3435     292084731     191771479     290552164     190751061
bdcsvd_double_computeThinV/500/5000                         -0.4782         -0.4776     404396172     211012044     401878571     209961360
bdcsvd_double_computeThinV/500/6000                         -0.6294         -0.6291     598237028     221678815     594283542     220415778
bdcsvd_complex_double_computeThinUV/500/500                 +0.0916         +0.0897     255520963     278933399     254162600     276962257
bdcsvd_complex_double_computeThinUV/500/1000                -0.0046         -0.0063     470148880     467963802     467394496     464450050
bdcsvd_complex_double_computeThinUV/500/2000                +0.0331         +0.0314     917793280     948137549     912521721     941157706
bdcsvd_complex_double_computeThinUV/500/3000                -0.2198         -0.2211    1360012279    1061090288    1351584140    1052804356
bdcsvd_complex_double_computeThinUV/500/4000                -0.3050         -0.3053    1915540819    1331249650    1903755375    1322459094
bdcsvd_complex_double_computeThinUV/500/5000                -0.2804         -0.2804    2421978472    1742944820    2405996094    1731417102
bdcsvd_complex_double_computeThinUV/500/6000                -0.3471         -0.3468    2871235454    1874670684    2851029992    1862315016
bdcsvd_complex_double_computeThinV/500/500                  -0.0028         -0.0027     197499887     196955268     196285795     195757475
bdcsvd_complex_double_computeThinV/500/1000                 -0.0236         -0.0235     318554734     311042543     316618729     309192952
bdcsvd_complex_double_computeThinV/500/2000                 -0.0431         -0.0423     567077441     542655686     563321471     539468600
bdcsvd_complex_double_computeThinV/500/3000                 -0.4819         -0.4812     915616107     474385083     909423431     471792729
bdcsvd_complex_double_computeThinV/500/4000                 -0.4868         -0.4862    1144582829     587365450    1137211810     584304876
bdcsvd_complex_double_computeThinV/500/5000                 -0.5118         -0.5114    1436150418     701125911    1427595270     697484811
bdcsvd_complex_double_computeThinV/500/6000                 -0.5330         -0.5328    1733322684     809429173    1722951672     805023436
OVERALL_GEOMEAN                                             -0.2741         -0.2740             0             0             0             0
Very Tall
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
bdcsvd_double_computeThinUV/32/100000                          -0.3170         -0.3175     226529586     154717160     225472766     153888702
bdcsvd_double_computeThinUV/264/100000                         -0.6096         -0.6095    5149984761    2010317098    5119070612    1998970140
bdcsvd_double_computeThinUV/1024/100000                        -0.6538         -0.6530   59780980101   20698847700   59313139187   20578710233
bdcsvd_double_computeThinV/32/100000                           -0.2606         -0.2609     258735994     191316443     257246119     190123143
bdcsvd_double_computeThinV/264/100000                          -0.6068         -0.6065    5527338202    2173494907    5493199942    2161352169
bdcsvd_double_computeThinV/1024/100000                         -0.6419         -0.6411   60740516522   21750376091   60278909057   21633052616
bdcsvd_complex_double_computeThinUV/32/100000                  -0.2993         -0.2989     636792072     446202903     633206937     443951859
bdcsvd_complex_double_computeThinUV/264/100000                 -0.4889         -0.4878   18326362603    9366890170   18194148881    9318136331
bdcsvd_complex_double_computeThinUV/1024/100000                -0.4089         -0.4083  195544107745  115591737930  194190302999  114900596242
bdcsvd_complex_double_computeThinV/32/100000                   -0.4411         -0.4417     368514577     205950727     366569465     204654500
bdcsvd_complex_double_computeThinV/264/100000                  -0.6587         -0.6583   12379953027    4225803248   12292426264    4200267389
bdcsvd_complex_double_computeThinV/1024/100000                 -0.6676         -0.6669  125078044965   41579651579  124086840510   41338883708
OVERALL_GEOMEAN                                                -0.5259         -0.5255             7             3             7             3

Merge request reports

Loading