Visualization of Chemical Databases Using the Singular Value Decomposition and Truncated-Newton Minimization

We describe a rapid algorithm for visualizing large chemical databases in a low-dimensional space (2D or 3D) as a first step in chemical database analyses and drug design applications. The compounds in the database are described as vectors in the hight-dimensional space of chemical descriptors. The algorithm is based on the singular value decomposition (SVD) combined with a minimization procedure implemented with the efficient truncated-Newton program package (TNPACK). Numerical experiments show that the algorithm achieves an accuracy in 2D for scaled datasets of around 30 to 46%, reflecting the percentage of pairwise distance segments that lie within 10% of original distance values. The low percentages can be made close to 100% with projections onto a ten-dimensional space. The 2D and 3D projections, in particular, can be efficiently generated and easily visalized and analyzed with respect to clustering patterns of the compounds.

Click to go back to the publication list