We propose a self-consistent approach to analyze
knowledge-based atom–atom potentials used to calculate
protein–ligand binding energies. Ligands complexed
to actual protein structures were first built using the
SMoG growth procedure (DeWitte & Shakhnovich, 1996)
with a chosen input potential. These model protein–ligand
complexes were used to construct databases from which knowledge-based
protein–ligand potentials were derived. We then tested
several different modifications to such potentials and
evaluated their performance on their ability to reconstruct
the input potential using the statistical information available
from a database composed of model complexes. Our data indicate
that the most significant improvement resulted from properly
accounting for the following key issues when estimating
the reference state: (1) the presence of significant nonenergetic
effects that influence the contact frequencies and (2)
the presence of correlations in contact patterns due to
chemical structure. The most successful procedure was applied
to derive an atom–atom potential for real protein–ligand
complexes. Despite the simplicity of the model (pairwise
contact potential with a single interaction distance),
the derived binding free energies showed a statistically
significant correlation (∼0.65) with experimental binding
scores for a diverse set of complexes.