The problem of predicting the 3-D structure of a protein from its amino acid sequence using computer algorithms has challenged scientists for nearly a half century. The structure of a protein is essential for understanding its function, and hence accurate structure prediction is of vital importance in modern applications such as protein design in biomedicine. A powerful approach for structure prediction is to search for the conformation of the protein that has minimum potential energy. However due to the size of the conformational space, efficient exploration remains a bottleneck for energy-guided computational methods even with the aid of known structures in the Protein Data Bank.
In this talk, I will first introduce this large-scale exploration problem from the perspective of data science. Then, I will present a new method for building segments of protein structures that is inspired by sequential Monte Carlo and enables faster exploration than existing methods. Finally, we apply the method to examples of real proteins and demonstrate its promise for improving the low confidence segments of 3-D structure predictions.