Friday, February 16, 2018

A novel method for discovering anti-aging oligopeptides

After a recent presentation at the Rogue Hack Lab by a bright young man named Luke Bryan on support vector machines I found my mind wandering down interesting new corridors while asking myself how these SVMs could be used in a novel fashion.

It occurred to me that a cascading stack of these could be used to bifurcate any data with an inherent vector or bias into subgroups/subsequences in an efficient spectrum fashion.

It popped into my mind that one prominent inherent bias in genomic information is the life-span duration of species. It is a simple matter to arrange a collection of peptide coding genomic data along this spectrum of life-span lengths and train a cascade of SVMs to discern where along this spectrum any block of DNA would fall.

To move this towards actual discovery of novel oligopeptides one would then just need to train an LSTM recurrent neural net on samples of DNA sequences from along this spectrum in order to generate novel oligopeptide which share the probabilistic characters of the original spectrum. The resultant outputs when passed through the above trained SVMs would be scored by where they fell in the spectrum. Any peptides sequences which scored past the longest lived organisms would by definition be from the informational sampling space of an organism with super-spectral lifespan, even if the hypothetical organism doesn't actually exist.

Here's a graphic to help demonstrate the method:
Edit (2/20/2018) interesting supportive work using a related but different method drawn from GAN construction: Generating and designing DNA with deep generative models