A watershed moment for protein structure prediction

DrScientist · on Jan 20, 2020

As the summary says, the basic approach ( evolutionary data -> position coupling -> to distance restraints - into structure solver ) is actually quite old, with the key paper back in 2011.

In fact, the distance constraints to 3D structure part is in fact very old - I was calculating structures from experimentally determined distances 30 years ago. You need a surprisingly low number of fairly weak ( these atoms are between 3 and 5 Angstroms ) distances to determine a 3D structure if you have a decent number of long range ones.

What they have done is execute better.

However the problem they are working on

sequence -> structure

though it's been a long term 'holy grail', practically it's not that useful!

The models typically aren't quite good enough, it's not predicting interactions, and experimental methods to determine structures have also moved on in leaps and bounds.

As the article briefly mentions what you really want to do is go the other way.

Designed novel structure -> protein sequence to make it.

One way to do that if you have a function going the other way ( like alphafold ( let's ignore limitations for now - ie does a knowledge based approach work well for completely novel folds ? ) ), is some sort of heuristic search - however the search space is huge and a step size of hours isn't going to cut it.

thaumasiotes · on Jan 20, 2020

> However the problem they are working on

> sequence -> structure

I recall reading about a result a couple of years ago supposedly demonstrating that "synonymous" DNA codons were in fact not synonymous, because the ribosome took systematically different amounts of time to process them, and the difference in construction time resulted in different folding for the protein.

This would imply that the problem "sequence -> structure" is not well defined, at least if the sequence in question is the sequence of peptides making up the protein and not the sequence of codons making up the gene that codes for the protein.

Do you know anything about this? Am I just making it up?

hobofan · on Jan 20, 2020

The phenomenon you described does exist[0], and similar effects (though rarely as severe) in cotranslational folding have been known for some time now. AFAIK it's not studied that intensely, as it's hard to study experimentally, and it is not expected to have a significant effect on final protein folding. During the folding process proteins often undergo partial misfolds, and defold again (the protein is basically doing a gradient descent search for the lowest energy state), so a small misfold at the beginning of the the translation process will rarely make it into the final fold.

Apart from protein folding there are other interfering folding mechanisms like chaperone proteins that also make it hard to phrase it as a straight "sequence -> structure" prediction problem, though they often only exist for a small number of the total proteins (in humans).

[0]: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737824/

dekhn · on Jan 20, 2020

I think it's much more likely that the different codon usage leads to different rates of synthesis, not differently folded proteins. But that's a complex area. Many proteins do not fold to their native structure spontaneously, and there are other proteins that refold them "correctly".

It's not clear to me that you could truly demonstrate substantially different folding due to codon usage like that, on an experimental level, to make a general statement about all proteins.

thaumasiotes · on Jan 20, 2020

> Many proteins do not fold to their native structure spontaneously, and there are other proteins that refold them "correctly".

This would also seem to imply that "sequence -> structure" isn't quite the right problem.

dekhn · on Jan 20, 2020

well, not sure what you mean by "Sequence->structure". Historically, people have used the fact that small globular proteins refold spontaneously and rapidly to their native state to support the idea that there is a single, unique structure encoded by a specific sequence. That's a helpful if ultimately limited approach (as we observe many proteins that don't fold rapidly to single native structure).

That's a reason that evolution-based methods, which use statistics about families of related proteins to estimate distances between pairs of amino acids (in 3D space), are more effective- many times in biology we can use evolutionary relationships between proteins to infer things that would be hard to determine through experiments or rigorous, thorough simulations.

But it's important to appreciate there are a large number of proteins that don't fold to a single unique structure rapidly-- and there are many ways this can be the case and many different biologically relevant behaviors depend on these properties. The tools from CASP are much less useful for proteins that violate the assumptions of Anfinsen's dogma, although the evolutionary data is helpful there too, it can often be a lot more challenging to deconvolute the signal.

Ultimately, "what is the right problem"? THe one that makes the most money? Produces the most "useful" scientific result? Is accessible with today's technology? For now, there's plenty of value in these sorts of competitions.

Personally I think the "right problem" is: "given a collection of diseases, use experimentally derived data and clever math, to discover biological treatments that reduce the total suffering from those diseases, subject to monetary and ethical constraints". That's what pharma attempts to do, although not particularly well. Others might say simply solving interesting problems like protein folding is inherently valuable as the right problem.

gewa · on Jan 20, 2020

It's true, that different codons are transcribed with different speed but I've never heard that this would lead to differences in folding. Speed of translation is nevertheless used in order to signal shortage of amino acids or to implement modification.

When you're doing protein expression, it's a standard procedure to do a codon optimization in order to adapt the codons to your host expression system. When you're expressing a human protein in yeast or ecoli the codons will be quite different but the folding is the same. If translation wouldn't show this stability it would be difficult to use foreign expression systems at all.

DrScientist · on Jan 21, 2020

It's biology - so there are exceptions and oddities everywhere.

Imagine software that evolved by trial and error rather than being designed - it's not 'clean'... there are no absolutes.

It's well known that codons effect translation rate and translation rate could/can effect the kinetic pathway of folding.

When a protein folds ( even away from the ribosome ) it isn't in isolation - it's in a solution - being bombarded by Brownian motion of other molecules.

It's amazing you are alive at all really - it also suggests most proteins would need the functional state to be strongly energetically preferred.

So there are potentially lots of things beyond just the protein sequence that could be important variables.

However some proteins will reversibly fold/unfold in simple solution all day long - so it makes sense to start there.

kanzenryu2 · on Jan 21, 2020

Is the bit by bit construction of the protein an issue? I mean, surely the first half would have already folded into some shape by the time the second half was be added on? And then the rest of it would not necessarily fold the same way as if you considered the entire completed protein starting to fold from scratch.

bordercases · on Jan 20, 2020

> however the search space is huge and a step size of hours isn't going to cut it.

What's the rush?

choeger · on Jan 20, 2020

I am not an ML engineer (except when I program in ML, of course ;)). But this sounds a lot like the following:

1. We had a model that worked in principle, but the search space was practically infeasible.

2. We made an observation that a different model might exist that makes the search space irrelevant.

3. We threw ML at it.

4. Now we might have a model that fulfills (2) but we cannot be sure because we used a black-box approach.

5. Somehow the results are exciting. Better results would be really exciting.

6. We hope that more data yields these better results.

Is that correct? Am I the only one to lament these black-box approaches? Should there not be a bunch of people now studying the learned models to figure out if much better results can actually exist?

gumby · on Jan 20, 2020

Think of it as an engineering problem rather than a science problem.

In much of drug discovery/development (and disease research), being able to predict protein structure would be very valuable. Being able to quickly find candidate structures (that can then be searched for in the lab) speeds things up immensely. Reducing false positives (or just coming up with possibilities at all) is a huge win.

But you’re right that this probably doesn’t help the protein theoretician much if at all. We already “know” how it works (it’s just thermodynamics and quantum mechanics) and of course have no idea how it works (“well they wiggle around until they find a low energy state” doesn’t really tell you anything). But that doesn’t keep this from being exciting.

irq11 · on Jan 20, 2020

Protein structure prediction, at the current levels of precision (and I include AlphaFold here), is not useful for drug discovery.

It’s the sort of thing researchers say to get grants, but as a distant goal, not a practical reality.

For structure-based drug discovery (which isn’t even the majority of drug discovery), the details are what matter (e.g. “does this water molecule mediate a binding interaction, or do the sidechains shuffle a bit, and kick the water out?”), and these methods don’t even come close to predicting detailed interactions.

Metrics in this space are focused on “general correctness” of protein backbone conformation. Success is to achieve a kind of blurry view of the overall shape of the molecule, and drug design is trying to predict specific atomic interactions. They’re two wildly different problems.

About the best you can say is that if we had a generalizable model of physics that could predict protein structure, it might also be able to do a good job of evaluating how a small molecule binds to a protein target. But even that is a huge leap, and when you start using black-box methods like AlphaFold to specifically solve the problem of structure prediction, it’s not really clear that generalization is even possible.

There are potential practical uses in drug discovery for a method that can design a protein which takes a particular shape, but even that is pretty different from what AlphaFold actually does.

lucidrains · on Jan 20, 2020

Not everyone is focused on solving folding. It's just one step of many. https://www.nature.com/articles/s41592-019-0666-6

https://github.com/yangkky/Machine-learning-for-proteins

irq11 · on Jan 20, 2020

I work in the field.

There are indeed many different applications of ML to drug discovery, but AlphaFold is a niche technique, and protein structure prediction is not practically useful, in general.

lucidrains · on Jan 20, 2020

I see. My point is that all these efforts are really validating deep learning as an approach for solving previously intractable biophysics problems. Science is a series of stepping stones.

irq11 · on Jan 20, 2020

Yes, again, I work in the field.

There are many areas where ML is being used to advance drug discovery and other kinds of practical, difficult biophysics problems. Casual readers would do well not to get too fixated on this particular area.

lucidrains · on Jan 20, 2020

Your expertise is acknowledged. We are in agreement

irq11 · on Jan 20, 2020

Ha...I wasn’t repeating it to make some kind of assertion of dominance, just to explain that I’m aware of the point you’re making, and that it’s related to, but different from my own.

lucidrains · on Jan 20, 2020

I see, no worries then :)

ArtWomb · on Jan 20, 2020

>>> Am I the only one to lament these black-box approaches?

Far from it. Prediction looks like a tool in the arsenal for better understanding. One still has to correlate the structure with the complex interactions in vivo. Even using AI in classification mode, where we can segment a large atlas of tumor cells and identify a dozen or so classes of cell anomalies may lead to faster breakthroughs in immunotherapy.

What I am trying to wrap my head around is the synthesis problem. Say AlphaFold generates a promising candidate. One that does not exist naturally. You still need the DNA or mRNA transcription sequence to synthesize the protein, right? Won't some candidates simply be too complex and unstable to reliably produce using existing mammalian or baculovirus platforms?

ovi256 · on Jan 20, 2020

>Won't some candidates simply be too complex and unstable to reliably produce using existing mammalian or baculovirus platforms?

You can add that to the objecting function that your model training function is optimizing, ensuring model output is not "too complex or unstable to reliably produce".

RocketSyntax · on Jan 20, 2020

You can use permuted feature importance to tell you what inputs are most correlated with the condition. WAYYYY better than the binary correlation methods most statistics use for each feature.

Damogran6 · on Jan 20, 2020

This is far from the only time this has happened, though...Many moons ago I was required to take a class in fluid dynamics in College. It was all observational best-fit statistical estimation that kinda modeled observed behavior.

ML is that with significantly more horsepower.

semi-extrinsic · on Jan 20, 2020

Either it was a poorly taught class, it was aimed at civil engineering and similar where you want simple models with broad applicability and don't care about the wide error bars, or maybe you remember best the observational stuff.

But fluid mechanics has a very deep theoretical underpinning, and generally has interpretable models. I'd suggest skimming through a copy of either Lamb or Batchelor's book, to see how far you can get just with pen and pencil and no statistical input at all.

Damogran6 · on Jan 20, 2020

It WAS for Civil Engineering and it was 30 years ago. FEA was JUST starting to be a thing and it sure didn't reach down into anything I encountered.

looking back through the google, I may be dimly remembering the calculations dealing with turbulent flow. It was a whole different experience to a College Junior that was used to the relatively simple equations from Physics, Statics and Dynamics.

semi-extrinsic · on Jan 20, 2020

Sounds like the standard way to teach fluids for civ.eng. All the flows you'll ever encounter will be turbulent, no point in learning about all the beautiful results that are mainly for laminar flows.

The books I mentioned are classics in the field and were first published in 1895 and 1967, respectively. Both are still in print. No computers, just advanced math (vector calculus etc).

ramraj07 · on Jan 20, 2020

Unlike in other cases, with protein structure, we really only care about the structure, most of the time no one cares how the chain folded to get there. ML based methods seem perfect for that.

nestorD · on Jan 20, 2020

Not an expert but I have read a lot on the subject recently.

Methods from the domain where already not truly model-based (compared to things like physical equations). It is mostly observations on existing proteins coupled with a form of gradient descent so, for this particular application, things have not degraded. (I am aware that there is more to it, this is just a fast summary)

But to be honest, it is far from a solved problem and I expect more breakthroughs from deep understanding and modeling than ML.

hobofan · on Jan 20, 2020

> I expect more breakthroughs from deep understanding and modeling than ML

Huh, I would've said exactly the opposite. The feature space of the variations of amino acid sequences is so big that I wouldn't bet on breakthroughs derived from understanding. Most recent advances seem to focus on how specific sequence motifs interact with each other, which are generally only applicable to certain kinds of proteins, but not protein folding as a whole.

My prediction would be that next breakthroughs that push the field over the usability limit are derived from more general machine learning advances.

nestorD · on Jan 20, 2020

I think deep learning can only get us so far with the existing, limited, data. I doubt it will be enough for precise structure reconstruction (but I would love to be wrong).

What it can be is a force multiplier for every progress in our understanding of why those sequences are transcribed into those structures.

But I have to admit that, with years of head-on, the research has not gotten there so I might be wrong.

marcosdumay · on Jan 20, 2020

> Now we might have a model that fulfills (2) but we cannot be sure because we used a black-box approach.

It's not like the alternative is any different. Being sure you have the right answer is not viable, the other methods of getting there are just more transparent heuristics.

neltnerb · on Jan 20, 2020

3a. We took the best results from ML and did full quantum mechanical simulations on the much smaller set of remaining candidates.

not_qe · on Jan 20, 2020

I have seen the word “watershed” and immediately guessed some it’s some biased unverifiable claptrap from google.

AlexCoventry · on Jan 20, 2020

They provide code and data for full verification. https://github.com/deepmind/deepmind-research/blob/master/al...

not_qe · on Jan 20, 2020

Will they provide thousands of tensor cpus as well

mda · on Jan 20, 2020

Except that they won a competition for protein folding by a huge margin as explained. Such claptrap indeed.

not_qe · on Jan 20, 2020

Review the casps13 results pages and then tell about “huge margin”

mda · on Jan 21, 2020

http://predictioncenter.org/casp13/zscores_final.cgi?formula...

not_qe · on Jan 21, 2020

First place z score sum 54 vs second place 42 and in a single category. The only thing watershed here is the headline dishonesty.

mda · on Jan 22, 2020

You think 54 vs 42 is a small difference in this area?

stopads · on Jan 20, 2020

This is 90% of science these days, just keep tweaking numbers in the model until it fits with what you want to see. The why and the how is not really something we have the capability or tooling to tackle.

I got bored with physics really quickly when I realized this is what everyone does.

coldtea · on Jan 20, 2020

>This is 90% of science these days, just keep tweaking numbers in the model until it fits with what you want to see.

If the result is immediately applicable to real world problems, like in this case, that's a perfectly valid approach.

In this case we just need the model to produce answers we can use -- how we made the model, what's in it, etc, we don't need to care.

not_qe · on Jan 20, 2020

Did you read the end of the paper.

mlthoughts2018 · on Jan 20, 2020

What do you mean by “black box” model? How do you know when a model “explains” something or not? How do you know if you are missing a key feature or degree of freedom in a model? Suppose you have two models, A and B, where A gives much poorer predictions than B. Is it possible for A to “explain” things better when it is not capable of producing adequate predictions? Since nature itself is not boiled down to logical elements like math proofs, what about the inherent ambiguity of language or mental models when assessing explainability? Did we really explain anything, or just decompose a problem into units of thought that artificially feel compelling for human minds at this moment of history? What would distinguish a “real explanation” from a convenient fiction, if not purely predictive capability?

brmgb · on Jan 20, 2020

Black box model is a perfectly well understood term of art.

It means a model which has a somewhat opaque internal working. A lot of modern ML approaches treat the model as a black box in that it is not particularly clear how the features actually interact to reach the prediction.

mlthoughts2018 · on Jan 20, 2020

I work professionally in machine learning for over ten years now and I deeply dispute what you say, and many other practitioners do as well.

> “not particularly clear how the features actually interact to reach the prediction.“

This is quite true of many models, such as linear regression in the wild. You may also have a clear but wrong picture of how features interact, eg looking at coefficients of interaction terms in a misspecified linear model.

See for example, “The Mythos of Model Interpretability”

- https://arxiv.org/abs/1606.03490

Without a coherent definition of what “not black box” or “explainable” means scientifically, the buzzword of “black box” is also meaningless and is more of a political game to be a gatekeeper over what models are allowed to be used than any kind of honest intellectual inquiry.

I’ve worked professionally on systems where misspecified linear models and text models using simple ngram boosting were vastly more inscrutable than comparable neural networks or dimensionality reduction models for the same applications.

Nobody has any scientifically cogent idea of what makes a model “explainable” other than arguing semantics.

perl4ever · on Jan 22, 2020

It may be because I'm not an expert that I think I understand, but the gist of making a model explainable sounds straightforward from what I've read. You train a model that does who knows what, and then you define a very limited language and use ML to match the first model as well as possible. If the language is simple enough, humans can understand it, and according to what I read, it likely generalizes better than the original model as a bonus. I assume it's harder than it sounds though.

Quote:

"1. There’s a tiny functional language based on a small number of side-effect free combinators

2. For a given task, a program template (which the authors call a sketch), further constrains the set of programs that can be learned for the problem in hand. This also very handily constrains the search space of course, helping to make learning a suitable policy program tractable.

3. To help guide the search within the set of programs conforming to the sketch, a standard reinforcement learning algorithm is used to learn a (black box) policy.

4. The black box policy is used as an oracle (the Neural Policy Oracle), and a neurally directed program search (NDPS) tries to find the sketch-conforming program that behaves as closely to the oracle as possible."

The paper described:

http://proceedings.mlr.press/v80/verma18a/verma18a.pdf

...this seems kind of similar to the recent success in solving mathematical equations using deep learning. The language translation paradigm seems to have a lot more potential than some realized.

mlthoughts2018 · on Jan 24, 2020

Yes, you are oversimplifying what is required to make a model explainable. The logical mechanism of the functional form of the model has very little to do with explainability. Explainability involves various kinds of model checking, overfitting analysis, tests for confounding variables, multicollinearity, interaction effects, model misspecification, etc.

Merely decomposing a prediction into say a linear combination of predictors does not, itself, provide any type of explanation unless a variety of more complicated assumptions about statistical model checking turn out to be true.

Even worse, because the naive approach is to just treat those regression coefficients as if they do automatically give explanatory power or feature-wise attribution, people misrepresent things unwittingly and don’t carry out enough robust model checking, leading to wrong explanations that appear to have deceptive degrees of confidence associated with them.

perl4ever · on Jan 25, 2020

"The logical mechanism of the functional form of the model has very little to do with explainability"

I'm not sure we are talking about the same thing, or which of the three sources in my comment you are reacting to.

My understanding of "explainability" in this context is that it means "humans can understand why the model gives a certain prediction".

It seems like you are using it to mean "humans can understand why the thing modeled does something (presumably by referring to the model)". That isn't what I thought people meant by the term, nor does it seem like a reasonable goal to me.

Do we agree these are distinct ideas? Feel free to elaborate on where I've gone wrong.

natechols · on Jan 20, 2020

The fact that the software in question is not publicly available or runnable certainly makes it a "black box", regardless of whether the underlying models are human-interpretable. EDIT: my apologies, apparently it is available: https://github.com/deepmind/deepmind-research/tree/master/al... I must have gotten mixed this up with a different black-box-advertised-in-scientific-journal story.

SQueeeeeL · on Jan 20, 2020

>Nobody has any scientifically cogent idea of what makes a model “explainable” other than arguing semantics.

Bullshit, any client you show a regression to can easily fully understand how the inputs change the output.

ovi256 · on Jan 20, 2020

That's only true in some limited cases.

Section 4.1 of the paper he linked discusses extensively the limitations of linear model interpretability. Here's an example:

"With respect to algorithmic transparency, [the claim that linear models are more interpretable] seems uncontroversial, but given high dimensional or heavily engineered features, linear models lose simulatability or decomposability, respectively."

Given how many interesting problems are high-dimensional or solved through heavy feature engineering, you must be lucky to live in the happy space where problems are either low-dimensional enough to not need heavy feature engineering or where the client is already familiar with the high-dimensional features you're going to limit the model to using.

mlthoughts2018 · on Jan 24, 2020

I call bullshit on your comment, not mine.

- http://www.saramitchell.org/achen04.pdf

The functional mechanism of a regression has nearly nothing to do with whether the regression explains the phenomenon it’s being used to model or not.

cs702 · on Jan 20, 2020

For background, read this fantastic blog post by the same author, Mohammed AlQuraishi at Harvard Medical School, from a year ago:

https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp...

wpasc · on Jan 20, 2020

Thanks for posting, it was a great read.

I especially enjoy the segments where he upfrontly addresses "an indictment of academic science" and "an indictment of pharma". He pulls no punches in saying how embarrassing it is for pharma and academia to be literally outclassed by DeepMind.

A great quote:

"If you think I’m being overly dramatic, consider this counterfactual scenario. Take a problem proximal to tech companies’ bottom line, e.g. image recognition or speech, and imagine that no tech company was investing research money into the problem. (IBM alone has been working on speech for decades.) Then imagine that a pharmaceutical company suddenly enters ImageNet and blows the competition out of the water, leaving the academics scratching their heads at what just happened and the tech companies almost unaware it even happened."

dekhn · on Jan 20, 2020

nobody is embarassed here. pharma doesn't work on protein folding prediction. now they can take the published results and code and use it, but protein fold prediction has not, is not, and probably will not ever be the rate limiting step in novel drug discovery and development.

travisporter · on Jan 20, 2020

Really good read! They requested the author submit it as a journal letter. This quote stuck out to me: "Keep in mind that unlike other areas of machine learning, new protein structures are not appearing at an increasing rate, and so waiting things out will not help."

vkreso · on Jan 22, 2020

Lovely read

RocketSyntax · on Jan 20, 2020

"The resulting algorithm outperformed all entrants at the most recent blind assessment of methods used to predict protein structures, generating the best structure for 25 out of 43 proteins, compared with 3 out of 43 for the next-best method."

KKKKkkkk1 · on Jan 20, 2020

This is remarkable. Teams of researchers all over the world have taken part in the CASP competitions for decades. Many attempts using machine learning and ANNs have been made. What is it about DeepMind that allowed them to make such a breakthrough? Do they have expertise in deep learning that does not exist in academia? Incredible amounts of compute that academia cannot afford?

dekhn · on Jan 20, 2020

The techniques DM used are popular in academia right now, too. Using evolutionary data to shortcut hard problems has been key to advancement in protein research for decades. DM just executed better, a combination of smart people, some good ideas, and lots of experimentation. NEver underestimate the ability of company that exists to win games, to win competitions.

natechols · on Jan 20, 2020

And never underestimate the amount of money that a big tech company can throw at a random problem. DeepMind probably blew through the equivalent of multiple R01 grants writing that paper.

robocat · on Jan 20, 2020

Big biotech can throw big amounts too.

And I read that the size of the team was 10 people - that's not a big number.

The compute power applied was not why they had this outcome.

natechols · on Jan 20, 2020

If their salaries are anything like what Bay Area companies are shelling out for top AI engineers, each one of those 10 people is probably costing as much as 10 grad students in any of the other labs working on this problem. Big Biotech does not usually have the money to get into a bidding war for engineering talent with companies like Google.

robocat · on Jan 20, 2020

"There are dozens of academic groups, with researchers likely numbering in the (low) hundreds, working on protein structure prediction. We have been working on this problem for decades, with vast expertise built up on both sides of the Atlantic and Pacific, and not insignificant computational resources when measured collectively. For DeepMind’s group of ~10 researchers, with primarily (but certainly not exclusively) ML expertise, to so thoroughly route everyone surely demonstrates the structural inefficiency of academic science."

"What is worse than academic groups getting scooped by DeepMind? The fact that the collective powers of Novartis, Pfizer, etc, with their hundreds of thousands (~million?) of employees, let an industrial lab that is a complete outsider to the field, with virtually no prior molecular sciences experience, come in and thoroughly beat them on a problem that is, quite frankly, of far greater importance to pharmaceuticals than it is to Alphabet. It is an indictment of the laughable “basic research” groups of these companies, which pay lip service to fundamental science but focus myopically on target-driven research that they managed to so badly embarrass themselves in this episode."

From: https://moalquraishi.wordpress.com/2018/12/09/alphafold-casp...

natechols · on Jan 20, 2020

I completely disagree with his interpretation. It would be surprising if group that concentrates some of the top expertise in AI weren't able to make a big impact on a well-defined optimization problem that has been studied for decades.

I think a lot of the commentary is missing two essential points:

1. Protein structure prediction is to a large extent a solved problem for small-ish, soluble targets. AlphaFold is a significant improvement on the current state of the art, but the state of the art was already far enough along that the best computational models in 2007 were good enough to bootstrap experimental structure determination (https://www.ncbi.nlm.nih.gov/pubmed/17934447). In other words, it's not like the entire academic community was stumbling around helplessly in the dark.

2. The value of these predictions to pharmaceutical companies is extremely marginal. Having a high-accuracy model is very helpful but it's rare that the researchers have so little information available that a completely de-novo prediction is necessary. And when they really don't have much information at all, it's usually because the target is sufficiently messy to defy traditional structure determination methods - which means it's almost certainly more than AlphaFold can handle too.

ibarelyknowher · on Jan 20, 2020

Alphabet owns way, way more computers than anyone else. You could lose any of the “top supercomputers” in the cracks of their datacenters.

AndrewKemendo · on Jan 21, 2020

It's critical to understand that having a measured benchmark is what makes this result so important and tells us that we're making progress.

Without measurable benchmarks we have no idea if we're making real progress towards human level AI.

suhaildawood · on Jan 20, 2020

For those unaware of cryo-EM (cryogenic electron microscopy), I highly recommend reading into it. A structural biology renaissance is upon us.

nl · on Jan 20, 2020

Code and neural network weights for AlphaFold: https://github.com/deepmind/deepmind-research/tree/master/al...

madengr · on Jan 20, 2020

I had the protein folding project running in my computer for a few years. Can these deep learning models be distributed like that, or do they require tightly coupled processors? Seems the latter as there was a recent IEEE article on a wafer scale array of CPU for deep learning.

LatteLazy · on Jan 20, 2020

Unless I misunderstood (likely, I'm not smart), you still need to calculate the closeness field for a given protein. And that will be 1000s of points each calculated separately. Only once you have that do you give it to ML to find the thermodynamicly favourable value.

mkagenius · on Jan 20, 2020

Atleast those (obsolete?) bitcoin miners can be put to this use.

derision · on Jan 20, 2020

If you're referring to ASICs, not at all. Those things can only compute hashes

OJFord · on Jan 20, 2020

(By design, because that's the 'A' (application) in 'ASIC' (Application-Specific Integrated Circuit).)

s_dev · on Jan 20, 2020

Will this have any impact on the Folding@Home efforts?

jcoffland · on Jan 20, 2020

Folding@home is working on a different problem. Folding@home finds the path a protein takes to arrive at its final structure, not just the final structure. One of the main goals is to understand protein misfolding or where on the path things go astray to result in disease.

After all these years, we are still at it. New methods are regularly evaluated and the simulation software is being refined all the time.

grahameb · on Jan 20, 2020

Yes, I was wondering about this too. It's been going on for a very long time – I remember it was shipped with the PS3, I used to leave mine running sometimes to contribute to the effort.

synthmeat · on Jan 20, 2020

Is there or will there be AlphaFold CASP14 entry too?