Evariste

Computational assessments of the synthetic accessibility of a given small molecule have seen a significant upsurge in interest in recent years. This has been driven largely by the need to triage the output of large scale automated de novo designers capable of producing millions of putative small molecules in hours or even minutes. Synthetic accessibility (or tractability) scores can range from simple heuristics to machine learning models fit on the output of slower, more complex route finding algorithms. There is an increasing body of literature which suggests that these scores, and route planning more widely, can help support medicinal chemistry efforts and accelerate small molecule discovery.

At Evariste we evaluated a series of both open source and proprietary solutions for retrosynthesis planning and synthetic accessibility scores, and although many of them are impressively accurate, we didn’t find that they were usefully accelerating our internal projects. We felt that this was because these models were insufficiently dynamic, and were answering the question “How synthetically accessible is compound X?” without any project specific context.

The more pragmatic question we wanted our model to address was instead “How quickly can I make compound X, relative to compound Y, given the chemistry resource currently available?”. This better represents our view that synthetic accessibility is not an inherent property of a compound, rather it is a property of a compound in the context of an active discovery program.

Medicinal chemists can be extremely good at trading off the need to acquire data rapidly with the utility or information gain associated with each piece of SAR. For machine learning algorithms to compete with teams of chemists they need to have very specific contextual information available. Generative route designers can be conditioned to start or end with pre-existing intermediates but we felt we could answer this question as effectively with a simpler model.

We decided any tool needed to be:

Fast: To be useful, the model needed to be capable of rapidly scoring the output of de novo designers without exorbitant compute resources.
Dynamic: The model must be responsive to changes in building block databases, internal lab notebooks, and project knowledge without the need for computationally intensive retraining.
Cliffy: Synthetic accessibility should not be smooth across chemical space and highly similar structures should have the potential to score radically differently.
Accurate: Although there is no ground truth to benchmark against, the internal chemistry team would need to agree with broad classifications of compounds as being slow/fast to make.

The workflow below was implemented:

Fragment novel designs into building block-like substructures
Search for these substructures in relevant internal and external databases
Generate and report Evariste Synthetic Accessibility (ESA) score based on substructure frequency

Compound Fragmentation

For this we used the BRICS algorithm as implemented in RDKit as it’s fast, somewhat customisable, and has some synthetic logic already built into the underlying fragmentation scheme.

Degen et al. (1) figure demonstrating an implementation of the BRICS algorithm as applied to Sorafenib

Database Selection

We chose to search three different databases, the SureCHEMBL database, a curated database of building blocks from various suppliers, and our internal database of synthesised compounds. These databases tell us whether there is any public record of the substructure, whether a simple building block might be available to purchase, and whether we have made an analogue already. To speed up the searching, we pre-computed the BRICS fragments in the databases. This made a negligible difference for internal databases but makes the function an order of magnitude quicker given the size of SureCHEMBL.

Scoring Function

There are a lot of options in terms of how to treat the frequency of the fragments and no ground truth to allow for optimisation of the weighting of various components. Some general rules we chose to apply:

Substructures with 0 results in any dataframe should receive a significant penalty
Very small fragments will appear too frequently to contribute usefully to the model
It should be possible to weight the results such that matches in local databases contribute more to the score than substructures in global databases

ESA function as implemented in the Frobenius Candidate platform

Firstly, this model is fast. To score a random sample of 1000 Enamine REAL compounds takes around 30 seconds on 16 cores. This is comfortably fast enough to support large scale design runs across multiple projects simultaneously.

Below we take a toy example from the patent literature and apply some of the design changes typically seen in the generative design process. Compound 1 in the case below is Gefitinib, with four analogues chosen at random.

Compound 2 unsurprisingly scores similarly. In our experience a difference of 0.4 does not represent a significant change in synthetic accessibility. Compounds 3 and 4 receive meaningfully lower scores. The pyrimido-pyridine core of compound 3 has some examples in the patent literature, albeit with different substitution patterns and could probably be accessed relatively readily with a novel synthetic route. Compound 4 contains a highly unusual 8-membered morpholine with two spirocyclic cyclopropanes and quite reasonably receives an even lower score. Combining these two groups into compound 5 results in the lowest scoring compound of all. One of the benefits of this system is that were a provider to add the morpholine analogue to an appropriate database, the score for this compound would improve significantly. This is exemplified by compound 6 below which contains an analogous morpholine building block which can be purchased, but is less well represented in patent literature generally and is available from fewer suppliers.

‘Intermediate’ design demonstrating responsiveness of ESA scores to building block databases

The examples above are useful, but key to the performance of this model is the specific context provided by an ongoing project. The scores should become progressively more able to feed information into the automated design platform, allowing us to correctly rank compounds based on their probability of delivering a molecule with the properties we require (target product profile or TPP) as weighted by the time required to make them. The results were generated using one of our internal projects and so we can’t share any structures for obvious reasons, however they do illustrate the agreement between the model and the internal chemistry teams. Of the 20 lowest scoring compounds:

17 did not have a suitable building block which could be purchased and delivered within two weeks or a new route was required to the core or both
3 we had actively tried to make a close analogue and it had proved unreasonably difficult to access

In every case, the 20 molecules with the lowest ESA scores were all rated by the chemists as difficult to synthesise. In the reverse case, the 20 molecules with the highest ESA score were rated as easy to access in a short period of time.

We can also see that the distribution of scores is shifted for different designers, which makes sense based on the underlying method of the generative design algorithms. The MMP designer makes changes based on matched pairs found in the literature, which results in designs that generally score highly by the ESA metric.

Distribution of ESA scores across molecules generated by two different designers

After using ESA scores across multiple projects, we have found that they are a good representation of the speed with which we can access novel small molecules. This means we can make principled decisions about prioritising targets based on both the predicted properties and how quickly we can access them. The combination of these two pieces of information means the Frobenius Candidate platform can scale across multiple projects simultaneously with limited supervision from medicinal chemists.