We have a team of passionate researchers who are all active in the fundamental and core machine learning areas.
Zenith science story
Zenith was founded to bring down the barriers around best-in-class machine learning and make it available in different problem domains. Zenith made it easy to easily ingest structured and unstructured data in various modalities like visual, text, and omics; create powerful visualisations to gain insights on the data; train state-of-the-art ML models; and deploy them rapidly. All of this is achieved via a no-code, composable and reusable toolbox and infrastructure software stack created by Zenith.
Zenith’s approach was validated with partners in different industries from insurance, remote sensing, visual inspection and survey, quality assurance etc.
One of Zenith’s key areas of focus early on was delivering state-of-the-art Machine Learning and corresponding tooling for bio-sciences. It has been only recently, with breakthroughs like Alphafold, that ML applied to biology has gained a lot of attention. The founders of Zenith have been collaborating with leading bio labs around the world since 2017 and feel strongly about the transformative power of Machine Learning in the understanding of biological data and enabling the next generation of discoveries in bio-sciences.
Although there is a vast amount of structured and unstructured data in biology, especially omics (genomics, proteomics etc.), various challenges still remain in marrying modern ML to this field. Most datasets in this domain are fragmented. Only a small subset of data is gold standard -i.e. fully validated in the lab. Different sources use different computation methods to annotate the data with a wide range of accuracy. Hence, the confidence of data points can vary widely and there is no standard scale to annotation quality or trust. Furthermore, there are multiple standards and best practices when it comes to naming conventions, organization and structure of the data.
In most successful applications of AI/ML, the timely feedback loop from the real world is invaluable. The biggest challenge in biology, therefore, lies in the disjoint between computational modelling and the time and cost of wet lab verification. Unlike most other companies, Zenith is not dependent on big data alone. For this reason from the very beginning Zenith partnered closely with Neo who have world leading genomics experts and high volume wet lab capability to fuel the rapid improvements in ML driven synthetic biology.
The scope of this collaboration is wide ranging. ML is applied to design, validation, and manufacturing of whole genome scale DNA, design of RNAs and Proteins for full cell level systems.
Zenith’s ML tools were at first applied to design, validation, and manufacturing of very long DNA constructs with desired properties. The benefits of Zenith’s machine learning approach were apparent immediately. Zenith’s ML tools improved accuracy in a key task of predicting fitness of DNA modifications, achieving a 81% accuracy vs 53% with other state-of-the-art methods. On top of that, the deep visualization and interpretation capabilities of the Zenith ML models provided fundamental insights to the biologists and shed light on mechanisms of long range regulatory interactions between regions of DNA.
Some key aspects of Zenith’s scientific approach are as follows –
Well calibrated prediction uncertainty – this is key to building trust in the prediction of models and is a key component in gathering medium scale but high impact data from the wet lab. This closing of the loop is key to achieving real world success.
Inductive bias in the models– exploit physical and evolutionary symmetries and constraints. This reduces the need for learning every property in the data from scratch using vast training sets.
Dynamic trust assignment to prediction vs ground truth during training. SOA method pioneered by Zenith that let’s us learn under noisy data.
Multi-modal and multi-resolution models for omics that can combine different types of molecules like RNA, and Proteins.
Long sequence models. This enables Zenith to capture very long range dependencies at a genome scale.
Proprietary semantics preserving data augmentation.
Proprietary visualization and model interpretation methods. This is key to achieving the trust of domain experts and helping them ask the next generation of questions.
After achieving the ground breaking results, Zenith was acquired by Opentrons labworks to work alongside Neochromosome to solve some of the toughest problems in cell engineering, protein engineering, therapeutics, etc.
Peer-reviewed research papers
In deep learning, how much should we trust a learner to leverage its knowledge as training goes? For learning regularisation, should we penalise a low-entropy status or reward it?
IEEE TPAMI 2021
We unveil two limitations of existing ranking-motivated structured losses and propose a novel ranked list loss to solve both of them. First, given a query, only a fraction of data points is incorporated to build the similarity structure. To address this, we propose to build a set-based similarity structure by exploiting all instances in the gallery. Second, previous methods aim to pull positive pairs as close as possible in the embedding space. As a result, the intraclass data distribution tends to be extremely compressed. In contrast, we propose to learn a hypersphere for each class to preserve useful similarity structure inside it, which functions as regularisation.