EX-99.2 3 a99_2aiday-vfinal1.htm EX-99.2 a99_2aiday-vfinal1
Exhibit 99.2


 
2 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 This presentation contains forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995, as amended. In some cases, forward-looking statements can be identified by terminology such as “will,” “may,” “should,” “expects,” “intends,” “plans,” “aims,” “anticipates,” “believes,” “estimates,” “predicts,” “potential,” “continue,” or the negative of these terms or other comparable terminology, although not all forward- looking statements contain these words. The forward-looking statements in this presentation are neither promises nor guarantees, and you should not place undue reliance on these forward-looking statements because they involve known and unknown risks, uncertainties, and other factors, many of which are beyond BioNTech’s control and which could cause actual results to differ materially from those expressed or implied by these forward-looking statements. You should review the risks and uncertainties described under the heading “Risk Factors” in BioNTech's Quarterly Report on Form 6-K for the period ended June 30, 2024 and in subsequent filings made by BioNTech with the SEC, which are available on the SEC’s website at https://www.sec.gov/. Except as required by law, BioNTech disclaims any intention or responsibility for updating or revising any forward-looking statements contained in this presentation in the event of new information, future developments or otherwise. These forward-looking statements are based on BioNTech’s current expectations and speak only as of the date hereof. Furthermore, certain statements contained in this presentation relate to or are based on studies, publications, surveys and other data obtained from third- party sources and BioNTech’s own internal estimates and research. While BioNTech believes these third-party sources to be reliable as of the date of this presentation, it has not independently verified, and makes no representation as to the adequacy, fairness, accuracy or completeness of, any information obtained from third-party sources. In addition, any market data included in this presentation involves assumptions and limitations, and there can be no guarantee as to the accuracy or reliability of such assumptions. While BioNTech believes its own internal research is reliable, such research has not been verified by any independent source. In addition, BioNTech is the owner of various trademarks, trade names and service marks that may appear in this presentation. Certain other trademarks, trade names and service marks appearing in this presentation are the property of third parties. Solely for convenience, the trademarks and trade names in this presentation may be referred to without the ® and TM symbols, but such references should not be construed as any indicator that their respective owners will not assert, to the fullest extent under applicable law, their rights thereto. This Slide Presentation Includes Forward-Looking Statements


 
Introduction and Welcome Ryan Richardson Chief Strategy Officer BioNTech Ugur Sahin Founder & CEO BioNTech Karim Beguir CEO InstaDeep 3 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Introduction and Vision 14:00 Welcome & Introductory Remarks 14:05 Our Vision for AI Part I. Scaling AI Capabilities 14:10 Computing Infrastructure 14:25 Innovation: Bayesian Flow Networks 14:45 DeepChain: One Platform, Multiple Tools Part II. Deploying AI across the pipeline 15:00 Applying AI end-to-end to the immunotherapy pipeline: examples 15:40 Closing Remarks and Q&A Agenda 4 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
DNA Mutations Healthy Cells Mutations Mutations Pre-Cancer Cell Root Cause of Cancer Treatment Failure Interindividual Variability & Intratumoral Heterogeneity Individual Patients Cancer Evolution 5-20 Years – up to 10.000 Mutations Cancer Cells Genetically Diverse & Adaptable Mutations 5 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Towards a Potentially Curative Approach to Cancer Based on Multiple Modalities and Differentiated Novel/Novel Therapeutic Combinations Space for potentially curative approaches Immunomodulators Novel checkpoint inhibitors, cytokines, immune agonists mRNA vaccines Targeted therapy ADCs, CAR-T, TCR-T, small molecules SynergySynergy Synergy ADC = antibody-drug conjugate; CAR = chimeric antigen receptor; TCR-T = T-cell receptor engineered T cell; IO = immune oncology. Immunomodulators • Focus on the most relevant and crucial IO pathways • Targeting different complementary players in the complex cancer immunity cycle aiming to achieve a thorough and durable anti-tumor effect mRNA cancer vaccines • Could eliminate polyclonal residual disease with individualized vaccines for potential long-term impact • Polyspecific activity by targeting multiple antigens at once Targeted therapy • Could rapidly reduce tumor burden • Designed with the aim to have clinical efficacy across the entire disease continuum including late lines 6 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
7 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 BioNTech: A pioneer in ML from day one


 
BioNTech and InstaDeep create joint AI Lab with committed budget and dedicated infrastructure 2020 BioNTech and InstaDeep begin project work 2019 2023 BioNTech invests in InstaDeep Series B round alongside Google and syndicate of investors BioNTech acquires InstaDeep to operate as a wholly owned AI subsidiary 2022 2022 BioNTech and InstaDeep work hand-in-hand to embed AI across BioNTech platforms and functions 2014 2017 BioNTech rolls out an in silico neo-antigen selection process3 BioNTech initiates individualized mRNA cancer vaccine first-in- human trial2 8 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 BioNTech and InstaDeep – The Road to Partnership 1. Cancer Res., PMID 22237626, 2. Nature PMID 25901682, Nature PMID 28678784, 3. BioNTech’s personalized cancer vaccine candidate, autogene cevumeran, is partnered with Genentech, a member of Roche Group. 2011 BioNTech introduces computationally designed individualized mRNA cancer vaccines1


 
Two Companies: One Mission BioNTech: >6,8001 Employees InstaDeep: >3701 Employees Developing medicines to fight cancer, infectious diseases and other serious diseases. Focused on productizing disruptive AI innovation HQ: Mainz, Germany HQ: London, UK Our Goal: Building a leading AI-first, personalized immunotherapy platform (and leveraging the breakthroughs obtained in the process) 9 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 1. As of 30 June 2024


 
Charting the Course for Tomorrow’s Personalized Medicine Deep genomics & immunology expertise to analyse patient data Individualized treatment platforms to address inter-individual variability In-house manufacturing AI & digitally-integrated target & drug discovery and development Drug classes Inter- individual variability Off-the-shelf drugs Tailored on-demand immunotherapies Clinical samples Engineered cell therapies mRNA therapeutics T cell receptors Antibodies Antibody conjugates Small molecule immunomodulators 10 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Personalized Omics


 
11 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Our Goal: Deploying AI end-to-end in our immunotherapy pipeline 1. Immunohistochemistry 2. DNA/RNA sequencing 3. Proteomics 4. Protein Design 5. Lab Functional Validation AI Vision DNA LLMs AI AgentProtein LLMs Protein LLMs DNA Sequence analysis and personalized genomic annotations Histology AI computer vision to improve speed and accuracy of tissue labeling Proteomics Leveraging AI for target discovery and analysis of the immunological landscape Protein Design Developing assets for immunotherapy modalities (antibodies, cytokines, TCRs) Lab Instrument automation and quality control AI-first Immunotherapy Platform


 
12 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Our Approach: 1. Scaling AI capabilities 2. Deploying across the pipeline


 
Part I. Scaling AI Capabilities 13 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 


 
15 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Computing Infrastructure


 
16 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Cluster Specifications 224 Nvidia H100 GPUs 86,000 CPU Cores 1.7 PetaBytes persistent storage 400 Gbps RoCE network InstaDeep's Supercomputing Cluster Source: Internal Data


 
17 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Our Supercomputing Cluster Is Nearing Exascale Levels: InstaDeep on-premise Cluster totals to ~0.5 ExaFLOPS Top 100 worldwide [1] Top 20 H100 GPU clusters worldwide [2] [1] “Top 500, The List”, June 2023 [2] “State of AI Report Compute Index“, August 2024


 
18 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Advanced In-House Rack Design Easy to expand with modular nodes Consistent performance, cost, power, cooling Optimized for large-scale AI workloads Simplified management with consistent architecture Minimize expenses with standard design Network stack 400 Gbps CPU Nodes 6144 CPUs Fast Storage 122 TB GPU Nodes 16 H100 GPUs Source: Internal Data


 
Software Stack Supporting our Cluster: 19 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Cluster Management Compute Networking Storage Projects Users Frameworks & Tools Cost Management SecurityFully tailored AI stack from hardware to experiments Open standards Cutting-edge tooling


 
20 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Strategic Benefits from our SuperComputing Cluster • Availability when most needed • Flexibility on Sw/Hw Integration • No Vendor Locking • Repeatable design • Predictable costs • Cost efficient (50% savings on cloud equivalent at 60% usage)


 
Scaling Intelligence 21 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Scaling Laws Why How What Scaling Intelligence Engineering Expertise Accelerating Scientific Discovery 22 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Scaling Laws Why How What Scaling Intelligence Engineering Expertise Accelerating Scientific Discovery 23 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Scaling Laws Performance of Large Language Models (LLMs) is a smooth, well-behaved and predictable function of the number of parameters of your model, the amount of data used to train it, and computing resources. Source: Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D.D.L., Hendricks, L.A., Welbl, J., Clark, A. and Hennigan, T., 2022. Training compute-optimal large language models. 24 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Source: Achiam, Josh, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida et al. "Gpt-4 technical report." (2023) We can expect “more intelligence” by scaling existing algorithms. Scaling Laws 25 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Scaling Laws Why How What Scaling Intelligence Engineering Expertise Accelerating Scientific Discovery 26 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
How to Scale Next-Generation Foundation Models 27 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Scaling next-generation AI systems demands advanced engineering solutions, tightly with the hardware to balance training and deployment constraints. Memory Model Sharding Rematerialization Quantization / Precision Network Compute/Comm. overlap I/O and Data processing Hardware and Topology Compute XLA optimization Kernel Fusion and Caching Data Parallelism


 
Scaling Laws Why How What Scaling Intelligence Engineering Expertise Accelerating Scientific Discovery 28 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
29 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 #1 Accelerating Reinforcement Learning Reinforcement Learning is the science of learning from trials and errors. A simulation engine turns computation into data. The Sebulba Architecture on 8x hardware accelerators1 Scaling Reinforcement Learning • Multiple threads keep the hardware accelerators active. • Learner cores process experience, synchronizing updates using JAX primitives. • The architecture can be replicated across a large number of nodes to form a supercomputing cluster. • Leverage the high-speed inter chip interconnects between the nodes of the hardware accelerators. [1] “InstaDeep’s scalable reinforcement learning on Cloud TPU”, October 19, 2023, Google Cloud blog post [2] Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C. and Józefowicz, R., 2019. Dota 2 with large scale deep reinforcement learning. [3] Hessel, M., Kroiss, M., Clark, A., Kemaev, I., Quan, J., Keck, T., Viola, F. and van Hasselt, H., 2021. Podracer architectures for scalable reinforcement learning.


 
Better: 50% improvement in performance as we scale the hardware and simulated data. Cheaper: 13x cost reduction due to the more efficient use of the hardware. Faster: 240x faster to train an RL agent up to convergence. [1] “InstaDeep’s scalable reinforcement learning on Cloud TPU”, October 19, 2023, Google Cloud blog post [2] Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C. and Józefowicz, R., 2019. Dota 2 with large scale deep reinforcement learning. [3] Hessel, M., Kroiss, M., Clark, A., Kemaev, I., Quan, J., Keck, T., Viola, F. and van Hasselt, H., 2021. Podracer architectures for scalable reinforcement learning. #1 Accelerating Reinforcement Learning 30 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
31 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Scaling the Next-Generation of our Generative AI Models • in-house JAX-based software library • best-in-class engineering for scaling LLMs e.g. Hybrid Parallelism, mixed-precision, rematerialization, etc. Results • Train multi-billion parameters models (+15B) • Scaling Laws in Action • Hardware efficiency on par with the latest Meta Llama 3.11 i.e. Model Flop Utilization of ~50% for our 15B model #2 Generative AI for Biology [1] Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A. and Goyal, A., 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783. Source: InstaDeep


 
Summary InstaDeep’s Supercomputing Cluster and advanced software stack could facilitate new scientific breakthroughs, services and products that were previously out of reach. Scaling Laws Engineering Expertise Accelerating Scientific Discoveries 32 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
33 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 AI Innovation Bayesian Flow Networks


 
Generative AI A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage… Sora: generative AI model for videos OpenAI 2024 34 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Source: SORA blog post by OpenAI: https://openai.com/index/sora/


 
Joint Probability Distributions For a generative model of face images, creating a new face means picking a sample from the joint probability distribution of all the pixels. Q: Why is this so hard? A: Because all the pixels are interrelated Generating Diverse High-Fidelity Images with VQ-VAE-2 (Razavi et al. 2019) 35 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
We want to control what we generate. With a multimodal model (e.g. images and text) we can do this with conditional sampling: fix one modality, generate another. Given an image, generate a caption (CLIP OpenAI 2021) Given an prompt, generate an image (DALL·E OpenAI 2021) Steerable Generation 36 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
One Model, Many Tasks Learning a joint distribution over many variables then choosing which to fix and which to generate gives us one model for many tasks 37 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
38 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 boss cat report Sorry ____ the ___ ate my ______ Autoregression (GPT) Masked prediction (BERT)Diffusion Pros: Continuous data (especially images), inpainting, fast gradient- based sampling Cons: Discrete data Pros: Sequence data (especially text) Cons: Unordered data, inpainting, slow sampling Pros: Discrete data, inpainting, representation learning Cons: Continuous data, slow sampling Backward process: remove noise Forward process: add noise slides Sorry boss the dog ate my ______ But which Model?


 
Graves, Srivastava, Atkinson, Gomez 2023 Bayesian Flow Networks Unlike diffusion models, they generate discrete data in a continuous way, allowing for gradient-based sampling This makes BFNs well-suited for controllable generation across diverse data modalities. BFNs are a new class of generative model that uses Bayesian inference to update beliefs about data. 39 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Generative Modelling p( ) Binders Binding assaysStructureTaxonomyGO Terms EC numberSequence AGL… PE score Structural domains pTMs AGL… , , , , , , , , , ,... Proteomics: Joint Modelling A unifying framework to learn useful functions from data (1) Learn to model the joint distribution of all your data, and (2) conditionally sample for tasks of interest. 40 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
p( ) Binders Binding assaysStructureTaxonomyGO Terms EC numberSequence AGL… PE score Structural domains pTMs Proteomics: Protein Folding A unifying framework to learn useful functions from data (1) Learn to model the joint distribution of all your data, and (2) conditionally sample for tasks of interest. AGL…| , , , , , , , , ,... Generative Modelling 41 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Binders Binding assaysStructureTaxonomyGO Terms EC numberSequence AGL… PE score Structural domains pTMs Proteomics: Function Prediction A unifying framework to learn useful functions from data (1) Learn to model the joint distribution of all your data, and (2) conditionally sample for tasks of interest. p( )AGL… , , , , , , , , ,... | Generative Modelling 42 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Binders Binding assaysStructureTaxonomyEC numberSequence AGL… PE score Structural domains pTMs Proteomics: Antibody Design A unifying framework to learn useful functions from data (1) Learn to model the joint distribution of all your data, and (2) conditionally sample for tasks of interest. p( )AGL… , , , , , , , , ,... | GO Terms Generative Modelling 43 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Sequence AGL… Proteomics: Sequence Generation A unifying framework to learn useful functions from data (1) Learn to model the joint distribution of all your data, and (2) conditionally sample for tasks of interest. Generative Modelling 44 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 De Novo Conditional Limited Limited applicability to discrete data “Discrete” diffusion P R I . MMPRSSPV... Autoregression (GPT) R . MPPR____... Masked prediction (BERT) RRS . MPP___IV... Yes No Yes


 
Natural, Diverse & Novel Protein Sequences ProtBFN learns statistical and biochemical properties of natural proteins with high-fidelity. 1. 10,000 generated sequences from each model are matched to clusterings from UniRef50. A hit is determined as a match with >50% sequence identity. Coverage score is the ratio of the number of unique clusters hit to the expected number if sequences were drawn i.i.d. from the models training distribution. ProtGPT2 (huggingface.co/nferruz/ProtGPT2) and EvoDiff (github.com/microsoft/evodiff) sequences are sampled using publicly available code and model weights provided by the authors. 2. Identity of ProtBFN generated sequences to the best matching protein sequence found in the models training data. Any identity < 100% is a novel sequence that the model has not seen before. …more diverse…1More natural...1 …and highly novel.2 95%Sequence identity < 95% 89%Sequence identity < 80% 44%Sequence identity < 50% 45 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Globular Structural Motifs With Novel Sequences Predicted structures of generated sequences show natural, globally coherent and functionally diverse folds. Structure largely determines function in nature. Sequence AGL… Structure Function Single and multi-domain proteins. Globally coherent generations with inter-domain interactions. Spans diversity of known structures and tree-of-life. Alpha Helical, Beta Sheet, Alpha-Beta and Irregular domains. Small and large domains. Transmembrane Proteins (porins and transporters) and Enzymes. Domains specific to Archaea, Bacteria, Eukarya (Plants, Humans). 46 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
BFN for Protein Sequences Outperforms or matches task-specific autoregressive, diffusion and BERT models. Improved naturalness, diversity and novelty. Uses zero-shot conditioning of model. Released only a few days ago!1 Patent application filed. 1. Available at https://www.biorxiv.org/content/10.1101/2024.09.24.614734v1 47 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Going Beyond Sequence-Only Models Our goal is to model everything: building foundational models of the joint distribution of heterogeneous scientific data. Performance across multiple data types and sources. Flexibility in the hands of scientists with task-specific inference. 48 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 p( ) Binders Binding assaysStructureTaxonomyGO Terms EC numberSequence AGL… PE score Structural domains pTMs AGL… , , , , , , , , , ,...


 
Task #1 Task #2 AGL… Task Data Model Scientist A?L… AGL… AGL… AGL… AGL… Data Conventional ML AI AI Going Beyond Sequence-Only Models Our goal is to model everything: building foundational models of the joint distribution of heterogeneous scientific data. Performance across multiple data types and sources. Flexibility in the hands of scientists with task-specific inference. 49 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
ModelData AGL… Our vision AI (BFN-X) A?L… AGL… AGL… Tasks determined flexibly at inference with conditional generation Scientist Going Beyond Sequence-Only Models Our goal is to model everything: building foundational models of the joint distribution of heterogeneous scientific data. Performance across multiple data types and sources. Flexibility in the hands of scientists with task-specific inference. 50 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Introducing AbBFN-X First look at our multimodal model for antibodies. 36 different attributes jointly modelled: sequence, genetic, biophysical Empowers scientists with tunable generation, being highly flexible across many tasks. Today’s use cases go beyond standard AI-enhanced antibody design workflows. 51 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Source: Generated Image


 
CDR-L1 CDR-L2 CDR-L3 CDR-H1 CDR-H2 CDR-H3 VH: EVQLLESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISWNSGSIYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCARGWSQVDTAMDLDYGQGTLVTVSS D gene AbBFN-X VL: DIQMTQSPSSVSASVGDRVTITCRASQSVSSNLAWYQQKPGKAPKLLIYGASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQYNNWLTFGQGTRLEIK L2 L1 H3 H2 H1 VH VL VH VL CH1 CL VH CH1 VL CL CH2 CH3 CH2 CH3 FV Fab V gene J gene V gene J gene L3 52 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Length Attributes CDR-H1 length CDR-H2 length CDR-H3 length CDR-L1 length CDR-L2 length CDR-L3 length VH length VL length HV gene HD gene HJ gene HV seq. identity HD seq. identity HJ seq. identity LV gene LD gene LV seq. identity LJ seq. identity LC locus Species Genetic Attributes % % % % % Amino acid sequence FWR-H1 CDR-H1 FWR-H2 CDR-H2 FWR-H3 CDR-H3 FWR-H4 FWR-L1 CDR-L1 FWR-L2 CDR-L2 FWR-L3 CDR-L3 FWR-L4 AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… L2 L1 H3 H2 H1 VH VL VH VL CH1 CL VH CH1 VL CL CH2 CH3 CH2 CH3 FV Fab L3 Biophysical Attributes Negative Patches Charge Imbalance Positive Patches Hydrophobicity AbBFN-X 53 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 CDR-L1 CDR-L2 CDR-L3 CDR-H1 CDR-H2 CDR-H3 VH: EVQLLESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISWNSGSIYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCARGWSQVDTAMDLDYGQGTLVTVSS D gene VL: DIQMTQSPSSVSASVGDRVTITCRASQSVSSNLAWYQQKPGKAPKLLIYGASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQYNNWLTFGQGTRLEIK V gene J gene V gene J gene


 
CDR-L1 CDR-L2 CDR-L3 CDR-H1 CDR-H2 CDR-H3 VH: EVQLLESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISWNSGSIYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAKDLLGSFPYDASGYYDYFDYWGQGTLVTVSS VL: DIQMTQSPSSVSASVGDRVTITCRASQSVSSNLAWYQQKPGKAPKLLIYGASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQANSFPPTFGQGTRLEIK L2 L1 H3 H2 H1 L3 AbBFN-X Length Attributes CDR-H1 length CDR-H2 length CDR-H3 length CDR-L1 length CDR-L2 length CDR-L3 length VH length VL length HV gene HD gene HJ gene HV seq. identity HD seq. identity HJ seq. identity LV gene LD gene LV seq. identity LJ seq. identity LC locus Species Genetic Attributes % % % % % Amino acid sequence FWR-H1 CDR-H1 FWR-H2 CDR-H2 FWR-H3 CDR-H3 FWR-H4 FWR-L1 CDR-L1 FWR-L2 CDR-L2 FWR-L3 CDR-L3 FWR-L4 AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… Biophysical Attributes Negative Patches Charge Imbalance Positive Patches Hydrophobicity 54 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
CDR-L1 AbBFN-X Length Attributes CDR-H1 length CDR-H2 length CDR-H3 length CDR-L1 length CDR-L2 length CDR-L3 length VH length VL length HV gene HD gene HJ gene HV seq. identity HD seq. identity HJ seq. identity LV gene LD gene LV seq. identity LJ seq. identity LC locus Species Genetic Attributes % % % % % Amino acid sequence FWR-H1 CDR-H1 FWR-H2 CDR-H2 FWR-H3 CDR-H3 FWR-H4 FWR-L1 FWR-L2 CDR-L2 FWR-L3 CDR-L3 FWR-L4 AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… Biophysical Attributes Negative Patches Charge Imbalance Positive Patches Hydrophobicity L2 L1 H3 H2 H1 L3 55 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 CDR-L1 CDR-L2 CDR-L3 CDR-H1 CDR-H2 CDR-H3 VH: EVQLLESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISWNSGSIYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAKDLLGSFPYDASGYYDYFDYWGQGTLVTVSS VL: DIQMTQSPSSVSASVGDRVTITCRASQSVSSNLAWYQQKPGKAPKLLIYGASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQANSFPPTFGQGTRLEIK


 
CDR-L1 CDR-L2 CDR-L3 CDR-H1 CDR-H2 VH: EVQLLESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISWNSGSIYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAKDRGGNWAILDYWGQGTLVTVSS VL: DIQMTQSPSSVSASVGDRVTITCRASQSVSSNLAWYQQKPGKAPKLLIYGASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQANSFPPTFGQGTRLEIK L2 L1 H3 H2 H1 L3 AbBFN-X Length Attributes CDR-H1 length CDR-H2 length CDR-H3 length CDR-L1 length CDR-L2 length CDR-L3 length VH length VL length HV gene HD gene HJ gene HV seq. identity HD seq. identity HJ seq. identity LV gene LD gene LV seq. identity LJ seq. identity LC locus Species Genetic Attributes % % % % % Amino acid sequence FWR-H1 CDR-H1 FWR-H2 CDR-H2 FWR-H3 CDR-H3 FWR-H4 FWR-L1 CDR-L1 FWR-L2 CDR-L2 FWR-L3 CDR-L3 FWR-L4 AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… Biophysical Attributes Negative Patches Charge Imbalance Positive Patches Hydrophobicity CDR-H3 56 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Heavy chain synthesis. CDR-H3 design and grafting. Light chain library synthesis with desired CDR length. Paired library generation. Mutagenesis. Discard poor candidates. Generating a library of rare antibodies against HIV: Example Task 1: Anti-HIV Antibody Library Design 57 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
FWR-H1 CDR-H1 FWR-H2 CDR-H2 FWR-H3 CDR-H3 FWR-H4 FWR-L1 CDR-L1 FWR-L2 CDR-L2 FWR-L3 CDR-L3 FWR-L4 AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… Generating a library of rare antibodies against HIV: (1) identify target attributes. (2) conditionally sample for rare, desired antibodies. Amino acid sequence HV gene HD gene HJ gene HV seq. identity HD seq. identity HJ seq. identity LV gene LD gene LV seq. identity LJ seq. identity LC locus Species Genetic Attributes % % % % % Length Attributes CDR-H1 length CDR-H2 length CDR-H3 length CDR-L1 length CDR-L2 length CDR-L3 length VH length VL length Biophysical Attributes Negative Patches Charge ImbalancePositive Patches Hydrophobicity Example Task 1: Anti-HIV Antibody Library Design 58 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
AbBFN-X antibodies are 5600x more likely2 to have all desired characteristics. Generating a library of rare antibodies against HIV: (1) identify target attributes. (2) conditionally sample for rare, desired antibodies. Example Task 1: Anti-HIV Antibody Library Design 59 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 1. "Baseline Abs" refers to data sets of natural antibodies (Olsen et al, 2021, Prot. Sci.), "BFN" refers to samples generated by AbBFN-X. 2. compared to the rate of finding antibodies with the correct characteristics in data sets of natural antibodies (Olsen et al, 2021, Prot. Sci.)1.


 
ARDEIYFLEWLISY AKVRLGELPYEAFDI ARGVRVQ SYNWFDP ASGEYFFDTSSYPN ARSSFVYPKSGYDFYFDY ARDIAVDPESTAYFDY AKGFSYGDGWADY VRLRVGVLPGAFDI ARDGGHYSH ASGSGDSRYAQPLWFTTAFDI ATSLNYGVIISD ASGKMAVAYYFDY AREGMDASMYYFDY ARDMGYHDGALVFDN… L2 L1 H3 H2 H1 L3 100%Unique CDRs1 Unique CDR-H3s 99%Unique CDR-L3s 100% 52%Unique non H/L3 CDRs 1. 128 samples were generated, uniqueness assessed by considering all relevant regions at once, excluding framework regions. Generating a library of rare antibodies against HIV: (1) identify target attributes. (2) conditionally sample for rare, desired antibodies. Example Task 1: Anti-HIV Antibody Library Design 60 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Generating a set of developable light chains that will pair with a heavy chain: (1) Condition on desired properties and heavy sequence. (2) Sample for stable, diverse sequences. Example Task 2: Heavy-Light Pairing 61 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Generating a set of developable light chains that will pair with a heavy chain: (1) Condition on desired properties and heavy sequence. (2) Sample for stable, diverse sequences. FWR-H1 CDR-H1 FWR-H2 CDR-H2 FWR-H3 CDR-H3 FWR-H4 Length Attributes CDR-H1 length CDR-H2 length CDR-H3 length CDR-L1 length CDR-L2 length CDR-L3 length HV gene HD gene HJ gene HV seq. identity HD seq. identity HJ seq. identity LV gene LD gene LV seq. identity LJ seq. identity LC locus Species VH length VL length Genetic Attributes % % % % % Biophysical Attributes Negative Patches Charge ImbalancePositive Patches Hydrophobicity FWR-L1 CDR-L1 FWR-L2 CDR-L2 FWR-L3 CDR-L3 FWR-L4 AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… AGL… Amino acid sequence Example Task 2: Heavy-Light Pairing 62 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Generating a set of developable light chains that will pair with a heavy chain: (1) Condition on desired properties and heavy sequence. (2) Sample for stable, diverse sequences. Conditioning on the required heavy chain Example Task 2: Heavy-Light Pairing 63 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Generating a set of developable light chains that will pair with a heavy chain: (1) Condition on desired properties and heavy sequence. (2) Sample for stable, diverse sequences. Conditioning on the required heavy chain results in diverse light chains in length and structure Example Task 2: Heavy-Light Pairing 64 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Generating a set of developable light chains that will pair with a heavy chain: (1) Condition on desired properties and heavy sequence. (2) Sample for stable, diverse sequences. Conditioning on the required heavy chain results in diverse light chains in length and structure while respecting the requested sequence and generating stable pairs. Example Task 2: Heavy-Light Pairing 65 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Generative AI for Proteomics InstaDeep is developing next-generation GenAI models across the whole stack, from fundamental ML research, to modelling scientific data and enabling new capabilities for scientists. Bayesian Flow Networks Unified modelling of multi-modal data Task-specific conditional generation Joint learning of heterogeneous data All modalities as first class citizens Protein Sequence Modelling Published demonstrations BFN-X In-development foundation models Leading performance across tasks Sequence, genetic and biophysical Diverse & novel de novo generation Learns rational antibody principles Zero-shot inpainting at inference Enables diverse suite of tasks 66 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
67 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 DeepChain One platform, Multiple tools


 
68 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Functional validation Lab Integration Development Stability enhancement Optimisation Binding improvement Target identification Genome annotation By combining State-of-the-art Science with Engineering, Our AI Tools aim to Accelerate the R&D pipeline [1] Alice Sends Amino Acids to Bob: Protein Sequence Modelling with Bayesian Flow Networks, Barrett et al., Under Review (2024). [2] The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics, Dalla-Torre et al., Under Review (2023). [3] SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models, Almeida et al., Under Review (2024). Nucleotide Transformers2,3 - Splicing prediction - In silico design of regulatory sequences Bayesian Flow Networks1 - De novo antibody design - Affinity enhancement Assistants - Form hypotheses, design experiments - Call tools to analyse results collaborating with a human scientist


 
69 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October, 2024 Source: Internal Data


 
We Are Releasing our Flagship Models on DeepChain ProtBFN & AbBFN1 Nucleotide Transformer2 & SegmentNT3 State-of-the-art generative protein models Generates natural-like, diverse, structurally coherent, and novel protein sequences Outperforms leading autoregressive and discrete diffusion models Enables flexible conditional generation in a zero-shot manner Our foundation models for DNA Single nucleotide resolution Up to 50kb context length without performance drop Generalizes across species 70 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 [1] Alice Sends Amino Acids to Bob: Protein Sequence Modelling with Bayesian Flow Networks, Barrett et al., Under Review (2024). [2] The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics, Dalla-Torre et al., Under Review (2023). [3] SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models, Almeida et al., Under Review (2024).


 
We Are Releasing our Flagship Models on DeepChain ProtBFN & AbBFN1 Nucleotide Transformer2 & SegmentNT3 State-of-the-art generative protein models Generates natural-like, diverse, structurally coherent, and novel protein sequences Outperforms leading autoregressive and discrete diffusion models Enables flexible conditional generation in a zero-shot manner Our foundation models for DNA Single nucleotide resolution Up to 50kb context length without performance drop Generalizes across species [1] Alice Sends Amino Acids to Bob: Protein Sequence Modelling with Bayesian Flow Networks, Barrett et al., Under Review (2024). [2] The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics, Dalla-Torre et al., Under Review (2023). [3] SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models, Almeida et al., Under Review (2024). 71 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Our Foundation Models for Genomics are State-of-the-Art [1] The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics, Dalla-Torre et al., Nat. methods (press) Nucleotide transformer models are state-of-the-art in the space1 72 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October, 2024


 
One of the Most Downloaded Genomics AI Models on Hugging Face2 73 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 [1] Cumulative Downloads for Nucleotide Transformer models, 50-2.5B parameter sizes, September 2024. Hugging Face Statistics. Models Release Date: April 2023. [2] Count by family of models under the “Genomics” official Hugging Face tag: https://huggingface.co/models?other=genomics&sort=downloads, September 2024. +700K Downloads Across model sizes1


 
We are releasing capabilities to build and scale with our AI models 74 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Optimized Setup 01 02 03 75 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
01 02 03 Get access to our hardware- accelerated workflows to run models with a few lines of code Send a request to the Inference API and receive a fast response containing the model’s output 76 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Optimized Setup


 
Running inference with DeepChain is 7X faster and 2X cheaper for in silico design of regulatory sequences* * reference methodology: Jores, T., Tonnies, J., Wrightsman, T. et al. Synthetic promoter designs enabled by a comprehensive analysis of plant core promoters. Nat. Plants 7, 842–855 (2021). * Test implementation: Sequence length: 6kbp and 2.1kbp sequences, Parameters: --num_indels=8000, --prop_indels=0.5, --random_indels=True, --min_indels_size=2, --max_indel_size=5, --tissue_optimize_idx=1, -- opt_metric=increase, --num_rounds=30 * Baseline implementation set-up: 1 NVIDIA V100 Tensor Core GPUs, using published Pytorch implementation available on Hugging Face Improved Speed Reduced Cost 77 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Optimized Setup 01 Customization 02 03 Customize models for your needs Customize a model for a specific task via supervised fine-tuning with our proprietary parameter efficient fine-tuning methods 78 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Fine-tuning a model on a specialised data set increased performance ≈1.5X for a splicing prediction use-case Relative improvement of fine-tuning for a splicing task * AUCPR: metric that measures the overall performance of a binary classification model by plotting precision against recall at different threshold settings, providing a more accurate assessment of performance for imbalanced classes. * Dataset used for customer-specific calculation: Shiraishi Y, Kataoka K, Chiba K, Okada A, Kogure Y, Tanaka H, Ogawa S, Miyano S. A comprehensive characterization of cis-acting splicing-associated variants in human cancer. Genome Res. 2018 Aug;28(8):1111-1125. doi: 10.1101/gr.231951.117. Epub 2018 Jul 16. PMID: 30012835; PMCID: PMC6071634. * Baseline implementation: Predicting Splicing from Primary Sequence with Deep Learning. Jaganathan, Kishore et al. Cell, Volume 176, Issue 3, 535 - 548.e24 79 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Optimized Setup 01 Customization 02 Assistants 03 Build using natural language with Laila Laila can form hypotheses, design experiments, and call tools to analyse results collaboratively with a human scientist 80 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
81 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Demo screen


 
82 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 The Laila Series of AI Agents is built with Meta Llama 3.1 70B 405B Llama-3.1-70B-Laila-fine-tuned Llama-3.1-405B-Laila-fine-tuned Model Sizes in billions of parameters Model Versions internally fine-tuned by InstaDeep 8B Llama-3.1-8B-Laila-fine-tuned


 
83 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Capable of reasoning and making decisions Laila an AI Agent for biology Learning through constant feedback Expert knowledge of biology Integrated with powerful tools Laila is more than just a chatbot...


 
84 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Go to deepchain.bio to gain early access


 
85 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Part II. Deploying AI Across The Pipeline


 
86 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Our Goal: Deploying AI end-to-end in our immunotherapy pipeline 1. Immunohistochemistry 2. DNA/RNA sequencing 3. Proteomics 4. Protein Design 5. Lab Functional Validation AI Vision DNA LLMs AI AgentProtein LLMs Protein LLMs DNA Sequence analysis and personalized genomic annotations Histology AI computer vision to improve speed and accuracy of tissue labeling Proteomics Leveraging AI for target discovery and analysis of the immunological landscape Protein Design Developing assets for immunotherapy modalities (antibodies, cytokines, TCRs) Lab Instrument automation and quality control AI-first Immunotherapy Platform


 
Step 1: Histology 87 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
88 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 The Challenge Pathologists face increasing workloads as demand grows for precise tumor and tissue analysis.


 
89 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October, 2024 Source: Own Data


 
90 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Our Approach Develop tools using state-of-the-art AI to enhance pathologists' workflows by producing faster and higher-quality annotations.


 
91 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 AI-Assisted Tissue Annotation Tool Aiming to enhance Pathologists' Precision And Efficiency Through Human-AI Collaboration


 
92 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October, 2024


 
5X Speed-Up By increasing efficiency fivefold compared to manual annotation, our AI tool allows pathologists to complete annotations in a fraction of the time, optimizing resource utilization and accelerating research and development efforts. 93 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Superior Annotation Quality Enabling pathologists to refine annotations at different magnification levels in Whole Slide Images, resulting in higher-quality annotations. 94 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
95 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Whole Slide Image Segmentation Tool


 
96 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Source: Own Data From Classification To Segmentation Use a state-of-the-art vision foundation model that we train specifically on pathology images. Decompose the image into patches. Transform the process from image segmentation to patch classification.


 
97 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October, 2024Source: Own Data


 
Scalable and Fast By processing thousands of patches in parallel, We deliver more than a 100 times speed-up compared to manual annotation. 98 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Step 2: DNA Foundation Models at Nucleotide Resolution 99 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Nucleotide Transformer: Self-Supervised Learning on Genomes DNA database of genomes ATTCGACTATCCCGTAG … CCGTAG ATTCGA CTATCC CGTAG CCGTAG ATTCGA CTATCC CGTAG [MASK] random sample 12,000 bp sequence Tokenization (6-mers) random masking … … Probability Token 0.01 AAAAAA 0.01 AAAAAC 0.14 CCGTAG … 0.01 TTTTTT ATTCGA CTATCC CGTAG CCGTAG… Nucleotide Transformer (NT) Training prediction InstaDeep’s Nucleotide Transformer Models ● Architecture: Masked Language Models (Bert-style training). ● Datasets: Trained on 5 datasets from different sizes with inter and intra-species variability from the whole tree of life. ● Nucleotide Transformers (NT) ○ V1: 500M, 1B, 2.5B parameters (2022) ○ V2: 50M, 100M, 250M parameters (2023) ● Hardware ○ Cambridge-1 Datacenter (collaboration with Nvidia) ○ TPUv4-1024 Pod (collaboration with Google Cloud) We believe automated analysis and predictions from genome sequences have the potential to transform tomorrow's health care and agriculture. 100 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
NT Acquires Genomics Knowledge During Pre-Training 101 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Layer 1 Layer 5 Layer 21 T S N E 2. 5 B M ul ti -s pe ci es Even without supervision, information about genomic sequence features is learned in the “sequence representation” Intergenic Intron 5`UTR Intron Intergenic 3`UTR IntronIntergenic Source: Dalla-Torre et al., Nat Methods 2024 (in press)


 
SegmentNT: Inspiration from Computer Vision Segmentation Models C om pu te r V is io n G en om ic s 102 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Source: https://techxplore.com/news/2020-05-deep-image-recognition-ability-self-driving.html


 
SegmentNT: Annotating the Genome at Nucleotide Resolution 14 annotations per nucleotide (e.g. 700,000 predictions at 50kbp)enhancer promoter 5'UTR exon intron polyA 3'UTR insulators We fine-tuned the Nucleotide Transformers on 2.5M high-quality gene and regulatory elements annotations . 103 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
The resulting SegmentNT model segments sequences up to 50kbp with state-of-the-art performance for splicing, gene finding and regulatory element detection. SegmentNT: Annotating the Genome at Nucleotide Resolution 104 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Source: adapted from de Almeida et al., 2024 (in revision)


 
M od el P re di ct io ns G ro un d Tr ut h 105 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 SegmentNT: 700,000 Accurate Predictions over 50kbp in less than a second Source: de Almeida et al., 2024 (in revision)


 
SegmentNT is State-of-the-Art for Canonical Splicing Detection Splicing is a biological process that removes non-coding sequences (introns) from a primary messenger RNA (mRNA) transcript and joins the coding sequences (exons) together to create a mature mRNA. Dysregulated splicing can be a vulnerability in cancers. SegmentNT outperforms state-of-the-art SpliceAI for splicing event detection. M C C Test performance for splicing detection over full chromosomes on human reference genome 106 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Source: de Almeida et al., 2024 (in revision)


 
Alternative Splicing Events Detection with SegmentNT Alternative splicing events can disrupt protein production and cancer pathways and is associated with cancer development. We finetuned segmentNT to identify tumor antigen candidates from alternative splicing events, which represent potential targets for personalized cancer immunotherapies. After finetuning, SegmentNT can accurately predict alternative splicing events in cancer data. % D et ec te d Test performance for 2,000 alternative splicing detection over cancer data (TCGA LUAD data) 107 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Source: https://www.cancer.gov/tcga (data)


 
Step 3: AI-Enhanced Proteomics for Target Discovery 108 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Intracellular Proteins are Processed and Presented on MHC Complexes 109 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Created with BioRender.com


 
MHC-presented Epitopes are Immune System’s Window into the Intracellular Proteome 110 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Created with BioRender.com


 
Mass Spectrometry is the Current State-of-the-art for Detecting, Identifying, and Quantifying MHC-presented Epitopes 111 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Created with BioRender.com


 
BioNTech has a Massive Database of MS Validated MHC-bound Epitope Peptides from Studies Performed Internally and Externally ~200 million spectra 1.8 million peptides 112 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Created with BioRender.com


 
With the Power of AI, we can Dig Deeper into our Data to Identify Novel Therapeutic Targets 113 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 ~200 million spectra >>1.8 million peptides Created with BioRender.com


 
AI Maximizes our Ability to Discover Novel Cancer Targets Chromatographic Retention Time Observed Pr ed ic te d Signal intensities of peptide fragments 0 200 400 600 800 1000 1200 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Observed Predicted Up to 200% increase in recovered peptide IDs Identification of novel, tumor-specific peptides 1000 0 1 10 100 1000 0 1 10 100 Each point is a gene (with jitter) # Observations in Normal Tissue Samples # O bs er va tio ns in C an ce rT is su e Sa m pl es 114 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 m/z Re la tiv e In te ns ity Source : Internal Source : Internal


 
Targets are Validated with High-precision Mass Spectrometry Lung Squamous Cell Carcinoma: Example peptides validated using synthetic controls Validated targets are candidates for TCR-based therapies Underway: Novel in silico approaches to discover and enhance TCRs 115 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Source : Internal


 
Unassigned • LncRNAs • Circular RNAs • Non-canonical open reading frames Endogenous retroviruses Post-translational modifications ?? • Canonical human protein sequences The Challenge of HLA Mass Spectrometry 55-75% of data cannot be mapped to a known human peptide Assigned 116 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
Target Database WQIPLCTVR NRRRYTSSC YVFGGLASA FTASKTTW … ASLMPTYY Decoy Database RVTCLPIQW CSSTYRRRN ASALGGFVY WTTKSATF … YYTPMLSA 117 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 MS2 Spectrum #1 #2 #3 #4 #5 #6 #7 #8 #9 Traditional Mass Spectrometry Target Decoy Search MS2 Spectrum #1 #2 #3 #4 #5 #6 #7 #8 #9 ASLMPTYY ASALGGFVY RVTCLPIQW YVFGGLASA CSSTYRRRN YVFGGLASA WTTKSATF WQIPLCTVR NRRRYTSSC Peptide 9.7 2.1 11.6 15.1 7.2 19 5.3 0.8 12.8 Score


 
De Novo Peptide Sequencing 118 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 ASLMPTYY ASALGGFVY RVTCLPIQW YVFGGLASA CSSTYRRRN YVFGGLASA WTTKSATF WQIPLCTVR NRRRYTSSC PeptideMS2 Spectrum 1. 2. 3. 4. 5. 6. 7. 8. 9. Sequence to Sequence AI Model


 
InstaNovo - De Novo Peptide Sequencing with Deep Learning De novo peptide sequencing using deep learning. No database needed. The approach Model trained on 28 million labeled spectra matched to 742 thousand human peptides from ProteomeTools project. The dataset The models Source: De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments (https://www.biorxiv.org/content/10.1101/2023.08.30.555055v3) InstaNovo Autoregressive encoder- decoder transformer model with special MS2 spectrum encoder 119 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 InstaNovo+ Multinomial diffusion model to further improve performance using iterative refinement


 
InstaNovo - De Novo Peptide Sequencing with Deep Learning 120 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Preprint is available on BioRxiv https://www.biorxiv.org/content/10.1101/2023.08.30.555055v3 Code is available on GitHub https://github.com/instadeepai/instanovo • InstaNovo has performed well across most datasets • Increases PSM rate in HeLa proteomes • Expanded an immunopeptidomics dataset by 42% • Found peptides from individual-specific mutations, splice variants, and post-translational modifications. • Discovered new HLA peptides in immunopeptidome experiments The results Preprint and code available Source: De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments (https://www.biorxiv.org/content/10.1101/2023.08.30.555055v3)


 
Step 4: Protein Design: RiboMabTM Platform 121 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
122 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Source: https://investors.biontech.de/news-releases/news-release-details/biontech-and-instadeep-announce-strategic-collaboration-and-form/


 
123 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Our goal: engineer the H-L interface to prevent mispairing. Correct Pairing Antibody A 12.5% Antibody B 12.5% Mispairing: 75% H-L Mispairing H-H Mispairing Co-expressed and bi-specific antibodies hold significant therapeutic interest. However, these require precise pairing of heavy and light chains. H-H mispairing out of scope. Functional form of co-expressed antibodies. Enabling Antibody Co-Formulation / Co-Expression Heavy chain Light chain


 
124 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 We set out to introduce mutations to enforce orthogonality between neoCH1—neoCL v.s. wild- typeCH1—wild-typeCL. VH VL CH1 CL VH CH1 VL CL CH2 CH3 CH2 CH3 Antibody A neoCH1 & neoCL Antibody B w.-t.CH1 & w.-t.CL Interface mutations Abrogated binding Improved binding Our Protein Engineering Approach Source: RCSB.org


 
125 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 A combinatorial multi-objective optimisation problem we could solve thanks to our DeepChainTM platform and an efficient in silico – in vitro feedback loop. Binding energy estimations For all correctly paired and mispaired complexes. Mitigation of thermostability changes For all both heavy and light chains. Structural modelling Of each interface mutation. Key interaction understanding In depth knowledge of the physics of the interface. Finding Optimal Orthogonal Mutations


 
126 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 >90% correct pairing Matching the best patented designs on the market. Functional activity of antibodies confirmed. 0 20 40 60 80 100 Correct Pairing (%) Competition BioNTech/InstaDeepWild Type RiboMab Results obtained with DeepChain Corre t ) Source: Internal Data.


 
Step 5: Lab Automation 127 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024


 
128 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 In-lab runs In-silico runs Accelerate scientific discovery Lab Automation Could Transform Research & Development Challenges Change R&D is changing constantly Complexity High complexity of science and automation Transparency Scientists need transparency and trust


 
129 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Supporting Change Management Information Discovery and Transparency Fast Protocol Development and Change Cross-Team Interaction Machine Error Diagnosis Opportunities to Unlock Full Lab Automation with AI With the assistance of Artificial Intelligence, we see opportunities to overcome these challenges and unlock the full potential of laboratory automation


 
130 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Demo


 
Case Study: liquid handler error diagnosis 131 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Increased Efficiency Across Laboratory Research Activities *Stated times are for an unexpected error


 
AI based Assay optimization Hiding Complexity of devices Carve out different group requirements Connect to BioNTech's Digital R&D Backbone Scale up to other devices Vendor Agnostic Solution Interactive and natural language based DNA/ RNA Template optimization AI-led Assay Information 132 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 Lab Automation Demo – Future Outlook Technology capability established Next Steps


 
133 I AI Day © 2024 InstaDeep Ltd. & BioNTech SE I October 2024 One more thing...


 
AI Day Executive Summary Ryan Richardson Chief Strategy Officer BioNTech Ugur Sahin Founder & CEO BioNTech Karim Beguir CEO InstaDeep


 
Thank You! THE END