Portal Weekly #30: LLM-based helpers, an atomistic materials chemistry foundation model, and more.
Join Portal and stay up to date on research happening at the intersection of tech x bio 🧬
Hi everyone 👋
Welcome to another issue of the Portal newsletter where we provide weekly updates on talks, events, and research in the TechBio space!
Thank you to everyone who took the time to complete the Portal Weekly feedback survey last week. Overall, it seemed like you thought that we were doing a good job with the current format! In the coming weeks, we’re going to try implementing some of the things you indicated you’d be open to - stay tuned! If you didn’t get a chance to fill it out last week but would like to give us input, click this link.
💬 Upcoming talks
LoGG continues next week with a presentation by Haoteng Yin, who will discuss techniques developed to make subgraph-based representation learning scalable, accurate and faster than ever.
Join us live on Zoom on Monday, January 15th at 11 am ET. Find more details here.
We’re still getting things booted up in the new year for the other talk series. Check back on the Events page on Portal for announcements regarding M2D2 and CARE talks.
💻 Latest Blogs
Last week we hosted the first blog post of 2024! Marc Oeller introduces you to CamSol and CamSol-PTM, tools to address the seemingly mundane but very necessary prediction of solubility for proteins and peptides. Check it out here!
If you enjoy this newsletter, we’d appreciate it if you could forward it to some of your friends! 🙏
Let’s jump right in!👇
📚 Community Reads
LLMs for Science
BioLLMBench: A Comprehensive Benchmarking of Large Language Models in Bioinformatics
Large Language Models (LLMs) are a source of a lot of excitement for ML in general, and everyone is trying to adapt them to specific domains. This study tries to systematically evaluate how three widely used LLMs (GPT-4, Bard and LLaMA) can help in bioinformatics research. They design a benchmarking framework that includes specifically designed task metrics for challenges that a bioinformatician may face from day to day, including domain expertise, data visualization, developing ML models and mathematical problem solving. They find that GPT-4 was generally best in all tasks except mathematical problem solving, where Bard did better. Notably, none of the models were able to do particularly well at research paper summarization. Still a while before they replace this newsletter!
ML for Small Molecules
ADMET-AI: A machine learning ADMET platform for evaluation of large-scale chemical libraries
Chemical space is combinatorially vast, but we’ve gotten quite good at describing it and making compounds (for example, ZINC - a database of >230 million purchasable compounds for virtual screening). One challenge is filtering through all of these molecules for the ones that have favourable druglike properties (ADMET). This preprint describes ADMET-AI, a machine learning platform provided both as a website and a Python package that provides fast and accurate ADMET predictions. It has the highest average rank on the TDC ADMET Benchmark Group leaderboard, and can make predictions for 1 million molecules in 3.1 hours on a local computer with 32 CPU cores and a GPU.
ML for Atomistic Simulations
A foundation model for atomistic materials chemistry
Density functional theory (DFT) is a computational quantum mechanical modeling method used to investigate the electronic structure of atoms and molecules. It’s successful but computationally costly; recent ML techniques are faster but usually less accurate, and must be custom-tailored for particular systems of interest. This study introduces a model that uses MACE architecture (keeping only “essential” elements of equivariant graph neural networks) to potentially run molecular dynamics on a diverse range of molecules and materials. Foundation models like this are seen as a democratizing force in many domains, since they can lower barriers to entry and enable ML applications to new problems.
ML for Proteins
De Novo Design of Diverse Small Molecule Binders and Sensors Using Shape Complementary Pseudocycles
Being able to custom-design proteins to bind to and sense a target small molecule would be hugely useful in both research and industry. It’s challenging because the contact interface between the protein and molecule is fairly small, so you need high complementarity. This study uses designed pseudocyclic scaffolds (a repeating structural unit surrounding a central pore or pocket) that satisfy this requirement. Notably, they are able to bind polar flexible molecules for the first time. This is a Baker Lab paper, so they back up their designs with benchwork validation of their computational predictions - always nice to see!
ML for Omics
An AI Agent for Fully Automated Multi-omic Analyses
LLM-based models can help with various tasks, including bioinformatics. This study presents an agent designed for multi-omic analyses, using multiple LLM backends. The model delivers step-by-step plans for various bioinformatic tasks, and importantly the accuracy of these outputs is evaluated by “expert bioinformaticians” (although the paper is a bit light on the details). This kind of tool can be useful at the very least in making bioinformatics more accessible for newcomers.
Open Source
Cryo2StructData: A Large Labeled Cryo-EM Density Map Dataset for AI-based Modeling of Protein Structures
Advantages of cryo-electron microscopy in structure determination include its ability to visualize large biomolecular complexes and show multiple conformations. Turning cryo-EM density maps into atomic models is hard without template-based models. This preprint describes a dataset of 7,600 preprocessed cryo-EM density maps for training and testing ML methods to build accurate atomic models.
Think we missed something? Join our community to discuss these topics further!
🎬 Latest Recordings
LoGG
you can always catch up on previous recordings on our YouTube channel or the Portal Events page!
Portal is the home of the TechBio community. Join here and stay up to date on the latest research, expand your network, and ask questions. In Portal, you can access everything that the community has to offer in a single location:
M2D2 - a weekly reading group to discuss the latest research in AI for drug discovery
LoGG - a weekly reading group to discuss the latest research in graph learning
CARE - a weekly reading group to discuss the latest research in causality
Blogs - tutorials and blog posts written by and for the community
Discussions - jump into a topic of interest and share new ideas and perspectives
See you at the next issue! 👋