Today, we’re thrilled to announce a bold new step for Ginkgo: the launch of our model API, a powerful tool designed to bring biological AI models directly to machine learning scientists.
Powered by our partnership with Google Cloud, we’ll be making this API publicly available on our website today. Researchers, developers, and enterprise teams will also be able to access the models that power our API on Google’s Model Garden soon.
With this programmer-friendly ultra-low cost API, Ginkgo is making our internally-developed AI tools available to anyone, and we couldn’t be more excited to begin sharing our work. The interface provides an easy and scalable way to access sophisticated models trained on protein and DNA data, starting with our first release: a machine learning model trained on a proprietary Ginkgo dataset. Read more about our first model — AA-0, a large-scale model trained on 2+ billion proprietary Ginkgo protein sequences — here.
We’re excited to see how the community builds on top of these models and our API to enable a wide range of applications in biology.
While our mission is to make biology easier to engineer, we don’t have a monopoly on interesting and creative uses of language models and other AI innovations in biology. That’s why we’re making these models as affordable and accessible as possible — including our first model trained on a proprietary Ginkgo dataset — so that you can build new applications on top of them today. We’re excited to see users build tools like iterative protein design programs that call our protein generation API or to use our embedding API to compute features for a clustering algorithm.
This and future protein LLMs empower companies to generate novel insights and accelerate the discovery of new therapeutics. By harnessing the power of AI to analyze and understand complex protein structures and interactions, researchers and enterprises can streamline their research pipelines, optimize lead identification, and ultimately bring life-saving medicines to market faster and more efficiently. Building on models that learn from Ginkgo’s private data, unavailable to the public, can enable companies to unlock hidden patterns and potential therapeutic targets that would otherwise remain elusive.
This is a new chapter for Ginkgo, and we’re just getting started. As we continue to develop and release more models and services, we’re excited to see how you’ll use these tools to drive innovation in biology. Sign up below to join our community and be the first to know about model releases and new features.
We have a multitude of models under development, spanning machine learning methods like language modeling and diffusion for conditional design. To begin, our first protein language model release will support two use-cases:
- Generation via Masked Language Modeling: given a sequence of amino acids with one or more <mask> tokens, the model will complete the sequence.
- Embedding calculation: Calculate the final hidden layer of the trained model to extract valuable representations for downstream tasks. To begin, our model returns the mean-pooled representation across the length axis.
Over the next year, we’ll roll out more models and expand the API’s capabilities, building a robust suite of tools that will enable you to solve complex problems in drug discovery, synthetic biology, genomics, and more using the latest machine learning methods. Visit our portal to access our model API and explore our first model.
Flexibility is everything. Alongside our first proprietary model, which leverages unique datasets from Ginkgo, you’ll also have access to publicly available models like ESM2. This means you can explore and experiment with different approaches, all through a single streamlined platform.
We’re committed to making advanced machine learning tools accessible, which is why our API comes with competitive pricing and a free tier. We’ve structured our costs to make it easy for you to jump in, experiment, and get predictions without worrying about high fees. Our initial models will have a free tier, and our introductory pricing will be approximately $0.18 per million tokens. This means for a protein with around 500 amino acids, users should be able to get predictions on 2,000 sequences for roughly 20 cents.
Ready to see what’s possible? Visit our developer portal to access everything you need to start using the API’s free tier, including detailed documentation, tutorials, and sample code. Access the portal today and be among the first to explore our new API and first protein LLM. — And to get you started, we’ll provide 2,000 sequences (i.e. ~1M tokens) of free inference in our initial language models!