Planned Maintenance: Some services may turn out to be unavailable from 15th January, 2026 to 16th January, 2026. We apologize for the inconvenience!

7RGR image
Deposition Date 2021-07-15
Release Date 2021-07-28
Last Version Date 2024-04-03
Entry Detail
PDB ID:
7RGR
Keywords:
Title:
Lysozyme 056 from Deep neural language modeling
Biological Source:
Source Organism(s):
Expression System(s):
Method Details:
Experimental Method:
Resolution:
2.48 Å
R-Value Free:
0.29
R-Value Work:
0.25
R-Value Observed:
0.26
Space Group:
P 21 21 21
Macromolecular Entities
Polymer Type:polypeptide(L)
Molecule:Artificial protein L056
Chain IDs:A (auth: B), B (auth: A)
Chain Length:168
Number of Molecules:2
Biological Source:synthetic construct
Primary Citation
Large language models generate functional protein sequences across diverse families.
Nat.Biotechnol. ? ? ? (2023)
PMID: 36702895 DOI: 10.1038/s41587-022-01618-2

Abstact

Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties. ProGen can be further fine-tuned to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Artificial proteins fine-tuned to five distinct lysozyme families showed similar catalytic efficiencies as natural lysozymes, with sequence identity to natural proteins as low as 31.4%. ProGen is readily adapted to diverse protein families, as we demonstrate with chorismate mutase and malate dehydrogenase.

Legend

Protein

Chemical

Disease

Primary Citation of related structures
Feedback Form
Name
Email
Institute
Feedback