8KGF image
Deposition Date 2023-08-18
Release Date 2024-09-04
Last Version Date 2025-09-24
Entry Detail
PDB ID:
8KGF
Title:
Structure of AmCas12a with crRNA
Biological Source:
Source Organism:
Anaeroglobus (Taxon ID: 156454)
Method Details:
Experimental Method:
Resolution:
2.90 Å
Aggregation State:
PARTICLE
Reconstruction Method:
SINGLE PARTICLE
Macromolecular Entities
Polymer Type:polypeptide(L)
Molecule:CRISPR-associated endonuclease Cas12a
Chain IDs:A
Chain Length:1365
Number of Molecules:1
Biological Source:Anaeroglobus
Polymer Type:polyribonucleotide
Molecule:RNA (44-MER)
Chain IDs:B (auth: G)
Chain Length:44
Number of Molecules:1
Biological Source:Anaeroglobus
Ligand Molecules
Primary Citation
Discovery of CRISPR-Cas12a clades using a large language model.
Nat Commun 16 7877 7877 (2025)
PMID: 40849498 DOI: 10.1038/s41467-025-63160-4

Abstact

CRISPR-Cas systems revolutionize life science. Metagenomes contain millions of unknown Cas proteins. Traditional mining relies on protein sequence alignments. In this work, we employ an evolutionary scale language model (ESM) to learn the information beyond sequences. Trained with CRISPR-Cas data, ESM accurately identifies Cas proteins without alignment. Limited experimental data restricts feature prediction, but integrating with machine learning enables trans-cleavage activity prediction of uncharacterized Cas12a. We discover 7 undocumented Cas12a subtypes with unique CRISPR loci. Structural analyses reveal 8 subtypes of Cas1, Cas2, and Cas4. Cas12a subtypes display distinct 3D-folds. CryoEM analyses unveil unique RNA interactions with the uncharacterized Cas12a. These proteins show distinct double-strand and single-strand DNA cleavage preferences and broad PAM recognition. Finally, we establish a specific detection strategy for the oncogene SNP without traditional Cas12a PAM. This study highlights the potential of language models in exploring undocumented Cas protein function via gene cluster classification.

Legend

Protein

Chemical

Disease

Primary Citation of related structures
Feedback Form
Name
Email
Institute
Feedback