Study Info
INDA Accessions: INRP000325
INSDC Accessions: PRJEB89352, ERP172379
- Title: Germline Variant Detection in Breast Cancer via Whole Exome Sequencing of Blood Samples
- Data Type : Exome Sequencing
- Descriptive Title: This dataset comprises whole exome sequencing WES data derived from blood samples of 16 patients with clinically confirmed breast cancer. Genomic DNA was isolated from peripheral blood and subjected to exome capture and sequencing on the Illumina NextSeq 550 platform. This study aims to identify germline variants associated with breast cancer susceptibility and provide a reference for paired tumour normal comparative studies. Bioinformatics processing included read quality control, alignment to the human reference genome, variant calling, and functional annotation.
-
Organism:
Scientific Name(Taxon Id): Homo sapiens (9606) Common Name: human
Other Info
- Abstract: The 16 breast cancer DNA sequences were aligned to the hg38 reference genome using high-throughput sequencing tools using BWA MEM. Variant calling was then performed on each sample, generating individual VCF files that contained information on genetic variants, including single nucleotide polymorphisms SNPs and insertions or deletions indels. These VCF files were filtered to remove low-quality variants based on factors such as depth of coverage, allele frequency, and quality score. The filtered VCF files were then converted into gVCF format using tools like GATK, preserving both variant and non-variant positions for a comprehensive genome representation. After the gVCF file was generated, variants were separated into SNPs and indels for focused analysis. Subsequently, ANNOVAR annotation was performed on both SNP and indel sets, utilizing a wide range of databases for deeper insight into the variants. These included RefGene for gene annotations, dbNSFP v4.7a for functional prediction of SNPs, InterVar 20180118 for clinical interpretation of variants, Kaviar v5.0 for population frequency data, and AbraOM for annotating structural variations and genomic rearrangements. In addition, MCAP and REVEL were used for assessing the pathogenicity of variants, while dbSNP v150 provided population-based variant frequency information. Clinically significant variants were cross-referenced with ClinVar 20240917 to identify known pathogenic variants. Finally, RegSNPintron was used to annotate intronic variants for regulatory potential.
- Linked publications:
- Center Name: Dr Nanaocha Sharma Institute of Bioresources and Sustainable Development Imphal
- Number of Base(Total) Mbp: 0
- Size in bytes(Total): 98,049,788,240
- Number of sample:
- Number of Runs:
- Number of Sequences:
- Number of Assembly:
Back