IBIAP_1000000002
|
Indian major basmati paddy seed varieties images dataset |
The dataset contains images of 10 out of 32 notified Indian basmati seeds varieties (by the Government of India). Indian basmati paddy varieties included in the dataset are 1121, 1509, 1637, 1718, 1728, BAS-370, CSR 30, Type-3/Dehraduni Basmati, PB-1 and PB-6. Moreover, several images of other seeds and related entities available in the household have also been included in the dataset. Thus, the dataset contains 11 classes such that ten classes contain images from ten different basmati paddy varieties. In contrast, the 11th class- named “Unknown” contains images from a mixture of two morphologically similar paddy varieties (1121 and 1509), different pulses, other grains and related food entities. The Unknown class is useful in discriminating the paddy seeds from other types of seeds and related food entities. All the images were captured (in standard conditions) manually using an apparatus developed in-house and a tablet with a five-megapixel camera (5MP). The camera was used to capture 3210 RGB coloured images in JPG format. The data pre-processing was performed to generate the ready-to-use images for training and testing machine learning-based models. AI-based paddy seed variety classification models have been developed using the dataset. The dataset can be used to generate different types of AI-based models for adulteration detection, automated classification models (along with independent devices) at the time of rice threshing, and to increase the classification potential (Supplementing images representing additional basmati varieties).
|
PPS_1000000002
(Download Images)
(Print Records)
|
Indian major basmati paddy seed varieties images dataset |
Seeds from ten major Indian basmati paddy varieties were collected from the Indian Agricultural Research Institute (IARI), New Delhi, India. A total of 46 different types of pulses, grains and other food entities were also collected in-house to capture images other than paddy seeds. Moreover, a mixture of two morphologically similar paddy varieties (1121 and 1509) was also prepared in a separate vessel and its images constituted a new class (including pulses, grains and other food entities). Thus, the dataset (accessible on Mendeley) comprises 11 classes (10 basmati paddy seeds varieties and other grains and related food entities). An apparatus was designed to capture the images in standard conditions. A Micromax Canvas TAB P802 tablet, attached to the apparatus, was used to capture 3210 images.
|
https://www.sciencedirect.com/science/article/pii/S2352340920313421 |
International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi |
Open Access
|
Aug. 12, 2024 |
IBIAP_1000000003
|
An Opportunistic screening mammography dataset from a screening-naive population |
Mammographic images dataset from Indian population containing 1869 FFDM images and 1708 SM images, providing breast-level imaging data (BIRADS category and breast density) along with ground truth labels based on histopathology for cancers and follow-up scans for noncancers.
|
MAMOS_1000000004
(Download Images)
(Print Records)
|
An Opportunistic screening mammography dataset from a screening-naive population |
Mammographic dataset from Indian population containing 1869 FFDM images and 1708 SM images and provides breast-level imaging data (BIRADS category and breast density) along with ground truth labels based on histopathology for cancers and follow-up scans for noncancers.
|
N/A |
All India Institute of Medical Sciences (AIIMS), New Delhi |
Open Access
|
Sept. 10, 2024 |
IBIAP_1000000005
|
FruitNet: Indian fruits image dataset with quality for machine learning applications |
Fast and precise fruit classification or recognition as per quality parameter is the unmet need of agriculture business. This is an open research problem, which always attracts researchers. Machine learning and deep learning techniques have shown very promising results for the classification and object detection problems. Neat and clean dataset is the elementary requirement to build accurate and robust machine learning models for the real-time environment. With this objective we have created an image dataset of Indian fruits with quality parameter which are highly consumed or exported. Accordingly, we have considered six fruits namely apple, banana, guava, lime, orange, and pomegranate to create a dataset. The dataset is divided into three folders (1) Good quality fruits (2) Bad quality fruits, and (3) Mixed quality fruits each consists of six fruits subfolders. Total 19,500+ images in the processed format are available in the dataset. We strongly believe that the proposed dataset is very helpful for training, testing and validation of fruit classification or reorganization machine leaning model.
|
PPS_1000000009
(Download Images)
(Print Records)
|
FruitNet: Indian Fruits Dataset with quality (Good, Bad & Mixed quality) |
The profit percentage share of fruit market is substantial with respect to the total agriculture output. In the agro-industry fast and accurate fruit classification is the highest need. The fruits can be classified into different classes as per their external features like shape, size and color using some computer vision and deep learning techniques. High quality images of fruits are required to solve fruit classification and recognition problem. To build the machine learning models, neat and clean dataset is the elementary requirement. With this objective we have created the dataset of six popular Indian fruits named as “FruitNet”. This dataset consists of 19500+ high-quality images of 6 different classes of fruits in the processed format. The images are divided into 3 sub-folders 1) Good quality fruits 2) Bad quality fruits and 3) Mixed quality fruits. Each sub-folder contains the 6 fruits images i.e. apple, banana, guava, lime, orange, and pomegranate. Mobile phone with a high-end resolution camera was used to capture the images. The images were taken at the different backgrounds and in different lighting conditions. The proposed dataset can be used for training, testing and validation of fruit classification or reorganization model.
|
https://www.sciencedirect.com/science/article/pii/S2352340921009616 |
Vishwakarma University, Pune |
Open Access
|
Oct. 23, 2024 |
IBIAP_1000000007
|
Facilitating spice recognition and classification: An image dataset of Indian spices |
This data paper presents a comprehensive visual dataset of 19 distinct types of Indian spices, consisting of high-quality images meticulously curated to facilitate various research and educational applications. The dataset includes extensive imagery of the following spices: Asafoetida, Bay Leaf, Black Cardamom, Black Pepper, Caraway Seeds, Cinnamon Stick, Cloves, Coriander Seeds, Cubeb Pepper, Cumin Seeds, Dry Ginger, Dry Red Chilly, Fennel Seeds, Green Cardamom, Mace, Nutmeg, Poppy Seeds, Star Anise, and Stone Flowers. Each image in the dataset has been captured under controlled conditions to ensure consistency and clarity, making it an invaluable resource for studies in food science, agriculture, and culinary arts. The dataset can also support machine learning and computer vision applications, such as spice recognition and classification. By providing detailed visual documentation, this dataset aims to promote a deeper understanding and appreciation of the rich diversity of Indian spices.
|
PPS_1000000011
(Download Images)
(Print Records)
|
Facilitating spice recognition and classification: An image dataset of Indian spices |
This data paper presents a comprehensive visual dataset of 19 distinct types of Indian spices, consisting of high-quality images meticulously curated to facilitate various research and educational applications. The dataset includes extensive imagery of the following spices: Asafoetida, Bay Leaf, Black Cardamom, Black Pepper, Caraway Seeds, Cinnamon Stick, Cloves, Coriander Seeds, Cubeb Pepper, Cumin Seeds, Dry Ginger, Dry Red Chilly, Fennel Seeds, Green Cardamom, Mace, Nutmeg, Poppy Seeds, Star Anise, and Stone Flowers. Each image in the dataset has been captured under controlled conditions to ensure consistency and clarity, making it an invaluable resource for studies in food science, agriculture, and culinary arts. The dataset can also support machine learning and computer vision applications, such as spice recognition and classification. By providing detailed visual documentation, this dataset aims to promote a deeper understanding and appreciation of the rich diversity of Indian spices.
|
https://doi.org/10.1016/j.dib.2024.110936 |
Vishwakarma University, Pune |
Open Access
|
Dec. 12, 2024 |
IBIAP_1000000008
|
PSFD-Musa: A dataset of banana plant, stem, fruit, leaf, and disease |
Varieties of banana plants can be found worldwide. It grows in Tropical regions and requires a hot and humid climate to develop itself. It is seen that each part of a banana plant can get infected with different types of bacterial, fungal, and viral diseases. Out of which many of them are dangerous diseases that affect it and its production. Deficiency diseases too can incur a heavy loss over the banana plantations. To get familiarized with different varieties of banana plants and to know some of the common diseases that affect the plants, we have created a PSFD-Musa DATASET, for the banana plants that are indigenously found in different parts of Assam. The dataset is divided into 3 subfolders. The first folder comprises the images of different varieties of banana plants which further consists of 7 classes namely Malbhog fruit (Musa assamica), Malbhog leaf (Musa assamica), Jahaji fruit (Musa chinensis), Jahaji stem (Musa chinensis), Jahaji leaf (Musa chinensis), Kachkol fruit (Musa paradisiaca L.), Bhimkol leaf (M. Balbisiana Colla). The second folder comprises different diseases that affect the banana plants which again comprises 7 classes namely: Bacterial Soft Rot, Banana Fruit Scarring Beetle, Black Sigatoka, Yellow Sigatoka, Panama disease, Banana Aphids, and PseudoStem Weevil. And the last folder is of the deficiencies that hamper the plants which are of 1 class namely: Potassium deficiency. The images provided here are raw as well as processed data and are in the format of .jpg.
|
PPS_1000000012
(Download Images)
(Print Records)
|
PSFD-Musa: A dataset of banana plant, stem, fruit, leaf, and disease |
In recent times, the classification and identification of different fruits and food crops have become a necessity in the field of agricultural science; for sustainable growth. Probable processes have been developed worldwide to improve the production of food crops. Problem-specific, clean and crisp datasets are also lagging in the sector. This article introduces an image dataset of varieties of banana plants and the diseases related to them. The varieties of Banana plants that we have considered in the dataset are the Malbhog (Musa assamica), Jahaji (Musa chinensis), Kachkol (Musa paradisiaca L.), Bhimkol (M. Balbisiana Colla). And the diseases and pathogens that we have considered here are the Bacterial Soft Rot, Banana Fruit Scarring Beetle, Black Sigatoka, Yellow Sigatoka, Panama disease, Banana Aphids, and Pseudo-Stem Weevil. A dataset of Potassium deficiency has been also considered in this article. A total of 8000+ processed images are present in the dataset. The purpose of this article is to provide the Researchers and Students in getting access to our dataset that would help them in their research and in developing some machine learning models.
|
https://www.sciencedirect.com/science/article/pii/S2352340922006242 |
Gauhati University, Assam |
Open Access
|
Dec. 19, 2024 |
IBIAP_1000000009
|
High-resolution AI image dataset for diagnosing oral submucous fibrosis and squamous cell carcinoma |
Oral cancer is a global health challenge with a difficult histopathological diagnosis. The accurate histopathological interpretation of oral cancer tissue samples remains difficult. However, early diagnosis is very challenging due to a lack of experienced pathologists and inter- observer variability in diagnosis. The application of artificial intelligence (deep learning algorithms) for oral cancer histology images is very promising for rapid diagnosis. However, it requires a quality annotated dataset to build AI models. We present ORCHID (ORal Cancer Histology Image Database), a specialized database generated to advance research in AI-based histology image analytics of oral cancer and precancer. The ORCHID database is an extensive multicenter collection of high-resolution images captured at 1000X effective magnification (100X objective lens), encapsulating various oral cancer and precancer categories, such as oral submucous fibrosis (OSMF) and oral squamous cell carcinoma (OSCC). Additionally, it also contains grade-level sub-classifications for OSCC, such as well- differentiated (WD), moderately-differentiated (MD), and poorly-differentiated (PD). The database seeks to aid in developing innovative artificial intelligence-based rapid diagnostics for OSMF and OSCC, along with subtypes.
|
HISTOS_1000000013
(Download Images)
(Print Records)
|
High-resolution AI image dataset for diagnosing oral submucous fibrosis and squamous cell carcinoma |
The number of images available in each of the five classes(folders), which are as follows, Normal, OSMF, WDOSCC, MDOSCC, and PDOSCC. Each class folder consists of subfolders representing different tissue slides collected from different patients. We have made an initial attempt to provide a comprehensive image database for two of the most prominent oral conditions, OSCC and OSMF. We believe that more such databases will be made publicly available in the near future. These comprehensive image databases will facilitate the development of accurate AI-based diagnostic tools for oral diseases, ultimately improving patient care and outcomes in the field of oral healthcare. In future, integration of databases comprising molecular markers, transcriptome, metabolome, and other biomarkers, combined with oral histological image through advanced AI-driven imaging techniques, holds great promise in improving diagnostic accuracy and precision. This potential has already been observed in the diagnosis of lung and breast cancers. This expansion will aid in developing a more comprehensive AI-driven diagnostic tool.
|
https://doi.org/10.1038/s41597-024-03836-6 |
Jamia Millia Islamia, Delhi |
Open Access
|
Jan. 13, 2025 |
IBIAP_1000000010
|
Histopathological images of thyroid lesions |
The dataset comprises 154,498 images derived from 134 slides, representing 125 thyroid nodules from 118 patients. These images encompass six types of thyroid lesions: NIFTP, HTN, FA, IEFVPTC, IFSPTC, and CPTC.
|
HISTOS_1000000014
(Request Access)
|
Utility of Artificial Intelligence in differentiating Non- invasive Follicular Thyroid Neoplasm with Papillary like Nuclear Features from other follicular-patterned thyroid benign and malignant lesions |
The introduction of the term non-invasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) in 2016 marked a pivotal shift in the classification of encapsulated follicular variants of papillary thyroid carcinoma (eFVPTC) lacking invasive features. While this reclassification significantly reduced overtreatment, the histopathological diagnosis of NIFTP remains challenging due to overlapping features with other thyroid lesions and inter-observer variability. This study presents a novel deep learning (DL)-based, three-stage diagnostic pipeline for distinguishing NIFTP from a wide spectrum of thyroid lesions, including benign and malignant mimics. By replicating the diagnostic strategy of histopathologists, the algorithm evaluates architectural patterns and nuclear features with high precision. Our approach has a potential to enhance diagnostic accuracy in a cost-effective and scalable manner, complementing existing diagnostic methods and thus optimizing clinical decision-making and improving the management of patients with thyroid neoplasms.
|
N/A |
- All India Institute of Medical Sciences (AIIMS), New Delhi
- International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi
|
Managed Access
|
Jan. 27, 2026 |
IBIAP_1000000011
|
Chákṣu: A glaucoma specific fundus image database |
We introduce Chákṣu–a retinal fundus image database for the evaluation of computer-assisted glaucoma prescreening techniques. The database contains 1345 color fundus images acquired using three brands of commercially available fundus cameras. Each image is provided with the outlines for the optic disc (OD) and optic cup (OC) using smooth closed contours and a decision of normal versus glaucomatous by five expert ophthalmologists. In addition, segmentation ground-truths of the OD and OC are provided by fusing the expert annotations using the mean, median, majority, and Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm. The performance indices show that the ground-truth agreement with the experts is the best with STAPLE algorithm, followed by majority, median, and mean. The vertical, horizontal, and area cup-to-disc ratios are provided based on the expert annotations. Image-wise glaucoma decisions are also provided based on majority voting among the experts. Chákṣu is the largest Indian-ethnicity-specific fundus image database with expert annotations and would aid in the development of artificial intelligence based glaucoma diagnostics.
|
OPTHS_1000000015
(Download Images)
(Print Records)
|
Chákṣu: A glaucoma specific fundus image database |
Glaucoma is a chronic, irreversible, and slowly progressing optical neuropathy that damages the optic nerve. Depending on the extent of damage to the optic nerve, glaucoma can cause moderate to severe vision loss. Glaucoma is asymptomatic in the early stages. It is not curable, and the lost vision cannot be restored. However, by early screening and detection, the progression of the disease could be slowed down. Color fundus imaging is the most viable non-invasive means of examining the retina for glaucoma. The widest application of fundus imaging is in optic nerve head or optic disc examination for glaucoma management. Fundus imaging is widely used due to the relative ease of establishing a digital baseline for assessing the progression of the disease and the effectiveness of the treatment. Fundus imaging technology is developing rapidly and several exciting products with fully automated software applications for retinal disease diagnosis are on the horizon. State-of-the-art tools based on image processing and deep learning algorithms are becoming increasingly useful and relevant. However, before deploying them in a clinical setting, a thorough validation over benchmark datasets is essential. The development of a large database with multiple expert annotations is a laborious and tedious task. A large annotated glaucoma-specific fundus image database is lacking, which is a gap that the Chákṣu database reported in this paper attempts to fill. Several retinal fundus image databases are publicly available to facilitate research and performance comparison of segmentation and classification algorithms.
|
https://doi: 10.1038/s41597-023-01943-4 |
Indian Institute of Science, Bangalore |
Open Access
|
March 16, 2025 |
IBIAP_1000000012
|
Histopathological imaging database for oral cancer analysis |
The repository is composed of 1224 images divided into two sets of images with two different resolutions. First set consists of 89 histopathological images with the normal epithelium of the oral cavity and 439 images of Oral Squamous Cell Carcinoma (OSCC) in 100x magnification. The second set consists of 201 images with the normal epithelium of the oral cavity and 495 histopathological images of OSCC in 400x magnification. The images were captured using a Leica ICC50 HD microscope from Hematoxyline and Eosin (H&E) stained tissue slides collected, prepared and catalogued by medical experts from 230 patients. A subset of 269 images from the second data set was used to detect OSCC based on textural features. Histopathology plays a very important role in diagnosing a disease. It is the investigation of biological tissues to detect the presence of diseased cells in microscopic detail. It usually involves a biopsy. Till date biopsy is the gold-standard test to diagnose cancer. The biopsy slides are examined based on various cytological criteria under a microscope. Therefore, there is a high possibility of not retaining uniformity and ensuring reproducibility in outcomes. Computational diagnostic tools, on the other hand, facilitate objective judgments by making the use of the quantitative measure. This dataset can be utilized in establishing automated diagnostic tool using Artificial Intelligence approaches.
|
HISTOS_1000000016
(Download Images)
(Print Records)
|
Histopathological imaging database for oral cancer analysis |
This is the first dataset containing histopathological images of the normal epithelium of the oral cavity and OSCC. The images were captured using a Leica ICC50 HD microscope from Hematoxyline and Eosin (H&E) stained tissue slides collected, prepared and catalogued by medical experts from 230 patients. Invasion of the tumour into the basement membrane is a very important architectural feature for diagnosing OSCC. Researchers can use 100x magnified images for architectural or tissue level analysis. These can also be used in feature extraction like shape, texture or colour feature extraction, segmentation of the epithelial layer, invasion of tumour into the basement membrane, or in categorizing images in normal and malignant category considering the whole architecture of the images. 400x magnified images can be used for tissue level analysis, such as in the automated diagnosis of the disease based on the textural feature. This dataset can be utilized in establishing automated diagnostic tool using Artificial Intelligence approaches. These data can be used as a gold standard for histopathological analysis of OSCC. This dataset can be used for a comparative evaluation of one's experimental findings in future when more dataset of such kind is available.
|
https://doi.org/10.1016/j.dib.2020.105114 |
Institute of Advanced Study in Science and Technology, Guwahati, Assam |
Open Access
|
March 24, 2025 |