Datasets

DISCLAIMER
The Library do not endorse, own or receive any promotional benefits from the list of Open Access Sites (OAS) included in this directory.
Feel free to suggest @ <asst.librarian@nagalanduniversity.ac.in> any relevant sites which you believed are useful for the students and staffs of Nagaland University.
Regards…
 
"Open Access Resources are research and reference materials which are made available by the site owners/publishers/aggregators, etc., to the general readers, to use freely for teaching-learning, and research purposes with proper acknowledgements."
 
Name Link Description
U.S. Government’s open data https://catalog.data.gov/dataset This data repository contains around 236,476 datasets in different fields such as Agriculture,  Climate, Education, Finance, Health, etc. It also has a search box that helps you to find out the data you are looking for. The data sets are public in nature. In addition, we can download datasets in different formats. The data is maintained by the GitHub repository. Data.gov is a data set aggregator and a home for U.S. Government’s open data.
Kaggle https://www.kaggle.com/datasets This source contains numerous amounts (approx. 22,325) of real-life datasets of all sizes and in many different formats. Each dataset is associated with the “kernels”, most of which are written in python. These kernels help the data scientists to analyze the data using different notebooks. In addition, some notebooks consist of algorithms that help in prediction problems.
Google Dataset https://datasetsearch.research.google.com/ On searching any dataset, thousands of different repositories of datasets are unified that make data discoverable. 
Open Government Data (OGD) India https://data.gov.in/ Open Government Data (OGD) Platform India is a single point of access to datasets in open formats published by Ministries and Departments. The source consists of datasets on real-life of all shapes and sizes along with their API’s and visualizations. The datasets are available for public use. 
Microsoft Research Open Data https://msropendata.com/ Microsoft along with the external research community launched a repository in July 2018 known as “Microsoft Research Open Data”. It consists of curated datasets that were used in the published research studies. In addition, datasets are present in different fields such as Computer Science, Biology, HealthCare, Mathematics, etc. Above all, it offers a wide variety of formats for downloading datasets.
Socrata Open Data https://opendata.socrata.com/ Socrata OpenData is a portal that contains multiple datasets. This broad range of information makes it more attractive and useful among data scientists and other researchers. You can look for the data in the tabular form in the browser or can use some built-in visualization tools as well.
UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets.php UCI Machine Learning Repository is one of the most famous data repositories. If one is looking for machine learning datasets, then the UCI Machine Learning Repository should be the first choice. Above all, currently, it contains 487 datasets from different fields and labels like domain, and purpose of the problem like Classification/Regression.
Microsoft Research Open Data https://msropendata.com/ Microsoft along with the external research community launched a repository in July 2018 known as “Microsoft Research Open Data”. It consists of curated datasets that were used in the published research studies. In addition, datasets are present in different fields such as Computer Science, Biology, HealthCare, Mathematics, etc. Above all, it offers a wide variety of formats for downloading datasets.
Academic Torrents https://academictorrents.com/ Academic Torrent is not a mainstream yet powerful repository to share data. The main purpose behind its creation is an attempt to make academic datasets and research papers available via BitTorrent. However, the main focus is to share datasets from different research papers.
Zenodo https://zenodo.org/ Commissioned by the EC to support their nascent Open Data policy by providing a catch-all repository for EC funded research. CERN, an OpenAIRE partner and pioneer in open source, open access and open data, provided this capability and Zenodo was launched in May 2013. 
BuzzFeed News https://github.com/BuzzFeedNews BuzzFeed makes the data sets, analysis, libraries, tools, and guides used in its articles available on Github. Check them out to learn from some of the best! 
Makeover Monday https://www.makeovermonday.co.uk/data/ Makeover Monday was an initiative started in the first week of 2016, between Andy Kriebel and Andy Cotgreave. This repository is mostly for data visualisations.
Awesome Public Datasets https://github.com/awesomedata/awesome-public-datasets Awesome Public Datasets is a repository on GitHub of high quality topic-centric public data sources. They are collected and tidied from blogs, answers, and user responses. Almost all of these are free with a few exceptions here and there
Reddit/Forum https://www.reddit.com/r/datasets/ A place to share, find, and discuss Datasets. You can request datasets from other subsribers as well as share and contribute your own.
Data World https://data.world/datasets/open-data Data World is an open data repository containing data contributed by thousands of users and organizations all across the world. You will find data from healthcare field and other areas as well.
PANGAEA Dataportal https://dataportals.pangaea.de/wds/index.php PANGAEA - Data Publisher for Earth & Environmental Science is a digital data library and a data publisher for earth system science.
OECD Stat https://stats.oecd.org/ OECD.Stat includes data and metadata for OECD countries and selected non-member economies.
UN Data http://data.un.org/ 32 databases - 60 million records
The World Bank Data Catalogue https://datacatalog.worldbank.org/home The Data Catalog is designed to make World Bank's development data easy to find, download, use, and share. It includes data from the World Bank's microdata, finances and energy data platforms, as well as datasets from the open data catalog…
AIDDATA https://www.aiddata.org/datasets Integrate subnational data from anywhere in the world into a single CSV file
WHO Health Data https://www.who.int/data/gho/  WHO featured datasets on themes related to health including child health, mental health, HIV, environment and pollution, etc. 
EUROPA  https://data.europa.eu/en  With almost 1382665+ datasets available from 36 countries, EUROPA is one of the best open data providers in the EU for insights on energy, education, commerce, agriculture, international issues, and much more.
UK Data Service https://ukdataservice.ac.uk/  The UK Data Service, a search engine for recent datasets on social media trends, politics, finance, international relations, and more happening in the UK.
Open Data Network https://www.opendatanetwork.com/  This source allows users to look for data using a robust search engine. Apply filters and pull data on everything from public safety, finance, infrastructure, housing and development, etc.
UN Comtrade https://comtrade.un.org/  Free access to detailed global trade data. UN Comtrade is a repository of official international trade statistics and relevant analytical tables. All data is accessible through API.
Financial Times https://markets.ft.com/data/  The Financial Times may look like an online newspaper but is actually one of the most robust data sources for global markets, the Americas, Europe and Africa, and Asia-Pacific.
FBI's Crime Data Explorer (CDE) https://crime-data-explorer.app.cloud.gov/pages/home  The FBI's Crime Data Explorer (CDE) aims to provide transparency, easier access, and expand awareness of criminal, and noncriminal data to help shape public policy. Use the CDE to discover available data through visualizations, download data in .csv format, and other large data files.
UNODC https://www.unodc.org/unodc/en/data-and-analysis/  For datasets on drug production and trafficking, global studies on homicide rates, organized crime, corruption, and more, the UNODC has frequently updated publications. 
US Health Data https://healthdata.gov/  This site is dedicated to making high value health data more accessible to entrepreneurs, researchers, and policy makers in the hopes of better health outcomes.
US Seer Program https://seer.cancer.gov/   A complement to the Broad Institute would be NIH. With advanced filters, users can create hyper-targeted search results for a variety of open datasets relating to cancer.
NASA, ESDS Program https://www.earthdata.nasa.gov/  The Earth Science Data Systems (ESDS) Program provides full and open access to NASA’s collection of Earth science data for understanding and protecting our home planet. 
NASA Planetary Data https://pds.nasa.gov/datasearch/data-search/  The Planetary Data System (PDS) is a long-term archive of digital data products returned from NASA's planetary missions, and from other kinds of flight and ground-based data acquisitions, including laboratory experiments.
PEW Research Centre https://www.pewresearch.org/internet/datasets/  Pew is one of the largest open data sources in the U.S. with datasets aggregated through high-quality social science surveys. This includes public opinion polling, demographic research, and content analysis. You’ll have to create a free login to access Pew Research Center.
US Education Statistics https://nces.ed.gov/  National Center for Education Statistics – Open datasets like the NCES are being widely used in educational institutions to improve student retention rates, degree attainment, understand learning habits, and much more.
IEA Atlas of Energy http://energyatlas.iea.org  IEA Atlas of Energy – When it comes to global energy and electricity consumption rates, IEA comprised of open datasets and map visualizations for everyone to access.
US National Center for Environmental Health https://www.cdc.gov/nceh/data.htm  One-Stop Shop for Environmental and Public Health Data. This Web site provides a reference list of nationally funded data systems that have a relationship to environmental public health.
Open Corporates https://opencorporates.com/  Open Corporates – One of the largest open databases of companies in the world holds hundreds-of-millions of datasets in essentially any country.
Graph API https://developers.facebook.com/docs/graph-api  Graph API – Curated by Facebook, Graph API is the primary way for apps to read and write to the Facebook social graph. The Facebook Graph API is an HTTP-based API that allows developers to extract data and functionality from the Facebook platform.
Google Trends https://trends.google.com  Search what the world is searching using Google Trends datasets on latest search trends. Marketers can pinpoint timely campaigns using this data.
Re3data http://www.re3data.org  Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines. Publishers and journals like Copernicus Publications, PeerJ, Springer and Nature’s Scientific Data refer to re3data.org as a tool for the easy identification of appropriate data repositories to store research data.
Zenodo https://zenodo.org/  CERN has developed tools for Big data management and extended Digital Library capabilities for open data through Zenodo. A platform to find and share researcher’s datasets.
OSF Home https://osf.io/  OSF is a free, open platform to support your research and enable collaboration. Discover projects, data, materials, and collaborators that might be helpful to your own research.
Figshare https://figshare.com/  Figshare is a repository where users can make all of their research data available in a citable, shareable and discoverable manner
DRYAD https://datadryad.org/stash  The Dryad Digital Repository is a curated resource that makes research data discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of data types from all disciplines.
OpenfMRI https://openfmri.org/  OpenfMRI.org is a project dedicated to the free and open sharing of raw magnetic resonance imaging (MRI) datasets.
DASH  https://dash.nichd.nih.gov/explore/dataset  The NICHD Data and Specimen Hub (DASH) is a centralized resource that allows researchers to share and access de-identified data from studies funded by NICHD. DASH also serves as a portal for requesting biospecimens from selected DASH studies.
BioLINCC https://biolincc.nhlbi.nih.gov/home/ The NHLBI Biorepository and the NHLBI Data Repository programs have always had a similar mission to enhance research in cardiovascular, pulmonary and hematologic conditions by providing access to stored biospecimen and data collections. Registration will allow you to submit requests for biospecimens and/or data from Open BioLINCC Studies.
UNESCO Open Access https://en.unesco.org/open-access/search_unesdoc  For UNESCO, adopting an Open Access Policy means to make thousands of its publications and data freely available to the public.
UNICEF Data https://data.unicef.org/resources/resource-type/datasets/  UNICEF’s Data & Analytics (D&A) team is the global go-to for data on children. It leads the collection, validation, analysis, use and communication of the most statistically sound, internationally comparable data on the situation of children and women around the world.
Yelp Dataset https://www.yelp.com/dataset  The Yelp dataset can be used in personal, educational, and academic purposes. Available as JSON files, use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps.
DBpedia https://www.dbpedia.org/resources/  This structured information resembles an open knowledge graph (OKG) which is available for everyone on the Web. A knowledge graph is a special kind of database which stores knowledge in a machine-readable form and provides a means for information to be collected, organised, shared, searched and utilised. DBpedia data is served as Linked Data, which is revolutionizing the way applications interact with the Web. One can navigate this Web of facts with standard Web browsers, automated crawlers or pose complex queries with SQL-like query languages (e.g. SPARQL).
U.S. Government’s open data https://data.gov/  The U.S. Government’s open data allow users to find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations and more..
FiveThirtyEight https://data.fivethirtyeight.com/  FiveThirtyEight are sharing the data and code behind some of the articles and graphics. It can be use to check their work and to create stories and visualizations.
Registry of Open Data on AWS https://registry.opendata.aws/  This registry exists to help people discover and share datasets that are available via Amazon Web Service (AWS) resources. All datasets available through the AWS are not provided and maintained by them. They are maintained by a variety of third parties. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.
Google Public Data Explorer https://www.google.com/publicdata/directory Google Public Data Explorer provides public data and forecasts from a range of international organizations and academic institutions including the World Bank, OECD, Eurostat and the University of Denver.
Protein Data Bank archive (PDB) https://www.wwpdb.org/ Since 1971, the Protein Data Bank archive (PDB) has served as the single repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies. The Worldwide PDB organization manages the PDB archive and ensures that the PDB is freely available to the global community.
UNDP  https://open.undp.org/ Open.undp.org is the central point of access to detailed information about UNDP’s 5,000+ development projects and 8,000 outputs in 170+ countries and territories worldwide. It comprises data and information for active development projects and those that were financially closed after 2011.
EarthChem https://www.earthchem.org/ EarthChem aims to drive innovation and discovery in the Earth, Ocean, and Environmental Sciences. It provides open data services to the geochemical, petrological, mineralogical, and related communities. EarthChem adheres to open data principles like FAIR principles, TRUST principles for repositories and CARE principles for Indigenous Data Governance.
European Nucleotide Archive https://www.ebi.ac.uk/ena/browser/home The European Nucleotide Archive (ENA) provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. More about ENA. You can also query all variants in the EVA by study, gene, chromosomal location or dbSNP identifier using their Variant Browser.
ICSSR Data Service http://www.icssrdataservice.in/  The ICSSR Data Service includes social science and statistical datasets on debt & investment, tourism, enterprise survey, employment, housing, consumer expenditure, health, etc. Datasets available in the repository are free to access. Yes, it is necessary for a researcher to register to gain access to the data available in the data repository except the open data. After registration, the researcher will be provided a log-in id & password to use the repository and download datasets.
National Data Repository https://www.ndrdgh.gov.in National Data Repository (NDR) is India’s sponsored E&P data bank. NDR includes - Seismic Data, Well & Log Data, Spatial Data, Other G&G data like Drilling, Reservoir, Production, Geological, Gravity & Magnetic etc. Users are requested to register themselves on NDR portal to gain access to data. Please be informed that user registration will only be accepted with their official email id.
NATURE https://www.nature.com/sdata/policies/repositories Nature compiled the list of data repositories where researchers can deposit their data, and provides examples of repositories from a number of disciplines. Please be aware that some repositories listed may charge for accessing and hosting data, especially for larger datasets
UK Archaeology Data Service (ADS) https://archaeologydataservice.ac.uk/ The ADS is the leading accredited digital repository for heritage data generated by UK-based fieldwork and research. Founded in 1996, All resources archived with the ADS are Open Access, and delivered through their website to facilitate re-use by the wider community.
The Association of Religion Data Archives (ARDA) https://www.thearda.com/Archive/browse.asp The ARDA strives to democratize access to the best surveys, polls and other data on religion. Founded as the American Religion Data Archive in 1997 and going online in 1998, the initial archive was targeted at researchers interested in American religion. 
openICPSR https://www.openicpsr.org/openicpsr/ The openICPSR is a self-publishing repository with 21 specialized collection of data for social, behavioral, and health sciences. Access to all data is available for ICPSR member institution only. As such, some of their data will not be available for you to download, even if you create an account.
The Qualitative Data Repository (QDR)  https://data.qdr.syr.edu/ The QDR is a dedicated archive for storing and sharing digital data (and accompanying documentation) generated or collected through qualitative and multi-method research in the social sciences and related disciplines.
Cornell University  https://ropercenter.cornell.edu/ The Roper Center is the world’s largest public opinion data archive from more than 100 different countries and U.S. data from 1935 to today. One needs to have a membership to access these datasets. They also sell individual datasets. However, few datasets are available for free under the Featured Collections and Classroom Materials pages of their web site.
OEDI https://openei.org/wiki/Data OEDI is a centralized repository of high-value energy research datasets aggregated from the U.S. Department of Energy’s Programs and National Laboratories. OEDI facilitates access to a broad network of findings, including the data available in technology-specific catalogs like the Geothermal Data Repository and Marine Hydrokinetic Data Repository.
UN Documentation Research Guides https://research.un.org/en/un-resources/az UN libraries compiled this list of repositories from all disciplines.
OA Data Repositories http://oad.simmons.edu/oadwiki/Data_repositories This is compendium of repositories and databases for datasets. 
ASM Journals https://journals.asm.org/list-data-repositories ASM journals list these data repositories for researchers to identify and cite data sets and/or code used in experiments and studies. These include data from microarray, genomic, structural, proteomic, or video imaging analyses. 
Respository Finder https://repositoryfinder.datacite.org/ Repository Finder, a pilot project of the Enabling FAIR Data Project led by the American Geophysical Union (AGU) in partnership with DataCite and the Earth, space and environment sciences community. This can help you find an appropriate repository to deposit your research data.
FAIRsharing project https://fairsharing.org/  The FAIRsharing project compares data repositories for their compliance with the FAIR principles and journal data-sharing policies. A curated, informative and educational resource on data and metadata standards, inter-related to databases and data policies. To identify the repository that meets your particular needs, you may find FAIR Sharing Databases helpful.
IMF Data  https://www.imf.org/en/Data  For insights on the global economic outlook, financial stability, fiscal monitoring, and more, IMF datasets should have you covered.