Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Chemistry and Chemical & Biological Engineering Datasets
Publicly available chemical structure and related data available for bulk download.
FTP site for downloading PubChem Substance, Compound, and BioAssay Data. There are over 230 million substance, 90 million compound, and 1 million BioAssay records. Data is available in SDF and XML format. Download via FTP (Services > Download Facility > Bulk Data Download.
FTP site is available for downloading ChEMBL data focused on small molecules and related bioactivity data. There are over 2 million compound records, 1 million assays, and 11,000 targets. Data is available as SDF, Oracle, and SQL.
ChEBI is a dictionary of small chemical compound molecular entities. SDF and ontology files are available for download.
DrugBank database combines chemical drug data with target data. About 10,000 entries are included and information can be downloaded as XML and SDF via their download page.
Largest collection of virtual molecules. Available for download via FTP as SMILES format.
nmtshiftdb2 is a database for organic structures and NMR spectra. Data is available as SDF and CML. There are over 40,000 structures and 50,000 spectra.
A collection of over 700,000 chemical substances and environmental chemical data. Available for download as text, and SDF.
ZINC 15 Database
ZINC15 is a free database of commercially available compounds. Over 100 million compounds can be downloaded as XML, CSV, SDF, JSON, and other formats.
Crystallography Open Database
Over 350,000 crystal structure files of organic, inorganic, and metal organic compounds. Entire collection as .cif files is available for download.
Chemistry and Chemical & Biological Engineering Repositories
UA Institutional Repository
UA's official document and data repository. Suitable for all data types. We will soon be accepting data!!! stay tuned.
re3data is the Registry of Research Data Repositories. It is a useful resource for finding appropriate data repositories to discover and share chemical data.
A general purpose repository. Very useful if a specific subject data repository is not available or appropriate for your data.