HKU Data Repository
Browse
ARCHIVE
Dataset (main folder).zip (4.21 MB)
TEXT
README.txt (4.65 kB)
1/0
2 files

Supporting data for “Therapeutic Effect of Teng Qi Formula on Triple-negative Breast Cancer: Efficacy, Active principles, and Mechanisms”

dataset
posted on 2022-08-22, 01:55 authored by Feng Zhang

In silico drug toxicity and interaction prediction workflow construction. The active compounds of herbal medicine defined here, are the natural products documented in TSCSP database for a certain herbal medicine, screened out based on the criteria (OB ≥ 0.3 and DL ≥ 0.18). These data were used to make a toxicity and drug interaction prediction for plant complexes. From the PubChem database, the mining of properties of active compounds was conducted firstly through a script coded in Python 3 (version 3.8.10) called “compound_properties_mining.py” using pubchempy and pandas packages. This script iterates over the “active_comp_pool_tcmsp.csv” dataset, specifically, the “Molecule Name” column, while fetching one “Molecule Name” at a time. The gathered property data of active compounds were written to a CSV file named “active_comp_proper_pubchem.csv”. Screened from the “Toxi_infor_sum.csv” file, the drug interaction information retrieved was separated and split into one “interaction” retrieve per row using a script named “drug_interactions_split.py” for further manual interpretation. The split data was stored in the file named “drug_interaction_pred_0.6171.csv”. The mining of similar compounds of active compounds was done through the web scraper script called “similar_comp_crawler.py”. This script iterated the “Active_compound_name” column and the “isomeric_smiles” column of the dataset storing the properties of active compounds. The isomeric SMILES 

code is posted as a query to the SwissSimilarity website (updated version issued in Dec. 2021), selecting“ Bioactive” compound class, choosing “ChEMBL (actives only)” natural product library, based on combined methods. All the data of similar compounds were stored in the file named “similar_comp_pool_swiss.csv”. Before toxicity and drug interaction information mining, using a script called “similar_compound_properties_mining.py”, the properties of similar compounds were collected with a similar method as the mining of properties of active compounds beforementioned and were stored in the file named “similar_comp_properties_sum.csv”.

History

Usage metrics

    Research Postgraduates

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC