This README.txt file was generated on 20210525 by Conor Cremin ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Supporting data for "Meta-analysis of host responses identifies gene network dysfunction during viral infection" 2. Author Information First Author Contact Information Name: Conor Cremin Faculty: Department of Microbiology Email: conor93@hku.hk Corresponding Author Contact Information Name: Honglin Chen Faculty: Department of Microbiology Email: hlchen@hku.hk --------------------- DATA & FILE OVERVIEW --------------------- Directory of Files: A. Filename: ChIP-seq Short description: Directory contains two folders. "MAC2" contains the output files generated from the MAC2 callpeak function for peaks determined from PolII, PolII-s2p and PolII-s5p ChIP-seq experiments. Raw data for these experiments is available on GEO (Gene Expression Omnibus database, ID: GSE156060). The "BigWigs" folder contains the BigWig files for these experiments used for visualization, etc. BigWigs were generated using DeepTools. B. Filename: Co-expression Short description: This folder contains the human co-expression networks assembled for this project. Files are in .Rdata format and are suitable for direct loading into R environment. Networks consists of an adjacency matrices with co-expression scores between 21,901 genes of the hg38 human genome. Gene IDs are in Ensembl ID format (e.g. ENSG00XXX). Raw network and the ranked network are located in this folder. Gene clusters r objects("cross-species.clusters.rdata") from meta-analysis of human (h.clusters) and mouse (m.clusters) influenza datasets and SARS-CoV-2 datasets (c.clusters) are also saved in this object. Colors indicated were used in heatmap generation. C. Filename: Data tables Short description: Folder contains the supplementary data tables. Supplementary table S1 contains the GEO identifiers for studies used in the meta-analysis of human influenza datasets. Supplementary table S2 contains the GEO identifiers for studies used in the meta-analysis of mouse influenza datasets. Supplementary table S3 contains the GEO identifiers for studies used in the the construction of the human co-expression network (deposited in the "Co-expression" folder). Supplementary table S4 contains a data table of the recurrent genes derived from the meta-analysis of human influenza datasets. Supplementary table S5 contains a data table of the recurrent genes derived from the meta-analysis of mouse influenza datasets. Supplementary table S6 contains a data table of the recurrent genes derived from the meta-analysis of SARS-CoV-2 datasets. All files are in .csv format. D. Filename: Multifunctionality Short description: Gene rankings used in differential expression predictibily are located here. Contains genes lists from human and mouse influenza datasets ranked by log2FC, p-values and a conmbination of the two with high ranking (larger ranking numerical values) genes corresponding to genes with highest log2FC averages and/or lowest p-value averages across selected datasets (See Data tables folder for ids of datasets used, supplementary tables S1 and S2 for human and mouse respectively). E. Filename: RNA-seq Short description: Differential expression output from DESeq2 for the RNA-seq data generated directly for this project is located in this folder (Each saved as "dds" object). The "*.DE.Rdata" files contain the DESeq2 objects for influenza infected and uninfected samples for each study indicated by *. A design matrix is also included that compatable for use with DESeq2. A result matrix is also present in each file that is the differential expression output between infected and uninfected samples. Raw data is available on GEO under the following GSE ids: GSE156060, GSE156152 and GSE156005 (The "Covid_isolates.DE.Rdata" file also contains samples from an unpublished dataset). F. Filename: Single-cell Short description: Contents include orthologue annotation files ("allgenes_mouse.rdata" and "mouse_orthos.csv"). Markers ("markers.csv") indicate the output derived from the FindAllMarkers function from Seurat. A meta-data file is also present as a record of cell-types annotated from seurat ("seurat_metadata.csv"). The Seurat integrated object is also provided which can be directly loaded into R, though this is a large file (>8.5GB) and may be suited for cloud computing ("wsn.integrated.rdata"). Additional Notes on File Relationships, Context, or Content (for example, if a user wants to reuse and/or cite your data, what information would you want them to know?): N/A File Naming Convention: In ChIP-seq > BigWigs folder, files are named based on : "PolII varient" _ "Influenza condition".bw -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Software-specific information: Name: R-project Version: 4.0.5 System Requirements: N/A Open Source? (Y/N): Y Name: R-Studio Version: 1.2.5001 System Requirements: N/A Open Source? (Y/N): Y Name: DeepTools Version: 3.5.0 System Requirements: N/A Open Source? (Y/N): Y Name: Seurat Version: 4.0 System Requirements: N/A Open Source? (Y/N): Y Additional Notes(such as, will this software not run on certain operating systems?): Analysis and generation of processed datasets was performed through R. Code is available on GitHub: https://github.com/hlchenlab/Influenza-Meta-Analysis 2. Equipment-specific information: N/A Additional Notes: N/A