dbCAN-seq download readme.txt (by Jinfang Zheng 09/12/2022) This current release of dbCAN-seq is based on 9,421 MAGs of four ecological (human gut, human oral, cow rumen, and marine) environments from EBI MGnify database. The updated dbCAN-seq contains 498,046 CAZymes and 168,906 CAZymes gene clusters (CGCs). Glycan substrates for 41,447 (24.54%) CGCs are inferred by two novel approaches (dbCAN-PUL homology search and eCAMI subfamily majority voting). This directory contains 4 subdirectories for 4 environments. Each subdirectory contains 5 tarball files, e.g.,: 1. COW RUMEN 1.1 dbCAN_overview.tar.gz All the overview files output from run_dbcan (one genome per folder in the tarball file). 1.2 cgc_result.tar.gz All the cgc output files from run_dbcan (one genome per folder in the tarball file). 1.3 cazyme.fa.tar.gz All the faa of cazymes (one genome per folder in the tarball file). 1.4 cgc.fa.tar.gz All the faa of cgc (one genome per folder in the tarball file). The fasta sequence ID was named in format: ContigID|CGCorder|proteinID|Type. Example: MGYG000291367_10|CGC1|MGYG000291367_00645|CAZyme CGC ID was named with ContigID and CGCorder. The CGC ID for the example CGC fasta sequences is MGYG000291367_10|CGC1. 1.5 substrates.tar.gz The substrate table, each substrate per file in the tarball. Each column is seperated by tab space. The column name for each substrate file is: geneomeID CGCID CGC_content PULDID substrate species_name method signature_content 2. HUMAN GUT 3. HUMAN ORAL 4. MARINE 5. readme.txt The file you are currently reading