dbCAN-sub

AA1 AA2 AA3 AA4 AA5 AA6 AA7 AA8 AA9 AA10 AA11 AA12 AA13 AA14 AA15 AA16 AA17 AA18

CBM1 CBM2 CBM3 CBM4 CBM5 CBM6 CBM8 CBM9 CBM10 CBM11 CBM12 CBM13 CBM14 CBM15 CBM16 CBM17 CBM18 CBM19 CBM20 CBM21 CBM22 CBM23 CBM24 CBM25 CBM26 CBM27 CBM28 CBM29 CBM30 CBM31 CBM32 CBM34 CBM35 CBM36 CBM37 CBM38 CBM39 CBM40 CBM41 CBM42 CBM43 CBM44 CBM45 CBM46 CBM47 CBM48 CBM49 CBM50 CBM51 CBM52 CBM53 CBM54 CBM55 CBM56 CBM57 CBM58 CBM59 CBM60 CBM61 CBM62 CBM63 CBM64 CBM65 CBM66 CBM67 CBM68 CBM69 CBM70 CBM71 CBM72 CBM73 CBM74 CBM76 CBM77 CBM79 CBM81 CBM82 CBM83 CBM84 CBM85 CBM86 CBM87 CBM88 CBM89 CBM90 CBM91 CBM92 CBM93 CBM94 CBM95 CBM96 CBM97 CBM98 CBM99 CBM101 CBM102 CBM103 CBM104 CBM106

CE1 CE2 CE3 CE4 CE5 CE6 CE7 CE8 CE9 CE11 CE12 CE13 CE14 CE15 CE16 CE17 CE18 CE19 CE20 CE21

GH1 GH2 GH3 GH4 GH5 GH6 GH7 GH8 GH9 GH10 GH11 GH12 GH13 GH14 GH15 GH16 GH17 GH18 GH19 GH20 GH22 GH23 GH24 GH25 GH26 GH27 GH28 GH29 GH30 GH31 GH32 GH33 GH34 GH35 GH36 GH37 GH38 GH39 GH42 GH43 GH44 GH45 GH46 GH47 GH48 GH49 GH50 GH51 GH52 GH53 GH54 GH55 GH56 GH57 GH58 GH59 GH62 GH63 GH64 GH65 GH66 GH67 GH68 GH70 GH71 GH72 GH73 GH74 GH75 GH76 GH77 GH78 GH79 GH80 GH81 GH82 GH83 GH84 GH85 GH86 GH87 GH88 GH89 GH90 GH91 GH92 GH93 GH94 GH95 GH96 GH97 GH98 GH99 GH100 GH101 GH102 GH103 GH104 GH105 GH106 GH107 GH108 GH109 GH110 GH111 GH112 GH113 GH114 GH115 GH116 GH117 GH118 GH119 GH120 GH121 GH122 GH123 GH124 GH125 GH126 GH127 GH128 GH129 GH130 GH131 GH132 GH133 GH134 GH135 GH136 GH137 GH138 GH139 GH140 GH141 GH142 GH143 GH144 GH146 GH147 GH148 GH149 GH150 GH151 GH152 GH153 GH154 GH156 GH157 GH158 GH159 GH160 GH161 GH162 GH163 GH164 GH165 GH166 GH167 GH168 GH169 GH170 GH171 GH172 GH173 GH174 GH175 GH176 GH177 GH178 GH179 GH180 GH181 GH182 GH183 GH184 GH185 GH186 GH187 GH188 GH189 GH190 GH191 GH192 GH193 GH194

GT1 GT2 GT3 GT4 GT5 GT6 GT7 GT8 GT9 GT10 GT11 GT12 GT13 GT14 GT15 GT16 GT17 GT18 GT19 GT20 GT21 GT22 GT23 GT24 GT25 GT26 GT27 GT28 GT29 GT30 GT31 GT32 GT33 GT34 GT35 GT37 GT38 GT39 GT40 GT41 GT42 GT43 GT44 GT45 GT47 GT48 GT49 GT50 GT51 GT52 GT53 GT54 GT55 GT56 GT57 GT58 GT59 GT60 GT61 GT62 GT63 GT64 GT65 GT66 GT67 GT68 GT69 GT70 GT71 GT72 GT73 GT74 GT75 GT76 GT77 GT78 GT79 GT80 GT81 GT82 GT83 GT84 GT85 GT87 GT88 GT89 GT90 GT91 GT92 GT93 GT94 GT95 GT96 GT97 GT98 GT99 GT100 GT101 GT102 GT103 GT104 GT105 GT106 GT107 GT108 GT109 GT110 GT111 GT112 GT113 GT114 GT115 GT116 GT117 GT118 GT119 GT120 GT121 GT122 GT123 GT124 GT125 GT126 GT127 GT128 GT129 GT130 GT131 GT132 GT133 GT134 GT135 GT136 GT137 GT138

PL1 PL2 PL3 PL4 PL5 PL6 PL7 PL8 PL9 PL10 PL11 PL12 PL13 PL14 PL15 PL16 PL17 PL18 PL20 PL21 PL22 PL23 PL24 PL25 PL26 PL27 PL28 PL29 PL30 PL31 PL33 PL34 PL35 PL36 PL37 PL38 PL39 PL40 PL41 PL42 PL43 PL44

News

Latest update (1/6/2026): The current update is based on (CAZyDB.07242025.fa) with subfamilies for 505 CAZyme families (AA: 18; CE: 20; PL: 42; GH: 186; GT: 135; CBM: 104). There are in total 53,411 clustered subfamilies and 442 unclustered subfamilies. Compared to the last version, this new release includes 70 newly added families, distributed across the different classes as follows: AA (1), CE (2), PL (2), GH (23), GT (24), and CBM (18). The updated subfamily HMM database (dbCAN_sub_2025.hmm) is derived from 53,411 CAZyme subfamilies classified using eCAMI (enzyme Classification And Motif Identification), representing an approximately two-fold increase in the number of subfamilies compared with the initial publication.

Note: New sequences similar to the existing subfamilies (if identity >=60, coverage >=80, and cluster members threshold >=70) are added to them, while the remaining are clustered into new subfamilies. Accordingly, existing subfamilies keep their old IDs, while new subfamilies receive new IDs.

dbCAN_sub

Unique features of dbCAN-sub (12/2022): dbCAN-sub is developed as the first comprehensive CAZyme subfamily HMM database (including CBMs) to enable substrate annotation for CAZymes. The subfamily HMMdb (Figure 1) is derived from 25,487 CAZyme subfamilies classified by eCAMI (enzyme Classification And Motif Identification), a new k-mer based tool that we published in 2020 for the classification of enzyme families into subfamilies using a bipartite network algorithm (1). eCAMI was integrated into our popular dbCAN2 meta server in 2021 to replace Hotpep (2) according to a recent CAZyme annotation tool evaluation work from an independent group (3). Like CUPP, eCAMI can assign proteins to subfamilies with EC numbers (colored curves in Figure 1). However, both CUPP and eCAMI suffer from high demands of computer CPU and memory. eCAMI can annotate not only the catalytic enzyme domains but also the carbohydrate binding CBM domains. A very recent paper found that eCAMI tends to produce more granular subfamilies (4) than CUPP, and thus produces a higher percentage of subfamilies with a single EC number, allowing more specific substrate inference.

dbCAN-sub uses HMMs instead of k-mer peptides for subfamily assignment. Using HMMs has advantages: (i) significantly lower computer memory use; (ii) parallel computing to reduce CPU time; (iii) statistical significance E-value and domain positions reported by HMMER search. In other words, to address the computing cost issue, we have converted each eCAMI subfamily into an HMM, which was built from dbCAN domain sequence alignment of the subfamily.

More importantly, dbCAN-sub enables carbohydrate substrate annotation with a manually curated mapping table between CAZyme subfamily, characterized CAZymes, EC numbers, and glycan substrates. We constructed this mapping table by curating the CAZy family webpages for experimentally characterized proteins (e.g., GH5). Most of these webpages contain external links from EC numbers to the Enzyme database and from characterized protein IDs to the PubMed pages of biochemical reference papers. In most cases, we could obtain the substrate information for subfamilies by skimming through the paper abstracts or EC descriptions using EC numbers of experimentally characterized proteins in the subfamilies. For all CBM families and some enzyme families, we were able to extract the substrate information from the CAZy webpages without EC.

We have built a webpage for each CAZyme subfamily to provide all the necessary information that users need to understand what data the subfamily HMM was built upon: (i) a summary table with various counts of CAZy proteins including the download links to the fasta sequences; (ii) a substrate table with EC numbers and curated substrates from CAZy webpages and literature; (iii) a member protein table with all CAZy protein IDs and their subfamily assignments in the CAZy and CUPP databases (if exist). All these tables, dbCAN-sub HMMs, sequence alignments, and fasta sequences can be downloaded from the dbCAN-sub website.

Lastly, the dbCAN-sub subfamily HMMdb is integrated into our popular dbCAN2 meta server and the standalone run_dbcan program to allow the glycan substrate annotation for user submitted (meta)genomes.

Future update: We plan to update dbCAN-sub annually as new sequences and families are added in the CAZy database. New subfamilies will be created if the new CAZy sequences have higher similarity to eCAMI previously unclassified sequences or to each other than to existing subfamilies. The dbCAN-sub database will be a new addition to our popular dbCAN family tool suite (dbCAN2, dbCAN-seq, dbCAN-PUL, eCAMI), which focuses on CAZyme bioinformatics and carbohydrate metabolism.