Created by Yanbin Yin (yyin@unl.edu) 6/30/2019 There are only 7 published Aca proteins (https://tinyurl.com/anti-CRISPR). In order to populate this data to get more high-confident Aca proteins, we used a guilt-by-association approach: HTH-domain containing proteins with encoding genes located in proximity to genes encoding homologs of known Acr proteins are considered as high-confident Aca proteins. To this end, we used the published 45 Acr proteins as the query to search against 75,599 RefSeq bacterial genomes and 760,453 metagenome-assembled viral contigs (~3% are from isolated phages or prophages) of the IMG/VR database. In total 975 unique RefSeq proteins (one protein ID can be found in multiple very similar RefSeq bacterial genomes) and 2,022 IMG/VR proteins were found to be Acr homologs. To qualify as Acr homologs, proteins have to meet the following criteria: E-value < 1e-2 to known Acr proteins, protein length < 200aa, and more importantly, Acr genes located in genomic loci (or operons) with all the genes encoding short proteins (< 200aa) on the same strand. Searching for HTH-domain containing proteins surrounding the Acr homologs found 168 (168.refseq.id.txt, 168 unique protein IDs) and 194 Aca proteins (194.jgi.id.info.txt). In addition, we also collected 39 Aca proteins (39.published.id.tax.txt) from the literature, which include HTH proteins surrounding the published 45 Acr proteins plus Aca proteins identified in https://science.sciencemag.org/content/sci/suppl/2018/09/05/science.aau5174.DC1/aau5174-Marino-SM.pdf (Table S4). 3parts.faa has 401 protein sequences in fasta format.