Acr (anti-CRISPR) proteins were first discovered in 2013 in Pseudomonas phages and prophages
Acr encoding genes often form operons with putative transcription regulator genes that encode Aca (Acr associated)
proteins (PMID: 31474367). These short Acr proteins
(< 200 aa) are made by phages and other mobile genetic elements
to inhibit the CRISPR-Cas systems of their hosts. Therefore, Acrs are "naturally occurring off-switch" of CRISPR-Cas,
with a great potential to serve as modulators of CRISPR-Cas genome
editing tools for more controllable genome engineering (e.g., PMID: 30377362)
As of 5/2020, 65 Acr proteins have been experimentally characterized (see here and here), but most do not have sequence homologs beyond the species
level and do not have conserved Pfam domains (PMID: 30208287). Aca proteins are more conserved, all having a helix-turn-helix (HTH)
DNA binding domain. Therefore, searching for HTH domains of the more conserved Aca proteins and then using gene neighborhood to
probe new Acrs has proven to be very successful, which has been known as the guilt-by-association (GBA) approach (reviewed in PMID: 29062071 and PMID: 30208287 and others).
Additionally, the self-targeting idea (first proposed by Rauch et al. 2017), i.e., bacterial genomes having CRISPR spacers and their targets (i.e., protospacers)
in the same genome, has also been applied to searching for new Acrs (e.g., PMID: 30190307). We have recently published a bioinformatics
data mining work for putative Acr-Aca loci in 75,000 bacterial genomes by combining sequence homology search, GBA, and self-targeting
approaches (PMID: 31506266). This pipeline was able to find all the published/characterized Acr-Aca loci and therefore has a
recall = 100%. A precision is not possible to obtain as no true negative Acr-Aca dataset is available. AcrFinder describes a
bioinformatics workflow rather than a predictive algorithm.
The study anti-CRISPR is a very young and rapidly growing research field (PMID: 30309933). Earlier than 2020, there were no any web server or standalone tool published to predict Acrs given a protein or DNA sequence file. However, since March 2020, there have been four tools published in peer-reviewed journals or BioRxiv. There are also related resources. Please see Menu -> Links. For example, the anti-CRISPRDB (PMID: 29036676) collects experimentally characterized Acr proteins
and their homologs and presents on the web.
Genome sequences in fna, gff and faa formats are taken as input. Only one fna file as input is also acceptable;
in that case, the gff and faa file will be generated by running Prodigal (PMID: 22796954).
The AcrFinder standalone program outputs a folder,
where two files and three sub-folders are found. The two files contain the homology-based and GBA-based Acr-Aca
search results. The three sub-folders include: (i) input files; (ii) CRISPRCasFinder (PMID: 29790974) result files;
(iii) all the intermediate result files. The computational workflow is described in https://github.com/HaidYi/acrfinder#workflow,
which is a modified version of the bioinformatics pipeline reported in our recent paper (PMID: 31506266).
This pipeline is not simply chaining others’ tools, but rather a workflow to cleverly process the gff and faa files to extract
genomic operons and examine their gene neighborhood, which include multiple steps of complex data filtering using sequence
features of known Acr-Aca loci.
This website is free and open to all users and there is no login requirement. The job submission page of the website has an option to let the users try out the sample data: one bacterial genome and one viral
genome. A help page is available to provide very detailed instructions on how to use the web server, particularly the interpretation
of the data in the result page. A typical bacterial genome submission is expected to finish within 2 minutes. A result web link and
a job ID is provided while the job is running. The result page has data tables to show the member genes in the identified Acr-Aca
loci, as well as the genomic positions, strand, sequence, length, if adjacent to mobile genetic elements, if match with known Acr
or Aca proteins, and if adjacent to self-targeting CRISPR spacers. Jbrowse is used to graphically display the gene neighborhood. See the Help page for the detailed description of the webpages.