Follow the steps below to learn how to use AcrFinder web server and understand the result:
1. Choose genome type
Choose the Type of genome
among among various organism (Archaea, Bacteria and Virus).
2. Search for MGE or Prophage
AcrFinder searches the Acr gene neighborhood (up- and down-stream range can be changed)
for other mobilome genes by RPS-BLAST search of a set of CDD mobilome protein models (total 491 CDD models, all having the keyword "mobilome" in their description), or for prophages (PHASTER database). Why does this even matter? That is because most experimentally characterized Acr-Aca loci are located within or nearby a prophage or genomic island or transferred element or other types of mobile genetic elements, in addition to those found in lytic phages/viruses directly.
3. Upload nucleotide sequence fna, protein sequence faa and gff or predict them via Prodigal
Choose a nucleotide sequence input file in
FASTSA format (e.g., .fna).
Please use a plain text editor (e.g., notepad++,
do not use MS word) to create the file. Invalid submission will be rejected.
If you want to use a protein sequence file (.faa),
you have to also provide a .gff3 file. In this case,
please uncheck the “Use Prodigal to generate .faa and .gff file”.
Otherwise our program will use Prodigal to create.faa file and .gff3 file from the .fna file.
4. Choose parameters for submission
These parameters are related to the search for Aca homologs in short-gene operons and how to define a short-gene operons. The program DIAMOND is run with --more-sensitive mode, but other parameters like E-value, coverage, and identity can be adjusted by users. Short-gene operons mean all genes are on the same strand, encode short proteins (< 200 aa), and all intergenic distances are < 150bp. However, these again can be adjusted by users. For example, in order to find AcrID1 experimentally characterized in Sulfolobus islandicus rudivirus 3 (NC_030884.1), one has to change the max intergenic dist to 250bp because AcrID1/YP_009272954.1 and its upstream Aca/YP_009272953.1 is 224bp apart. The default value 150bp won't find this operon.
5. Submit or cancel the uploaded files
Click submit to submit the job, or click Cancel to clear the uploaded files.
If you want to try an example with the parameters set above, just click
The provided example is a bacterial genome, so please be sure to choose type as
Bacteria when you run the example.
1. Choose the type of input nucleotide sequence.
2. Paste a nucleotide sequence (FASTA format) in the text area box.
3. Choose parameters
4. Submit and wait for uploading. When you click , please be sure to choose the type Virus because the example sequence is from a virus.
5. Here the example is from Sulfolobus islandicus rod-shaped virus 2 (GCF_000857285.1). You have to change the intergenic dist to 250bp in order to find the AcrID1-Aca locus (try this job ID: 1583804352).
After a valid submission, you will have a job ID, and AcrFinder will be running. Wait for the program to finish (a couple of minutes for a typical bacterial genome like ecoli) and you will see the result page. If you want to bookmark, please save the link of this, which will give you the result once the job is done.
There are two sections in this page, (i) Guilt by Association and (ii) Homology Based. Different Acr-Aca operons/loci are displayed with different background colors, and each row is a gene.
1. Job ID
Job ID is the your ID created by server.
2. Copy or Export
You can sort the result table, export table into .csv and .xlsx format , and copy the result table.
3. Columns of data table
3.1. Classification/Confidence levels are explained in AcrFinder workflow.
3.2-3.9. GCF (3.2) (if submit GenBank genome), Positions (3.3), NC ID (3.4), Start (3.5), End (3.6), Strand (3.7), Protein ID (3.8) and aa Length (3.9) are parsed/calculated from the faa and gff files. Particularly, Positions (3.3) are the location index of genes in the contig.
It is not shown, but all the intergenic distances between two neighboring genes are < 150bp (default value but can be changed in the submit page).
3.10. Acr/Aca column shows the Aca homolog and Acr candidates and need some explanation, e.g.,:
3.11. MGE/Prophage MetaData shows if a protein is a prophage or mobilome gene (by searching against the prophage PHASTER database) . The last number in the string is the E-value. Clicking on the link can lead to the NCBI description page of the prophage protein. The other parts of the string are from the PHASTER database.
Why does this even matter? That is because most experimentally characterized Acr-Aca loci are located within or nearby a prophage or genomic island or transferred element or other types of mobile genetic elements, in addition to those found in lytic phages/viruses directly.
3.12. Acr_Hit|pident column shows the best Acr homolog and the alignment identity.
3.13. Sequence shows the protein sequence
3.14-3.15. These columns are calculated from the sequence using pepstats of EMBOSS. It has been noted that many experimentally characterized Acr proteins are negatively charged (acidic) and have low isoelectric point (e.g., mimic DNA)
3.16. The Self Target w/in 5000 BP column shows that, within a 5000 bp region of the Acr-Aca locus, if there is a CRISPR-Cas spacer target (see Figure 2c of PMID:30208287 [Stanley SY and Maxwell KL, 2018]). If it is, then the chance that the locus encode an Acr is higher. If not empty, the long string in this column has two parts separated by |:
3.16.1. Spacer Accession=NC_018591.1|Spacer_Pos=2660964-2660994|CAS-TypeIB(548849-555732)+CAS-TypeIIA(2663536-2669406): this is parsed from the result of CRISPRCasFinder (level 3 and 4, see their paper for details ["CRISPR arrays having evidence-levels 3 and 4 may be considered as highly likely candidates"]).
In this example, the contig NC_018591.1 has a CRISPR spacer (positions are given), which match a target (see below 3.16.2) with a sequence identity = 100% in the same contig (i.e., self-targeting). NC_018591.1 also has two CAS loci with Type info predicted by CRISPRCasFinder, and their positions are given.
In this case, the CAS-TypeIIA(2663536-2669406) locus is next to the Spacer_Pos=2660964-2660994, so it’s likely that the targeted spacer encodes Acr to inhibit the CAS-TypeIIA enzymes, to avoid being cleaved by the genome’s own CRISPR-Cas system.
Therefore, carefully looking at this string can help infer the CRISPR-Cas type that the identified Acr-Aca locus may inhibit.
3.16.2. Target Accession=NC_018591.1|Target_Pos=710566-710596, this is identified by a BLASTn search with all the CRISPR spacers (see above 3.16.1 and Figure 2c of PMID:30208287 [Stanley SY and Maxwell KL, 2018]: green diamond is the spacer and green arrow is its target and there must be a neighboring Acr [red arrow] expressed to avoid self-destruction) as query. In this case, the target’s positions are given, which is located within a 5000 bp region from the predicted Acr-Aca locus.
3.17. The Self Target Outside 5000 BP column shows that, outside a 5000 bp region of the Acr-Aca locus but within the same genome (maybe different contig), if there is a CRISPR-Cas spacer target (see Figure 2c of PMID:30208287 [Stanley SY and Maxwell KL, 2018]). If it is, then the chance that the locus encode an Acr is medium high. If not empty, the long string in this column has two parts separated by |, which is explained in 3.16.
Note it is very possible that your query genome does have an Acr-Aca locus output, but have no self-targeting CRISPR spacer or even no level 3 & 4 CRISPR-Cas systems (discussed in our paper) (i.e., both 3.16 and 3.17 columns are empty). For these cases, if they have level 3 & 4 CRISPR-Cas systems, they are labeled as "low confidence" loci. If they do not have level 3 & 4 CRISPR-Cas systems, they will not show up in the Guilt by Association section, but may appear in the Homology Based section.
4. The Genome Context of Acr-Aca Loci
The Genome Context of Acr-Aca Loci is displayed using JBrowse. Each gene in the loci is highlighted in yellow. Furthermore, this genome browser could zoom in/out or go to left or right of the loci neighborhood. If there are loci identified in > 1 NC/contig IDs, and you want to see the neighborhood in the other NC ID, you need to choose another NC ID by yourself (click the drop-down menu right behind the magnifier) and zoom in the neighborhood of identified loci by yourself (the loci will also be highlighted in yellow).
5. Homology Based
The second section of result page is Homology Based, which are presented in the same way as Guilt by Association. The table does not have Classification, CDD MetaData, Self Target w/in 5000 BP column, and Self Target outside 5000 BP columns. But it has a Genome_Loci|start|end column.
6. When no Acr-Aca loci is identified
There are only a small number of experimentally characterized Acr-Aca loci and they are biased towards certain bacteria (see Yin et al.).
Threfore, it is very possible that no Acr-Aca loci are identified in your query genome, and you will see this page.
7. When user uploads invalid data
If you submit invalid input (empty input, input with invalid FASTA format), our website will reject your submission and suggest you to try a valid input.