Created by Yanbin Yin (yyin@unl.edu) 8/17/2020 This folder has flat files with all the computationally predicted Acr-Aca loci/operons. The known/published Acr proteins and their homologs can be found in AcrFinder website (http://bcb.unl.edu/AcrFinder/Download/database/). Statistics: 15203 bacterial genomes (completely assembled), 961 archaeal genomes (completely or partially assembled), and 2659 viral genomes (completely assembled) of the NCBI RefSeq database were analyzed. Please find the pipeline at https://github.com/HaidYi/acrfinder#workflow. Briefly, each genome was analyzed: 1. with homology-based approach: diamond search with 64 experimentally characterized Acr proteins as query and 2. with non-homology-based methods: 2.1. guilt-by-association approach: finding Aca homologs first and then looking at the genomic neighborhood; 2.2. self-targeting spacer idea: using CRISPRCAS-Finder to find complete CRISPR-Cas loci first and then looking for CRISPR spacer target within the self-genome for identical blastn match (self-targeting spacer). 3. AcRanker was run on genomes containing predicted Acr-Aca operons 4. PaCRISPR was run only on proteins in AcrFinder predicted Acr-Aca operons The http://bcb.unl.edu/AcrFinder/Download/database folder has two files: (i) http://bcb.unl.edu/AcrFinder/Download/database/Known_AcrDB.faa and (ii) http://bcb.unl.edu/AcrFinder/Download/database/AcrFinder_AcaDB.faa. The README.txt file in there explains how/where are data collected. These data were used for identifying new Acr-Aca loci/operons using the pipeline https://github.com/HaidYi/acrfinder#workflow. There are three folders in here: archaea_result/, bacteria_result/, and virus_result/ organized the predictions with respect to the three kingdoms. - In each folder, you will find guilt_by_association/ and homologs/, which have Acr-Aca loci predicted by guilt by association (GBA) and Acr homologs predicted by homology search. Inside the two folders, you will see two tarballs (tablular format and fasta format) and two folders. The tarballs were created from the folders. The folders are organized with each genome (NCBI GCF ID) as a file. Note: although 15203 bacterial genomes (completely assembled), 961 archaeal genomes (completely or partially assembled), and 2659 viral genomes (completely assembled) of the NCBI RefSeq database were analyzed, not all of the genomes have GBA or homolog results. Please see operon_statistics.csv or http://bcb.unl.edu/AcrDB/statistics.php for data statistics. - operon_statistics.csv. This csv file contains all the genomes that have been searched for Acr-Aca operons with the following columns (comma separatd): (1) Kingdom/Organism: "A" is Archaea; "B" is Bacteria; and "V" is "Viruses" (2) GCF ID: NCBI RefSeq Assembly ID (3) Whether the gcf contains Acr-Aca operons that are identified by AcrFinder: "Yes" or "No" (4) NCBI taxid (5) GCF genome assembly level: Complete Genome, Contig, Scaffold, or Chromosome (6-) Taxonomy lineage: kingdom,phyla,classes,orders,families,genera,species,subspecies - In archaea_result/ and bacteria_result/, you will also see CRISPRCas-Finder folder which has the CRISPR-Cas_summary.tar.gz file. This file was created by a CRISPRCas-Finder run on all the genomes to identify putative CRISPR-Cas systems.