For most non-model organisms, biological understanding of study outcomes is limited to protein-coding genes with functional
annotations such as KEGG pathways, Gene Ontology or PANTHER classification system. Therefore, developing Seq2Fun database to
focus on functionally annotated genes such as, protein-coding genes, GOs and KOs largely meets the preferred needs of most scientists studying non-model organisms.
We provide dozens (~30) of pre-built databases that can be downloaded here (then click Tab of "Without a Reference Transcriptome").
Note: the ortholog includes all genes (including genes are not orthology with any other genes) from that groups of organisms.
Groups of organisms can be download from here.
The definition of a core ortholog is that its frequency in the group >= 0.90. This is in consistent with BUSCO
Note: * frequency >= 0.85; ** frequency >= 0.80;
Group | Species | Proteins | Ortholog | Core ortholog | Filename | Date |
---|---|---|---|---|---|---|
Algae | 14 | 155495 | 38334 | 1521 | algae.tar.gz | 07-11-2022 |
alveolates | 21 | 207674 | 51205 | 1132 | alveolates.tar.gz | 07-11-2022 |
amoebozoa | 7 | 81844 | 22114 | 1165 | amoebozoa.tar.gz | 07-11-2022 |
amphibians | 3 | 75261 | 17186 | 13925 | amphibians.tar.gz | 07-11-2022 |
animals | 370 | 7150735 | 270089 | 1512 | animals.tar.gz | 07-11-2022 |
apicomplexans | 18 | 93576 | 14632 | 1091 | apicomplexans.tar.gz | 07-11-2022 |
arthropods | 119 | 1727651 | 113673 | 5106 | arthropods.tar.gz | 07-11-2022 |
ascomycetes | 100 | 904642 | 98151 | 2799 | ascomycetes.tar.gz | 07-11-2022 |
basidiomycetes | 33 | 363997 | 56935 | 2453 | basidiomycetes.tar.gz | 07-11-2022 |
birds | 31 | 482205 | 22397 | 11761 | birds.tar.gz | 07-11-2022 |
cnidarians | 9 | 203000 | 24003 | 5547 | cnidarians.tar.gz | 07-11-2022 |
crustaceans | 7 | 154960 | 37407 | 5216 | crustaceans.tar.gz | 07-11-2022 |
dothideomycetes | 10 | 123200 | 28898 | 5567 | dothideomycetes.tar.gz | 07-11-2022 |
eudicots | 93 | 3180221 | 102679 | 8230 | eudicots.tar.gz | 07-11-2022 |
euglenozoa | 9 | 86483 | 12363 | 4638 | euglenozoa.tar.gz | 07-11-2022 |
eurotiomycetes | 20 | 196228 | 25710 | 4723 | eurotiomycetes.tar.gz | 07-11-2022 |
fishes | 64 | 1736572 | 43690 | 13248 | fishes.tar.gz | 07-11-2022 |
flatworms | 4 | 58181 | 17784 | 4237 | flatworms.tar.gz | 07-11-2022 |
fungi | 138 | 1278312 | 148080 | 2138 | fungi.tar.gz | 07-11-2022 |
insects | 101 | 1376824 | 70170 | 5971 | insects.tar.gz | 07-11-2022 |
leotiomycetes | 5 | 67865 | 21669 | 5707 | leotiomycetes.tar.gz | 07-11-2022 |
mammals | 94 | 1910363 | 47144 | 14776 | mammals.tar.gz | 07-11-2022 |
mollusks | 9 | 206905 | 35775 | 6726 | mollusks.tar.gz | 07-11-2022 |
monocots | 17 | 560027 | 43452 | 9611 | monocots.tar.gz | 07-11-2022 |
nematodes | 6 | 134093 | 35865 | 3280 | nematodes.tar.gz | 07-11-2022 |
plants | 127 | 3968027 | 162990 | 3485 | plants.tar.gz | 07-11-2022 |
protists | 52 | 660237 | 134452 | 602 | protists.tar.gz | 07-11-2022 |
reptiles | 20 | 384584 | 21725 | 12715 | reptiles.tar.gz | 07-11-2022 |
saccharomycetes | 36 | 195913 | 14873 | 3079 | saccharomycetes.tar.gz | 07-11-2022 |
stramenopiles | 8 | 119746 | 31582 | 567 | stramenopiles.tar.gz | 07-11-2022 |
vertebrates | 212 | 4588985 | 83704 | 8222 | vertebrates.tar.gz | 07-11-2022 |
We fully support customer built database. See MANUAL 13. Custom built database.
The following 8 databases are used for the assessment of Seq2Fun version 1 with mouse, chicken, zebrafish and roundworm datasets.
The RNA-seq data can be download from here.
Group | Proteins | KOs | Species | Filename |
---|---|---|---|---|
Mammals_no_mouse | 356,672 | 5,622 | 64 | mammals_no_mouse.tar.gz |
Mouse | 8,438 | 5,437 | 1 | mouse.tar.gz |
Birds_no_chicken | 81,576 | 4,176 | 23 | birds_no_chicken.tar.gz |
Chicken | 4,930 | 3,921 | 1 | chicken.tar.gz |
Fishes_no_zebrafish | 267,954 | 4,235 | 38 | fishes_no_zebrafish.tar.gz |
Zebrafish | 6,047 | 3,963 | 1 | zebrafish.tar.gz |
Nematodes_no_roundworm | 13,939 | 2,950 | 5 | nematodes_no_worm.tar.gz |
Roundworm | 3,081 | 2,391 | 1 | worm.tar.gz |