AnimeTosho Database Exports

Exports of AnimeTosho's database are dumped here daily, and can be downloaded and used for whatever purpose without restriction.

Notes:

these data sets are provided without any support beyond the brief explanation offerred below
data structures may change without notice
this document may not get updated to reflect possible changes
the AT script has evolved over time, so older entries may lack data, or have different data to newer entries
due to bugs etc, some errors are to be expected
if it looks like dumps are broken or aren't being updated, please report the issue here

Download

Files updated: 2024-07-26

Torrents table (65.64 MB)
Files table (325.61 MB)
Attachments table (51.51 MB)
Attachment Files table (57.37 MB)

Available File Links snapshots:

Format

Data is in the form of text files, with fields separated by tabs and each line representing a record/row. Data is encoded in UTF-8, with Unix line endings, and tab, newline, null and backslash characters escaped using C-style escapes (\t, \n, \0 and \\ respectively).
The first row always contains the column headers - you can use these to sort data appropriately, to deal with structural changes.
Data is actually exported from a MySQL database, and can be imported into MySQL using a LOAD DATA query. You can also load it into Microsoft Excel, and although I'd like to joke about trying to load sizable data sets into Excel, it actually works...

The following describes the tables and fields.

Torrents Table

This table contains all torrent entries. For example, the display on the homepage simply pulls latest items from this table.

Fields:

id: unique identifier
tosho_id: ID from TokyoTosho; 0 if none available
nyaa_id: ID from Nyaa; 0 if none available
anidex_id: ID from AniDex; 0 if none available
name: name/title
link: original HTTP torrent download link
magnet: magnet link of torrent, either obtained from source or generated from torrent file
cat: TokyoTosho category
website: URL for website
totalsize: total size of all files in torrent, in bytes
date_posted: Unix timestamp of when torrent was uploaded
comment:
date_added: Unix timestamp of when torrent was grabbed by AT scripts
date_completed: Unix timestamp of when torrent download was completed
torrentname: Name extracted from torrent file
torrentfiles: Number of files found in the torrent
stored_nzb: Whether AT has an NZB stored with this entry. If available, the NZB can be downloaded from https://storage.animetosho.org/nzbs/xxxxxxxx/file.nzb replacing xxxxxxxx with the 8 character hex encoded ID (see id column, convert the number to hexadecimal representation and left pad with zeroes)
stored_torrent: Whether AT has a .torrent stored with this entry. If available, the torrent can be downloaded from https://storage.animetosho.org/torrent/hex-btih/torrent.torrent replacing hex-btih with the 40 character lower-case hex encoded info hash (see btih column)
tosho_uhash: hex encoded string of the Submitter Hash from TT
tosho_uauth: TT's "Authorized" data: 0=unknown, 1=No, 2=Anonymous, 3=Yes, -1=other
tosho_uname: TT's submitter username
nyaa_info: additional Nyaa info, JSON encoded
nyaa_class: Nyaa's classification: 0=unknown, 1=remake, 2=none, 3=trusted, 4=a+, -1=hidden
nyaa_cat: Nyaa's category
anidex_info: additional AniDex info, JSON encoded
anidex_cat: AniDex's category
anidex_lang: AniDex's language ID
anidex_labels: labels from AniDex as bit flags: 1=batch, 2=raw, 4=hentai, 8=reencode
anidex_uid: AniDex's submitter user ID; username available in anidex_info
btih: hex encoded torrent info hash
btih_sha256: hex encoded BitTorrent v2 info hash, if file is a BTv2 or hybrid torrent file
isdupe: whether this entry is considered a duplicate, based on BTIH
deleted: whether this entry is marked as deleted
date_updated: Unix timestamp of when this row was last updated
aid: related AniDB anime ID
eid: related AniDB episode ID
fid: related AniDB file ID
gids: related AniDB group IDs, comma separated list
resolveapproved: whether exclamation mark shows up next to the anime title in the view page
main_fileid: if the torrent contains one file of significance, will be the AT file ID, otherwise 0. If this is set, links from this file are displayed on the home page
srcurl: source article URL
srcurltype: source article type
srctitle: source article title
status: torrent status: 0=downloading, 1=downloaded, -1=skipped, -2=broken, -3=other error

Files Table

This table contains all file entries. Torrents contain one or more files.

Fields:

id: unique identifier
torrent_id: ID of associated torrent entry
is_archive: 1 if this file is an 7z archive created by the Anime Tosho script, 0 otherwise
filename: file's name; includes path if supplied
filesize: file's size in bytes
vidframes: non-empty if video I-frames are being stored for screenshot purposes. Is a comma separated list of integers which are timestamps at which the frame occurs in the video (miliseconds elapsed since start of video). Stored I-frames can be downloaded from https://storage.animetosho.org/sframes/xxxxxxxx_time.mkv where xxxxxxxx is the file's ID, hex encoded and left padded with zeroes to 8 characters, and time is the timestamp of the frame. If soft subtitles have been rendered for the frame, they can be downloaded from https://storage.animetosho.org/sframes/xxxxxxxx_track_time.webp where track is the track number of the subtitle (from the original video, usually is track 3)
crc32: CRC32 hash of file, hex encoded, if available
md5: MD5 hash of file, hex encoded, if available
sha1: SHA1 hash of file, hex encoded, if available
sha256: SHA256 hash of file, hex encoded, if available
tth: TTH hash of file, hex encoded, if available
ed2k: ED2K hash of file, hex encoded, if available
bt2: BitTorrent v2 (2017-08-31) root hash of file, hex encoded, if available
crc32k: CRC32 hash of first 1KB of file, hex encoded, if available
torpc_sha1_*: hex encoded SHA1 hash of concatenated SHA1 hashes (binary encoded) of the respective block size. For example, the torpc_sha1_16k hash is obtained by breaking the file into 16KB blocks (if the last block is less than 16KB, it is discarded), calculating a 20 byte SHA1 hash for each block, concatenating these hashes, which is then fed through SHA1 to obtain the final hash. The selected block sizes correspond with the most common piece sizes used for torrents, and hence this hash can be useful in trying to detect duplicate torrents which have different info hash values.

Mediainfo and related data are currently not included mainly due to size and time it takes to dump the data. The data is also compressed using a custom LZMA2 based scheme, which users would need to implement a decompressor for. I may consider including this data if many are interested in such.

Attachments Table

This table contains all attachment (subtitles, fonts etc) entries. File entries contain 0 or more attachment entries. Note that attachments are de-duplicated, and hence, there's a separate Attachment Files table (below) which describes unique attachment files.

Fields:

id: unique identifier
file_id: ID of associated file entry
filename: attachment file's name
attachfile_id: ID of associated attachment file entry
type: 1=subtitle, 2=xmlmeta, 0=other
info: JSON document containing additional info

Attachment Files Table

This table contains information about attachment files stored on disk. A file on disk can be linked to multiple attachments (due to de-duplication).

Fields:

id: unique identifier; files can be downloaded by converting this ID to a hex representation, left-padding with 0's to make it 8 hex characters long, and visiting the URL https://storage.animetosho.org/attach/xxxxxxxx/file.xz, replacing xxxxxxxx with the 8 character hex representation of the ID, left padded with zeroes
sha1: hex encoded SHA1 hash of the file
filesize: size of file
packedsize: size of file, after XZ compression

File Links Exports

These contain the download links generated for files.
The data is an export of links generated/updated since the last export, performed daily. Only a few days' worth of snapshots are retained, and the full table is not available due to size.

Fields:

id: unique identifier
file_id: ID of associated file entry
site: displayed site name for the link. Sub-links are indicated as Parent|Child
part: 1-based part number; if file wasn't split, will be 1
url: the link URL
date_added: Unix timestamp of when this entry was added
date_updated: Unix timestamp of when this entry was last updated

Other Tables

The other tables used by Anime Tosho probably won't be supplied for the following reasons:

Tracker Scrape Tables: contains seeder/leecher stats scraped from torrent trackers; this information can usually be scraped easily
AniDB Info Tables: contains all data retrieved from AniDB, such as series information; please see AniDB API for obtaining data
AniDB - TVDB Mapping Table: used to map AniDB references to TVDB/IMDB. Original mapping data 'anime-lists' can be found here
Source Scrape Tables: contains all data scraped from upstream sources (TokyoTosho, Nyaa, AniDex). Please see sources for data dumps if desired

Code

All open sourced code can be found on this GitHub page.