Exports of AnimeTosho's database are dumped here daily, and can be downloaded and used for whatever purpose without restriction.
Notes:
- these data sets are provided without any support beyond the brief explanation offerred below
- data structures may change without notice
- this document may not get updated to reflect possible changes
- the AT script has evolved over time, so older entries may lack data, or have different data to newer entries
- due to bugs etc, some errors are to be expected
- if it looks like dumps are broken or aren't being updated, please report the issue here
Download
Files updated: 2024-09-07
Available File Links snapshots:
Format
Data is in the form of text files, with fields separated by tabs and each line representing a record/row. Data is encoded in UTF-8, with Unix line endings, and tab, newline, null and backslash characters escaped using C-style escapes (\t
, \n
, \0
and \\
respectively).
The first row always contains the column headers - you can use these to sort data appropriately, to deal with structural changes.
Data is actually exported from a MySQL database, and can be imported into MySQL using a LOAD DATA query. You can also load it into Microsoft Excel, and although I'd like to joke about trying to load sizable data sets into Excel, it actually works...
The following describes the tables and fields.
Torrents Table
This table contains all torrent entries. For example, the display on the homepage simply pulls latest items from this table.
Fields:
- id: unique identifier
- tosho_id: ID from TokyoTosho; 0 if none available
- nyaa_id: ID from Nyaa; 0 if none available
- anidex_id: ID from AniDex; 0 if none available
- name: name/title
- link: original HTTP torrent download link
- magnet: magnet link of torrent, either obtained from source or generated from torrent file
- cat: TokyoTosho category
- website: URL for website
- totalsize: total size of all files in torrent, in bytes
- date_posted: Unix timestamp of when torrent was uploaded
- comment:
- date_added: Unix timestamp of when torrent was grabbed by AT scripts
- date_completed: Unix timestamp of when torrent download was completed
- torrentname: Name extracted from torrent file
- torrentfiles: Number of files found in the torrent
- stored_nzb: Whether AT has an NZB stored with this entry. If available, the NZB can be downloaded from https://storage.animetosho.org/nzbs/xxxxxxxx/file.nzb replacing xxxxxxxx with the 8 character hex encoded ID (see id column, convert the number to hexadecimal representation and left pad with zeroes)
- stored_torrent: Whether AT has a .torrent stored with this entry. If available, the torrent can be downloaded from https://storage.animetosho.org/torrent/hex-btih/torrent.torrent replacing hex-btih with the 40 character lower-case hex encoded info hash (see btih column)
- tosho_uhash: hex encoded string of the Submitter Hash from TT
- tosho_uauth: TT's "Authorized" data: 0=unknown, 1=No, 2=Anonymous, 3=Yes, -1=other
- tosho_uname: TT's submitter username
- nyaa_info: additional Nyaa info, JSON encoded
- nyaa_class: Nyaa's classification: 0=unknown, 1=remake, 2=none, 3=trusted, 4=a+, -1=hidden
- nyaa_cat: Nyaa's category
- anidex_info: additional AniDex info, JSON encoded
- anidex_cat: AniDex's category
- anidex_lang: AniDex's language ID
- anidex_labels: labels from AniDex as bit flags: 1=batch, 2=raw, 4=hentai, 8=reencode
- anidex_uid: AniDex's submitter user ID; username available in anidex_info
- btih: hex encoded torrent info hash
- btih_sha256: hex encoded BitTorrent v2 info hash, if file is a BTv2 or hybrid torrent file
- isdupe: whether this entry is considered a duplicate, based on BTIH
- deleted: whether this entry is marked as deleted
- date_updated: Unix timestamp of when this row was last updated
- aid: related AniDB anime ID
- eid: related AniDB episode ID
- fid: related AniDB file ID
- gids: related AniDB group IDs, comma separated list
- resolveapproved: whether exclamation mark shows up next to the anime title in the view page
- main_fileid: if the torrent contains one file of significance, will be the AT file ID, otherwise 0. If this is set, links from this file are displayed on the home page
- srcurl: source article URL
- srcurltype: source article type
- srctitle: source article title
- status: torrent status: 0=downloading, 1=downloaded, -1=skipped, -2=broken, -3=other error
Files Table
This table contains all file entries. Torrents contain one or more files.
Fields:
- id: unique identifier
- torrent_id: ID of associated torrent entry
- is_archive: 1 if this file is an 7z archive created by the Anime Tosho script, 0 otherwise
- filename: file's name; includes path if supplied
- filesize: file's size in bytes
- vidframes: non-empty if video I-frames are being stored for screenshot purposes. Is a comma separated list of integers which are timestamps at which the frame occurs in the video (miliseconds elapsed since start of video). Stored I-frames can be downloaded from https://storage.animetosho.org/sframes/xxxxxxxx_time.mkv where xxxxxxxx is the file's ID, hex encoded and left padded with zeroes to 8 characters, and time is the timestamp of the frame. If soft subtitles have been rendered for the frame, they can be downloaded from https://storage.animetosho.org/sframes/xxxxxxxx_track_time.webp where track is the track number of the subtitle (from the original video, usually is track 3)
- crc32: CRC32 hash of file, hex encoded, if available
- md5: MD5 hash of file, hex encoded, if available
- sha1: SHA1 hash of file, hex encoded, if available
- sha256: SHA256 hash of file, hex encoded, if available
- tth: TTH hash of file, hex encoded, if available
- ed2k: ED2K hash of file, hex encoded, if available
- bt2: BitTorrent v2 (2017-08-31) root hash of file, hex encoded, if available
- crc32k: CRC32 hash of first 1KB of file, hex encoded, if available
- torpc_sha1_*: hex encoded SHA1 hash of concatenated SHA1 hashes (binary encoded) of the respective block size. For example, the torpc_sha1_16k hash is obtained by breaking the file into 16KB blocks (if the last block is less than 16KB, it is discarded), calculating a 20 byte SHA1 hash for each block, concatenating these hashes, which is then fed through SHA1 to obtain the final hash. The selected block sizes correspond with the most common piece sizes used for torrents, and hence this hash can be useful in trying to detect duplicate torrents which have different info hash values.
Mediainfo and related data are currently not included mainly due to size and time it takes to dump the data. The data is also compressed using a custom LZMA2 based scheme, which users would need to implement a decompressor for. I may consider including this data if many are interested in such.
Attachments Table
This table contains all attachment (subtitles, fonts etc) entries. File entries contain 0 or more attachment entries. Note that attachments are de-duplicated, and hence, there's a separate Attachment Files table (below) which describes unique attachment files.
Fields:
- id: unique identifier
- file_id: ID of associated file entry
- filename: attachment file's name
- attachfile_id: ID of associated attachment file entry
- type: 1=subtitle, 2=xmlmeta, 0=other
- info: JSON document containing additional info
Attachment Files Table
This table contains information about attachment files stored on disk. A file on disk can be linked to multiple attachments (due to de-duplication).
Fields:
- id: unique identifier; files can be downloaded by converting this ID to a hex representation, left-padding with 0's to make it 8 hex characters long, and visiting the URL https://storage.animetosho.org/attach/xxxxxxxx/file.xz, replacing xxxxxxxx with the 8 character hex representation of the ID, left padded with zeroes
- sha1: hex encoded SHA1 hash of the file
- filesize: size of file
- packedsize: size of file, after XZ compression
File Links Exports
These contain the download links generated for files.
The data is an export of links generated/updated since the last export, performed daily. Only a few days' worth of snapshots are retained, and the full table is not available due to size.
Fields:
- id: unique identifier
- file_id: ID of associated file entry
- site: displayed site name for the link. Sub-links are indicated as Parent|Child
- part: 1-based part number; if file wasn't split, will be 1
- url: the link URL
- date_added: Unix timestamp of when this entry was added
- date_updated: Unix timestamp of when this entry was last updated
Other Tables
The other tables used by Anime Tosho probably won't be supplied for the following reasons:
- Tracker Scrape Tables: contains seeder/leecher stats scraped from torrent trackers; this information can usually be scraped easily
- AniDB Info Tables: contains all data retrieved from AniDB, such as series information; please see AniDB API for obtaining data
- AniDB - TVDB Mapping Table: used to map AniDB references to TVDB/IMDB. Original mapping data 'anime-lists' can be found here
- Source Scrape Tables: contains all data scraped from upstream sources (TokyoTosho, Nyaa, AniDex). Please see sources for data dumps if desired
Code
All open sourced code can be found on this GitHub page.