CollecTor fetches data from various nodes and services in the public Tor network and makes it available to the world. If you're doing research on the Tor network, or if you're developing an application that uses Tor network data, this is your place to start.
Browse Recent Descriptors Browse Archived DescriptorsDescriptors are available in two different file formats: recent descriptors that were published in the last 72 hours are available as plain text, and archived descriptors covering many years of Tor network history are available as compressed tarballs.
Descriptor Type | Type Annotation | Published | Descriptors |
---|---|---|---|
Tor Relay Descriptors | |||
Relay Server Descriptors | @type server-descriptor 1.0 |
2005-12 to present | recent archive |
Relay Extra-info Descriptors | @type extra-info 1.0 |
2007-08 to present | recent archive |
Network Status Consensuses | @type network-status-consensus-3 1.0 |
2010-04 to present | recent archive |
Network Status Votes | @type network-status-vote-3 1.0 |
2007-10 to present | recent archive |
Directory Key Certificates | @type dir-key-certificate-3 1.0 |
archive | |
Detached Signatures | @type detached-signature-3 1.0 |
||
Microdescriptor Consensuses | @type network-status-microdesc-consensus-3 1.0 |
2014-01 to present | recent archive |
Microdescriptors | @type microdescriptor 1.0 |
2014-01 to present | recent archive |
Network Status Entries | @type network-status-entry-3 1.0 |
||
Version 2 Network Statuses | @type network-status-2 1.0 |
2005-12 to 2012-03 | archive |
Version 1 Directories | @type directory 1.0 |
2004-05 to 2007-08 | archive |
Tor Bridge Descriptors | |||
Bridge Network Statuses | @type bridge-network-status 1.2 |
2008-05 to present | recent archive |
Bridge Server Descriptors | @type bridge-server-descriptor 1.2 |
2008-05 to present | recent archive |
Bridge Extra-info Descriptors | @type bridge-extra-info 1.3 |
2008-05 to present | recent archive |
Tor Hidden Service Descriptors | |||
Hidden Service Descriptors | @type hidden-service-descriptor 1.0 |
||
Hidden Service Descriptors v3 | @type hidden-service-descriptor-3 1.0 |
||
BridgeDB's Bridge Pool Assignments | |||
Bridge Pool Assignments | @type bridge-pool-assignment 1.0 |
2010-09 to present | recent archive |
TorDNSEL's Exit Lists | |||
Exit Lists | @type tordnsel 1.0 |
2010-02 to present | recent archive |
Torperf's and OnionPerf's Performance Data | |||
Torperf Measurement Results | @type torperf 1.0 |
2009-07 to 2020-05 | archive|
OnionPerf Analysis Files | @type torperf 1.1 |
2017-04 to present | recent archive |
Tor web server logs | |||
Tor web server logs | 2015-01 to present | recent archive | |
Bandwidth Files | |||
Bandwidth Files | @type bandwidth-file 1.0 |
2017-08 to present | recent archive |
Snowflake Statistics | |||
Snowflake Statistics | @type snowflake-stats 1.0 |
2019-06 to present | recent archive |
BridgeDB metrics | |||
BridgeDB metrics | @type bridgedb-metrics 1.0 |
2019-09 to present | recent archive |
Each descriptor provided here contains an @type
annotation using
the format @type $descriptortype $major.$minor
.
Any tool that processes these descriptors may parse files without meta
data or with an unknown descriptor type at its own risk, can safely parse
files with known descriptor type and same major version number, and should
not parse files with known descriptor type and higher major version
number.
Relays and directory authorities publish relay descriptors, so that clients can select relays for their paths through the Tor network. All these relay descriptors are specified in the Tor directory protocol, version 3 specification document (or in the earlier protocol version 2 or version 1).
@type server-descriptor 1.0
recent
archive
#
Server descriptors contain information that relays publish about themselves. Tor clients once downloaded this information, but now they use microdescriptors instead. The server descriptors in the descriptor archives contain one descriptor per file, whereas the recently published files contain all descriptors collected in an hour concatenated into a single file.
@type extra-info 1.0
recent
archive
#
Extra-info descriptors contain relay information that Tor clients do not need in order to function. These are self-published, like server descriptors, but not downloaded by clients by default. The extra-info descriptors in the descriptor archives contain one descriptor per file, whereas the recently published files contain all descriptors collected in an hour concatenated into a single file.
@type network-status-consensus-3 1.0
recent
archive
#
Though Tor relays are decentralized, the directories that track the overall network are not. These central points are called directory authorities, and every hour they publish a document called a consensus, or network status document. The consensus is made up of router status entries containing flags, heuristics used for relay selection, etc.
@type network-status-vote-3 1.0
recent
archive
#
The directory authorities exchange votes every hour to come up with a common consensus. Vote documents are by far the largest documents provided here.
@type dir-key-certificate-3 1.0
archive
#
The directory authorities sign votes and the consensus with their key that they publish in a key certificate. These key certificates change once every few months, so they are only available in a single descriptor archive tarball.
@type detached-signature-3 1.0
#
Detached signature as per section 3.10 of the dir-spec, and downloadable for
DistSeconds every consensus freshness period (usually five minutes each hour)
via the /tor/status-vote/next/consensus-signatures
resource.
@type network-status-microdesc-consensus-3 1.0
recent
archive
#
Tor clients used to download all server descriptors of active relays, but now they only download the smaller microdescriptors which are derived from server descriptors. The microdescriptor consensus lists all active relays and references their currently used microdescriptor. The descriptor archive tarballs contain both microdescriptor consensuses and referenced microdescriptors together.
@type microdescriptor 1.0
recent
archive
#
Microdescriptors are minimalistic documents that just includes the information necessary for Tor clients to work. The descriptor archive tarballs contain both microdescriptor consensuses and referenced microdescriptors together. The microdescriptors in descriptor archive tarballs contain one descriptor per file, whereas the recently published files contain all descriptors collected in an hour concatenated into a single file.
@type network-status-entry-3 1.0
#
Individual router status entry from an unflavored v3 network status document.
These are available from Tor's control port GETINFO ns/*
commands
and NS events.
@type network-status-2 1.0
archive
#
Version 2 network statuses have been published by the directory authorities before consensuses have been introduced. In contrast to consensuses, each directory authority published their own authoritative view on the network, and clients combined these documents locally. We stopped archiving version 2 network statuses in 2012.
@type directory 1.0
archive
#
The first directory protocol version combined the list of active relays with server descriptors in a single directory document. We stopped archiving version 1 directories in 2007.
Bridges and the bridge authority publish bridge descriptors that are used by censored clients to connect to the Tor network. We cannot, however, make bridge descriptors available as we do with relay descriptors, because that would defeat the purpose of making bridges hard to enumerate for censors. We therefore sanitize bridge descriptors by removing all potentially identifying information and publish sanitized versions here. The sanitizing steps are specified in detail on a separate page.
@type bridge-network-status 1.2
recent
archive
#
Sanitized bridge network statuses are similar to version 2 relay network
statuses, but with only a published
line and a
fingerprint
line in the header, and
without any lines in the footer.
The format has changed over time to accomodate changes to the sanitizing
process, with earlier versions being:
@type bridge-network-status 1.0
was the first version.@type bridge-network-status 1.1
introduced sanitized TCP
ports.@type bridge-network-status 1.2
introduced the
fingerprint
line, containing the fingerprint of the bridge
authority which produced the document, to the header.@type bridge-server-descriptor 1.2
recent
archive
#
Bridge server descriptors follow the same format as relay server descriptors, except for the sanitizing steps described above. The bridge server descriptor archive tarballs contain one descriptor per file, whereas recently published bridge server descriptor files contain all descriptors collected in an hour concatenated into a single file to reduce the number of files. The format has changed over time to accomodate changes to the sanitizing process, with earlier versions being:
@type bridge-server-descriptor 1.0
was the first version.ntor-onion-key
lines, but due to a mistake only the version number
of sanitized bridge extra-info descriptors was raised.
As a result, there may be sanitized bridge server descriptors with version
@type bridge-server-descriptor 1.0
with and without those
lines.@type bridge-server-descriptor 1.1
added
master-key-ed25519
lines and router-digest-sha256
to
server descriptors published by bridges using an Ed25519 master
key.@type bridge-server-descriptor 1.2
introduced sanitized TCP
ports.@type bridge-extra-info 1.3
recent
archive
#
Bridge extra-info descriptors follow the same format as relay extra-info descriptors, except for the sanitizing steps described above. The format has changed over time to accomodate changes to the sanitizing process, with earlier versions being:
@type bridge-extra-info 1.0
was the first version.@type bridge-extra-info 1.1
added sanitized
transport
lines.@type bridge-extra-info 1.2
was supposed to indicate added
ntor-onion-key
lines, but those changes only affect bridge server
descriptors, not extra-info descriptors.
So, nothing has changed as compared to version 1.1.@type bridge-extra-info 1.3
added master-key-ed25519
lines and router-digest-sha256
to extra-info descriptors
published by bridges using an Ed25519 master key.The bridge extra-info descriptor archive tarballs contain one descriptor per file, whereas recently published bridge extra-info descriptor files contain all descriptors collected in an hour concatenated into a single file to reduce the number of files.
Tor hidden services make it possible for users to hide their locations while offering various kinds of services, such as web publishing or an instant messaging server. A hidden service assembles a hidden service descriptor to make its service available in the network. This descriptor gets stored on hidden service directories and can be retrieved by hidden service clients. Hidden service descriptors are not formally archived, but some libraries support parsing these descriptors when obtaining them from a locally running Tor instance.
@type hidden-service-descriptor 1.0
#
Hidden service descriptors contain all details that are necessary for clients to connect to a hidden service. Despite the version number being 1.0, these descriptors are part of the version 2 hidden service protocol.
@type hidden-service-descriptor-3 1.0
#
Hidden service descriptors contain details required to connect to version 3 hidden service. Despite the version number being 1.0, these descriptors are part of the version 3 hidden service protocol.
The bridge distribution service BridgeDB publishes bridge pool assignments describing which bridges it has assigned to which distribution pool. BridgeDB receives bridge network statuses from the bridge authority, assigns these bridges to persistent distribution rings, and hands them out to bridge users. BridgeDB periodically dumps the list of running bridges with information about the rings, subrings, and file buckets to which they are assigned to a local file. The sanitized versions of these lists containing SHA-1 hashes of bridge fingerprints instead of the original fingerprints are available for statistical analysis.
@type bridge-pool-assignment 1.0
recent
archive
#
The document below shows a BridgeDB pool assignment file from April 09, 2022. Every such file begins with a line containing the timestamp when rdsys wrote this file. Subsequent lines start with the SHA-1 hash of a bridge fingerprint, followed by distribution mechanism, transport, ip and country block list information. There are currently different distributor mechanism in rdsys:
port=$port
.
https
ring can also be assigned to subrings by
OR port.The each bridge in the assignment contains a list of key=value parameters with the following keys:
@type bridge-pool-assignment 1.1 bridge-pool-assignment 2022-04-09 00:29:37 005fd4d7decbb250055b861579e6fdc79ad17bee email transport=obfs4 ip=4 blocklist=ru port=443 distributed=true state=functional bandwidth=accepted ratio=1.902 00782946f4c54ce1d028f21e541ef8440ecaa0ee settings ip=4 blocklist=ru distributed=true state=functional bandwidth=untested 00e1ae6cb75e47e363e6aef9f67a49c0e854fde7 moat transport=obfs4 ip=4 distributed=false state=functional bandwidth=rejected ratio=0.569 00e6f1d633d4e29db31f43d1e6e3e928e5c1810d moat transport=obfs4 ip=4 blocklist=ru distributed=false state=functional bandwidth=rejected ratio=0.752 0110a6cf41a07637808fff79c0783ff37462b525 email ip=4 blocklist=ru distributed=true state=functional bandwidth=accepted ratio=1.223 [...]
The exit list service TorDNSEL publishes exit lists containing the IP addresses of relays that it found when exiting through them.
@type tordnsel 1.0
recent
archive
#
TorDNSEL makes the list of known exits and corresponding exit IP
addresses available in a specific format.
The document below shows an entry of the exit list written on
December 28, 2010 at 15:21:44 UTC.
This entry means that the relay with fingerprint 63BA..
which
published a descriptor at 07:35:55 and was contained in a version 2
network status from 08:10:11 uses two different IP addresses for exiting.
The first address 91.102.152.236
was found in a test performed at
07:10:30.
When looking at the corresponding server descriptor, one finds that this
is also the IP address on which the relay accepts connections from inside
the Tor network.
A second test performed at 10:35:30 reveals that the relay also uses IP
address 91.102.152.227
for exiting.
ExitNode 63BA28370F543D175173E414D5450590D73E22DC Published 2010-12-28 07:35:55 LastStatus 2010-12-28 08:10:11 ExitAddress 91.102.152.236 2010-12-28 07:10:30 ExitAddress 91.102.152.227 2010-12-28 10:35:30
The performance measurement services Torperf and OnionPerf publish performance data from making simple HTTP requests over the Tor network. Torperf/OnionPerf use a SOCKS client to download files of various sizes over the Tor network and notes how long substeps take.
@type torperf 1.0
archive
#
A Torperf results file contains a single line per Torperf run with
key=value
pairs.
Such a result line is sufficient to learn about 1) the Tor and Torperf
configuration, 2) measurement results, and 3) additional information that
might help explain the results.
Known keys in @type torperf 1.0
are explained below.
SOURCE:
Configured name of the data source; required.FILESIZE:
Configured file size in bytes; required.START:
Time when the connection process starts;
required.SOCKET:
Time when the socket was created; required.CONNECT:
Time when the socket was connected; required.NEGOTIATE:
Time when SOCKS 5 authentication methods have been
negotiated; required.REQUEST:
Time when the SOCKS request was sent; required.RESPONSE:
Time when the SOCKS response was received;
required.DATAREQUEST:
Time when the HTTP request was written;
required.DATARESPONSE:
Time when the first response was received;
required.DATACOMPLETE:
Time when the payload was complete;
required.WRITEBYTES:
Total number of bytes written; required.READBYTES:
Total number of bytes read; required.DIDTIMEOUT:
1 if the request timed out, 0 otherwise;
optional.DATAPERCx:
Time when x% of expected bytes were read for
x = { 10, 20, 30, 40, 50, 60, 70, 80, 90 }; optional.LAUNCH:
Time when the circuit was launched; optional.USED_AT:
Time when this circuit was used; optional.PATH:
List of relays in the circuit, separated by commas;
optional.BUILDTIMES:
List of times when circuit hops were built,
separated by commas; optional.TIMEOUT:
Circuit build timeout in milliseconds that the Tor
client used when building this circuit; optional.QUANTILE:
Circuit build time quantile that the Tor client
uses to determine its circuit-build timeout; optional.CIRC_ID:
Circuit identifier of the circuit used for this
measurement; optional.USED_BY:
Stream identifier of the stream used for this
measurement; optional.@type torperf 1.1
recent
archive
#
OnionPerf exports its measurements once per day in an analysis file using the
JSON format specified
here. That file is not by default in the @type torperf 1.1
format but needs to get converted for that.
OnionPerf adds a few more keys in @type torperf 1.1
:
ENDPOINTLOCAL:
Hostname, IP address, and port that the TGen client used to connect to the local tor SOCKS port, formatted as hostname:ip:port
, which may be "NULL:0.0.0.0:0"
if TGen was not able to find this information; optional.ENDPOINTPROXY:
Hostname, IP address, and port that the TGen client used to connect to the SOCKS proxy server that tor runs, formatted as hostname:ip:port
, which may be "NULL:0.0.0.0:0"
if TGen was not able to find this information; optional.ENDPOINTREMOTE:
Hostname, IP address, and port that the TGen client used to connect to the remote server, formatted as hostname:ip:port
, which may be "NULL:0.0.0.0:0"
if TGen was not able to find this information; optional.HOSTNAMELOCAL:
Client machine hostname, which may be "(NULL)"
if the TGen client was not able to find this information; optional.HOSTNAMEREMOTE:
Server machine hostname, which may be "(NULL)"
if the TGen server was not able to find this information; optional.SOURCEADDRESS:
Public IP address of the OnionPerf host obtained by connecting to well-known servers and finding the IP address in the result, which may be "unknown"
if OnionPerf was not able to find this information; optional.Tor's web servers, like most web servers, keep request logs for maintenance and informational purposes. However, unlike most other web servers, Tor's web servers use a privacy-aware log format that avoids logging too sensitive data about their users. Also unlike most other web server logs, Tor's logs are neither archived nor analyzed before performing a number of post-processing steps to further reduce any privacy-sensitive parts.
The data format and sanitizing steps for Tor web server logs are specified in detail on a separate page.
@type bandwidth-file 1.0
recent
archive
#
Bandwidth authority metrics as defined in the bandwidth-file-spec.
These are available from a DirPort's
/tor/status-vote/next/bandwidth
url and CollecTor.
@type snowflake-stats 1.0
recent
archive
#
Snowflake statistics containing aggregated information about snowflake proxies and snowflake clients as generated by the snowflake broker.
@type bridgedb-metrics 1.0
recent
archive
#
BridgeDB metrics as defined in the BridgeDB metrics spec. Those metrics contain aggregated information about requests to the BridgeDB service.
There are multiple ways to download descriptors from this site. Of course, the obvious way is to browse the directories and download contained files using your browser. However, this method cannot be automated very well.
wget
#
A more elaborate way to automatically download descriptors is to use Unix tools like wget
which support recursively downloading files from this site. Example:
wget --recursive \ # turn on recursive retrieving --reject "index.html*" \ # don't retrieve directory listings --no-parent \ # don't ascend to parent directory --no-host-directories \ # don't generate host-prefixed directories --directory-prefix descriptors \ # set directory prefix https://collector.torproject.org/recent/relay-descriptors/microdescs/
index.json
#Another automated way to download descriptors is to develop a tool that uses the provided index.json
file or one of its compressed versions index.json.gz
, index.json.bz2
, or index.json.xz
.
These files contain a machine-readable representation of all descriptor files available on this site.
Index files use the following custom JSON data format that might still be extended at a later time:
"index_created"
: Timestamp when this index was created using pattern "YYYY-MM-DD HH:MM"
in the UTC timezone."build_revision"
: Git revision of the CollecTor instance's software used to create this file, which will be omitted if unknown."path"
: Base URL of this index file and all included resources."files"
: List of file objects of files available from the document root, which will be omitted if empty.
"directories"
: List of directory objects of directories available from the document root, which will be omitted if empty."path"
: Relative path of the directory."files"
: List of file objects of files available from this directory, which will be omitted if empty.
"directories"
: List of directory objects of directories available from this directory, which will be omitted if empty."path"
: Relative path of the file."size"
: Size of the file in bytes."last_modified"
: Timestamp when the file was last modified using pattern "YYYY-MM-DD HH:MM"
in the UTC timezone."types"
: Descriptor types as found in @type
annotations of contained descriptors."first_published"
: Earliest publication timestamp of contained descriptors using pattern "YYYY-MM-DD HH:MM"
in the UTC timezone."last_published"
: Latest publication timestamp of contained descriptors using pattern "YYYY-MM-DD HH:MM"
in the UTC timezone."sha256"
: SHA-256 digest of this file, encoded in Base64.© 2009–2023 The Tor Project
This material is supported in part by the National Science Foundation under Grant No. CNS-0959138. Any opinions, finding, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. "Tor" and the "Onion Logo" are registered trademarks of The Tor Project, Inc.. Data on this site is freely available under a CC0 no copyright declaration: To the extent possible under law, the Tor Project has waived all copyright and related or neighboring rights in the data. Graphs are licensed under a Creative Commons Attribution 3.0 United States License.