metrics-lib

Welcome to metrics-lib, a Java API that facilitates processing Tor network data from the CollecTor service for statistical analysis and for building services and applications.

In the tutorials below we're explaining the basic steps to get you started with metrics-lib.

Prerequisites and preparation #

The following tutorials are written with an audience in mind that knows Java and to a lesser extent how Tor works. We explain all data used in the tutorials. More and most up-to-date information about descriptors can be found in the Tor directory protocol specification and on the CollecTor page.

All tutorials require you to download the latest release of metrics-lib, follow the instructions to verify its signature, extract the tarball locally, and copy the lib/ and the generated/ directories to your working directory for the tutorials.

Tutorial 1: Download descriptors from CollecTor #

Let's start this tutorial series by doing something really simple. We'll use metrics-lib to download recent consensuses from CollecTor and write them to a local directory. We're not doing anything with those consensuses yet, though we'll get back to that in a bit.

We'll need to tell metrics-lib five pieces of information for this:

the CollecTor base URL without trailing slash ("https://collector.torproject.org"),
which remote directories to collect descriptors from (new String[] { "/recent/relay-descriptors/consensuses/" }),
the minimum last-modified time of files to be collected (0L),
the local directory to write files to (new File("descriptors")), and
whether to delete all local files that do not exist remotely anymore (false).

Create a new file DownloadConsensuses.java with the following content:

import org.torproject.descriptor.*;

import java.io.File;

public class DownloadConsensuses {
  public static void main(String[] args) {

    // Download consensuses published in the last 72 hours, which will take up to five minutes and require several hundred MB on the local disk.
    DescriptorCollector descriptorCollector = DescriptorSourceFactory.createDescriptorCollector();
    descriptorCollector.collectDescriptors(
        // Download from Tor's main CollecTor instance,
        "https://collector.torproject.org",
        // include only network status consensuses
        new String[] { "/recent/relay-descriptors/consensuses/" },
        // regardless of last-modified time,
        0L,
        // write to the local directory called descriptors/,
        new File("descriptors"),
        // and don't delete extraneous files that do not exist remotely anymore.
        false);
  }
}

If you haven't already done so, prepare the working directory for this tutorial as described above.

Compile and run the Java file:

javac -cp lib/\*:generated/dist/signed/\* DownloadConsensuses.java

java -cp .:lib/\*:generated/dist/signed/\* DownloadConsensuses

This will take up to five minutes and require several hundred MB on the local disk.

If you want to play a bit with this code, you could extend it to also download recent bridge extra-info descriptors from CollecTor, which are stored in /recent/bridge-descriptors/extra-infos/ and which we'll need for tutorial 3 below. (If you're too ~~impatient~~ curious, scroll down to the bottom of this page for the diff.)

Tutorial 2: Relay capacity by Tor version #

If you just followed tutorial 1 above, you now have a bunch of consensuses on your disk. Let's do something with those and look at relay capacity by Tor version. A possible use case could be that the Tor developers debate which of the older versions to turn into long-term supported versions, and you want to contribute more facts to that discussion by telling them how much relay capacity each version provides.

Consider the following snippet from a consensus document showing a single relay to get an idea of the underlying data:

[...]
r PrivacyRepublic0001 XOzFwwrMSz3kYnkjI5Zwh8xT2Uc WLlCQj3gVELkwIBh3EWxG74LZ2E 2017-03-04 08:16:22 178.32.181.96 443 80
s Exit Fast Guard HSDir Running Stable V2Dir Valid
v Tor 0.2.8.9
pr Cons=1-2 Desc=1-2 DirCache=1 HSDir=1 HSIntro=3 HSRend=1 Link=1-4 LinkAuth=1 Microdesc=1-2 Relay=1-2
w Bandwidth=136000
p reject 22,25,109-110,119,143,465,563,587,6881-6889
[...]

We're interested in the Tor version number without patch level (0.2.8) and the consensus weight (136000).

Create a new file ConsensusWeightByVersion.java with the following content:

import org.torproject.descriptor.*;

import java.io.File;
import java.util.*;

public class ConsensusWeightByVersion {
  public static void main(String[] args) {

    // Download consensuses.
    DescriptorCollector descriptorCollector = DescriptorSourceFactory.createDescriptorCollector();
    descriptorCollector.collectDescriptors("https://collector.torproject.org", new String[] { "/recent/relay-descriptors/consensuses/" }, 0L, new File("descriptors"), false);

    // Keep local counters for extracted descriptor data.
    long totalBandwidth = 0L;
    SortedMap<String, Long> bandwidthByVersion = new TreeMap<>();

    // Read descriptors from disk.
    DescriptorReader descriptorReader = DescriptorSourceFactory.createDescriptorReader();
    for (Descriptor descriptor : descriptorReader.readDescriptors(new File("descriptors/recent/relay-descriptors/consensuses"))) {
      if (!(descriptor instanceof RelayNetworkStatusConsensus)) {
        // We're only interested in consensuses.
        continue;
      }
      RelayNetworkStatusConsensus consensus = (RelayNetworkStatusConsensus) descriptor;
      for (NetworkStatusEntry entry : consensus.getStatusEntries().values()) {
        String version = entry.getVersion();
        if (!version.startsWith("Tor ") || version.length() < 9) {
          // We're only interested in a.b.c type versions for this example.
          continue;
        }
        // Remove the 'Tor ' prefix and anything starting at the patch level.
        version = version.substring(4, 9);
        long bandwidth = entry.getBandwidth();
        totalBandwidth += bandwidth;
        if (bandwidthByVersion.containsKey(version)) {
          bandwidthByVersion.put(version, bandwidth + bandwidthByVersion.get(version));
        } else {
          bandwidthByVersion.put(version, bandwidth);
        }
      }
    }

    // Print out fractions of consensus weight by Tor version.
    if (totalBandwidth > 0L) {
      for (Map.Entry<String, Long> e : bandwidthByVersion.entrySet()) {
        System.out.printf("%s -> %4.1f%%%n", e.getKey(), (100.0 * (double) e.getValue() / (double) totalBandwidth));
      }
    }
  }
}

If you haven't already done so, prepare the working directory for this tutorial as described above.

Compile and run the Java file:

javac -cp lib/\*:generated/dist/signed/\* ConsensusWeightByVersion.java

java -cp .:lib/\*:generated/dist/signed/\* ConsensusWeightByVersion

There will be some log statements, and the final output should now contain lines like the following:

0.2.4 ->  3.2%
0.2.5 ->  9.4%
0.2.6 ->  3.2%
0.2.7 ->  7.3%
0.2.8 ->  6.4%
0.2.9 -> 48.2%
0.3.0 -> 20.8%
0.3.1 ->  1.2%
0.3.2 ->  0.3%

These are the numbers we were looking for. Now you should know what to do to extract interesting data from consensuses. Want to give that another try and filter relays with the Exit flag to learn about exit capacity by Tor version? Hint: You'll want to check for entry.getFlags().contains("Exit"). Of course, you could as well continue with the next tutorial below. (Or you could scroll down to the bottom of this page to see the diff.)

Tutorial 3: Frequency of transports #

In the previous tutorial we looked at relay descriptors, so let's now look a bit at bridge descriptors.

Every bridge publishes its transports in its extra-info descriptors that it periodically sends to the bridge authority. Let's count the frequency of transports. A possible use case could be that the Pluggable Transports developers debate which of the transport name is the least pronouncable, and you want to give them numbers to talk about something much more useful instead.

Consider this snippet from a bridge extra-info descriptor:

extra-info LeifEricson 3E0908F131AC417C48DDD835D78FB6887F4CD126
[...]
transport obfs2
transport scramblesuit
transport obfs3
transport obfs4
transport fte

What we need to do is extract the list of transport names (obfs2, scramblesuit, etc.) together with the bridge fingerprint (3E0908F131AC417C48DDD835D78FB6887F4CD126). Considering the fingerprint is important, so that we avoid double-counting transports provided by the same bridge.

Create a new file PluggableTransports.java with the following content:

import org.torproject.descriptor.*;

import java.io.File;
import java.util.*;

public class PluggableTransports {
  public static void main(String[] args) {

    DescriptorCollector descriptorCollector = DescriptorSourceFactory.createDescriptorCollector();
    descriptorCollector.collectDescriptors("https://collector.torproject.org", new String[] { "/recent/bridge-descriptors/extra-infos/" }, 0L, new File("descriptors"), false);

    Set<String> observedFingerprints = new HashSet<>();
    SortedMap<String, Integer> countedTransports = new TreeMap<>();

    DescriptorReader descriptorReader = DescriptorSourceFactory.createDescriptorReader();
    for (Descriptor descriptor : descriptorReader.readDescriptors(new File("descriptors/recent/bridge-descriptors/extra-infos"))) {
      if (!(descriptor instanceof BridgeExtraInfoDescriptor)) {
        continue;
      }
      BridgeExtraInfoDescriptor extraInfo = (BridgeExtraInfoDescriptor) descriptor;
      String fingerprint = extraInfo.getFingerprint();
      if (observedFingerprints.add(fingerprint)) {
        for (String transport : extraInfo.getTransports()) {
          if (countedTransports.containsKey(transport)) {
            countedTransports.put(transport, 1 + countedTransports.get(transport));
          } else {
            countedTransports.put(transport, 1);
          }
        }
      }
    }

    if (!observedFingerprints.isEmpty()) {
      double totalObservedFingerprints = observedFingerprints.size();
      for (Map.Entry<String, Integer> e : countedTransports.entrySet()) {
        System.out.printf("%20s -> %4.1f%%%n", e.getKey(), (100.0 * (double) e.getValue() / totalObservedFingerprints));
      }
    }
  }
}

If you haven't already done so, prepare the working directory for this tutorial as described above.

Compile and run the Java file:

javac -cp lib/\*:generated/dist/signed/\* PluggableTransports.java

java -cp .:lib/\*:generated/dist/signed/\* PluggableTransports

The output should contain lines like the following:

                 fte ->  2.3%
                meek ->  0.2%
               obfs2 ->  0.7%
               obfs3 -> 20.8%
     obfs3_websocket ->  0.0%
               obfs4 -> 77.0%
        scramblesuit -> 17.3%
           snowflake ->  0.1%
           websocket ->  0.7%

As above, we'll leave it up to you to further expand this code. For example, how does the result change if you count transport combinations rather than transports? Hint: you won't need anything else from metrics-lib, but you'll need to add some code to order transport names and write them to a string. (And if you'd rather look up the solution, scroll down a bit to see the diff.)

Next steps #

Want to write more code that uses metrics-lib? Be sure to read the JavaDocs while developing new services or applications using Tor network data.

Ran into a problem, found a bug, or came up with a cool new feature? Feel free to contact us. Alternatively, take a look at the bug tracker and open a ticket if there's none for your issue yet.

Interested in writing code for metrics-lib? Please take a look at the Network Health team wiki page to find out how to contribute.

Scrolled down just to see where we're hiding the solutions of the three little riddles above? Here are the diffs:

diff -Nur DownloadConsensuses.java DownloadConsensuses.java
--- DownloadConsensuses.java        2017-03-07 17:48:35.000000000 +0100
+++ DownloadConsensuses.java        2017-03-10 23:02:51.000000000 +0100
@@ -11,7 +11,7 @@
         // Download from Tor's main CollecTor instance,
         "https://collector.torproject.org",
         // include only network status consensuses
-        new String[] { "/recent/relay-descriptors/consensuses/" },
+        new String[] { "/recent/bridge-descriptors/extra-infos/" },
         // regardless of last-modified time,
         0L,
         // write to the local directory called descriptors/,

diff -Nur ConsensusWeightByVersion.java ConsensusWeightByVersion.java
--- ConsensusWeightByVersion.java   2017-03-10 23:00:40.000000000 +0100
+++ ConsensusWeightByVersion.java   2017-03-10 23:03:18.000000000 +0100
@@ -25,6 +25,9 @@
       }
       RelayNetworkStatusConsensus consensus = (RelayNetworkStatusConsensus) descriptor;
       for (NetworkStatusEntry entry : consensus.getStatusEntries().values()) {
+        if (!entry.getFlags().contains("Exit")) {
+          continue;
+        }
         String version = entry.getVersion();
         if (!version.startsWith("Tor ") || version.length() < 9) {
           // We're only interested in a.b.c type versions for this example.

diff -Nur PluggableTransports.java PluggableTransports.java
--- PluggableTransports.java        2017-03-10 23:01:43.000000000 +0100
+++ PluggableTransports.java        2017-03-10 23:03:43.000000000 +0100
@@ -20,12 +22,11 @@
       BridgeExtraInfoDescriptor extraInfo = (BridgeExtraInfoDescriptor) descriptor;
       String fingerprint = extraInfo.getFingerprint();
       if (observedFingerprints.add(fingerprint)) {
-        for (String transport : extraInfo.getTransports()) {
-          if (countedTransports.containsKey(transport)) {
-            countedTransports.put(transport, 1 + countedTransports.get(transport));
-          } else {
-            countedTransports.put(transport, 1);
-          }
+        String transports = new TreeSet<>(extraInfo.getTransports()).toString();
+        if (countedTransports.containsKey(transports)) {
+          countedTransports.put(transports, 1 + countedTransports.get(transports));
+        } else {
+          countedTransports.put(transports, 1);
         }
       }
     }

Tor Metrics Library

metrics-lib

Prerequisites and preparation #

Tutorial 1: Download descriptors from CollecTor #

Tutorial 2: Relay capacity by Tor version #

Tutorial 3: Frequency of transports #

Next steps #