XKeyScore's 'source code'. Anything interesting?
When German media reported that they gained access to the source code of XKeyScore I was highly skeptical about source code because what they claimed did not make much sense. I suspected sloppy language and that seems to be true. It's nevertheless interesting.
Apparently everything that remotely looks like code has to be source code for journalists. In fact the released code appears to be a set of rules for a deep packet inspection tool ... a configuration file if you prefer that term.
Now before we start. It's pretty much unclear how this was obtained or if this actually is a set of rules for XKeyScore at all. It looks astounding familiar if you are used to configure deep packet inspection tools. I'm unsure if this is a fraction of an actual live configuration or just a development version to test certain aspects of the system. I'm leaning towards the latter. Mostly because there's an active dummy rule in the set, which would be a stark blunder if this would be a live configuration.
The most interesting information we can derive from this is not that TOR is apparently a prime target but how the system works in general.
One of the major allegations was that TOR users are automatically flagged as extremists. While this is in the file it's not TOR but TAILS.
The comment actually doesn't label TAILS users as extremists. It says that
[...] extremists on extremist forums advocate the use of TAILS.
The actual filter however is as broad as it can possibly get.
It is basically filtering for an undisclosed filter related to TAILS documents, web searches, requested URLs or page titles. It's basically trying to sort out people who are looking for information on TAILS. Live-CD versions for example.
The specific filtering for linuxjournal.com seems to be rather awkward. I have checked the site for information on tails and it's nothing you wouldn't expect to find on a truckload of other sites. As a matter of fact I could come up with a truckload of sites better suited for a honorary mention. It really doesn't make any sense at all. I suspect Linux Journal was just mentioned by one of those extremists on extremist forums. Otherwise I couldn't possibly guess why they are specifically filtered.
In context of TOR it's interesting to mention that they are not just monitoring TOR directory servers. They are also filtering for hidden TOR nodes in raw traffic. An extremely costly task. But it makes sense in the context of trying to monitor TOR.
There's a major no-no in the TOR section.
The filter in question filters emails from firstname.lastname@example.org. Basically these emails contain an IP list of TOR bridges. The no-no part is that the system seems to directly digest these IPs into a database without further checking. So if anyone is up for some funny business ... faking emails from email@example.com with expected content and bogus IP addresses would not be loved by this filter.
But they are easy to clean from the database once the problem is fixed. If it is actually active on a live system.
Mixminion, an anonymous remailer, is also on the to-monitor list. It's hard to say what they are actually doing here. It's handled externally and there's no comment to guess from. They however are monitoring every host that sports mixminion and specifically a server over at MIT that is hosting a mixminion documentation. So it looks like if you are visiting moria.csail.mit.edu you will end up in whatever is done here. And that server seems to be hosting more than just that documentation on mixminion.net.
What might be interesting for the 5-eyes folks. While it seems to be that visits to torproject.org exclude you from special treatment. This doesn't seem to be true for TAILS or whatever is done in the mixminion processor.
There's a hell of a lot IFs in this. But if this is actually a set of rules for XKeyScore it's extremely broad and one can only guess how easily one could end up in any of the other topics the NSA is most certainly monitoring. It seems a rather impossible task to not get flagged one way or the other in one topic or another. Assumptions on exploiting this however are flawed with the exception of the email blunder - if that is active or unless it's fixed. Which probably happened yesterday.
Flooding the database with bogus crap seems to be a rather simple task. Actually it seems we are already doing that. But since the filters are so broad I don't think this is a problem at all. If they would be after something specific they would filter more specifically. They already get truckloads of junk data. This data alone cannot have any relevance on its own. So it has to be linked with other stuff to gain relevance. Flooding more junk data into the system is rather pointless. The NSA isn't exactly short in storage capacity.
It is an assessable problem if it enables you to exploit seemingly irrelevant data from the past if it would ever become relevant in the future. If the costs are a longer database query on the archives and more storage requirements this is a no-brainer for something like the NSA.