Saturday, 8 June 2013

The US Government's Data Mining Program

Unless you're living in a cave, you've probably heard about the US government's classified data mining program known as PRISM. First discovered when we learned that the US government required Verizon to tun over all of their customer's call data on a daily basis, PRISM is a NSA intelligence program that, well, we'll let the NSA's leaked slides explain for themselves.

Slide 1: PRISM is bigger than you think.

Most used in NSA Reporting? That's obviously huge.

Assuming that the NSA is not lying to the people who viewed this presentation, the PRISM program pulls in more intelligence information for the NSA than any of their other programs. Think about that: the PRISM program, established a few years ago, is already pulling in more information than the 60-year old NSA has managed to generate by other means. This is huge.

SIGAD, incidentally, stands for "SIGINT (signal intelligence) Activity Designator".

Slide 2: They're collecting this information from US companies

So the NSA is monitoring communications in the US.
When you use email, a telephone, or chat, or other electronic communication medium, there's a very good chance that your communication travels through the US in some way. Once it does, there's a good chance it's being monitored by the NSA.

Slide 3: So what are they collecting?

Look at the data they say they're collecting!

The government has been suggesting that they're just collecting metadata, not the actual data. I work with metadata a lot and I describe it to people as "information about information". When you watch a television show, the show is the data, but the names of the actors, the broadcast schedule, and copyright information are all things that can qualify as "metadata". However, this leaked slide implies that they might have access to your private photos, the files you're sharing, your chats, and so on.

Slide 4: Who's giving the US government this information?

I'm sure you use one or more of these services.
Take a list at that list of providers above. I see that Microsoft cooperated early on, but then, I have long suspected Microsoft made an arrangement with the US government to be cooperative in exchange for lenient treatment of anti-trust issues. This would benefit both Microsoft and the US government (ok, tin foil hat moment over).

The above list includes Microsoft, Yahoo, Google, Facebook, PalTalk, AOL, Skype, YouTube, and Apple.

What does this mean?

Regarding PRISM, you might have read Google's very carefully worded denial of PRISM participation. So many people are saying "what a relief! Of course Google wouldn't cooperate", but that's clearly not true. Even ignoring the January 14, 2009 date of Google cooperation listed in the slide above, what Larry Page and Google's Chief Legal Office said was:
  1. We don't provide "direct" access to our customer's data
  2. We don't provide a "back door" to the information stored in our data centers
  3. We've never heard of PRISM
Obviously, points one and two are true if you're careful about how you define "direct" and "back door". Since this definition was lacking, it would be very easy for Google to provide the government daily electronic reports of their user's activity. This would neither be "direct" access nor a "back door".

Further, on point three, it's no surprise that the government did not choose to share their top-secret code name with a private company.

Other companies and executives have made similar denials, but all of them are carefully worded to imply that are not participating in PRISM, but not actually denying it outright.

Interestingly, the Washington Post story on PRISM has this to say in the very first paragraph (emphasis mine):
The National Security Agency and the FBI are tapping directly into the central servers of nine leading U.S. Internet companies, extracting audio and video chats, photographs, e-mails, documents, and connection logs that enable analysts to track foreign targets, according to a top-secret document obtained by The Washington Post.
As I work in tech, I often see news organizations make extremely sloppy claims about technology so it's possible that the Washington Post simply didn't understand what what "tapping directly into central servers" means, but given the magnitude of the story, I doubt that. They knew what they meant to say and they said it.

Interestingly, the Google statement could still be sort of true if Google has an internal strategy of plausible deniability. As the Google statement was made via a blog post, they were not under oath and later could claim that Page and his CLO were both ignorant of the program, meaning that they told the truth as they understood it.

If Google and other company's denials are to be believed, this is a non-story: the government sometimes presents a court order and the companies comply when required to by law. That sounds innocuous enough, right? Well, the US government is now claiming that they are exempt from judicial oversight on this matter because of "military and state secrets". James Clapper, the director of national intelligence, writes:
I have determined that the disclosure of certain information would cause exceptionally grave damage to the national security of the United States.
No, this is not a minor story. "Exceptionally grave damage" isn't saying "embarrassment" or "annoyance." It's saying that allowing the courts to rule on the legality of this surveillance would seriously hurt the US.


So what everyone wants to know is what data is really being collected. The government has continuously insisted that this information is just metadata. Metadata is "information about information". So how does this work?

Jim is married to Sally, but he's having an affair with Bob. So Jim emails Bob and tells him how much he loves him and is looking forward to their "business trip" at the Love Motel in Las Vegas next weekend.

What's the metadata? Sure, it could be the IP address of where the email was sent from, the email addresses involved, the headers, and so on. However, metadata can include summaries. The previous paragraph could easily be included in the information that the NSA and FBI are collecting, but still be described, quite accurately, as metadata!

Now you might think "hold on a minute, to provide summaries they would have to read the email and that would be illegal!"

No, they would not have to read the email. All you have to do is read about the Summly startup to know that there is software out there which can summarize information without human intervention. It's not hard to imagine that the NSA, an organization which has recruited some of the brightest minds in America, has developed similar technology.

So imagine if someone followed you all the time, noting what you were doing, who you were meeting, what food you ate, where you were going, when you got up, and when you went to sleep, what sort of work you were doing, and jotted summaries of all of your conversations. Would you be OK with that? That, in a nutshell, is what the government appears to be doing with your online life.

Foreign Impact?

So as bad as this scandal is for the US, it's growing in Europe. Americans are sort of ticked, but given that the official reason for this program is to track non-US citizens, Europeans are extremely unhappy with the companies involved, and the US government. Peter Schaar, Germany's commissioner for data protection has written:
The U.S. administration must now provide clarification. [The] first statements from the U.S. government [suggesting that] the surveillance would not be directed against U.S. citizens, but only against persons who reside outside the United States, [do] not reassure me at all.
Just to clarify for those are are unaware of our data protection laws here in Europe, they are extremely strict and absolutely forbid misuse of the data and there are multiple safeguards in place to ensure public trust. Interestingly, it is illegal for the data of EU citizens to be hosted on servers outside the EU unless those servers are subject to privacy laws of at least as high a standard as in the EU. Given that US law falls woefully short of EU privacy laws, how can Google, Facebook, Yahoo!, and many other companies legally store data about EU citizens?

Safe Harbor

There is a special agreement between the US and the EU called the US-EU Safe Harbor Program. Under this program, US companies can — hold on to your hats — self-certify that they're respecting EU privacy laws.

This is going to be interesting. Now we clearly have several of the largest Internet names, all US-based, clearly violating EU law, though presumably under coercion from the US government. This is going to have some very interesting fall out, but it gets worse! The Guardian revealed that GCHQ, the British version of the NSA, also had access to PRISM data. Now UK government ministers are under pressure to disclose if they authorized access to PRISM data. So a branch of UK intelligence not only knew about the violation of EU law, but actively participated in it.


No matter your feelings on PRISM, whether you're an anarchist who never trusts the government or a die-hard "USA, right or wrong" jingoist, this story is big and it's not going to die soon. The repercussions will be felt for a long time. Even if the US government did strip your privacy to try and provide you with security (comforting, eh?), if there's a strong EU backlash against US companies, this could have severe economic consequences for the US. The EU has long been concerned about the US dominance on the Internet and the US government's cavalier attitude towards privacy and PRISM might be the tipping point.

We already know that non-US cloud service providers are using the fact that they're not subject to the PATRIOT Act as a selling point; PRISM is going to continue to put pressure on non-US companies to choose non-US computing partners. Not only was PRISM an ethically questionable surveillance program, it may prove to be an economically questionable one, too.


  1. PRISM is not only immoral (if barely legal), but also stupid. Last think intelligence agencies need is even *more* data; they keep enlarging the haystack...

  2. Outstanding post. I had all kinds of questions about this story and you've managed to answer them perfectly.