Saturday, August 10, 2013

OSINT vs Data Mining

Periodically, we publish guest articles of particular merit 
and of special interest to our readers.
To date, each of the guest articles has proven to be quite popular, 
and we expect this one, by George Mapp, to be equally well received. 

By:  George Mapp (on Twitter)

The exponential growth in social media has made it an absolute necessity for anyone and everyone competing for any type of business worldwide.

When watching CNN, BBC, Al Jazeera, and other major "news" networks, you are likely to see displayed on the bottom of the television screen: "as reported by Twitter".

Almost on a daily basis, the local news will use Face Book photos of a suspect or people of interest if their profile is public. Because of the vast numbers of people using social media, it has caught the attention of advertisers, venture capitalists, recruiters, the IRS, and yes, many other US government agencies including those in the Intelligence Community.
Twitter had over a half-billion users as of July 2012, according to Tech Crunch [Analyst: Twitter Passed 500M Users In June 2012, 140M Of Them In US; Jakarta ‘Biggest Tweeting’ City].

What I find provocative about Twitter is that you can follow directly the journalists posted worldwide who report instantly what they are seeing, what is happening and additionally, they are sometimes adding photos and video links instantly before any media outlet has reported on the story.

Digital Life of the Today Show reported that Face Book hit one billion users last October. 
That’s ‘almost’ an inconceivable amount of data and information to comprehend. Imagine how many photos, wall posts, likes, comments etc., that each user has and then multiple that by a billion!

Data Mining vs OSINT                        []

Now, due to technological advances, all that information can be gathered, stored and can even be used to predict the future.

Some even say it can affect politics, while others go so far as to say that social media and data mining affected the outcome of the 2012 US Presidential elections.

I will provide several examples throughout this post to clarify the above statements.


 Data Mining vs OSINT, the title of this post, mentions an ‘invisible line’ between Social Media Data Mining and OSINT because the definitions as well as the uses often overlap. Although most of my readers know what both mean as well as the fine distinctions, I will define the two for the novice readers:

Data Mining: The process of collecting, searching through, and analyzing a large amount of data in a database to discover patterns or relationships: historically used to detect fraud. The gathering of information from existing data stored in a database, such as one held by a supermarket about customers’ shopping habits.

[Ed Note:  The content of this article was created via Data Mining]

Early OSINT Sales Pitch                          [CIA]
OSINT: Open source intelligence is a form of intelligence collection management that involves finding, selecting, and acquiring information from publicly available sources and analyzing it to produce actionable intelligence.

In the intelligence community (IC), the term “open” refers to overt, publicly available sources (as opposed to covert or classified sources); it is not related to open-source software or public intelligence… [continue reading on Wikipedia Others definitions from the community via Recorded Future].

Any unclassified information, in any medium, that is generally available to the public, even if its distribution is limited or only available upon payment.

Intelligence is a process by which information is treated to answer and provide analysis which is used by a body. That information comes from open source or not does not make it Open Source Intelligence.

[Ed Note: 
Intelligence is derived from raw information reporting, but transitions to Intelligence based on Sources and Methods of information gathering; i.e., a restricted organization-chart of a foreign government bureaucracy may not be regarded as intelligence to that government, however, another government may procure that document by surreptitious means in order to assess personnel listed on that chart as targets for espionage recruitment.  The means of acquisition of that document and the officer or technical system used to acquire that document make the document "intelligence", and it would be classified as such.]

Stricto senso OSINT would be that which is open and available to all.

After reading the definition it becomes evident why OSINT and data mining have become so crucial in today’s environment.

Here is an example from Recorded Future of how much intelligence is considered open source: With an estimated 90% of required intelligence available in open source, it is imperative that intelligence analysts become adept at mining open sources.

"Recorded Future can help reduce research time, identify new sources, build timelines, chart networks, perform link analysis and more. Due to the sheer volume of available data and the continued rise of technology in our daily lives, data mining is a necessity for many business’ to compete and to stay in business."

OSINT has risen to a new level because of the internet and the vast amounts of data on the web coupled with the advances in technology hackers and/or hacktivists.

The definition of what is Open Source and what is private -- or considered proprietary/classified by business and governments has become increasing blurred. In fact, many members of the US intelligence community see hacking, cyber security, infoSec and cyber-threats as the number one national security threat in our country.

Data Mining and OSINT in use today: 
Here are a few examples of how NSA uses data mining and OSINT in our world today.

The US surveillance and super secret code-crackers better known as the The National Security Agency [NSA] are building a $2 billion spy center in Utah to keep up with explosive growth of social media and the tremendous amount of data it produces daily.

Forbes reported last year [NSA’s New Data Center And Supercomputer Aim To Crack World’s Strongest Encryption] on NSA's operations to collect vast amounts of data:

US Army Corps of Engineers Proposed Design   [Unclass - US Army]

"The $2 billion data center being built in Utah would have four 25,000 square-foot halls filled with servers, as well as another 900,000 square feet for administration.

It will use 65 megawatts of electricity, with an annual bill of $40 million, and incorporates a $10 million security system.

 "Since 2001, NSA has intercepted and stored between 15 and 20 trillion messages, according to the estimate of ex-NSA scientist Bill Binney.

It now aims to store yottabytes [one million billion gigabytes] of data.

"According to one storage firm’s estimate in 2009, a yottabyte would cover the entire states of Rhode Island and Delaware with data centers.

When the Department of Energy began a supercomputing project in 2004 that took the title of the world’s fastest known computer from IBM in 2009 with its “Jaguar” system, it simultaneously created a secret track for the same program focused on cracking codes. The project took place in a $41 million, 214,000 square foot building at Oak Ridge National Lab with 318 scientists and other staff. The supercomputer produced there was faster than the so-called “world’s fastest” Jaguar.

"The NSA project now aims to break the “exaflop barrier” [a quintillion (1018) by building a supercomputer a hundred times faster than the fastest existing today, the Japanese 'K Computer'; that code-breaking system is projected to use 200 megawatts of power, about as much as would power 200,000 homes."

Earlier in my post I mentioned that social media and data mining affected the US Presidential elections.

An article written last November [How social media, data mining, and new-fangled technology tipped the 2012 election], validated how important and influential social media is in our daily lives.

[Ed Update: 
Social Media (e.g., Facebook and Twitter) was crucial in sharing information in the 2016 election, particularly with the Wikileaks exposés of Hillary Clinton's emails containing Top Secret Intelligence information]

"While Facebook is still arguably the place that people turn to before and after events, Twitter solidified itself as the go-to for all things real-time. And better yet: It didn’t break.

[During election night] Twitter averaged about 9,965 Tweets per second (TPS) from 8:11pm to 9:11pm PT, with a one-second peak of 15,107 TPS at 8:20pm PT, and a one-minute peak of 874,560 TPM,

Twitter announced, via its Engineering Blog:
Seeing a sustained peak over the course of an entire event is a change from the way people have previously turned to Twitter during live events.”

Defense contractors are also in on the action, which is significant since the US Department of Defense [DOD] is the world’s largest employer; thus, its contractors are currently developing products that the DoD might purchase from them.

Raytheon’s Jared Adams told The Guardian:
New Social Media Data Mining Software, ampliflies the importance of both social media and data mining].

"Defense giant Raytheon’s new RIOT software data mines information from Facebook, Twitter, Foursquare, and image EXIF metadata, and applies predictive analytics to determine users’ physical locations.

RIOT is a big data analytics system design we are working on with industry, national labs and commercial partners to help turn massive amounts of data into useable information to help meet our nation’s rapidly changing security needs

Danger Room [Pentagon Inks Deal for Smartphone Tool That Scans Your Face, Eyes, Thumbs] illustrates how important social media and data mining are to the US government.

"In a few years, the soldier, marine or special operator out on patrol might be able to record the facial features or iris signature of a suspicious person all from his or her smartphone — and at a distance, too.

"The Defense Department has awarded a $3 million research contract to California-based AOptix to examine its “Smart Mobile Identity” biometrics identification package. At the end of two years of research to validate the concepts of what the company built, AOptix will provide the Defense Department with a hardware peripheral and software suite that turns a commercially available smartphone into a device that scans and transmits data from someone’s eyes, face, thumbs and voice."

Think about the amount of ALL the billion plus Face Book users, and now imagine how many photos that each user has, including many photos of those that chose not to be on Face Book for a variety of reasons, and you have a huge data base for facial recognition.

Eye See You                             []
The FBI recently announced a $1 billion facial recognition software project.

According to Business Insider [The FBI’s Nationwide Facial Recognition System Ends Anonymity As We Know It (9-2013)]:
"The FBI has begun installing state-of-the-art facial recognition technology across the country as part of an update to the national fingerprint database"

"Sara Reardon of the New Scientist reports. 'The agency’s $1 billion Next Generation Identification (NGI) program will also include iris scans, DNA analysis and voice identification by 2014."

Now, let’s look at Recorded Future, a company that analyzes open source intelligence and makes predictions about the future. These are excerpts from their website video [How Recorded Future Works.]

1) "Collect Public Web Content: We continually scan tens of thousands of high-quality, online news publications, blogs, public niche sources, trade publications, government web sites, financial databases and more.

2) "Analyze the Text: From these open web sites, we identify references to entities and events. These are organized in time by extracting publication date and any temporal expressions in the text. Each reference is linked to the original source and measured for online momentum and tone of language: positive or negative sentiment.

3) "Visualize Insights: You can explore the past, present and predicted future of almost anything in a matter of seconds. Our powerful interactive tools facilitate analysis of temporal patterns and better understanding of complex relationships and issues."

Recorded Future is very cutting edge, so much so that it has attracted the attention of the Central Intelligence Agency's [CIA] venture capital subsidiary In-Q-Tel which has invested in Recorded Future. Given their area of expertise the investment should come as no surprise to many.

Data Mining and OSINT Aren’t Just for Spooks [credit for that caption goes to Krypt3ia].
His recent blog post "No, You’re Not A Spook Just Because You Track Social Media and Do OSINT" reinforces my point of how critically important data mining and OSINT tools are  to any individual, business, government and intelligence agency.

In short, social media data mining and / or OSINT is not just for the intelligence community but to anyone who wishes to use it. Both of these tools have become such an important part of our everyday lives, behavior and more so everyday a part of what we will do tomorrow.

Ed Note:  
OSINT derives from an old CIA program called FBIS [Foreign Broadcast Information Service] which provided an unclassified translation of foreign news services in countries around the world.  
Under the Carter Administration, funding was cut severely so that translations of local newspapers and broadcasts, in, for example, Buenos Aires, were replaced by summaries provided by foreign news services such as AFP [Agence France-Presse] which was limited to the articles the French felt were important.  The end product was of limited to no value for analysts trying to grasp what was going on in foreign countries.  
It was scheduled for termination in 1997 under the Clinton Administration, but was salvaged by the Federation of  American Scientists [FAS].
OSINT is now a division of CIA which reads and analyzes newspaper and magazine content, and also retrieves news and comments from social media such as Twitter.