One of the earliest sayings I can remember is: “Don’t put all your eggs in one basket.” In the years that followed I learned the benefits of diversification and the dangers of single strand resolutions to many human problems. Now I am frightened how mindlessly we are putting all our info into one gigantic electronic basket.
But how long will the information stored by friends and family, by banks and the tax people, by Google and Yahoo, by the intelligence services as well as such social network as Facebook or Twitter remain accessible? Five years? A decade? Most unlikely 25years. And what will this do for the historical understanding of future generations? For now this seems of negligible concern.
Ever since the mass production of silicon chips and computers as well as the accompanying development of the internet, most of those living in the advanced technological nations have focused on both the ever-increasing speed of communications, its use in all applicable areas, and the conversion from different forms of calculation, classification, and storage to labor-saving computers. Most of the younger generation never learned penmanship and many have never written a letter!
The revolutionary consequences of the past thirty years have overwhelmed economics, markets, airline travel, research, information communication, as well as our daily lives. This has led to far more comprehensive and intensive change than the introduction of printing or even the much more gradual transformations of the industrial revolution.
From the globalization of corporations to the globalization of credit cards, the ease of networking communications, and the immense social impact on the younger generation, the scope of this revolution is so overwhelming that it has become difficult for us to digest. “Progress” in the form of hi-tech is moving humanity in so many new directions and at such an accelerating speed that we are at the point of losing control.
Our taxes, bank accounts, credit card transactions, police records (including fingerprints, health records including DNA files, car licenses and driving records, CCTV recordings, NSA and GCHQ spying on our communications, are only some of the areas affecting our lives and which are now dependent on the storage of what is being called “Big Data.” Is this new information structure sufficiently resilient to cope with yet unknown threats ranging from intentional sabotage, solar flares or simply the inability to deal with the congestion caused by incredibly rapid growth? Indeed, when will the sheer volume of traffic that all these servers process become so overwhelming it causes a systemic global black-out?
The Guardian (UK), in a front-page story pegged to major computer crashes on the Nasdaq stock market, Google, and insurance companies, warned that “governments, banks and big business are over-reliant on computer networks that have become too complex.”1
Jaron Lanier, the author-creator of the concept of virtual reality, went further in writing that the digital infrastructure was moving beyond our control: “When you try to achieve great scale with automation and the automation exceeds the boundaries of human oversight, there is going to be failure.” (Not only in the breakdown of specific servers but also of entire systems, such as the global stock exchanges and the banking system.)
And there seem to be no boundaries: Cisco the California based manufacturer of communication equipment predicts that in four years it will be possible to transmit in a three-minute burst over the internet the data equivalent to all the films ever produced.
At the same time, greed, as evidenced by the extreme demands of high-frequency computer trades of shares by hedge funds and banks, is triggering ever more mini-crashes on the stock markets. In May 2010 one such incident following a false report caused close to a trillion dollars to be erased in 20 minutes from the value of US shares. (Most of this was swiftly recovered.) Once a stock fell to an automated “Sell” level, it was impossible to halt all the other computers from executing their sell commands and across-the-board selling spread with lightning speed.
Amazon, Google, Yahoo, and such social sites as Facebook, Flicker, and Twitter, all need to process and store immense quantities of data. Twitter alone has to grapple with 500 million tweets a day! When glitches occur, and this seems inevitable, the “dirty” entities — sometimes simply garbled or confused ones and occasionally intentionally corrupting ones from hackers or alien sources, such as North Korean or Syrian military saboteurs, can corrupt files and communications which could ultimately bring down the entire system.
At this level, there is a rising awareness of the problem. Google alone is investing around £4 billion ($6 billion) a year on network data centers. Annual global spending on such centers will rise close to £100 billion ($150billion) this year.2
The rise of big data has resulted in many traditional data warehousing companies, such as Teradata, continuously updating their products and technology. The Teradata product encompasses a massively parallel processing system referred to as a “data warehouse system” which stores and manages data. The data warehouses use a “shared nothing architecture,” which means that each server node has its own memory and processing power. Adding more servers and nodes increases the amount of data that can be stored. The database software sits on top of the servers and spreads the workload among them.3
Speed has also become a factor as has the surging demand for storage. Consider that almost three quarters of all trades on the American stock exchanges are being executed by machines which process transactions in less than a millionth of a second via fiber-optic connections. As a necessary consequence Teradata is also used as a series back-up during downtime. The systems work to balance the work-load of big data which has arisen exponentially from new media sources, such as social media.
Storing increasingly immense quantities of data also has become a challenge by placing a limit on the number of years most of it is (or can be) stored. At the same time, more steps are being taken to prevent disasters from occurring. A prime example is Microsoft’s Exchange ActiveSync (commonly known as EAS). This is a protocol designed for the synchronization of email, contacts, tasks, and notes from a messaging server to a smartphone or other mobile device. But such efforts also illustrate how the global system has grown along market-driven choices and corporate decisions — not according to any plan nor oversight nor set rules or regulations. This does not seem like a comprehensive way to proceed globally. Alas, the possible consequences of the fantastic expansion of Big Data may not be recognized until a comprehensive melt-down suddenly occurs with the most dire results for all but the most undeveloped societies like those of Papua New Guinea or Bhutan. You have been warned!
1Juliette Garside, “Warning over data meltdown,” The Guardian (UK) 24 August, 2013, p. 1
2Charles Arthur, “The Cost,” The Guardian, 24 August, 2013, p. 7
3See the Teradata entry in Wikipedia