This is a scenario in which predictive analytics for individual behavior will exceed expectations, becoming different in kind, not just in degree.
With accelerated developments in machine learning, algorithms, and sensors that track human action and enable datasets to feed off one another, the internet of 2020 will have embedded within it profoundly powerful models capable of predicting—and manipulating—a surprising range of human behavior. Rather than infer individual tendencies from trends and groups with similar characteristics, these new models will make truly individualized predictions that are granular, discriminating, and accurate about complex behaviors. The power of data science to predict individual behavior at this very precise level will become the most polarizing debate of the decade: is it an indicator that humanity has handed over its most important powers, freedoms, and mysteries to digital technologies? Or is it an indicator of stunning progress, enabling societies to more effectively solve some of their most recalcitrant problems? While this debate rages on in the abstract, these powerful predictive analytics will generate new security vulnerabilities that outmatch existing concepts and practices of defense, focus increasingly on people rather than infrastructure, and prove capable of causing extreme damage, financial and otherwise.
In this scenario, the availability of vastly greater amounts and varieties of high-quality data, coupled with advanced algorithms and analytics capable of interrogating that data, will enable highly precise and individualized predictions of human behavior. While today it is possible to predict the aggregate behaviors of groups and populations, in 2020 such predictions will be orders of magnitude more accurate and— most importantly—far more personalized, to the point of predicting the behavior of a single person. In this new world, high-tech firms and sophisticated criminals alike will be able to identify (and, in some circumstances, control) the future behavior of particular people at a surprisingly granular level. Many will regard this capability as a signal of the last—or “omega”—algorithm.1 Pessimists will see it as the final step before humanity hands over all power to ubiquitous technologies—or even (according to extremists) as an end to free will. Optimists will believe it possible for dynamic individualized predictions to solve problems that humans had almost given up on.
Far from being an obscure debate among abstract philosophical positions, the battle between these perspectives will likely become the defining political and moral cleavage of the decade. Illicit actors (indifferent on the philosophical point) will simply take advantage of these new technologies and the controversies they create to more precisely target and differentiate their attacks, making security even harder to achieve than it is today.
There will be categorical differences between the predictive algorithms of 2016 and those that arise in this scenario.2 In 2016, algorithms attempt to predict individual behavior by drawing inferences about the behaviors of populations with similar profiles (e.g., white females over 55 prefer to watch 60 Minutes; therefore, Sue, a white female over 55, likely prefers to watch 60 Minutes). These algorithms typically express a view of an individual’s preferences that translate into probabilistic predictions (there is an 85 percent chance that Sue will watch 60 Minutes today).
In 2020, next-generation algorithms will be able to skip the demographic shortcuts and narrow in on the specific preferences of a single individual (Sue herself prefers to watch 60 Minutes). More importantly, probabilistic predictions will become contingent predictions, with tightly accurate statements about the precise conditions under which person X will take action Y. We will know exactly under what conditions (time, place, cost, etc.) Sue actually will watch 60 Minutes.
Relevant assumptions from traditional microeconomics—for example, that preferences are both stable and transitive—were always imperfect, but in this scenario they will no longer be needed. Probabilistic predictions were always a pragmatic compromise—in fact, Sue either will or will not watch 60 Minutes, and the 85 percent prediction just meant we did not have a full understanding of the conditions affecting her choice. In this world, the algorithms do understand.
In 2020, next-generation algorithms will be able to skip the demographic shortcuts and narrow in on the specific preferences of a single individual.
Commercial-driven technological development will be a principal driver of this future landscape— but so will the relentless curiosity of human beings to understand one another and themselves. The financial returns realized when machine-learning techniques are applied to the prediction of individual behavior will accelerate the technology far beyond what was formerly seen as possible. Cheaper storage, faster hardware, more efficient processing, and advances in simulation and cognitive processing—along with business models and financing—will together accelerate progress. The availability of low-cost baseline predictive analytic infrastructure (the most profitable service from Amazon’s cloud in 2018?) will free up researchers to focus their time and effort on developing and testing much more elaborate prediction models. The concept of “big data” will evolve toward rich data, wide data, and then dynamic data. Software will improve to better deal with data types along various spectrums, including modalities, granularities, and temporality. New methods of coding the validity of predictions will become instrumental in improving feedback and learning time. These positive feedback loops will allow models to improve significantly faster than expected. Even some of the more audacious projections for 2020 might be exceeded by 2018.
Businesses, governments, educational institutions, and others will continue to promise extraordinary benefits to those willing to grant greater access to their personal information. Surprisingly, many individuals won’t need much convincing. Those not swayed by benefit-cost or benefit-risk calculations about sharing data will be so fascinated by the promise of understanding their own behavioral mysteries that they almost will not be able to resist. No one will have to force the next-generation Fitbits and dry EEG devices and their associated algorithms onto users; users will put them on themselves because they want the results.
By 2020, it will no longer be interesting to categorize an individual as a member of a population class or offer probabilistic assessments of what he or she will do. Instead, the new class of predictive analytics will look at the deep foundation of an individual’s decision-making and behavior. As long as data collection is essentially unrestricted and demand for predictability continues to skyrocket, the energy behind this trend will remain extremely strong. Competitive pressure to identify new streams of data will keep building to the point where the marginal returns might start to decline, but who knows where that line is?
In this world, predictive models will play an increasingly significant role in day-to-day life, whether they are used to route global air traffic, choose products for display, or calculate when and where to deploy troops. Weight-loss companies will be able to make precision diet and behavior recommendations based on predictions about when clients will have cravings. Companies will be able to correctly forecast the total sales that would be generated from the European rollout of a new product. In 2020, will CVS Health begin prefilling people’s shopping carts, provoking complaints from competitors as it prefills the carts with store-brand products?
How can companies achieve such gains so quickly? For individuals, the core of this predictive model will involve the development of “personal behavior files” (what will become the successor to the “customer information file,” or corporate file containing demographic and use data about each individual customer). Personal behavior files will contain detailed information about an individual’s past behaviors, including situational information that will help companies understand when and under what circumstances they have acted in the past. The development of such a file may start quite early in life. For instance, parents concerned with tracking the progress of their young children’s development might actively support the use of devices that record play behavior and derive patterns related to stress, competition, and the like.
There is an irony in all this. As the ability to predict individual choices and behaviors improves, the ability to predict group behavior will become both less useful and less accurate. Many existing group modeling efforts will feel clunky and become obsolete. Moreover, aggregating individual predictions into group predictions may prove even harder and less accurate than the old approach of disaggregating downward from groups to individuals. Small mistakes (whether in algorithms or in data) spread across many individuals would scale up into potentially big misses at the group level. Whereas today we are generally better at predicting group versus individual behavior, in this scenario the opposite will be true.
On their own, constrained and contingent individual prediction models will not necessarily revolutionize our way of life. The real discontinuity will be in the meta-models: identifying what aspects of an individual’s behavior are predictable and knowing how to use those anchors to contextualize and bound predictions about the rest of an individual’s behavior. If these models are not operating effectively in 2020, the possibility will be visible not far over the horizon. With viable models of this kind, individuals could instigate radical adjustments to their behavior through micro-informational interventions and nudges. Put simply, it could become possible to influence a wide variety of individual behaviors by working through a manageable number of key motivational levers. For some people, the key motivation might be status, power, or money; for others, it might be a spiritual goal or generosity.
Once a person’s principal lever is known, the threshold for influencing what he or she does next could be surprisingly low—and this would be just as applicable to illicit and illegal activities as to legal ones. For both attackers and defenders in the cybersecurity world, attention would shift decisively from infrastructure to people. It is common to hear in 2016 that “people are the weakest link in security.” In this scenario, that statement becomes a fundamental truth in new and profound ways.
For individuals, the core of this predictive model will involve the development of “personal behavior files”...
The revolutionary idea that defines the boundary between modern times and the past is the mastery of risk: the notion that the future is more than a whim of the gods and that men and women are not passive before nature. Until human beings discovered a way across that boundary, the future was the mirror of the past or the murky domain of oracles and soothsayers who held a monopoly over knowledge of anticipated events.
– Peter L. Bernstein,
“Against the Gods: The Remarkable Story of Risk”
Throughout history, the ability to model, quantify, and subsequently put a price on new categories of risk has transformed uncertainty into an actionable equation and repeatedly catalyzed the remaking of economics, politics, and technology. As we approach 2020, the ability to model, quantify, and price the risk attached to granular actions of individuals—to shine light onto what used to be unknowable at useful scale—will become an essential part of the way the world works, and significantly change the cybersecurity landscape as a result.
The shift from statistical representations of group behavior to individualized predictions will become a major driver of change. The privacy calculations that people make in 2016 when it comes to their Fitbits, smartphones, and connected cars will seem anachronistic, because what you get in return for your data in 2020 will be a new set of insights about yourself that are—as Arthur C. Clarke once said of sufficiently advanced technologies— barely distinguishable from magic.3 A subset of the population will continue to use Tor and other dark web tools to preserve their anonymity4 or seek to obfuscate data from search engine queries.5 But this subset will operate on the margins.
Much like credit scores today,6 the “answers” that prediction systems provide will appear to emerge from a black box. Only a select number of technical experts will have the sophistication to dissect the new algorithms, the vast majority of which will be neither public nor well understood. Few people outside of specialist firms will comprehend how these algorithms target individual, not group, behavior, or grasp the full significance of that change. For most people, what will be salient are the tangible benefits these algorithms bring.
...what you get in return for your data in 2020 will be a new set of insights about yourself that are...barely distinguishable from magic.
Consider the example of predictive policing: if predicting individual criminals significantly reduces crime in dangerous cities, the average member of the public will be unlikely to object, even if there is limited transparency about how this new data shapes policing practices. Would theoretical and philosophical objections to predictive policing put forward by academics and other critics gain any traction with the public? Perhaps in a few European countries with powerful resistance to police intervention, like Germany. Much less so in places like France and Spain that are historically more comfortable with policy autonomy. Almost certainly not in the small, rich autocracies of the Gulf and semi-democratic states like Singapore.
In the United States, the baseline response will be ambivalence. US firms will lead many of the technological and commercial developments that enable predictive policing, but occasional media exposés will constrain just how far local governments go. At the same time, surprising success stories will emerge from “broken” cities that seemed resistant to other means of stopping devastating cycles of crime. The NYPD may be an early leader due to its distinctive license to operate given the perceived risk of terrorism. Overall, the trajectory would point toward greater acceptance of such practices. Algorithm-driven policing would also likely be perceived as more fair than traditional practices, which are visibly subject to racial and other biases. A small number of type 1 errors (false positives) will get outsized attention, but that attention will not be enough to change overall sentiment.
In the commercial sector, many companies will find great utility in this new reality, which will lead to a virtuous cycle as they invest in building software and acquiring data to further improve individualized predictions. The temptation and competitive pressure to participate in this new frontier would be almost irresistible. Data science teams might eventually split into data and prediction teams, with the latter adding neuroscientists, cognitive scientists, simulation specialists, game theorists, and even symbolic logisticians and philosophers of science to their rosters. Companies that have long been repositories for thus-far unused datasets would see untapped potential in developing analytic capabilities—and the hiring of in-house analysts would explode.
The temptation and competitive pressure to participate in this new frontier would be almost irresistible.
At the same time, this transition will be tumultuous and difficult. Like the development of web technologies in the 1990s, this new shift will involve not just incremental improvement to existing processes but also the institutionalization of new technologies that reshape terms of competition in many markets. Incumbent-firm advantage will be upended as new firms gain a significant competitive lead in developing and applying predictions to individual customers, clients, and citizens.
These developments likely would coincide with a continued slowing in economic growth rates, not only because of ongoing secular economic stagnation and financial crisis recovery, but also because of the new challenges of operating in this highly granular customer- and employeesegmented world. Consider how firms focused on optimizing business models and applications for large populations will have to transition. In some sectors—public transport, for example—insight into the granularity of individual behaviors will yield significant benefits over population-based predictions. Large firms that were focused on group prediction may have a difficult time switching tactics, such that smaller, local providers are able to assert market power. Would most automobile companies be able to navigate this transition quickly enough, or would they become commodity providers in a transportation market now dominated by upstart prediction firms (perhaps next-generation Ubers and Lyfts) that know, with a high degree of certainty, precisely where and when a person wants to travel from point A to point B? In a sector like education, the ability to create truly customized and individualized curricula and learning systems would run up against longstanding business models, industry structures, and huge incumbent institutions. The market will favor the upstarts because they perform so much better, but the friction will be tremendous.
The geopolitics of this scenario will also present challenges, as next-generation predictive analytics will plausibly be seen as the next major source of power in global political economy and security systems. If prediction technologies evolve quickly along positive feedback loops, then this scenario would most likely reinforce the power of those who start in the lead, implying a new phase of American hegemony. This in turn would engender resistance, such as internet “balkanization” and data nationalism, not so much as an ideological trend or as resistance to surveillance but as a core part of national power strategies aimed at countering US dominance.
Organizations public and private will vary in their ability to keep up in the fierce race for predictive scope and accuracy, spawning a new competitive dynamic between “super-smart” predictive processing and “brute-force” data collection. Put differently, organizations that are particularly strong on the algorithmic side will have somewhat less need for data, while organizations that are relatively weak on the algorithmic side will try to compensate by collecting more data in potentially more sensitive ways. If privacy intrusions or failures of data security occur, it would then be the algorithmically weak that are more likely to be the transgressors and victims of attack. Traditional goods-producing companies—such as oil companies and TV manufacturers—will likely be in the latter category.
Markets for Predictive Activity
In a world in which algorithms capably predict individual behavior and organizations race to harness that power, cybersecurity will become a segmented enterprise—largely because different realms of human action will not be equally susceptible to predictive algorithms. By 2020, the landscape will divide into three broad sectors, or areas of activity and decision-making, distinguished by the efficacy of predictive models: the strong prediction sector, the throttled (or regulated) prediction sector, and the predictionless sector. The different vulnerabilities that arise within each sector and at the boundaries between them will give rise to an important new cybersecurity agenda for 2020.
Strong Prediction Sector
In this sector, predictions will be highly accurate (well calibrated and discriminating) and reliably available (covering a broad swath of behaviors). This sector will likely include a range of human activity where data is accessible, accurate predictions are monetizable and/or have high significance for governments, and environmental and in-subject randomness is limited. The most powerful and reliable predictive models will develop in areas where all three variables are present, but strong predictions will also occur when any such variables are combined. The private sector will drive developments in this sector most boldly, using “personal behavior files” to help track individual experiences and make predictions based on those experiences.
Healthcare likely falls in the strong prediction sector, as data will be accessible, monetizable, and non-random. Both demanders (patients) and suppliers will see vast promise in what used to be called personalized or targeted medicine—what will now be called (more accurately) predictive medicine. The financial incentives to do more with what today’s healthcare companies call “real world data”7 will continue to mount as insurers and regulators push providers to practice metrics-driven medicine and improve performance on discrete measures, such as hospital readmittance rates. The consolidation of health insurers (driven in part by the Affordable Care Act8) will help aggregate customer data at an even larger scale and provide significant revenue streams to fund further applications of prediction-based technology. An aging population in developed countries will contribute on the patient side; baby boomers will see a vast gap between how poorly they are served in the healthcare sector and just about every other sector they touch and are touched by. This generation could very well drive this process forward—to the surprise of anyone expecting higher levels of concern about privacy.
When hospitals are able to reliably complete simple tasks like identifying appropriate individualized plans for each patient being discharged—along with administering programs designed to adjust each patient’s behavior through predictive algorithms—the concept of predictive medicine will become real to patients. Importantly, these advances will not be reliant on breakthroughs in genetically personalized medicine; it does not have to be quite so high-science to be effective. Rather, it will be easier to modify at-risk behaviors and develop individually appropriate interventions with well-predicted outcomes that touch on health variables like diet, medication compliance, and social support.
Ultimately, healthcare may become a kind of proof point where the movement toward individualized targeting works visibly to the benefit of sick people, who get better more frequently and more quickly than they have come to expect. The proven benefits would then spread quickly to other markets.
The workplace is another area where all three variables will align for strong prediction. Here, employment contracts, rather than personal trust, grant employers access to data. Companies in 2016 already collect significant data on employees in the name of corporate efficiency; a high-tech office building in Amsterdam will find you the “right” desk and set the room “atmosphere” to your liking.9 In 2020, enterprises will have moved to entirely new realms of data collection and algorithm investment to predict how employees will behave and perform in the workplace. Firms are likely to redesign workflows, both manual and cognitive, to increase the amount of data available to their prediction models, decrease the amount of environmental randomness, and thus build, act on, and benefit from a range of prediction models on employee productivity. The debate in 2020 will be between companies that use these new insights to help employees succeed and those that are seen as using these insights to weed out and punish—proactively in some cases—less productive workers.10
Such changes are likely to accelerate and expand what in 2016 is already a historic debate about labor markets, automation, and inequality, paralleled only by the fights over the rise of labor unions at the turn of the 20th century. Predictions about employee behavior could become the nexus for new problems, leading to calls for stronger social safety nets of a different kind. Some locales may adopt nascent models of prediction-supported employment insurance, while workers’ labor cooperatives may take as their primary objective the “breaking” of such models. Will European labor unions take up corporate data collection as their next big point of advocacy?
Meanwhile, many governments will struggle with the adoption of strong predictive technologies. Democratic governments in particular will be constrained by the tangle of existing privacy laws and practices and would likely fall behind compared to the private sector. This could become another front in the outsource-privatization debate; with regard to public-private service delivery—for instance, roads, tolls, and other traffic management—private-sector providers would soon have an unbeatable advantage. Governments may opt to outsource their data and algorithms to the private sector as the path of least resistance to better performance.
Structural tensions also would likely begin to emerge between democratic and non-democratic governments in the strong prediction sector. If the latter cast aside reservations about the new prediction models and use them as a tool for governance, these governments’ overall performance could improve in surprising ways. Apart from the political-philosophical arguments this would engender (“Is this the coming golden era for algorithmic-authoritative rule?”), it will also present difficulty for trade negotiations, as those countries most willing to use predictive technologies will have structural competitive advantages. Would “non-predictive” economies in 2020 need special dispensations and restrictions, the way “non-market” economies did in the early 21st-century days of the World Trade Organization?
...cybersecurity will become a segmented enterprise—largely because different realms of human action will not be equally susceptible to predictive algorithms.
Democratic governments in particular will be constrained by the tangle of existing privacy laws and practices and would likely fall behind compared to the private sector.
The security dynamic in the strong prediction sector will depend in part on how people respond— in emotional and political-economic terms—to the accuracy of the models and what follows from their predictive capacity. Users will likely find significant value in having increased certainty about decision-making regarding complex and frustrating everyday choices—the effectiveness of a new diet, a workout regimen, a course of study, or personal safety precautions. At the same time, if the surplus generated from these developments is seen to benefit mainly capital and big institutions, then the very accuracy and success of the strong prediction sector could easily become its Achilles’ heel by making it the preferred target for disruptive attacks.
Consider what it would mean to steal someone’s “personal behavior file”—a very lucrative proposition, particularly if the criminal can mine from that file predictions that are not already known to the “legal” market players, or even the actual person behind the file. The simplest spear-phishing attacks could become predictably successful if attackers knew what types of emails a victim is most likely to click on, at what time of day, even as the race against defensive counter-predictions ratchets up.
Vast, quick-profit possibilities here would create a very attractive and highly compensated market for data scientists in the illicit world. Consider the elegance of an integrity attack that introduces a minuscule “bad” argument into an algorithm so that the user of the algorithm receives predictions that completely fail in practice. This could have catastrophic results for the targets. But it would be scientifically fascinating for data scientists to test— particularly for “insider” attacks that might blur the boundaries between what is criminal and what is simply pushing the envelope of scientific research.
Throttled Prediction Sector
In contrast to the (largely unregulated) strong prediction sector, this sector will include industries where government regulations impose more limits on the use of data and predictive models, in order to both manage public expectations and protect against security intrusions. This kind of regulation is likely to develop first in areas where the legitimacy of regulatory action is already established. It is also likely to develop in areas seen as essential to national security, such as defense and intelligence.
Regulations will evolve in order to respond to concrete demonstrations of what in 2016 is referred to as “algorithmic bias” across a variety of sectors, from housing to insurance to education. Arguments about whether human decision-making is more (or less) biased than prediction models will continue to no firm conclusion, and these arguments will create space for policy and regulatory arbitrage, where actors take advantage of differences in regulatory regimes between markets. Copying what Uber did so successfully in the first half of the decade, some companies will defy regulations and legal precedent as they make use of data and algorithms in “throttled” domains, relying on the political power of constituents who desperately want the benefits of the products their algorithms make possible to hold back courts and regulatory authorities. Others will try more subtle approaches, making small changes to processes and defending against possible legal action only as necessary.
A somewhat peculiar trend in the throttled sector is likely to develop in areas where transparency is already quite high: regulations that seek to limit transparency. Consider public equity markets, where regulation historically has sought to force transparency in order to prevent fraud and other forms of market dysfunction. How would regulations in 2020 maintain equilibrium in the face of massive economic incentives pushing global financial institutions to out-predict competitors’ investment algorithms? One (ironic) way to do it might be to limit what kinds of information firms reveal about themselves.
Regulations could also aim to influence the strength of algorithms directly. But this approach will likely lead to other types of regulatory arbitrage where firms hedge their bets by operating in multiple markets. For instance, if governments were to restrict banks from considering certain variables when providing home loans, banks might use that restricted information to make decisions about whether to fund business loans. These kinds of moves will add fuel to the debate about the appropriate role of government regulators, and even whether it is possible to sustain a throttled prediction segment at all.11
The security dynamic in this sector would revolve around a game of complexity management. The highly variegated regulatory environment would, in practice, present an attack surface filled with pockets of vulnerability that are fine-grained and specific. Large-scale attacks may be somewhat more difficult in this environment, but smaller-scale attacks could be much more interesting to invent and harder to detect. The larger, better-funded, and more scientifically sophisticated states and criminals will have an outsized advantage in this world: the capacity to identify and understand arbitrage possibilities will be hard to achieve yet extremely lucrative.
In places where parastatal attackers dominate (China, Russia, possibly Iran), it will likely be the case that the best capabilities are found in large, semi– state-owned enterprises that further blur the lines between military and commercial cyberattacks. For Western governments that would prefer to sustain clear lines between commerce and intelligence, between strategic and corporate espionage, and between civilian and military operations, this blending and blurring will not be a good thing—but how can it be stopped?
Finally, the predictionless sector will include industries and institutions where data is limited and/or environmental randomness is high, as well as those where the ability to monetize predictive technology is less obvious. It may turn out that human decision-making and behavior in particular realms are predominantly random and simply cannot be predicted. It may just as well turn out that decisions are not yet predictable in 2020 using existing mechanisms, either because the relevant data points have not been identified yet or because they cannot be accurately measured. Some types of behavior will fall surprisingly fast out of this sector and some realms will be stubbornly resistant to prediction (for example, the results of a competitive team sports event on “any given Sunday”). But whatever does make up this sector at any given moment will have a unique feel to it: when much of human activity can be predicted, the pursuit of what cannot be predicted becomes a sign of privilege, daring, or both. The most ambitious criminal enterprises—not to mention risk-tolerant investment vehicles—will prefer to operate within the predictionless sector. Some may operate in this area because their actions more easily remain hidden; others will do it to capitalize on asymmetric information advantages in this space.
The cybersecurity attack dynamics in this sector will be distinctive, because they will focus on a nextgeneration approach to the strategic manipulation of uncertainty and doubt. Attackers might send deceptive signals about breakthroughs in prediction modeling in order to destabilize others’ strategies in (ironically) predictable ways. They might also focus their efforts on small manipulations of data, since the inability to predict makes it unlikely that such small manipulations would be identified. For example, without a reliable model of how people set the temperature in their homes, an attacker could raise the set point on a million connected thermostats by a tenth of a degree without much risk of the data manipulation being caught. Attackers might further aim to introduce noise and randomness in order to foil emerging prediction models that threaten to destabilize their strategies. They might also try to shift the predictive power of targets of interest from the strong prediction sector into the predictionless sector by finding ways to deny access to the data that the models require.
...the predictionless sector will include industries and institutions where data is limited and/or environmental randomness is high.
Cybersecurity Uncertainties and Challenges
In this world, human behavior will become the key to cybersecurity. While organizations will have much better information about the wants and needs of individual people, the very fine granularity of that knowledge will make it challenging to achieve economies of scale. How, for example, does one build a platform for a political party in mass movement democratic politics when all the micro-differences among people’s desired policies are plain to see?
Criminal enterprises will face similar challenges as they too look for new sources of effective scale in their attack strategies. One approach would be to seek to identify and gain access to a small number of very important people in a particular setting—the CEO or the president, the prime minister or the five star general. This (ironically) might mean a decline in very large-scale data theft: why bother with all those “weeds” when you can invest your resources much more efficiently in tending the few “roses” that can get you what you want? It might also lead to a segmentation of cybercriminals, with those who cannot play in the top-notch prediction attack game (in other words, those without the expertise to write or manage complicated algorithms) remaining focused on stealing data. Large, rich, scientifically sophisticated state actors are more likely to land in the former category.
In this world, private corporations will be out ahead of government agencies and regulators (at least in democratic governments) in managing the segmented prediction system. Companies will have stronger incentives and fewer constraints on the use of predictive algorithms, as well as greater freedom to experiment with what can be achieved when the algorithms are throttled or fail. As a result of these incentives—and the value that the illicit economy will place on undermining them—new kinds of security mechanisms will likely be developed that operate across the three sectors. Industry watchdogs—independently funded or in some cases owned and funded by industry consortia—would be used to validate claims of prediction quality, perhaps through a kind of escrow-based access to the underlying algorithms and datasets. Some governments might also create or “charter” thirdparty validators or industry self-regulatory bodies in order to gain insight and some oversight at the margins. Either way, firms that underperform and cannot predict to standard will be pushed out of markets rather quickly, which will of course increase the stakes for a successful attack that could quickly bring down a competitor.
It is nearly certain that prediction technologies will quickly find their way into direct military applications as national armies push the boundaries of human performance in conflict. They will also be intensively investigated and in some cases used by intelligence agencies. It has long been the stuff of spy fiction to know enough about particular individuals that recruitment, counter-intelligence, disinformation, and manipulation become extremely precise and targeted science. The most advanced intelligence agencies might not believe it fully possible, but they will boldly experiment nonetheless—if for no other reason than to assess the breakout possibilities open to other, less scrupulous intelligence agencies that might not be willing to play by any set of rules. Might government security agencies even seek to limit the export of algorithms, machines, and people that bolster these capabilities? Surely some governments will try, creating seductive opportunities for “smugglers” and embargo breakers to earn outsized profits in a new kind of cyberpiracy.
The words “surveillance” and “privacy” would come to have quite different operational meanings in this world. When firms and governments can predict what people will do, it will become less necessary to surveil them in a conventional sense. The better the prediction model, the lower the data requirements, and the less the (familiar forms of) intrusion on privacy. Some states—the UAE and Singapore, perhaps—will wholeheartedly invest in what might then be called predictive surveillance, taking advantage of this new equation to reduce visibly intrusive data collection. Might London—probably the world’s most surveilled city— follow?
These are some of the questions that will emerge if privacy continues to mean basically what it does in 2016. But what if the “privacy” agenda is forced up a level of abstraction toward profound issues of human autonomy and freedom from coercion, as might occur across much of Europe? At least part of the cybersecurity agenda would then shift toward system-wide government throttling: imposing constraints on what can be done with prediction models as well as deterring illicit actors who, for monetary or ideological reasons, would seek to break those constraints. In some areas of behavior, the confidentiality of predictions per se—even more so than the underlying prediction models—would need to be protected. But the integrity of data-driven models would be complicated to assess and defend. What obfuscations should or would be considered “attacks” on integrity? Would authorities in some jurisdictions call for anti-circumvention laws that mirror what the US’s Digital Millennium Copyright Act did for copyright protection?12
Ultimately, an operational notion of cybersecurity in this world would need to account for the (possibly monopolistic or at least anticompetitive) power that could be generated by firms with far-reaching prediction models, particularly those subject to positive feedback learning effects. Governments will be less concerned about the dominance of advertising markets than the de facto ownership of markets for aspects of human life.
The Way Forward
In this scenario, the world shifts away from group-based data predictions toward individualized predictive models. Such a shift, which could go largely unnoticed (or be poorly understood) by the public, would occur as a result of improvements in data collection and interpretation. In some areas, predictions will become a significant driver of public life. In others, limitations in data or models—or regulations that inhibit their use—would restrain their impact. But in all cases, new vulnerabilities would arise as a result of the power of predictive modeling, both from malicious actors who socially engineer more targeted attacks and from governments that are ill-equipped to handle them.
In this scenario, members of the cybersecurity research community in 2020 will wish that in 2016 they had been looking at:
- Predictive ModelingThe trajectory of new kinds of security attack vectors resulting from predictive modeling, especially as such vectors displace basic hacking and other security vulnerabilities attracting disproportionate attention today
- RegulationHow predictive modeling can best be regulated, and what schemas of regulation (strict prohibition? licensing?) are likely to be most effective
- Risk AssessmentHow human risk assessment operates in this increasingly automated world
- OptimizationHow to determine whether this shift in predictive models might be approaching, and/or identify particular algorithms that use such approaches, in order to rein in dysfunctions that result from such models and/or spread the benefits of such models more broadly
Researchers in 2020—particularly in the social sciences, but really anyone using data science or advanced statistics—might also have hoped to foresee the ripple effects they could face when the modeling of human behavior shifts to focus attention on single individuals and their particular actions, rather than populations or groups that share characteristics.