The Modern Data Stack

7 min readMay 14, 2021

Complex systems are best explained with apt analogies. No matter how complicated or unfamiliar a concept is, it will contain parallels to something more readily understood. Automated systems, our focus at Story Ventures, are no exception.

The modern data stack — the hardware and software involved in data creation, organization, and analysis — is seen as a series of intimidating technologies. But behind the layers of technical jargon, the data stack functions just like a library; it is merely the current iteration of a process by which people have gathered, organized and consumed information for thousands of years. The main difference today is that computers have made us exceptionally good at it. For instance, the library of Alexandria, built thousands of years ago, represents an early form of information technology. Manuscripts were written, methodically stored and codified, and finally studied by scholars. This flow of information — collection, organization, analysis — has remained constant; we just now possess the digital tools to leverage it at previously inconceivable levels of scale and efficiency.

The Story Ventures Thesis

History tends to ascribe its most pivotal moments to people — the story of how we arrived in the present moment is told with names and faces. Eisenhower & Churchill won World War II, Neil Armstrong was the first man on the moon, Steve Jobs started the Smartphone era, and so on. But what of the technologies that enabled those feats? The invention of mechanical computing that broke the Axis codes, advancements in rocket propulsion that brought us into space, and improvements in microchip processing are all just as responsible for these momentous events as the names to which we credit them. At Story, we believe that technological advancement, perhaps more so than any individual or group, is the key facilitator of societal and economic progress — that new technologies are capable of reshaping and improving both entire economies and individual lives.

Over the past few decades, a number of improvements to the modern data stack have propelled us into an era of enhanced decision-making. Specifically, the quality of sensors has increased, cost of hardware has decreased, the systems that facilitate data movement have improved, and the algorithms applied to this data are revolutionary. Our investment thesis is tied directly to these developments: (i) creation of data via sensory systems, (ii) data infrastructure that processes and organizes data, and (iii) intelligent software that analyzes and derives insights from these datasets.

The application of this thesis is determined by the data maturity of different industries. For sectors like financial services that have long utilized digital data, most of the current opportunities lie in the application and analysis of this data. For sectors that generate disparate and siloed datasets, like automotive, aggregation of this data provides particularly compelling upside. Finally, for industries that have historically not digitized data, like agriculture or energy, much of the potential lies in data creation and capture.

Layer 1: Data Creation

Creation is the layer of the data stack at which information is captured by digital and physical systems. Continuing the library analogy, this is the step at which the books are written. Today, a principal driver of activity at this layer is the proliferation of high-quality, low-cost sensors that translate real-world information into digital data. These sensors include the cameras and light detection systems that enable autonomous vehicles to “see”, the microphones that allow smart speakers to “hear”, and the GPS receivers that enable ride-sharing companies to “locate”. Smart devices constantly monitor our world, and the data they capture has become an integral piece of the broader computational stack.

The decrease in price of these sensors, estimated at up to 70% since 2004, has enabled engineers to include them in everyday products like cell phones and speakers in order to make them “smart.” Much like our hand can detect a hot stovetop and send the feedback to our brain to quickly remove it from the heat, everyday products can now make similar sense of their surroundings and transmit the information back to a computer.

The evolution of LiDAR exemplifies the overall improvement in sensory technology. LiDAR stands for Light Detection and Ranging; it is a sensing method that uses lasers to measure distances and create 3-D maps of objects and areas. The first LiDAR scanners of the 1970s were large, expensive, and reserved for government use. The initial applications were to make detailed maps of clouds, the earth’s surface, and even the Moon. Today, LiDAR scanners come standard in every iPhone 12 and are the lynchpin to the self-driving car — they are smaller than the phone’s camera lens and are becoming affordable enough to power the future of autonomous vehicles. LiDAR’s journey from a prohibitively expensive sci-fi technology to a scanner found in the pocket of every iPhone user represents the progress of the sensor market as a whole.

The second principal driver of data creation is the Internet, which has developed at an astonishing rate. For most of history we were relegated to hand-written records, fraught with error and restricted by the capacity of the author. Today, data creation is democratized, and ubiquitous Internet usage has created a trove of data so large that its size is difficult to comprehend: every day, 306.4 billion emails are sent, Google processes over 3.5 billion searches, 500 million tweets are posted, and 350 million photos are uploaded to Facebook. Every single Internet search, post, and like creates new data, and it is being generated so rapidly that 90% of the world’s data was created in the last two years.

Layer 2: Processing & Organization of Data

The rise of affordable sensors and the Internet has left the world awash in data, but most of it lacks a common structure and organizing principle, making it largely unusable. Instead, data today comes from siloed sources, often encoded in different languages and formats — imagine trying to find a book in a library with no sections or organizational system. Fortunately, libraries use a decimal classification system, and industries today such as finance and healthcare are beginning to promulgate standards for similarly consistent data storage (e.g. Open Banking and FHIR). However, most companies do not have the data science or engineering resources to refine this information for themselves. Therefore, companies such as Plaid are leading a burgeoning wave of data aggregators and normalizers that provide access to cleansed data through APIs. In the Story portfolio, we have invested in companies such as Motorq and Particle Health that operate at this middle layer. We view these systems, which allow businesses to effortlessly discover, access, and share normalized data, as a pillar of the next generation of software applications.

The story of Internet data exemplifies broader advances in data organization. When the Internet was first developed, data was scattered across a massive, fragmented, virtual environment. People often compare the early Internet to an underground oil deposit — a dormant store of resources worth trillions of dollars, waiting for people to realize their value and build the infrastructure to refine them. As the amount of data grew and its value was recognized, the appropriate infrastructure was developed — specifically, companies like Google and Amazon created the rails to efficiently search through a giant corpus of data.

Layer 3: Data Analysis

Our hypothetical library is now fully stocked with books that are stored in an organized and retrievable manner — people can walk in and easily access their desired information. The final step is to learn from these books, and apply our learnings to create real-world value. After all, vast stores of digital data are useless without the software to analyze them, just as a book is equally useless without the scholar’s ability to read it.

Data analysis is the twenty-first century alchemy, but instead of turning lead into gold, machine intelligence turns strings of zeros and ones into revenue. Think of Spotify making a playlist recommendation, Gmail suggesting the last words in a sentence, or Apple auto-correcting your texts. The combination of structured data and advancements at the algorithmic level has created an ideal situation for these businesses to automate processes, improve customer experiences, and increase the capacity for informed decision-making.

Among the most encouraging developments at this layer are developments in artificial intelligence (“AI”), specifically machine learning (“ML”). Siri, Alexa, and customer service chat-bots are some of the better-known applications of AI/ML, but the technology has far broader commercial purposes. Insurance companies are using AI to analyze coverage plans, lenders are utilizing it to determine credit approval, and manufacturers are using it to reduce downtime and predict failures. On its own, AI/ML does an excellent job of replacing low-level analytical work, and when combined with the prowess of high-level data scientists, rich and complex findings can be derived. Overall, commercial applications of AI allow businesses to operate at levels of efficiency that were previously impossible, and they empower business leaders to make key decisions with increased clarity.

Wrapping Up

Despite the underlying complexities of the modern data stack, its essence remains the familiar three-step information process that people have employed for thousands of years. However, this time around, the process is infinitely more useful as it is not restricted by human limitations. The hardware and software technologies now augmenting our capabilities are enabling fascinating new economic and societal achievements. The library remains, it has just moved to the cloud.

Both now and going forward, the capture, cleansing, and packaging of data into information that fuels insights will be the most critical components in driving improvements to enterprise and state. No future scenario is certain, but all future scenarios will certainly be shaped by data.

The Modern Data Stack

Written by Story Ventures