Keep or Not to Keep...Data for Self-Driving Vehicles?

Wednesday, May 17, 2023

Data maximization is the idea that the more data collected the smarter an autonomous system could get. Autonomy companies have long had this approach, but that appears to be changing

Due to current wide-ranging cost-cutting and the high prices for data servers, companies such as Cruise and Waymo are focusing not on maximizing data collection but selecting and holding only data believed to be specifically useful. In 2015, for example, Chris Urmson said, “We could take all the data the cars have seen over time, the hundreds of thousands of pedestrians, cyclist, and vehicles, and take from that a model of how we expect them to move.” At that time autonomous vehicle prototypes were relatively few and companies testing them could afford to keep every data point. Now, nearly a decade later, growing fleets, fancy sensors, and tighter budgets are forcing autonomy companies to be picker about which data to store on their servers. For example, Amazon Web Services charges about 2 cents per gigabyte monthly for its S3 cloud storage service. In 2016 Intel estimated that each autonomous vehicle would generate 4,000 gigabytes of data per day, a volume that would cost about $350,000 to store for a year at Amazon’s current prices. 

Waymo’s Chartham, who oversees computing infrastructure, and his team have begun setting strict quotas for data storage. Chucking data might sound ridiculous for the tech industry with companies like Google and Meta collecting everything with the aim of creating better-designed services and increasing add revenue. Initially, self-driving car developers held a similar philosophy of data maximization, assuming this would produce smarter self-driving systems, says Brady Wang, who studies automotive technologies at market researcher Counterpoint. But, Wang points out, the approach didn’t always work because the volume and complexity of the data made them difficult to organize and understand. 

Recently the focus has been on usefulness over quantity. For example, Cruise has said that 1 percent of the data it generated from driving in San Francisco contains useful information. Deletion isn't the only solution. Moving data to “cold” storage which costs as little as one-tenth of a cent per gigabyte per month at AWS. However, this data can be accessed slowly which limits its usefulness. AI software that can extract valuable data from compressed files could eventually help companies keep more logs without breaking the data bank, says Weisong Shi, a computer scientist at the University of Delaware.

Personal Comment: 

The changing mindset from storing everything to selective storing is important and will likely affect any company that relies on lots of data. The importance of compressing data and reducing storage increases with a growing number of solutions that use lots of data. I also think it is important to note that there is another consideration beyond high storage prices, namely, the sustainability of data centers. This is relevant and should be discussed. 

As far as I can see “cold” storage does not make any sense. Why keep unusable data in storage instead of just removing it? Perhaps a way forward, rather than having multiple private storage centres with duplicated data, is adopting public data storage. Or, perhaps new AI models, such as foundation models, focused on small data sets can reduce storage needs. 

