Article icon
Article

Ask a Data Ethicist: Where Does My AI “Chat” Data Go?

Sometimes I get questions about AI that are not really about AI. I was recently asked a similar question about chatbot interactions by a few different people at various talks. These folks wanted to know…

When Chatting with a Chatbot, Where Does My Data Go?

I was surprised by this question, perhaps because I thought the answer is something we already understood. But, this AI moment has a way of illuminating things we might not have previously paid attention to, so with that as our setup, let’s dig into the world of cloud computing.

A Little History

A long time ago, computers were really big and expensive. People would share access to a computer. There’s a story about one of the first computers in my neck of the woods, stationed at a military base in Suffield, Alberta. Researchers from the University of Alberta in Edmonton would make the five-hour drive (one way) to use this computer to help crunch data for their agricultural research project. 

By the early 1990s, we saw the rise of the personal computer and desktop software. Organizations everywhere got a computer for every desk top and it was loaded with programs that ran locally on that computer. Large enterprise organizations networked or connected their computers and added more processing capacity through servers. The internet consolidated the idea of being networked at a bigger scale. At some point – around the mid 2010s – with the rise of things like browser-based software as a service, streaming media, and all the other internet-based applications we barely notice today, organizations abandoned running their own on-premises” infrastructure in favor of renting space on “the cloud.” We also (largely) abandoned the idea of running software programs locally, in favor of renting – hence the term “software as a service.” Subscriptions proved to be very lucrative business model for software vendors (looking at you, Adobe).

I’m teaching a course at the University of Alberta that is held in one of the dedicated computer labs and the room feels like a vestige of the early to mid 2000s before everyone got their own devices. Virtually nobody is using the supplied machines because everyone has their own. 

Who’s Down with O.P.P.* (Other People’s Processors)?

O.P.P., how can I explain it? I’ll take you frame by frame it” –Naughty by Nature

One way to think about the cloud is other people’s processors. Using the cloud means you’re storing and processing your data elsewhere. For most organizations (and individuals), this means Amazon Web Services, Google Cloud or Microsoft Azure. There are other options, but these are the more popular ones. Here’s a short and simple explainer about the cloud. My sense is that for most people who don’t work with data or in IT and who were not involved in moving an organization’s data to a cloud infrastructure, the transition to the cloud just happened. It wasn’t an intentional choice. For many younger people, there may not be a memory of any other way because this is hows its always been.

The data that you put into a chatbot goes to the same place it goes when you do other things that are enabled by the cloud, like typing in a Google doc, which I’m doing right now. It goes to the same place that it goes when you use software as a service or social media. It gets stored and processed in the cloud or other people’s processors and “resides” across a global infrastructure of data centers. In that sense, it’s removed from your direct physical control. Large language models in particular require this infrastructure and all its processing capacity in order to work. 

What About Privacy?

Part of what is not clearly stated in this question about “where the data goes” centers on the idea of privacy. What if I’m sharing private (define that however you wish) information with the chatbot? Who has access to that data? Will it be used to train the next iteration of the model? Could it be leaked as output from a model somewhere down the line? Where specifically is that data located? In other words, this question shines a big light on the whole idea of cloud computing infrastructure which has been with us for several decades now, but which most people may not have thought too much about. 

There are different levels of privacy related to cloud computing, which are typically defined in terms of service and privacy policies for a particular product or service. Unfortunately, few people read those policies. Most people understand that these companies can access our email, documents, and other communications. They can use the information to create profiles of our habits and behaviors, which are then sold to third parties typically for the purpose of advertising. This is the business model for much of the internet and will probably be part of the business model for chatbots too. 

There are ways to change our settings or in some cases – if you are large enough organization – to negotiate different terms. Generally speaking, if you’re using free versions of AI models (ChatGPT, Gemini), the default is to train on your data, or as they put it, “use your data to improve the service,” aligning with the old adage of you being the product when it comes to free digital apps. There are ways to opt out, but you need to proactively enable these settings. 

*could not resist making the old-school hip hop reference. Check out this fantastic jazz remix.

Send Me Your Questions!

I would love to hear about your data dilemmas or AI ethics questions and quandaries. You can send me a note at [email protected] or connect with me on LinkedIn. I will keep all inquiries confidential and remove any potentially sensitive information – so please feel free to keep things high level and anonymous as well. 

This column is not legal advice. The information provided is strictly for educational purposes. AI and data regulation is an evolving area and anyone with specific questions should seek advice from a legal professional.

Data and AI Ethics Courses

Explore the ethical considerations and standards implicit in the data industry and the emerging realm of AI.

(Use code DATAEDU for 25% off!)