Skip to content
A protective dome shielding a warm constellation of family lights from external data streams
Privacy

Why Your AI Assistant Should Never Train on Your Family's Data

M
Morphee Team
· 17 min read

In 2023, the Federal Trade Commission ordered Amazon to pay $25 million to settle allegations that its Alexa voice assistant had retained children’s voice recordings indefinitely — even after parents explicitly requested deletion. The recordings were used to train machine learning models. Millions of families had invited a device into their kitchens, living rooms, and bedrooms, trusting that their children’s voices would be handled responsibly. That trust was violated at industrial scale.

This is not an isolated incident. It is the default business model of consumer AI. And if you are using an AI assistant in your home today, your family’s most intimate data is almost certainly being processed in ways you never agreed to and cannot fully understand.

What AI Assistants Actually Collect

The average person generates roughly 1.7 megabytes of data per second of online activity, according to estimates from DOMO’s annual Data Never Sleeps reports. For a household of four running a voice-enabled AI assistant throughout the day, the volume of behavioral data produced is staggering. But volume is only part of the story. The real concern is the nature of what gets captured.

The Visible Layer

You know about your direct interactions: the questions you ask, the reminders you set, the calendar events you create, the messages you dictate. This is the data you consciously generate, and most people assume it is all that gets processed.

The Invisible Layer

Beneath every explicit command sits a far richer dataset. AI assistants routinely capture voice patterns and biometric voiceprints, ambient audio before and after wake words (so-called “pre-roll” and “post-roll” buffers), interaction timing that maps your household’s daily rhythms, device proximity data revealing which family members are home and when, natural language patterns that reveal emotional states, educational levels, and cognitive development in children, and inferred household composition based on the number and characteristics of distinct speakers.

Amazon’s Alexa, for instance, was found to be creating detailed “voice profiles” of household members, including children, which persisted across devices and were linked to purchasing behavior. Google’s Nest devices have been shown to collect ambient temperature, motion, and audio data continuously, building what amounts to a real-time occupancy model of your home.

This is not metadata. This is an intimate, continuously updated portrait of your family’s private life.

Why Family Data Is Categorically Different

Privacy advocates have long argued that all personal data deserves protection. This is true. But family data — specifically data generated in households with children — carries risks that are qualitatively different from individual adult data. Three factors make this the case.

The European Union’s General Data Protection Regulation addresses this directly. GDPR Article 8 establishes that the processing of a child’s personal data requires parental consent, with member states setting the age threshold between 13 and 16 years. In Ireland, where many tech companies are headquartered for regulatory purposes, the age is 16. In the United Kingdom (under the UK GDPR post-Brexit), it is 13. In the United States, the Children’s Online Privacy Protection Act (COPPA) sets the threshold at 13 and imposes strict requirements on operators of websites and online services directed at children.

These are not abstract legal principles. They carry real enforcement weight. In 2019, the FTC fined TikTok (then Musical.ly) $5.7 million for collecting personal information from children under 13 without parental consent — the largest COPPA civil penalty at that time. Later that same year, Google and YouTube agreed to pay $170 million to settle FTC and New York Attorney General allegations that YouTube had illegally collected children’s personal data and used it for targeted advertising.

Yet most AI assistants deployed in family homes make no meaningful distinction between adult and child data. When a six-year-old asks Alexa about dinosaurs, that voice recording enters the same data pipeline as an adult’s request for driving directions. The legal frameworks exist. The enforcement actions prove they are not theoretical. But the technology deployed in most homes simply ignores the distinction.

Routines Reveal Security Vulnerabilities

A single data point about your morning schedule is innocuous. Six months of continuous data about when your family wakes up, when children leave for school, when the house is empty, when you return from work, and when you go to bed is a comprehensive security profile. This data, if breached or sold, tells a potential intruder exactly when your home is unoccupied and for how long.

The data broker industry, estimated at approximately $350 billion in annual revenue globally, trades precisely this kind of behavioral intelligence. Location data brokers have been caught selling geofenced data around schools, places of worship, and medical facilities. In 2022, a Catholic priest in the United States was publicly identified and outed using commercially available location data purchased from Grindr through a data broker. If a priest’s location data is commercially available, your family’s behavioral patterns captured by a home AI assistant are not safer simply because they sit in a different company’s servers.

Health Mentions Become Unprotected Health Data

When you mention to your AI assistant that your daughter has a peanut allergy, or ask it to remind you about a medication, or tell it your son has been having trouble sleeping, you are generating health-related data. In a clinical setting, this information would be protected under regulations like HIPAA in the United States or explicit GDPR provisions for special category data (Article 9). But when the same information flows through a consumer AI assistant, it typically receives no special protection whatsoever.

This gap is not hypothetical. In 2019, the Wall Street Journal revealed Google’s “Project Nightingale,” a partnership with Ascension health system that gave Google access to detailed medical records of up to 50 million Americans — without patient notification or consent. The project was technically legal under HIPAA’s provisions for business associates, but it demonstrated how health data, once digitized, finds its way into AI training pipelines through paths that consumers never anticipate.

Your family’s casual health mentions to an AI assistant are processed with even fewer protections than Project Nightingale’s clinical data, because consumer AI interactions are not covered by HIPAA at all.

How the Major Players Actually Handle Your Data

Understanding the specific practices of the largest AI companies is not about vilifying individual products. It is about recognizing an industry-wide pattern where family data is treated as training fuel rather than a sacred trust.

Amazon Alexa

Amazon’s Alexa privacy controversies span years. In addition to the $25 million FTC settlement in 2023 for retaining children’s voice recordings, Amazon was found to have employed thousands of workers worldwide to listen to and transcribe Alexa voice recordings for quality improvement. Internal documents revealed that these workers could access users’ home addresses and in some cases shared amusing or disturbing recordings among themselves. Amazon’s response was to add an opt-out option buried in device settings, but the default remained — and remains — that recordings are stored and used for service improvement.

The FTC’s complaint specifically noted that Amazon’s deletion mechanism was defective: even when parents deleted voice recordings through the Alexa app, the associated data (transcripts, inferences, behavioral profiles derived from the recordings) often persisted in Amazon’s systems. You could delete the audio, but the intelligence extracted from it lived on.

Google Assistant and Gemini

Google’s entire business model is built on data-driven advertising. When Google processes your family’s AI interactions, it does so within an ecosystem explicitly designed to convert personal data into advertising signals. Google’s privacy policy uses language like “improve our services” and “develop new ones,” which in practice means that interaction data feeds models that serve the company’s $224 billion annual advertising revenue.

Google has made meaningful improvements — offering auto-delete options and removing the default of human review for audio recordings after a 2019 investigation by Belgian broadcaster VRT revealed that Google contractors were listening to intimate conversations, including bedroom audio that should never have been recorded. But the fundamental architecture remains cloud-first: your data leaves your device, enters Google’s infrastructure, and is processed under policies that Google can unilaterally change.

OpenAI (ChatGPT)

OpenAI’s approach to training data is straightforward but troubling for families. By default, conversations with ChatGPT are used to train future models. OpenAI provides an opt-out mechanism — you can disable “Chat History & Training” in settings, or submit a formal request through their data privacy portal. But the opt-out is not the default, it is not prominently surfaced, and many users are unaware it exists.

For families, this means that every conversation a child has with ChatGPT — homework help, creative writing, personal questions about health or emotions — becomes training data for models that will be deployed to millions of other users. The child’s unique phrasings, concerns, and developmental patterns are absorbed into a system they have no control over, to benefit a company’s product in ways neither the child nor their parents chose.

OpenAI’s privacy policy also notes that data may be shared with “service providers” and “affiliates,” and that aggregated or de-identified data can be used without restriction. The research community has repeatedly demonstrated that “de-identified” data can often be re-identified, particularly when datasets are rich enough — and conversational AI data is among the richest data types in existence.

A Positive Counterpoint: Apple’s On-Device Approach

Apple’s approach to AI processing offers an instructive contrast. Apple has invested heavily in on-device machine learning, processing Siri requests, photo recognition, and health data locally on the user’s device whenever possible. When cloud processing is required, Apple uses a system it calls Private Cloud Compute, which processes data on Apple Silicon servers with cryptographic guarantees that Apple itself cannot access the data.

This is not a complete solution — Apple still collects some data, and its privacy protections have limits. But it demonstrates that on-device, privacy-preserving AI is technically feasible at scale. The choice to send family data to cloud servers for model training is a business decision, not a technical necessity.

Red Flags in Privacy Policies

Most families will never read the full privacy policy of their AI assistant. These documents are deliberately long, vague, and written to maximize the company’s legal flexibility rather than to inform the user. But certain phrases function as reliable warning signals that your data is being used in ways you would not choose if you understood what was happening.

“Improve our services.” This is the most common euphemism for using your data to train AI models. When a company says it uses your data to “improve” its products, it means your conversations, voice patterns, and behavioral data are feeding machine learning pipelines that benefit the company’s entire user base — and its bottom line. Your family’s private moments become product development inputs.

“Aggregate and anonymize.” This phrase sounds protective, but it is often meaningless in practice. Researchers at Imperial College London and elsewhere have demonstrated that anonymized datasets can be re-identified with over 99% accuracy when enough data points are available. Conversational AI data, which contains linguistic fingerprints, topic patterns, and temporal signatures, is particularly vulnerable to re-identification. “Anonymized” family data is rarely truly anonymous.

“Third-party partners.” This language grants the company permission to share your data with an undefined and potentially unlimited set of external entities. “Partners” can include advertising networks, data brokers, analytics companies, and other AI firms. Once your data reaches a third party, you lose all practical ability to track or control it.

“May retain data after account deletion.” Some policies reserve the right to keep derived data — models trained on your conversations, inferences drawn from your behavior, aggregated profiles — even after you delete your account. The raw data disappears, but its ghost persists in the company’s systems indefinitely.

“Transfer data internationally.” For families in the EU or UK, this is a critical flag. GDPR restricts international data transfers to countries without adequate privacy protections (Chapter V). If your AI assistant transfers family data to servers in jurisdictions without GDPR-equivalent protections, the legal safeguards you rely on may not apply.

A Privacy Checklist for Families

Before allowing any AI product into your family’s life, ask these seven questions. If the company cannot answer them clearly and affirmatively, it does not deserve access to your home.

1. Where is my data processed? Demand specificity. “In the cloud” is not an answer. Which cloud? Which region? Which jurisdiction’s laws apply? The gold standard is on-device processing, where data never leaves your hardware. If cloud processing is involved, the company should be able to tell you exactly where your data goes and which legal framework governs it.

2. Is my data used to train AI models? This is a yes-or-no question. If the answer is yes, or “yes, but you can opt out,” that means the default is to use your family’s data for the company’s benefit. Opt-out is not the same as privacy. The default should be that your data is yours alone.

3. How does the product distinguish between adult and child data? If the answer is “it doesn’t,” the product is almost certainly not COPPA-compliant and may violate GDPR Article 8. A product designed for family use must have age-appropriate data handling built into its architecture, not bolted on as an afterthought.

4. What happens when I delete my data? Deletion must mean deletion — not just of the raw data, but of derived data, inferences, trained model weights, and behavioral profiles. Ask specifically: “If I delete my account, does any data or data derivative persist in your systems?” If the answer is anything other than “no,” you do not have real deletion.

5. Can I see exactly what data you have about my family? GDPR Article 15 grants EU residents the right to access their personal data. CCPA provides similar rights in California. But rights on paper and rights in practice are different things. Test the company’s data access process before committing to its product. If getting your own data back is difficult, deleting it will be harder.

6. Where are my credentials and API keys stored? If the AI product connects to other services on your behalf (email, calendar, smart home devices), it needs to store authentication credentials somewhere. Those credentials should be stored in your device’s secure enclave or keychain — never in a remote database. A breach of the AI company’s servers should not give attackers access to your email, your calendar, and your smart locks.

7. What is the company’s business model? If the product is free and the company is not a nonprofit, you are the product. Advertising-funded AI assistants have a structural incentive to collect and monetize your data. Subscription-funded or self-hosted products align the company’s incentives with your privacy, because the company profits from keeping you satisfied, not from selling your data.

How Morphee Approaches This Differently

We built Morphee because we have children and we were unwilling to accept the industry’s default bargain: trade your family’s privacy for AI convenience. Every architectural decision in Morphee starts from a single principle: your family’s data belongs to your family.

Local-First Processing

Morphee runs AI inference directly on your device. Your conversations, your children’s questions, your family’s routines — they are processed by models running on your own hardware. Data does not leave your device by default. There is no cloud server ingesting your family’s private life. For a deeper explanation of why this matters technically, see our comparison of local AI versus cloud AI.

When cloud processing is needed for capabilities that exceed local hardware (such as complex reasoning tasks), Morphee requires explicit, per-feature consent before any data leaves your device. This is not an opt-out buried in settings. It is a clear, informed choice presented at the moment it matters.

No Training on User Data

Morphee does not train models on your family’s data. Full stop. Your conversations improve your personal AI experience through local memory and context — but they never leave your device to feed a shared model. Your daughter’s bedtime questions do not become training data for a product used by strangers.

Credentials in Your Device’s Keychain

When Morphee connects to external services on your behalf, authentication credentials are stored in your device’s secure keychain through our secure credential architecture — the same hardware-backed secure storage that protects your banking apps. Credentials never touch a remote database. If Morphee’s servers were compromised tomorrow (we operate minimal server infrastructure precisely to reduce this attack surface), attackers would find no passwords, no OAuth tokens, no API keys. Your connected services remain secure because the keys to them never left your device.

Morphee implements a structured consent service at the application level. Every processing activity that involves personal data — from AI inference to memory extraction to calendar integration — is gated by an explicit consent check. The system does not process data of a given type until the relevant consent has been granted. This is not a privacy policy promise. It is enforced in code, tested in our automated test suite, and auditable. For the technical details of our privacy architecture, including our approach to GDPR compliance, we publish everything openly.

Group-Based Isolation

Every piece of data in Morphee is scoped to your family group. There is no shared database where your family’s data mingles with other families’ data. Queries are filtered by group at the database level. Even in a catastrophic software bug, data from one family cannot leak to another because the isolation is structural, not just logical.

You can review our full security architecture at morphee.app/security.

The Stakes Are Higher Than You Think

The decisions we make today about AI and family data will shape the digital environment our children grow up in. A child who interacts with AI assistants from age three will have generated an extraordinarily detailed behavioral, cognitive, and emotional profile by adulthood — a profile they never consented to creating, stored in systems they cannot audit, controlled by companies whose business models may change at any time.

The regulatory landscape is catching up. The EU AI Act, which entered force in 2024 with provisions phasing in through 2026, classifies AI systems that interact with children as high-risk and imposes stringent transparency and data governance requirements. The FTC has signaled through its enforcement actions against Amazon, Google, and TikTok that it will use its existing authority aggressively to protect children’s data. California’s Age-Appropriate Design Code Act, modeled on the UK’s Children’s Code, requires businesses to default to the highest privacy settings for users likely to be children.

But regulation, however important, is reactive. By the time a violation is discovered, investigated, and penalized, years of data have already been collected and processed. The Amazon Alexa case spanned years of data retention before the FTC acted. Your family cannot afford to wait for regulators to catch up with each new AI product’s data practices.

The only reliable protection is architectural: choose products that cannot misuse your data because they are built so that your data never leaves your control.

Moving Forward

The AI industry wants you to believe that privacy and capability are a trade-off — that you must surrender your family’s data to get a useful AI assistant. This is false. Modern hardware is powerful enough to run sophisticated AI models locally. On-device processing is not a limitation; it is a design choice that respects your family’s dignity.

You do not need to become a privacy expert to protect your family. You need to ask the right questions, recognize the red flags, and choose products built by people who believe your family’s data is not theirs to take.


Morphee is built for families who refuse to compromise on privacy. We process AI locally, we never train on your data, and we store credentials in your device’s secure keychain — not on our servers. Join the waitlist to see what privacy-first AI actually looks like.

Share this article
M

Morphee Team

Morphee Team

Related articles

Encrypted GDPR compliant No tracking Local AI option Open source