Work with Us

Family Intelligence

Speculative Research on Local LLMs.

Bringing memories back home,
by USB Club and garden3d.

I.The Moment
II.The Idea
III.The System
IV.Work with Us

I.The Moment
II.The Idea
III.The System
IV.Work with Us

I

The Moment

We hesitate whispering our secrets to the cloud, guilty trading privacy for convenience.

But we do, because there is joy and beauty in being known and understood by the computer.

With careful architecture, we can feel safe speaking openly around an LLM, safe in the verifiable proof that our data's accessible to us alone.

When we engineer for intimacy, we can bring families together, easily storing and safeguarding our memories for future generations to come.

AI could help preserve the next
millennium of family heritage,
...but we hesitate to share our
cherished memories with the cloud.

These objects existed inside multi-generational family environments over the holidays.

II

The Idea

Private at-home intelligence is now in reach. While most of the industry is chasing always-online AI, we've been exploring the alternative.

“In just a few cycles, our handheld devices will house small, local LLMs [...] routing more complex or topical queries through to to the bigger, more expensive cloud-based models. But this architecture will do little to improve upon our privacy under the gaze of Big Tech.”

Research: Where the Flower GrowsView our Previous Post

In Part One of this research, we dove into the case for private AI to explore what will be needed for an air gapped future. When weighing the use cases, families stuck out as both an early adopter and multi-generational beneficiary of local LLMs. Helpful today, crucial tomorrow.

Archiving your family history is a cumbersome process, currently left to the one individual in the family with enough time and conviction to put a book together. LLMs are great at recording unstructured data into a maintainable archive, lowering the barrier to entry for anyone in the family to contribute to the family tree.

The form of the book has endured for centuries. It's timeless in the home and shaped for private reading or collective use.

Family memories belong in the home. With recent AI advancements, this is the first time you're able to build a treasure trove of memories in such a frictionless way for your family. Our heritage and family history is extremely intimate data that many don't want to give to big tech. However there's more utility and longevity of this family information if its archived and browsable digitally.

Previous generations stored family memories physically. Our generation is waking up to the fact that we're losing these memories unless we put systems in place to preserve them.

These memories also need to be embodied, as objects of heritage. They cannot solely live in a phone or a black box home server. We believe in three tenets that these objects must uphold if they wish to be accepted as a new method of archiving. A family intelligence object must be Timeless in its ability to withstand generations, Observable to have an ease of control, and Trustworthy from first glance to the 100th entry into the family tree.

Form follows function. Tenets follow values. The bolded tenets are the ones that felt in harmony.

Families come in all shapes and sizes. They're messy, heartwarming, dysfunctional, inspirational, chosen, bestowed upon us. We considered these many forms of a family to understand how we can design an heirloom that resonates with any family member.

Families don't take a single form, they are a spectrum of structures.

Objects inherently hold memories. Families already embed memories into static objects today, and pass them down their lineage to extend their heritage. A couple types of objects stood out in our research for both private and familial heirlooms.

Objects emerge as long-term memory holders across personal, domestic, and technological forms.

Taking cues from how families archive memories today, we concepted and play-tested different forms that speak to the three tenets of trustworthy, observable, and timeless. They're all around a medium size, able to be quickly thrown in a backpack on the way to Grandma's house, and they all aim to resemble something that already sits in the home today. New objects must meet people halfway to overcome the barrier of entry to change behavior.

From observation to form, we mapped how everyday objects become memory vessels.

The Leaf

An homage to the family tree, the Family Leaf mimics an alarm clock as a tabletop item and features a removable mic. Stand up the leaf to begin a session or use it as a remote to browse past recordings.

The Radio

The most sentimental of the bunch, the Family Radio is a contextually accurate and historically nostalgic way to browse your family's heritage. It's tactile interfaces keep memories grounded in the home.

The Family Book

A modern update to the scrapbook, which stand the test of time for keeping family memories safe. The Family Book provides an intuitive interface for reading and writing your family's history.

Imagine growing up and being able to spend full days diving into your family tree – your lineage, what your aunt studied in school, where that distant cousin is now, what's your grandma's favorite recipe. Building a system for family intelligence provides easy avenues to this information for all ages. We explored early wireframes that would aide family members looking to learn more.

III

The System

Encouraged by the results of our initial local LLM tests, we filled out the system architecture and ran benchmarks on a wider set of home-ready computers.

The main engineering challenge for building a local LLM system comes down to managing user expectations around performance. While larger cloud-based systems can scale up to enormous amounts of computing power, consumer hardware will need to utilize longer-running AI processes for heavy-duty data processing. These tasks will run on-device in the background and surface results to the user interface when ready.

High-level Architecture

We focused on the main user flow of recording and processing a family conversation for architecting and benchmarking. Audio is one of the many modalities that this object of heritage will support. Let's take a look at our ETL Pipeline, a common pattern to Extract, Transform, and Load data.

High-Level System Diagram showing Family Intelligence Runtime architecture

This pipeline will serve in real-time to record and store memories as audio, understand them, and load them into an ontological (or categorized) representation. That data will underpin a social graph of nodes such as people, places, and events, and their edges such as relationships and actions.

Step One: Extract and Chunk Audio

With a high degree of resilience, the device will first record chunks of audio directly to storage (e.g. an on-board microSD card) and encrypt it at rest to lower the likelihood of faults like memory overflow, data corruption, or inconsistency in later steps. This is our first step towards an idempotent and reliably consistent processing architecture.

Lastly, to ensure family members feel safe and in control, we'll utilize a physical disconnect switch that allows for pausing of recording during the conversation – perfect for Grandma's dicey side stories.

Detect and process all unprocessed Transcript Chunks

Step Two: Transform Audio to Recognize Speakers

As chunks are stored safely on disk, the system will pick them up and separate out audio and transcriptions for different speakers. These are often referred to as "Speaker Turns". A couple Python libraries and offline models backed by pyannote-audio and faster-whisper are helpful here.

We store the voices as "voiceprints" so when future recording sessions feature the same speakers they can be logically connected in the social graph. Further, raw transcripts will be stored in the database to be interpreted in the next step.

Transform Speaker Diarisation diagram showing audio chunk to speaker turns to transcript and voiceprint storage

Step Three: Load into Ontological Vector Database

Finally, as raw transcripts are stored in the database, they'll be picked up and analyzed by the LLM, then compressed and stored for easy RAG retrieval and traversal through a graph database.

For compression, Chain of Density (CoD) is a common prompting technique we can employ to ensure our speaker turns are vectorized at a high degree of detail and predictable length.

For extracting a social graph, we can employ Few-Shot Prompting and strict JSON output to extract social relationships ready for entry into a traditional nodes + edges graph database.

Here is our example system prompt for the curious:

system_prompt = """
You are a Data Extraction Engine, not a creative writer.
Your job is to extract family history data with forensic accuracy.

### GROUNDING RULES
1. **Extract ALL People**: You MUST extract EVERY person mentioned in the transcript, even if they're minor characters or only mentioned once. This includes:
   - All named individuals (e.g., "John Doe", "Jane Doe", "Leilani")
   - People referred to by first name only (e.g., "Jane", "Mia", "Paul")
   - People referred to by relationship (e.g., "Mum", "Dad") - create entries for them
   - DO NOT create separate entries for speakers - instead, identify which person each speaker likely is (see Speaker Identification below)
2. **Speaker Identification**: Analyze ALL context clues throughout the transcript to identify which person each speaker likely is:
   - **Collect Multiple Clues**: A speaker may be referred to by different names/relationships in the same conversation:
     * Direct address: "wouldn't it, Mia?" followed by SPEAKER_01 responding suggests SPEAKER_01 is Mia
     * Relationship references: "Mum can tell you" suggests SPEAKER_01 is the mother
     * First-person references: "Jane and I drove home" suggests SPEAKER_01 might be Jane
   - **Name Consolidation**: If multiple clues point to different names for the same speaker, consider they might be the SAME person:
     * Example: If SPEAKER_01 is addressed as "Mia" AND referred to as "Mum" AND says "Jane and I", these could all be the same person (Mia/Jane is the mother)
     * Create ONE person entry with the most complete name (e.g., "Jane" if full name, or "Mia" if that's what's used most)
     * Use the STRONGEST evidence (direct address is stronger than relationship reference)
   - **Confidence Levels**:
     * CERTAIN: Explicit statement like "I am John" or multiple strong clues all pointing to same person
     * PROBABLE: Strong clues like direct address followed by response, OR multiple weaker clues converging on same person
     * POSSIBLE: Single weak clue or conflicting clues
     * UNKNOWN: No identification clues found
   - **Evidence**: Quote ALL relevant text snippets that support the identification, especially if multiple clues point to the same person
2. **Quote Your Sources**: For every memory location, you must provide the EXACT substring from the text that proves it.
3. **No Normalization**: If the text says "Brothers Leagues Club", do not change it to "Bar" or "Coffee Shop". Keep the specific name.
4. **Context is Key**: If a location is vague (e.g., "recovery"), use the specific venue mentioned in context (e.g., "Race Club" or "Leagues Club").
5. **Dates**: If the speakers debate a date (e.g., "84? No 85"), use the final agreed date.

### DATABASE LAYOUT RULES
1. When working with IDs (either from the transcript as UUIDs, or locally generated temporary IDs), triple check that you reference the ID EXACTLY throughout the JSON and don't drop characters.
2. An Event node HAPPENED_AT one or more Location nodes.
3. An Event node is ATTENDED by a Person node, but a Location node is never ATTENDED by a Person node.
4. An Event node should always have a "year" property, and be __UNKNOWN__ if not stated.
5. An Event, Person and Location node should always have a "name" property, and be __UNKNOWN__ if not stated.
6. Two Person nodes are always connected by a RELATES_TO edge, and the edge always has a "type" property, describing the relationship, or __UNKNOWN__ if not stated.
7. A Person should never RELATED_TO a Location node, and a Location node should never be RELATED_TO a Person node.

### REQUIRED JSON SCHEMA
{
  "nodes": [
    {
      "id": "1"
      "label": "Person" | "Event" | "Location",
      "properties": { "name": "Bill" }
    }
    {
      "id": "3",
      "label": "Event",
      "properties": { "year": "1984", "name": "Jan & Ian's Wedding" }
    },
    {
      "id": "4",
      "label": "Location",
      "properties": { "name": "Brother's League Club" }
    },
    {
      "id": "5",
      "label": "Person",
      "properties": { "name": "Paul Francis" }
    }
  ],
  "edges": [
    {
      "label": "RELATES_TO",
      "source_id": "1",
      "target_id": "2",
      "properties": { "relationship": "WIFE" }
    },
    {
      "label": "HAPPENED_AT",
      "source_id": "3",
      "target_id": "4",
    },
    {
      "label": "ATTENDED",
      "source_id": "5",
      "target_id": "3"
    },
    {
      "label": "RELATES_TO",
      "source_id": "5",
      "target_id": "2",
      "properties": { "relationship": "FRIEND" }
    }
  ]
}

### ONE-SHOT EXAMPLE
Transcript: **123**: In 1984, no, in 1985, we got married at the bowls club, right Mia? **456**: Yeah, that's right.

Example output:
{
  "nodes": [
    {
      "id": "123",
      "label": "Person",
      "properties": { "name": "__UNKNOWN__" }
    },
    {
      "id": "456",
      "label": "Person",
      "properties": { "name": "Mia" }
    },
    {
      "label": "Event",
      "properties": { "year": "1985", "name": "Wedding" }
    },
    {
      "id": "_1"
      "label": "Location",
      "properties": { "name": "bowls club" }
    }
  ],
  "edges": [
    {
      "label": "RELATES_TO",
      "source_id": "123",
      "target_id": "456",
      "properties": { "relationship": "SPOUSE" }
    },
    {
      "label": "HAPPENED_AT",
      "source_id": "3",
      "target_id": "_1",
    }
  ]
}

### CRITICAL REMINDER
- Extract EVERY person mentioned, no matter how briefly or how minor they seem
- If someone is mentioned by first name only, use that name (e.g., "Jane" not "Jane Unknown")
- If full names are given, use them (e.g., "John Doe", "Jane Doe")
- DO NOT create separate "Speaker 00" or "Speaker 01" entries - identify which real person each speaker is
- **MULTIPLE CLUES ANALYSIS**: When identifying speakers, look for ALL clues throughout the transcript:
  * If SPEAKER_01 is addressed as "Mia" AND referred to as "Mum" AND says "Jane and I", these likely refer to the SAME person
  * Consolidate: Create ONE person entry (use the most complete name, e.g., "Jane" if that's the full name)
  * The fact that multiple different names/relationships point to the same speaker STRENGTHENS the identification
- For speaker_identifications: Use person_id from the people list, or null if unknown
- Scan the ENTIRE transcript systematically - don't miss anyone
- If a person appears multiple times with different names (e.g., "Jane" and "Jane Doe"), create ONE entry with the most complete name
- **Evidence field**: Include ALL relevant quotes when multiple clues converge, e.g., "Addressed as 'Mia' + referred to as 'Mum' + says 'Jane and I' - all point to same person"
"""

A graph database representation of the relationships of actors in a story, extracted offline by Qwen.

Benchmarking our Chipsets

A key part of our thesis points to the drastic strides in processing speeds that both open source models and consumer-level chipsets have been making every 6 months. Its safe to bet that what might seem resource constrained and inefficient today is likely to be a breeze on hardware and architectures just a year from now.

We wrote a lightweight test harness (view codebase) to asses the feasibility of running this workload on consumer grade hardware. We used this testing framework to run the same benchmark against three best-in-class chipsets. From most to least performant they are:

All benchmarks were run against a cute 3 minute and 42 second story of Hugh's parents explaining how they met in 1985.

“I think we'll be going back to 1984...”

family-meetcute.mp30:00 / 0:00

To benchmark the relative performance of these chips, we ran a speaker diarisation process with a local model of pyannote/speaker-diarization-3.1 (Hugging Face) via the pyannote-audio python library.

As expected, the Thor & Orin drastically outperformed the Raspberry Pi 5 16gb, indicating that for the best possible UX we'll need to run the Application Runtime against a GPU enabled system for these processing loads.

For future tests, we'd be interested to run the benchmarks against a smaller size chip to the Pi with an onboard GPU, such as Orange Pi 5, Khadas VIM4, ASUS Tinker Board, or even a Raspberry Pi 5 running an AI-capable HAT like the HAILO SC1785.

While these initial results are encouraging, we look forward to seeing the advancements in these test cases in the coming months.

Intended for use, then for staying

IV

Work with Us

garden3d helps forward-thinking teams explore the edges of local ai, speculative hardware, and product storytelling. We collaborate with brands, labs, and founders to design the next generation of tangible ai experiences. This research is in partnership with usb club, a memory network for preserving what matters.

For partnerships and collaborations, email us at partner@intelligence.family.

Subscribe for Updates

This is part four in our ongoing research on local AI. Checkout the previous research on our Substack:

Local Intelligence Research

Published January 2026 by
USB Club (Norm, Yatú) and garden3d (Hugh)

In partnership with

Primary Research Are.na Secondary Research Are.na

Thank you.