Apollo Cache in a Nutshell

A Series Blog for Deep Dive into Apollo GraphQL from Backend to Frontend

E.Y.

9 min readMay 3, 2021

This is a 8th of the series blogs on deep dive into Apollo GraphQL from backend to frontend. A lot of information is Apollo GraphQL Doc or GraphQL Doc as well as their source code on Github — all tributes go to them. For my part, I would like to give you my “destructuring” of the original knowledge and my reflection on it, analysis on the code examples/source code as well as some extra examples.

One aspect of optimising Frontend performance is using a Frontend Cache, aka. how to reduce the back and forth request/response lifecycles. Luckily, Apollo Client already has its native implementation by storing the results of its GraphQL queries in a normalised, in-memory cache.

The cache itself has a plug and play nature with some default behaviours, but it also offers a fine granular control by overwriting its default configs. Essentially, you can:

Specify custom primary key fields regarding how data is normalised.
Customise the storage and retrieval of individual fields
Customise the interpretation of field arguments
Define supertype-subtype relationships for fragment matching
Define patterns for pagination
Manage client-side local state

According to the Doc, to customise cache behaviour, provide an options object to the InMemoryCache constructor.

Apollo Cache as an Abstraction Layer

You may wonder, how exactly is the Apollo cache mechanism working with all the details from above? Well, you are more than welcome to go into details, but from an architectural perspective, all you need to know is that Apollo Cache serves as an abstract layer on top of a data store. It handles the upcoming actions (mutations, queries, and subscriptions) and spit out pre-processed response from server after the response data going through the caching process.

It may remind you of a tool like Redux, and you are right, just like a redux store returns the data based on actions, apollo cache returns data based on e.g. queries & mutations. And obviously, the cache store is very similar to a Redux store.

Data normalisation and Custom Identifier

To understand how Apollo cache stores the data, we need to tap into the data normalisation. According to Wiki, normalisation is the process of structuring a database, usually a relational database, in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. Through the use of relationships (primary keys, foreign keys) and constraints, we can enforce unique data getting added to the database only.

The InMemoryCache normalises query results before saving them to the cache by:

Create a global unique identifier for each object included in the response.
Store the objects by its unique identifier in a flat lookup table in JSON-serializable format.
Whenever an incoming object is stored with the same unique identifier as an existing object, the fields of those objects are merged.

The most important part is to calculate the unique identifier.

Assigning unique identifiers

By default, Apollo Client uses the id + __typename to create the identifier. If an object doesn’t specify a __typename or one of id or _id, InMemoryCache falls back to using the object’s path within its associated query (e.g., ROOT_QUERY.allPeople.0 for the first record returned for an allPeople root query).These two values are separated by a colon (:).

But of course, you can overwrite the default behaviour.

Custom identifiers

To create custom identifier, you define TypePolicy for the type and include a keyFields field in relevant TypePolicy objects, like so:

const cache = new InMemoryCache({
  typePolicies: {
    AllProducts: {
      keyFields: [],
    },
    Product: {
      keyFields: ["upc"],
    },
    Person: 
      keyFields: ["name", "email"],
    },
    Book: {
      keyFields: ["title", "author", ["name"]],
    },
  },
});

For the example above, the Book type above uses a subfield as part of its primary key. The ["name"] item indicates that the name field of the previous field in the array (author) is part of the primary key. The Book's author field must be an object that includes a name field for this to be valid. So the resulting identifier string for a Book object has the following structure:

Book:{"title":"Fahrenheit 451","author":{"name":"Ray Bradbury"}}

Understanding the storage mechanism, now the next step is how to interact with Cache. But before that, let’s categorise the default cache behaviours based on its operations.

Operations where cache can or cannot automatically update

Apollo Client is very smart in that for a lot of operations, it can update the cache for some of the operations, so you only need to manually update the cache for the rest of the operations.

In short, the cache can automatically update itself for queries, single mutations that update a single existing entity, and batch mutations that return the entire set of changed entities. But operations involve adding, removing or reordering entities cannot update automatically. Note that batch mutations that do not return the entire set of changed entities can also not update automatically.

Using a common ToDo app as an example.

GetTodoById (automatic)
GetAllTodos (automatic)
UpdateTodoById(automatic)
UpdateTodos(automatic)
AddTodo (no)
DeleteTo (no)
GetAllTodosByFilter (no, as the todo order is different with GetAllTodos , and the dataset might come back different)

Reading and writing data to the cache

Apollo Client supports multiple strategies for interacting with cached data:

readQuery / writeQueryEnables you to use standard GraphQL queries for managing both remote and local data.
readFragment / writeFragmentEnables you to access the fields of any cached object without composing an entire query to reach that object.
cache.modify Enables you to manipulate cached data without using GraphQL at all.

In real life, it’s often see a combination of read+write. Like readQuery and writeQuery (or readFragment and writeFragment) to fetch currently cached data and make selective modifications to it.

const query = gql`
  query MyTodoAppQuery {
    todos {
      id
      text
      completed
    }
  }
`;const data = client.readQuery({ query });const myNewTodo = {
  id: '6',
  text: 'Start using Apollo Client.',
  completed: false,
  __typename: 'Todo',
};client.writeQuery({
  query,
  data: {
    todos: [...data.todos, myNewTodo],
  },
});

readFragment:

const todo = client.readFragment({
  id: 'Todo:5', 
  fragment: gql`
    fragment MyTodo on Todo {
      id
      text
      completed
    }
  `,
});Unlike readQuery, readFragment requires an id option. This option specifies the unique identifier for the object in your cache. If you don't know about the identifier, using the utility function cache.identify()

writeFragment:

client.writeFragment({
  id: 'Todo:5',
  fragment: gql`
    fragment MyTodo on Todo {
      completed
    }
  `,
  data: {
    completed: true,
  },
});

cache.modify

The modify method of InMemoryCache enables you to directly modify the values of individual cached fields. But unlike writeQuery and writeFragment: modify circumvents any merge functions you've defined, which means that fields are always overwritten.

The modify method takes the following parameters:

The ID of a cached object to modify.
A map of modifier functions to execute, one for each field.
Optional broadcast and optimistic boolean values to customise behaviour

Example: Adding an item to a list:

const newComment = {
  __typename: 'Comment',
  id: 'abc123',
  text: 'Great blog post!',
};cache.modify({
  fields: {
    comments(existingCommentRefs = [], { readField }) {
      const newCommentRef = cache.writeFragment({
        data: newComment,
        fragment: gql`
          fragment NewComment on Comment {
            id
            text
          }
        `
      });      // Quick safety check - if the new comment is already
      // present in the cache, we don't need to add it again.
      if (existingCommentRefs.some(
        ref => readField('id', ref) === newComment.id
      )) {
        return existingCommentRefs;
      }      return [...existingCommentRefs, newCommentRef];
    }
  }
});

Garbage collection and cache eviction

According to the documentation, Apollo Client has a Garbage Collection mechanism whereby the default garbage collection strategy of the gc method is suitable for most applications, but you can still use methods like evict to provide more fine-grained control.

cache.gc: The gc method removes all objects from the normalized cache that are not reachable:cache.gc();

cache.retain: You can use the retain method to prevent an object (and its children) from being garbage collected, even if the object isn't reachable: cache.retain('my-object-id');

cache.release: If you later want a retained object to be garbage collected, use the release method: cache.release('my-object-id');

cache.evict: You can remove any normalised object from the cache using the evict method: cache.evict({ id: 'global-identifier' })

Config the TypePolicy & FieldPolicy

TypePolicy

To customise how the cache interacts with specific types in your schema, you can provide an object mapping __typename strings to TypePolicy objects when you create a new InMemoryCache object. A TypePolicy object can include the following fields:

type TypePolicy = {
  keyFields?: KeySpecifier | KeyFieldsFunction | false;  queryType?: true,
  mutationType?: true,
  subscriptionType?: true,fields?: {
    [fieldName: string]:
      | FieldPolicy<StoreValue>
      | FieldReadFunction<StoreValue>;
  }
};type KeySpecifier = (string | KeySpecifier)[];type KeyFieldsFunction = (
  object: Readonly<StoreObject>,
  context: {
    typename: string;
    selectionSet?: SelectionSetNode;
    fragmentMap?: FragmentMap;
  },
) => string | null | void;

FieldPolicy

Inside the TypePolicies, you can supply field level policies for each individual type (object) that you’d like to configure the cache policies for.

A field policy can include:

A read function that specifies what happens when the field's cached value is read
A merge function that specifies what happens when field's cached value is written
An array of key arguments that help the cache avoid storing unnecessary duplicate data.

The most important is the read & merge function. Going back to the cache as a store we mentioned at the beginning of this blog, a read policy defines how data goes out of the store, and merge defines how data gets stored.

When using together, read and write function can serve as a Field level middleware that can do whatever you want after the data comes in and before it goes out. This will be illustrated in Apollo Client pagination in our later blog.

Note that there’s a list of helper functions to pass into the read and merge functions as documented here. FieldPolicy API reference.

Read: When your client queries for object with the FieldPolicy defined, the field is populated with the read function's return value, instead of the field's cached value. For example, you can even define a read function for a field that isn't even defined in your schema.

const cache = new InMemoryCache({
  typePolicies: {
    Person: {
      fields: {
        userId() {
          return localStorage.getItem("loggedInUserId");
        },
      },
    },
  },
});

merge : A common use case for a merge function is to define how to write to a field that holds an array.:

const cache = new InMemoryCache({
  typePolicies: {
    Agenda: {
      fields: {
        tasks: {
          merge(existing = [], incoming: any[]) {
            return [...existing, ...incoming];
          },
        },
      },
    },
  },
});Note that existing is undefined the very first time this function is called for a given instance of the field, as the cache does not yet contain any data for the field so we are providing existing = [] default parameter.Also note that you can't push the incoming array directly onto the existing array. It must instead return a new array.

Key arguments: A keyArgument array indicates which arguments are key arguments that are used to calculate the field's value. Specifying this array can help reduce the amount of duplicate data in your cache. Otherwise, the query with the same request data but different variables will be regarded as different objects when saved to the cache.

Let’s say your schema’s Query type includes a monthForNumber field. This field returns the details of particular month, given a provided number argument (January for 1 and so on). The number argument is a key argument for this field, because it is used when calculating the field's result:

const cache = new InMemoryCache({
  typePolicies: {
    Query: {
      fields: {
        monthForNumber: {
          keyArgs: ["number"],
        },
      },
    },
  },
});

FieldPolicy API reference

Here are the list for the FieldPolicy type and its related types in TypeScript:

type FieldPolicy<
  TExisting,
  TIncoming = TExisting,
  TReadResult = TExisting,
> = {
  keyArgs?: KeySpecifier | KeyArgsFunction | false;
  read?: FieldReadFunction<TExisting, TReadResult>;
  merge?: FieldMergeFunction<TExisting, TIncoming> | boolean;
};type KeySpecifier = (string | KeySpecifier)[];type KeyArgsFunction = (
  args: Record<string, any> | null,
  context: {
    typename: string;
    fieldName: string;
    field: FieldNode | null;
    variables?: Record<string, any>;
  },
) => string | KeySpecifier | null | void;type FieldReadFunction<TExisting, TReadResult = TExisting> = (
  existing: Readonly<TExisting> | undefined,
  options: FieldFunctionOptions,
) => TReadResult;type FieldMergeFunction<TExisting, TIncoming = TExisting> = (
  existing: Readonly<TExisting> | undefined,
  incoming: Readonly<TIncoming>,
  options: FieldFunctionOptions,
) => TExisting;interface FieldFunctionOptions {
  cache: InMemoryCache;
  args: Record<string, any> | null;
  fieldName: string;
  field: FieldNode | null;
  variables?: Record<string, any>;
  isReference(obj: any): obj is Reference;
  toReference(
    objOrIdOrRef: StoreObject | string | Reference,
    mergeIntoStore?: boolean,
  ): Reference | undefined;
  readField<T = StoreValue>(
    nameOrField: string | FieldNode,
    foreignObjOrRef?: StoreObject | Reference,
  ): T;
  canRead(value: StoreValue): boolean;
  storage: Record<string, any>;
  mergeObjects<T extends StoreObject | Reference>(
    existing: T,
    incoming: T,
  ): T | undefined;
}

How Many Ways to Update Cache?

To this point, you might be confused by so many different ways to interact with the Cache object. So how many ways we can use to update, e.g. an addTodo mutation operation?

Well, you can surely use: readQuery+writeQuery
You can also use cache.modify
You can also go define the FieldPolicy for Todo object on the TypePolicies

We can see an example of option 2 and 3:

Option 2: const [addTodo] = useMutation(ADD_TODO, {
    update(cache, { data: { addTodo } }) {
      cache.modify({
        fields: {
          todos(existingTodos = []) {
            const newTodoRef = cache.writeFragment({
              data: addTodo,
              fragment: gql`
                fragment AddTodo on Todo {
                  id
                  type
                }
              `
            });
            return [...existingTodos, newTodoRef];
          }
        }
      });
    }
  });Option 3:
const cache = new InMemoryCache({
  typePolicies: {
    Mutation: {
      fields: {
        addTodo: {
          merge(_, incoming, { cache }) {
            cache.modify({
              fields: {
                todos(existing = []) {
                  return [...existing, incoming]
                },
              },
            })
            return incoming
          },
        },
      },
    },
  },
})

So while Option 1&2 are ad-hoc actions per each individual operations from bottom up, Option 3 is from top down. There is no right or wrong for either way, as long as it suits your need.

That’s so much of it!

Happy Reading!