GraphQL is a query language for APIs, which gives clients the power to ask for exactly what data they need and receive exactly that, nothing more or less. This way, a single query can already fetch all the required data to render a component.
(A REST API, in comparison, must trigger several roundtrips to fetch data from multiple resources from different endpoints, which can become very slow, particularly on mobile.)
Even though GraphQL (which means “Graph Query Language”) uses a graph data model to represent data, the GraphQL server does not necessarily need to use a graph as the data structure to resolve the query, but can use any data structure that desires. The graph is only a mental model, not an actual implementation.
This is endorsed by the GraphQL project by stating on its website graphql.org (emphasis mine):
Graphs are powerful tools for modeling many real-world phenomena because they resemble our natural mental models and verbal descriptions of the underlying process. With GraphQL, you model your business domain as a graph by defining a schema; within your schema, you define different types of nodes and how they connect/relate to one another. On the client, this creates a pattern similar to Object-Oriented Programming: types that reference other types. On the server, since GraphQL only defines the interface, you have the freedom to use it with any backend (new or legacy!).
This is good news, because dealing with either graphs or trees (which are subsets of graphs) is not trivial, and can lead to an exponential or logarithmic time complexity for resolving the query (i.e. the time required to resolve a query may increase several orders of magnitude for each new input to the query).
In this article, I will describe the architectural design of the GraphQL server in PHP GraphQL by PoP, which uses components as a data structure instead of graphs. This server owes its name to PoP, the library to build components in PHP over which it is based. (I am the author of both projects.)
This article is divided into 5 sections, explaining:
- What is a component
- How PoP works
- How components are defined in PoP
- How components are naturally suitable for GraphQL
- The performance of using components to resolve a GraphQL query
Let’s start.
1. What is a component
The layout of every webpage can be represented using components. A component is simply a set of pieces of code (such as HTML, JavaScript and CSS) put all together to create an autonomous entity, which can wrap other components to create more complex structures, and be itself wrapped by other components too. Every component has a purpose, which can range from something very basic, such as a link or a button, to something very elaborate, such as a carousel or a drag-and-drop image uploader.
Building a site through components is akin to playing with LEGO. For instance, in the webpage from the image below, simple components (links, buttons, avatars) are composed to create more complex structures (widgets, sections, sidebars, menus) all the way up to the top, until we obtain the webpage:
Components can be implemented both for the client-side (such as JS libraries Vue and React, or CSS component libraries Bootstrap and Material-UI) and for the server-side, in any language.
2. How PoP works
PoP describes an architecture based on a server-side component model, and implements it in PHP through the component-model library.
In the sections below, the terms “component” and “module” are used interchangeably.
The component hierarchy
The relationship of all modules wrapping each other, from the top-most module all the way down to the last level, is called the component hierarchy. This relationship can be expressed through an associative array (an array of key
=> property
) on the server-side, in which each module states its name as the key attribute and its inner modules under property "modules"
.
The data in the PHP array can be directly used in the client-side too, encoded as a JSON object.
The component hierarchy looks like this:
$componentHierarchy = [
'module-level0' => [
"modules" => [
'module-level1' => [
"modules" => [
'module-level11' => [
"modules" => [...]
],
'module-level12' => [
"modules" => [
'module-level121' => [
"modules" => [...]
]
]
]
]
],
'module-level2' => [
"modules" => [
'module-level21' => [
"modules" => [...]
]
]
]
]
]
]
The relationship among modules is defined on a strictly top-down fashion: a module wraps other modules and knows who they are, but it doesn’t know, and doesn’t care, which modules are wrapping him.
For instance, in the component hierarchy above, module 'module-level1'
knows it wraps modules 'module-level11'
and 'module-level12'
, and, transitively, it also knows it wraps 'module-level121'
; but module 'module-level11'
doesn’t care who is wrapping him, consequently is unaware of 'module-level1'
.
Having the component-based structure, we add the actual information required by each module, which is categorized into either settings (such as configuration values and other properties) and data (such as the IDs of the queried database objects and other properties), and placed accordingly under entries modulesettings
and moduledata
:
$componentHierarchyData = [
"modulesettings" => [
'module-level0' => [
"configuration" => [...],
...,
"modules" => [
'module-level1' => [
"configuration" => [...],
...,
"modules" => [
'module-level11' => [
...children...
],
'module-level12' => [
"configuration" => [...],
...,
"modules" => [
'module-level121' => [
...children...
]
]
]
]
],
'module-level2' => [
"configuration" => [...],
...,
"modules" => [
'module-level21' => [
...children...
]
]
]
]
]
],
"moduledata" => [
'module-level0' => [
"dbobjectids" => [...],
...,
"modules" => [
'module-level1' => [
"dbobjectids" => [...],
...,
"modules" => [
'module-level11' => [
...children...
],
'module-level12' => [
"dbobjectids" => [...],
...,
"modules" => [
'module-level121' => [
...children...
]
]
]
]
],
'module-level2' => [
"dbobjectids" => [...],
...,
"modules" => [
'module-level21' => [
...children...
]
]
]
]
]
]
]
Next, the database object data is added to the component hierarchy. This information is not placed under each module, but under a shared section called databases
, to avoid duplicating information when 2 or more different modules fetch the same objects from the database.
In addition, the library represents the database object data in a relational manner, to avoid duplicating information when 2 or more different database objects are related to a common object (such as 2 posts having the same author).
In other words, database object data is normalized. The structure is a dictionary, organized under each object type first and object ID second, from which we can obtain the object properties:
$componentHierarchyData = [
...
"databases" => [
"dbobject_type" => [
"dbobject_id" => [
"property" => ...,
...
],
...
],
...
]
]
For instance, the object below contains a component hierarchy with two modules, "page"
=> "post-feed"
, where module "post-feed"
fetches blog posts. Please notice the following:
- Each module knows which are its queried objects from property
dbobjectids
(IDs4
and9
for the blog posts) - Each module knows the object type for its queried objects from property
dbkeys
(each post’s data is found under"posts"
, and the post’s author data, corresponding to the author with the ID given under the post’s property"author"
, is found under"users"
): - Because the database object data is relational, property
"author"
contains the ID to the author object instead of printing the author data directly
$componentHierarchyData = [
"moduledata" => [
'page' => [
"modules" => [
'post-feed' => [
"dbobjectids": [4, 9]
]
]
]
],
"modulesettings" => [
'page' => [
"modules" => [
'post-feed' => [
"dbkeys" => [
'id' => "posts",
'author' => "users"
]
]
]
]
],
"databases" => [
'posts' => [
4 => [
'title' => "Hello World!",
'author' => 7
],
9 => [
'title' => "Everything fine?",
'author' => 7
]
],
'users' => [
7 => [
'name' => "Leo"
]
]
]
]
Data-loading
When a module displays a property from a database object, the module may not know, or care, what object it is; all it cares about is defining what properties from the loaded object are required.
For instance, consider the image below: a module loads an object from the database (in this case, a single post), and then its descendant modules will show certain properties from the object, such as "title"
and "content"
:
Hence, along the component hierarchy, the “dataloading” modules will be in charge of loading the queried objects (the module loading the single post, in this case), and its descendant modules will define what properties from the DB object are required ("title"
and "content"
, in this case).
Fetching all the required properties for the DB object can be done by traversing the component hierarchy: starting from the dataloading module, PoP iterates all its descendant modules all the way down until reaching a new dataloading module, or until the end of the tree; at each level it obtains all required properties, and then merges all properties together and queries them from the database, all of them only once.
Because the database object data is retrieved in a relational manner, then we can also apply this strategy among the relationships between database objects themselves.
Consider the image below: Starting from the object type "post"
, and moving down the component hierarchy, we will need to shift the database object type to "user"
and "comment"
, corresponding to the post’s author and each of the post’s comments respectively, and then, for each comment, it must change the object type once again to "user"
corresponding to the comment’s author. Moving from a database object to a relational object is what I call “switching domains”.
After switching to a new domain, from that level at the component hierarchy downwards, all required properties will be subjected to the new domain: Property "name"
is fetched from the "user"
object representing the post’s author, "content"
from the "comment"
object representing each of the post’s comments, and then "name"
from the "user"
object representing the author of each comment:
Traversing the component hierarchy, PoP knows when it is switching domain and, appropriately, fetch the relational object data.
3. How components are defined in PoP
Module properties (configuration values, what database data to fetch, etc) and descendant modules are defined through ModuleProcessor
objects, on a module by module basis, and PoP creates the component hierarchy from all ModuleProcessor
s handling all involved modules.
Similar to a React app (where we must indicate which component is rendered on <div id="root"></div>
), the component model in PoP must have an entry module. Starting from it, PoP will traverse all modules in the component hierarchy, fetch the properties for each from the corresponding ModuleProcessor
, and create the nested associative array with all properties for all modules.
When a component defines a descendant component, it references it through an array of 2 parts:
- The PHP class
- The component name
This is so because components typically share properties. For instance, components POST_THUMBNAIL_LARGE
and POST_THUMBNAIL_SMALL
will share most properties, with the exception of the size of the thumbnail. Then, it makes sense to group all similar components under a same PHP class, and use switch
statements to identify the requested module and return the corresponding property.
A ModuleProcessor
for post widget components to be placed on different pages looks like this:
class PostWidgetModuleProcessor extends AbstractModuleProcessor {
const POST_WIDGET_HOMEPAGE = 'post-widget-homepage';
const POST_WIDGET_AUTHORPAGE = 'post-widget-authorpage';
function getSubmodulesToProcess() {
return [
self::POST_WIDGET_HOMEPAGE,
self::POST_WIDGET_AUTHORPAGE,
];
}
function getSubmodules($module): array
{
$ret = [];
switch ($module[1]) {
case self::POST_WIDGET_HOMEPAGE:
case self::POST_WIDGET_AUTHORPAGE:
$ret[] = [
UserLayoutModuleProcessor::class,
UserLayoutModuleProcessor::POST_THUMB
];
$ret[] = [
UserLayoutModuleProcessor::class,
UserLayoutModuleProcessor::POST_TITLE
];
break;
}
switch ($module[1]) {
case self::POST_WIDGET_HOMEPAGE:
$ret[] = [
UserLayoutModuleProcessor::class,
UserLayoutModuleProcessor::POST_DATE
];
break;
}
return $ret;
}
function getImmutableConfiguration($module, &$props)
{
$ret = [];
switch ($module[1]) {
case self::POST_WIDGET_HOMEPAGE:
$ret['description'] = __('Latest posts', 'my-domain');
$ret['showmore'] = $this->getProp($module, $props, 'showmore');
$ret['class'] = $this->getProp($module, $props, 'class');
break;
case self::POST_WIDGET_AUTHORPAGE:
$ret['description'] = __('Latest posts by the author', 'my-domain');
$ret['showmore'] = false;
$ret['class'] = 'text-center';
break;
}
return $ret;
}
function initModelProps($module, &$props)
{
switch ($module[1]) {
case self::POST_WIDGET_HOMEPAGE:
$this->setProp($module, $props, 'showmore', false);
$this->appendProp($module, $props, 'class', 'text-center');
break;
}
parent::initModelProps($module, $props);
}
// ...
}
Creating reusable components is accomplished by crafting abstract ModuleProcessor
classes defining placeholder functions that must be implemented by some instantiating class:
abstract class PostWidgetLayoutAbstractModuleProcessor extends AbstractModuleProcessor
{
function getSubmodules($module): array
{
$ret = [
$this->getContentModule($module),
];
if ($thumbnail_module = $this->getThumbnailModule($module))
{
$ret[] = $thumbnail_module;
}
if ($aftercontent_modules = $this->getAfterContentModules($module))
{
$ret = array_merge(
$ret,
$aftercontent_modules
);
}
return $ret;
}
abstract protected function getContentModule($module): array;
protected function getThumbnailModule($module): ?array
{
// Default value (overridable)
return [self::class, self::THUMBNAIL_LAYOUT];
}
protected function getAfterContentModules($module): array
{
return [];
}
function getImmutableConfiguration($module, &$props): array
{
return [
'description' => $this->getDescription(),
];
}
protected function getDescription($module): string
{
return '';
}
}
Custom ModuleProcessor
classes can then extend the abstract class, and define their own properties:
class PostLayoutModuleProcessor extends AbstractPostLayoutModuleProcessor {
const POST_CONTENT = 'post-content'
const POST_EXCERPT = 'post-excerpt'
const POST_THUMBNAIL_LARGE = 'post-thumbnail-large'
const POST_THUMBNAIL_MEDIUM = 'post-thumbnail-medium'
const POST_SHARE = 'post-share'
function getSubmodulesToProcess() {
return [
self::POST_CONTENT,
self::POST_EXCERPT,
self::POST_THUMBNAIL_LARGE,
self::POST_THUMBNAIL_MEDIUM,
self::POST_SHARE,
];
}
}
class PostWidgetLayoutModuleProcessor extends AbstractPostWidgetLayoutModuleProcessor
{
protected function getContentModule($module): ?array
{
switch ($module[1])
{
case self::POST_WIDGET_HOMEPAGE_LARGE:
return [
PostLayoutModuleProcessor::class,
PostLayoutModuleProcessor::POST_CONTENT
];
case self::POST_WIDGET_HOMEPAGE_MEDIUM:
case self::POST_WIDGET_HOMEPAGE_SMALL:
return [
PostLayoutModuleProcessor::class,
PostLayoutModuleProcessor::POST_EXCERPT
];
}
return parent::getContentModule($module);
}
protected function getThumbnailModule($module): ?array
{
switch ($module[1])
{
case self::POST_WIDGET_HOMEPAGE_LARGE:
return [
PostLayoutModuleProcessor::class,
PostLayoutModuleProcessor::POST_THUMBNAIL_LARGE
];
case self::POST_WIDGET_HOMEPAGE_MEDIUM:
return [
PostLayoutModuleProcessor::class,
PostLayoutModuleProcessor::POST_THUMBNAIL_MEDIUM
];
}
return parent::getThumbnailModule($module);
}
protected function getAfterContentModules($module): array
{
$ret = [];
switch ($module[1])
{
case self::POST_WIDGET_HOMEPAGE_LARGE:
$ret[] = [
PostLayoutModuleProcessor::class,
PostLayoutModuleProcessor::POST_SHARE
];
break
}
return $ret;
}
protected function getDescription($module): string
{
return __('These are my blog posts', 'my-domain');
}
}
4. How components are naturally suitable for GraphQL
The component model can naturally map a tree-shaped GraphQL query, making it an ideal architecture to implement a GraphQL server.
GraphQL by PoP has implemented the ModuleProcessor
classes needed to transform a GraphQL query to its corresponding component hierarchy, and resolve it using the PoP dataloading engine.
This is why and how this solution works.
Mapping client-side components to GraphQL queries
The GraphQL query can be represented using PoP’s component hierarchy, in which every object type represents a component, and every relationship field from an object type to another object type represents a component wrapping another component.
Let’s see how this is the case by using an example. Let’s say that we want to build the following “Featured director” widget:
Using Vue or React (or any other component-based library), we would first identify the components. In this case, we would have an outer component <FeaturedDirector>
(in red), which wraps a component <Film>
(in blue), which itself wraps a component <Actor>
(in green):
The pseudo-code looks like this:
<!-- Component: <FeaturedDirector> -->
<div>
Country: {country}
{foreach films as film}
<Film film={film} />
{/foreach}
</div>
<!-- Component: <Film> -->
<div>
Title: {title}
Pic: {thumbnail}
{foreach actors as actor}
<Actor actor={actor} />
{/foreach}
</div>
<!-- Component: <Actor> -->
<div>
Name: {name}
Photo: {avatar}
</div>
Then we identify what data is needed for each component. For <FeaturedDirector>
we need the name
, avatar
and country
. For <Film>
we need thumbnail
and title
. And for <Actor>
we need name
and avatar
:
And we build our GraphQL query to fetch the required data:
query {
featuredDirector {
name
country
avatar
films {
title
thumbnail
actors {
name
avatar
}
}
}
}
As it can be appreciated, there is a direct relationship between the shape of a component hierarchy and a GraphQL query. Indeed, a GraphQL query can even be considered to be the representation of a component hierarchy.
Resolving the GraphQL query using server-side components
Since a GraphQL query has the same shape of a component hierarchy, PoP transforms the query to its equivalent component hierarchy, resolves it using its approach to fetch data for the components, and finally recreates the shape of the query to send the data in the response.
Let’s see how this works.
In order to process the data, PoP converts the GraphQL types into components: <FeaturedDirector>
=> Director
, <Film>
=> Film
, <Actor>
=> Actor
, and using the order in which they appear in the query, PoP creates a virtual component hierarchy with the same elements: root component Director
, which wraps component Film
, which wraps component Actor
.
From now on, talking about GraphQL types or PoP components makes no difference.
To load their data, PoP deals with them in “iterations”, retrieving the object data for each type on its own iteration, like this:
PoP’s dataloading engine implements the following pseudo-algorithm to load the data:
Preparation:
- Have an empty queue store the list of IDs from the objects that must be fetched from the database, organized by type (each entry will be:
[type => list of IDs]
) - Retrieve the ID of the featured director object, and place it on the queue under type
Director
Loop until there are no more entries on the queue:
- Get the first entry from the queue: the type and list of IDs (eg:
Director
and[2]
), and remove this entry off the queue - Execute a single query against the database to retrieve all objects for that type with those IDs
- If the type has relational fields (eg: type
Director
has relational fieldfilms
of typeFilm
), then collect all the IDs from these fields from all the objects retrieved in the current iteration (eg: all IDs in fieldfilms
from all objects of typeDirector
), and place these IDs on the queue under the corresponding type (eg: IDs[3, 8]
under typeFilm
).
By the end of the iterations, we will have loaded all the object data for all types, like this:
Please notice how all IDs for a type are collected, until the type is processed in the queue. If, for instance, we add a relational field preferredActors
to type Director
, these IDs would be added to the queue under type Actor
, and it would be processed together with the IDs from field actors
from type Film
:
However, if a type has been processed and then we need to load more data from that type, then it’s a new iteration on that type. For instance, adding a relational field preferredDirector
to the Author
type, will make the type Director
be added to the queue once again:
Pay attention also that here we can use a caching mechanism: on the second iteration for type Director
, the object with ID 2 is not retrieved again, since it was already retrieved on the first iteration so it can be taken from the cache.
Now that we have fetched all the object data, we need to shape it into the expected response, mirroring the GraphQL query. As it is currently, data is organized as in a relational database:
Table for type Director
:
ID | name | country | avatar | films |
---|---|---|---|---|
2 | George Lucas | USA | george-lucas.jpg | [3, 8] |
Table for type Film
:
ID | title | thumbnail | actors |
---|---|---|---|
3 | The Phantom Menace | episode-1.jpg | [4, 6] |
8 | Attack of the Clones | episode-2.jpg | [6, 7] |
Table for type Actor
:
ID | name | avatar |
---|---|---|
4 | Ewan McGregor | mcgregor.jpg |
6 | Nathalie Portman | portman.jpg |
7 | Hayden Christensen | christensen.jpg |
At this stage, PoP has all the data organized as tables, and hows how every type relates to each other (i.e. Director
references Film
through field films
, Film
references Actor
through field actors
). Then, by iterating the component hierarchy from the root, navigating the relationships, and retrieving the corresponding objects from the relational tables, PoP will produce the tree shape from the GraphQL query:
Finally, printing the data into the output produces the response with the same shape of the GraphQL query:
{
data: {
featuredDirector: {
name: "George Lucas",
country: "USA",
avatar: "george-lucas.jpg",
films: [
{
title: "Star Wars: Episode I",
thumbnail: "episode-1.jpg",
actors: [
{
name: "Ewan McGregor",
avatar: "mcgregor.jpg",
},
{
name: "Natalie Portman",
avatar: "portman.jpg",
}
]
},
{
title: "Star Wars: Episode II",
thumbnail: "episode-2.jpg",
actors: [
{
name: "Natalie Portman",
avatar: "portman.jpg",
},
{
name: "Hayden Christensen",
avatar: "christensen.jpg",
}
]
}
]
}
}
}
5. Analysis of the performance of using components to resolve a GraphQL query
Let’s analyze the big O notation of the dataloading algorithm to understand how the number of queries executed against the database grows as the number of inputs grows, to make sure that this solution is performant.
PoP’s dataloading engine loads data in iterations corresponding to each type. By the time it starts an iteration, it will already have the list of all the IDs for all the objects to fetch, hence it can execute 1 single query to fetch all the data for the corresponding objects. It then follows that the number of queries to the database will grow linearly with the number of types involved in the query. In other words, the time complexity is O(n)
, where n
is the number of types in the query (however, if a type is iterated more than once, then it must be added more than once to n
).
This solution is very performant, certainly more than the exponential complexity expected from dealing with graphs, or logarithmic complexity expected from dealing with trees.
Conclusion
A GraphQL server does not need use graphs to represent data. In this article we explored the architecture described by PoP, and implemented by GraphQL by PoP, which is based on components and loads data in iterations according to type.
Through this approach, the server can resolve GraphQL queries with linear time complexity, which is a better outcome than the exponential or logarithmic time complexity expected from using graphs or trees.
Leave a Reply