Essential Magento 2 Partial Reindexing

Magento 2 goes through great lengths to provide speed for the webshop. It has over 30 index tables to speed up the shopping experience for the user, as well as a flat product and category table per store view, if flat tables are enabled.

Nederlandse vertaling van dit artikel.

When database entities like products, categories and customers are modified, these indexes need to be updated. Magento has two ways of doing this called “Update on Save” and “Update by Schedule”.

I will use the term “entities” here for products, categories and customers alike.

Update on Save is easiest to explain. Whenever an entity is added, changed, or removed via the admin panel or via a REST request, all necessary indexes are updated to accommodate this change. Quite a number of extra SQL queries are executed in the same request that inserts, updates or deletes the entity. The index is not completely rebuild, it is just modified a little. Specific parts of caches are cleared immediately to ensure that the change is visible on the webshop.

Update by Schedule does not update the indexes in the request. In fact, Magento just writes the queries to modify the entity, and leaves it at that. Yet, the index is updated and specific caches are cleared efficiently in less then a minute.

In the rest of this blog article I will show you how this Update by Schedule magic works.

How does it work?

In short, Magento 2 creates triggers for each “Update by Schedule” index. These triggers fire on any INSERT, UPDATE and DELETE on any of the entity’s tables. On firing, they create a new entry with the entity id in a specific changelog table. When the cron job runs, it compares the version_id of its mview_state table with the version_ids of the changelog table. The index is updated for all entities that are marked as changed. Caches for specific entities are invalidated (cleared) as well.

I will now go into this in more detail.

Full Reindex and Partial Reindexing

Some processes (both core and extension) explicitly invalidate a complete index. The user may see this in the backend:

The validity of all indexes is stored in the table indexer_state:

Magento’s cron job “indexer_reindex_all_invalid” rebuilds the invalid indexes.

If cron jobs are not active, you can use a CLI command to rebuild all indexes, or a specific index:

This command rebuilds the indexes from scratch. For webshops of some size, this is a non-trivial operation. For a webshop with thousands of products and many categories and store views, it easily takes an hour to complete.

For a shop in production this is undesirable. Fortunately, the instances where a full index needs to be rebuild are rare. Magento 2 can handle most entity modifications by incrementally updating the index. This process is called partial reindexing.

Index Management and Materialized Views

The indexes and flat tables are known to Magento 2 internally as “mviews”.

A “view”, in database terminology, is a virtual table that reflects the results of a SELECT query. The view is live: it always presents the actual state of the data.

A “materialized view” is a virtual table that contains the stored result of the SELECT query. Its contents is not live, but must be updated regularly. A materialized view is much faster than a normal view because it is much simpler to evaluate.

MySQL does not contain actual materialized views, but a normal table can be used as such. This is what Magento does. Magento’s “indexes” and “flat tables” are such materialized views.

In the backend of Magento you can select which type of index update action Magento should use. You can specify this per Index.

As an example, I will set only the index “Product Flat Data” (catalog_product_flat) to “Update by Schedule” and the rest to “Update on Save”.

The effect on the database of this admin action is reflected in the table mview_state:

As you can see, only the index “catalog_product_flat” is enabled. Notice the column version_id. We will come back to this shortly.

Triggers

A database trigger is an event handler that responds to changes in a database table.

When an index is set to “Update on Schedule”, Magento 2 creates a series of triggers for it. For catalog_product_flat alone, 30 triggers are created. I will show only three of them:

The 3 triggers you see here all respond to changes in the table “catalog_product_entity” (see Table). The events are INSERT, UPDATE, and DELETE. Since there are 10 tables that affect the contents of the materialized view catalog_product_flat, it needs 3 x 10 = 30 triggers to cover all cases.

The event handlers for the “catalog_product_flat” index all write a single record to the changelog table “catalog_product_flat_cl”:

Each index has its own changelog (cl) table. The table just contains the ids of entities that are “dirty”, and whose index entry needs to be added, updated or removed.

Entity Change

When an entity is changed, via the backend, via a REST call, or even via a direct SQL statement in phpMyAdmin, some of the triggers fire.

The triggers fire abundantly, so a single change on an entity creates multiple entries in the changelog table:

The version_id column is just an auto-increment field.

Partial Reindexing Cron job

In a regular Magento installation, a cron job runs every minute. This cron job performs many tasks. One of these is “indexer_update_all_views”. This task checks for each index if the version_id of mview_state matches with the version_ids of its changelog table.

The entity_ids of changelog entries with version_id larger than that of the version_id in mview_state are collected, and these entities are integrated in the index table.

The cron job also invalidates all necessary caches. This is done in a very precise way. Specific page and category caches are cleared. Not the complete full page cache.

After that, the version_id mview_state is raised to the largest version_id in the changelog table.

Once an hour another cron job, “indexer_clean_all_changelogs” removes all changelog entries with version_id below that of the one in mview_state.

Conclusion

Magento 2 essentially allows you to change products from the outside and the inside. They can not only be modified via the user interface (outside), but via direct database queries as well (inside). Update by Schedule updates the necessary indexes for you.

If you want to modify catalog products in a way that is not possible via Magento’s admin panel, or that can be done faster via a direct SQL query with phpMyAdmin or with a script, this is still possible, even on a production webshop. Of course you need to have a good knowledge of Magento’s MySQL tables. But the partial reindexing techniques from Update by Schedule ensure that all necessary indexes and flat tables are updated, and even specific caches are invalidated.

References

Als developer bij Bigbridge ontwikkel ik robuuste e-commerce oplossingen op basis van Magento 2 en Shopware. Benieuwd wat wij voor je kunnen betekenen? Neem dan contact met ons op.

Reacties:

  1. Patrick at 10:39

    Thanks, Nick!

  2. Nick at 17:31

    Fantastic explanation of Magento 2 indexers!

Geef een antwoord

Het e-mailadres wordt niet gepubliceerd.