Essential Magento 2 Partial Reindexing

Marco de Vries

July 1, 2018

5 min lezen

Magento 2 goes through great lengths to provide speed for the webshop. It has over 30 index tables to speed up the shopping experience for the user, as well as a flat product and category table per store view, if flat tables are enabled. When database entities like products, categories and customers are modified, these indexes need to be updated. Magento has two ways of doing this called “Update on Save” and “Update by Schedule”. I will use the term “entities” here for products, categories and customers alike.

Een Nederlandse vertaling van dit artikel, vind je hier.

Update on Save is easiest to explain. Whenever an entity is added, changed, or removed via the admin panel or via a REST request, all necessary indexes are updated to accommodate this change. Quite a number of extra SQL queries are executed in the same request that inserts, updates or deletes the entity. The index is not completely rebuild, it is just modified a little. Specific parts of caches are cleared immediately to ensure that the change is visible on the webshop.

Update by Schedule does not update the indexes in the request. In fact, Magento just writes the queries to modify the entity, and leaves it at that. Yet, the index is updated and specific caches are cleared efficiently in less then a minute.

In the rest of this blog article I will show you how this Update by Schedule magic works.

How does it work?

In short, Magento 2 creates triggers for each “Update by Schedule” index. These triggers fire on any INSERT, UPDATE and DELETE on any of the entity’s tables. On firing, they create a new entry with the entity id in a specific changelog table. When the cron job runs, it compares the version_id of its mview_state table with the version_ids of the changelog table. The index is updated for all entities that are marked as changed. Caches for specific entities are invalidated (cleared) as well.

I will now go into this in more detail.

Magento 2 goes through great lengths to provide speed for the webshop. It has over 30 index tables to speed up the shopping experience for the user, as well as a flat product and category table per store view, if flat tables are enabled. When database entities like products, categories and customers are modified, these indexes need to be updated. Magento has two ways of doing this called “Update on Save” and “Update by Schedule”. I will use the term “entities” here for products, categories and customers alike.

Een Nederlandse vertaling van dit artikel, vind je hier.

Update on Save is easiest to explain. Whenever an entity is added, changed, or removed via the admin panel or via a REST request, all necessary indexes are updated to accommodate this change. Quite a number of extra SQL queries are executed in the same request that inserts, updates or deletes the entity. The index is not completely rebuild, it is just modified a little. Specific parts of caches are cleared immediately to ensure that the change is visible on the webshop.

Update by Schedule does not update the indexes in the request. In fact, Magento just writes the queries to modify the entity, and leaves it at that. Yet, the index is updated and specific caches are cleared efficiently in less then a minute.

In the rest of this blog article I will show you how this Update by Schedule magic works.

How does it work?

In short, Magento 2 creates triggers for each “Update by Schedule” index. These triggers fire on any INSERT, UPDATE and DELETE on any of the entity’s tables. On firing, they create a new entry with the entity id in a specific changelog table. When the cron job runs, it compares the version_id of its mview_state table with the version_ids of the changelog table. The index is updated for all entities that are marked as changed. Caches for specific entities are invalidated (cleared) as well.

I will now go into this in more detail.

Index Management and Materialized Views

The indexes and flat tables are known to Magento 2 internally as “mviews”.

A “view”, in database terminology, is a virtual table that reflects the results of a SELECT query. The view is live: it always presents the actual state of the data.

A “materialized view” is a virtual table that contains the stored result of the SELECT query. Its contents is not live, but must be updated regularly. A materialized view is much faster than a normal view because it is much simpler to evaluate.

MySQL does not contain actual materialized views, but a normal table can be used as such. This is what Magento does. Magento’s “indexes” and “flat tables” are such materialized views.

In the backend of Magento you can select which type of index update action Magento should use. You can specify this per Index.

As an example, I will set only the index “Product Flat Data” (catalog_product_flat) to “Update by Schedule” and the rest to “Update on Save”.

The effect on the database of this admin action is reflected in the table mview_state:

mysql> select * from mview_state;
+----------+---------------------------+----------+--------+---------------------+------------+
| state_id | view_id                   | mode     | status | updated             | version_id |
+----------+---------------------------+----------+--------+---------------------+------------+
|        1 | catalog_product_category  | disabled | idle   | 2018-01-10 15:04:47 |       NULL |
|        2 | catalog_category_product  | disabled | idle   | 2018-01-10 15:05:16 |       NULL |
|        3 | design_config_dummy       | disabled | idle   | 2018-01-10 15:04:47 |       NULL |
|        4 | customer_dummy            | disabled | idle   | 2018-01-10 15:04:47 |       NULL |
|        5 | catalog_product_flat      | enabled  | idle   | 2018-01-10 15:05:00 |       NULL |
|        6 | catalog_product_price     | disabled | idle   | 2018-01-10 14:41:26 |       NULL |
|        7 | catalog_product_attribute | disabled | idle   | 2018-01-10 14:41:26 |       NULL |
|        8 | cataloginventory_stock    | disabled | idle   | 2018-01-10 14:41:27 |       NULL |
|        9 | catalogrule_rule          | disabled | idle   | 2018-01-10 14:41:27 |       NULL |
|       10 | catalogrule_product       | disabled | idle   | 2018-01-10 14:41:28 |       NULL |
|       11 | catalogsearch_fulltext    | disabled | idle   | 2018-01-10 14:41:29 |       NULL |
+----------+---------------------------+----------+--------+---------------------+------------+

As you can see, only the index “catalog_product_flat” is enabled. Notice the column version_id. We will come back to this shortly.

Triggers

A database trigger is an event handler that responds to changes in a database table.

When an index is set to “Update on Schedule”, Magento 2 creates a series of triggers for it. For catalog_product_flat alone, 30 triggers are created. I will show only three of them:

mysql> show triggers \G
 
*************************** 1. row ***************************
             Trigger: trg_catalog_product_entity_after_insert
               Event: INSERT
               Table: catalog_product_entity
           Statement: BEGIN
INSERT IGNORE INTO `catalog_product_flat_cl` (`entity_id`) VALUES (NEW.`entity_id`);
END
              Timing: AFTER
             Created: 2018-01-10 16:04:59.54
            sql_mode: 
             Definer: some-magento-user@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: latin1_swedish_ci
*************************** 2. row ***************************
             Trigger: trg_catalog_product_entity_after_update
               Event: UPDATE
               Table: catalog_product_entity
           Statement: BEGIN
INSERT IGNORE INTO `catalog_product_flat_cl` (`entity_id`) VALUES (NEW.`entity_id`);
END
              Timing: AFTER
             Created: 2018-01-10 16:04:59.56
            sql_mode: 
             Definer: some-magento-user@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: latin1_swedish_ci
*************************** 3. row ***************************
             Trigger: trg_catalog_product_entity_after_delete
               Event: DELETE
               Table: catalog_product_entity
           Statement: BEGIN
INSERT IGNORE INTO `catalog_product_flat_cl` (`entity_id`) VALUES (OLD.`entity_id`);
END
              Timing: AFTER
             Created: 2018-01-10 16:04:59.57
            sql_mode: 
             Definer: some-magento-user@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: latin1_swedish_ci

The 3 triggers you see here all respond to changes in the table “catalog_product_entity” (see Table). The events are INSERT, UPDATE, and DELETE. Since there are 10 tables that affect the contents of the materialized view catalog_product_flat, it needs 3 x 10 = 30 triggers to cover all cases.

The event handlers for the “catalog_product_flat” index all write a single record to the changelog table “catalog_product_flat_cl”:

CREATE TABLE `catalog_product_flat_cl` (
  `version_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Version ID',
  `entity_id` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'Entity ID',
  PRIMARY KEY (`version_id`)
) ENGINE=InnoDB AUTO_INCREMENT=118 DEFAULT CHARSET=utf8 COMMENT='catalog_product_flat_cl'

Each index has its own changelog (cl) table. The table just contains the ids of entities that are “dirty”, and whose index entry needs to be added, updated or removed.

Entity Change

When an entity is changed, via the backend, via a REST call, or even via a direct SQL statement in phpMyAdmin, some of the triggers fire.

The triggers fire abundantly, so a single change on an entity creates multiple entries in the changelog table:

mysql> select * from catalog_product_flat_cl limit 10;
+------------+-----------+
| version_id | entity_id |
+------------+-----------+
|          1 |     22601 |
|          2 |     22601 |
|          3 |     22601 |
|          4 |     22601 |
|          5 |     22601 |
|          6 |     22601 |
|          7 |     22601 |
|          8 |     22601 |
|          9 |     22601 |
|         10 |     22601 |
+------------+-----------+
10 rows in set (0,00 sec)

The version_id column is just an auto-increment field.

Partial Reindexing Cron job

In a regular Magento installation, a cron job runs every minute. This cron job performs many tasks. One of these is “indexer_update_all_views”. This task checks for each index if the version_id of mview_state matches with the version_ids of its changelog table.

The entity_ids of changelog entries with version_id larger than that of the version_id in mview_state are collected, and these entities are integrated in the index table.

The cron job also invalidates all necessary caches. This is done in a very precise way. Specific page and category caches are cleared. Not the complete full page cache.

After that, the version_id mview_state is raised to the largest version_id in the changelog table.

Once an hour another cron job, “indexer_clean_all_changelogs” removes all changelog entries with version_id below that of the one in mview_state.

Conclusion

Magento 2 essentially allows you to change products from the outside and the inside. They can not only be modified via the user interface (outside), but via direct database queries as well (inside). Update by Schedule updates the necessary indexes for you.

If you want to modify catalog products in a way that is not possible via Magento’s admin panel, or that can be done faster via a direct SQL query with phpMyAdmin or with a script, this is still possible, even on a production webshop. Of course you need to have a good knowledge of Magento’s MySQL tables. But the partial reindexing techniques from Update by Schedule ensure that all necessary indexes and flat tables are updated, and even specific caches are invalidated.

References

- https://magento.stackexchange.com/questions/117030/what-is-mview-in-magento2
- http://devdocs.magento.com/guides/v2.0/extension-dev-guide/indexing.html

Hebben wij jouw interesse gewekt?

Laten we er eens over kletsen, onder het genot van een goede kop koffie!

neem contact op