tom__bo’s Blog

MySQL!! MySQL!! @tom__bo

MySQL 8.0.18のHASH JOINを試した

8.0.18がリリースされたのでHash Joinを試してみました。

dockerには8.0.18 imageはなかったのでcentos7にinstallして実験

先にまとめ

  • HASH JOINは等価条件のJOINでかつjoinするカラムにindexがない場合に採用される(ドキュメント1行目)
  • HASH JOINしたかどうかはEXPLAIN ANALYZEもしくはEXPLAIN FORMAT=TREE:で確認する
  • JOINアルゴリズムの選択をコスト計算で行っているかは不明
    • optimizer_traceみても単純に判断できそうな出力はない

install

yum localinstall -y https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
yum install mysql-server
systemctl start mysqld

yum install epel-release
yum install sysbench

MySQLの準備

cat /var/log/mysqld.log | grep generated
# passwd

mysql_secure_installation
# configure...

mysql -u root -p
m> create database sysbench;

サンプルのテーブルを作成

sysbenchでテーブル2つ(sbtest1, sbtest2)を作成

sysbench /usr/share/sysbench/oltp_write_only.lua \
 --db-driver=mysql \
 --table-size=10000 \
 --tables=2 \
 --mysql-host=127.0.0.1 \
 --mysql-port=3306 \
 --mysql-user=sysbench \
 --mysql-password=Sysbench8! \
 --mysql-db=sysbench \
 prepare

中身

mysql> show create table sbtest1\G
*************************** 1. row ***************************
       Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=10001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> select * from sbtest1 limit 10;
+----+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k    | c                                                                                                                       | pad                                                         |
+----+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
|  1 | 4993 | 83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330 | 67847967377-48000963322-62604785301-91415491898-96926520291 |
|  2 | 5020 | 38014276128-25250245652-62722561801-27818678124-24890218270-18312424692-92565570600-36243745486-21199862476-38576014630 | 23183251411-36241541236-31706421314-92007079971-60663066966 |
|  3 | 5044 | 33973744704-80540844748-72700647445-87330233173-87249600839-07301471459-22846777364-58808996678-64607045326-48799346817 | 38615512647-91458489257-90681424432-95014675832-60408598704 |
|  4 | 5021 | 37002370280-58842166667-00026392672-77506866252-09658311935-56926959306-83464667271-94685475868-28264244556-14550208498 | 63947013338-98809887124-59806726763-79831528812-45582457048 |
|  5 | 4999 | 44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783 | 34551750492-67990399350-81179284955-79299808058-21257255869 |
|  6 | 5006 | 37216201353-39109531021-11197415756-87798784755-02463049870-83329763120-57551308766-61100580113-80090253566-30971527105 | 05161542529-00085727016-35134775864-52531204064-98744439797 |
|  7 | 5014 | 33071042495-29920376648-91343430102-79082003121-73317691963-02846712788-88069761578-14885283975-44409837760-90760298045 | 91798303270-64988107984-08161247972-12116454627-22996445111 |
|  8 | 5000 | 73754818686-04889373966-18668178968-56957589012-31352882173-91882653509-59577900152-88962682169-52981807259-62646890059 | 76460662325-41613089656-42706083314-81833284991-17063140920 |
|  9 | 5273 | 26482547570-00155460224-12388481921-23289186371-78242522654-77998886134-73270876420-50821093220-31442690639-11588920653 | 30508501104-50823269125-88107014550-70202920684-95842308929 |
| 10 | 5035 | 05677017559-47107518969-97509137401-28934334557-14497052050-61906823704-44077628507-24840441785-05187301456-27797851637 | 29489382504-13697582598-09964978366-26554639515-36136545002 |
+----+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
10 rows in set (0.00 sec)

やってみよう

mysql> explain select * from sbtest1 s1 inner join sbtest2 s2 on s1.k = s2.k limit 10;
+----+-------------+-------+------------+------+---------------+------+---------+---------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref           | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+---------------+------+----------+-------+
|  1 | SIMPLE      | s1    | NULL       | ALL  | k_1           | NULL | NULL    | NULL          | 9680 |   100.00 | NULL  |
|  1 | SIMPLE      | s2    | NULL       | ref  | k_2           | k_2  | 4       | sysbench.s1.k |    5 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+---------------+------+----------+-------+
2 rows in set, 1 warning (0.00 sec)

ああそうかexplainではでないのか。 と思ったけどsbtest2(s2)のrows 5ってなに?select ... limit 10なんだけど。

-- why rows 5??
mysql> select count(*) from sbtest1 where k = 4993;
+----------+
| count(*) |
+----------+
|      104 |
+----------+
1 row in set (0.01 sec)

確認したらkは結構かぶってた。

気を取り直してexplain analyze!!

mysql> explain analyze select * from sbtest1 s1 inner join sbtest2 s2 on s1.k = s2.k limit 10\G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 10 row(s)  (actual time=0.159..0.523 rows=10 loops=1)
    -> Nested loop inner join  (cost=20455.49 rows=55564) (actual time=0.128..0.349 rows=10 loops=1)
        -> Table scan on s1  (cost=1008.25 rows=9680) (actual time=0.035..0.035 rows=1 loops=1)
        -> Index lookup on s2 using k_2 (k=s1.k)  (cost=1.44 rows=6) (actual time=0.046..0.118 rows=10 loops=1)

あれ??? 確かにkのカーディナリティは低いかもしれないけどoptimizer traceでなんかわかるのかな? optimizer trace見る

SET optimizer_trace_max_mem_size = 1048576;
SET optimizer_trace="enabled=on"
explain select * from sbtest1 s1 inner join sbtest2 s2 on s1.k = s2.k limit 10\G
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE
-- (長いのでこの記事の最後に)

とりあえずoptimizer traceにjoinアルゴリズムのコスト計算経過がわかるものはなさそうでした。

一旦落ち着いてドキュメントを読む

https://dev.mysql.com/doc/refman/8.0/en/hash-joins.html

冒頭部分でequi-join(等価条件でのjoin)且つindexがない場合と書いてあった。。。
MySQL employs a hash join for any query for which each join has an equi-join condition and uses no indexes,

まじか。 ならindexのないカラムcでjoin

cでjoin

mysql> explain analyze select * from sbtest1 s1 inner join sbtest2 s2 on s1.c = s2.c limit 10\G
*************************** 1. row ***************************
EXPLAIN: -> Limit: 10 row(s)  (actual time=3218.399..3218.399 rows=0 loops=1)
    -> Inner hash join (s2.c = s1.c)  (cost=9620178.51 rows=9618048) (actual time=3218.357..3218.357 rows=0 loops=1)
        -> Table scan on s2  (cost=0.13 rows=9936) (actual time=0.037..62.317 rows=10000 loops=24)
        -> Hash
            -> Table scan on s1  (cost=1008.25 rows=9680) (actual time=0.039..65.958 rows=10000 loops=1)

1 row in set (3.22 sec)

おおおー! Inner hash join 出ました。

片方のtable(sbtest1)にindex作成

indexあったらどうなのか。 (ちなみにこのクエリをexplainせずに実行すると結果は0件です)

mysql> alter table sbtest1 add index c_1(c);
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain analyze select * from sbtest1 s1 inner join sbtest2 s2 on s1.c = s2.c limit 10\G                                  *************************** 1. row ***************************
EXPLAIN: -> Limit: 10 row(s)  (actual time=409.276..409.276 rows=0 loops=1)
    -> Nested loop inner join  (cost=4511.45 rows=9936) (actual time=409.239..409.239 rows=0 loops=1)
        -> Table scan on s2  (cost=1033.85 rows=9936) (actual time=0.070..66.048 rows=10000 loops=1)
        -> Index lookup on s1 using c_1 (c=s2.c)  (cost=0.25 rows=1) (actual time=0.016..0.016 rows=0 loops=10000)

1 row in set (0.41 sec)

Nested loopになりました。 このときもoptimizer traceではhash joinしたような文字列は見つけられませんでした。

なるほど。

まとめ

記事冒頭に書いたとおり

以下付録2つ。optimizer traceからjoin algorithmの判断部分がわかるよ?っていう場合以外は見る必要はなさそう。

(付録1) kでjoinしてnested loop担ったときのoptimizer trace結果

mysql> SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE\G
*************************** 1. row ***************************
                            QUERY: explain select * from sbtest1 s1 inner join sbtest2 s2 on s1.k = s2.k limit 10
                            TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `s1`.`id` AS `id`,`s1`.`k` AS `k`,`s1`.`c` AS `c`,`s1`.`pad` AS `pad`,`s2`.`id` AS `id`,`s2`.`k` AS `k`,`s2`.`c` AS `c`,`s2`.`pad` AS `pad` from (`sbtest1` `s1` join `sbtest2` `s2` on((`s1`.`k` = `s2`.`k`))) limit 10"
          },
          {
            "transformations_to_nested_joins": {
              "transformations": [
                "JOIN_condition_to_WHERE",
                "parenthesis_removal"
              ],
              "expanded_query": "/* select#1 */ select `s1`.`id` AS `id`,`s1`.`k` AS `k`,`s1`.`c` AS `c`,`s1`.`pad` AS `pad`,`s2`.`id` AS `id`,`s2`.`k` AS `k`,`s2`.`c` AS `c`,`s2`.`pad` AS `pad` from `sbtest1` `s1` join `sbtest2` `s2` where (`s1`.`k` = `s2`.`k`) limit 10"
            }
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`s1`.`k` = `s2`.`k`)",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "multiple equal(`s1`.`k`, `s2`.`k`)"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "multiple equal(`s1`.`k`, `s2`.`k`)"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "multiple equal(`s1`.`k`, `s2`.`k`)"
                }
              ]
            }
          },
          {
            "substitute_generated_columns": {
            }
          },
          {
            "table_dependencies": [
              {
                "table": "`sbtest1` `s1`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ]
              },
              {
                "table": "`sbtest2` `s2`",
                "row_may_be_null": false,
                "map_bit": 1,
                "depends_on_map_bits": [
                ]
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [
              {
                "table": "`sbtest1` `s1`",
                "field": "k",
                "equals": "`s2`.`k`",
                "null_rejecting": false
              },
              {
                "table": "`sbtest2` `s2`",
                "field": "k",
                "equals": "`s1`.`k`",
                "null_rejecting": false
              }
            ]
          },
          {
            "rows_estimation": [
              {
                "table": "`sbtest1` `s1`",
                "table_scan": {
                  "rows": 9680,
                  "cost": 40.25
                }
              },
              {
                "table": "`sbtest2` `s2`",
                "table_scan": {
                  "rows": 9936,
                  "cost": 40.25
                }
              }
            ]
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`sbtest1` `s1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "ref",
                      "index": "k_1",
                      "usable": false,
                      "chosen": false
                    },
                    {
                      "rows_to_scan": 9680,
                      "filtering_effect": [
                      ],
                      "final_filtering_effect": 1,
                      "access_type": "scan",
                      "resulting_rows": 9680,
                      "cost": 1008.2,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 9680,
                "cost_for_plan": 1008.2,
                "rest_of_plan": [
                  {
                    "plan_prefix": [
                      "`sbtest1` `s1`"
                    ],
                    "table": "`sbtest2` `s2`",
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "access_type": "ref",
                          "index": "k_2",
                          "rows": 5.74,
                          "cost": 19447,
                          "chosen": true
                        },
                        {
                          "rows_to_scan": 9936,
                          "filtering_effect": [
                          ],
                          "final_filtering_effect": 1,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 27,
                          "resulting_rows": 9936,
                          "cost": 9.62e6,
                          "chosen": false
                        }
                      ]
                    },
                    "condition_filtering_pct": 100,
                    "rows_for_plan": 55564,
                    "cost_for_plan": 20455,
                    "chosen": true
                  }
                ]
              },
              {
                "plan_prefix": [
                ],
                "table": "`sbtest2` `s2`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "access_type": "ref",
                      "index": "k_2",
                      "usable": false,
                      "chosen": false
                    },
                    {
                      "rows_to_scan": 9936,
                      "filtering_effect": [
                      ],
                      "final_filtering_effect": 1,
                      "access_type": "scan",
                      "resulting_rows": 9936,
                      "cost": 1033.8,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 9936,
                "cost_for_plan": 1033.8,
                "rest_of_plan": [
                  {
                    "plan_prefix": [
                      "`sbtest2` `s2`"
                    ],
                    "table": "`sbtest1` `s1`",
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "access_type": "ref",
                          "index": "k_1",
                          "rows": 5.8343,
                          "cost": 20289,
                          "chosen": true
                        },
                        {
                          "rows_to_scan": 9680,
                          "filtering_effect": [
                          ],
                          "final_filtering_effect": 1,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 28,
                          "resulting_rows": 9680,
                          "cost": 9.62e6,
                          "chosen": false
                        }
                      ]
                    },
                    "condition_filtering_pct": 100,
                    "rows_for_plan": 57970,
                    "cost_for_plan": 21323,
                    "pruned_by_cost": true
                  }
                ]
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "(`s2`.`k` = `s1`.`k`)",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`sbtest1` `s1`",
                  "attached": null
                },
                {
                  "table": "`sbtest2` `s2`",
                  "attached": "(`s2`.`k` = `s1`.`k`)"
                }
              ]
            }
          },
          {
            "finalizing_table_conditions": [
              {
                "table": "`sbtest2` `s2`",
                "original_table_condition": "(`s2`.`k` = `s1`.`k`)",
                "final_table_condition   ": null
              }
            ]
          },
          {
            "refine_plan": [
              {
                "table": "`sbtest1` `s1`"
              },
              {
                "table": "`sbtest2` `s2`"
              }
            ]
          }
        ]
      }
    },
    {
      "join_explain": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
          INSUFFICIENT_PRIVILEGES: 0
1 row in set (0.00 sec)

(付録2) cでindexなしでjoinしてhash joinした場合のoptimizer_trace結果

mysql> SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE
    -> \G
*************************** 1. row ***************************
                            QUERY: explain select * from sbtest1 s1 inner join sbtest2 s2 on s1.c = s2.c limit 10
                            TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `s1`.`id` AS `id`,`s1`.`k` AS `k`,`s1`.`c` AS `c`,`s1`.`pad` AS `pad`,`s2`.`id` AS `id`,`s2`.`k` AS `k`,`s2`.`c` AS `c`,`s2`.`pad` AS `pad` from (`sbtest1` `s1` join `sbtest2` `s2` on((`s1`.`c` = `s2`.`c`))) limit 10"
          },
          {
            "transformations_to_nested_joins": {
              "transformations": [
                "JOIN_condition_to_WHERE",
                "parenthesis_removal"
              ],
              "expanded_query": "/* select#1 */ select `s1`.`id` AS `id`,`s1`.`k` AS `k`,`s1`.`c` AS `c`,`s1`.`pad` AS `pad`,`s2`.`id` AS `id`,`s2`.`k` AS `k`,`s2`.`c` AS `c`,`s2`.`pad` AS `pad` from `sbtest1` `s1` join `sbtest2` `s2` where (`s1`.`c` = `s2`.`c`) limit 10"
            }
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`s1`.`c` = `s2`.`c`)",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "multiple equal(`s1`.`c`, `s2`.`c`)"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "multiple equal(`s1`.`c`, `s2`.`c`)"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "multiple equal(`s1`.`c`, `s2`.`c`)"
                }
              ]
            }
          },
          {
            "substitute_generated_columns": {
            }
          },
          {
            "table_dependencies": [
              {
                "table": "`sbtest1` `s1`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ]
              },
              {
                "table": "`sbtest2` `s2`",
                "row_may_be_null": false,
                "map_bit": 1,
                "depends_on_map_bits": [
                ]
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [
            ]
          },
          {
            "rows_estimation": [
              {
                "table": "`sbtest1` `s1`",
                "table_scan": {
                  "rows": 9680,
                  "cost": 40.25
                }
              },
              {
                "table": "`sbtest2` `s2`",
                "table_scan": {
                  "rows": 9936,
                  "cost": 40.25
                }
              }
            ]
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`sbtest1` `s1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 9680,
                      "filtering_effect": [
                      ],
                      "final_filtering_effect": 1,
                      "access_type": "scan",
                      "resulting_rows": 9680,
                      "cost": 1008.2,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 9680,
                "cost_for_plan": 1008.2,
                "rest_of_plan": [
                  {
                    "plan_prefix": [
                      "`sbtest1` `s1`"
                    ],
                    "table": "`sbtest2` `s2`",
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "rows_to_scan": 9936,
                          "filtering_effect": [
                          ],
                          "final_filtering_effect": 1,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 27,
                          "resulting_rows": 9936,
                          "cost": 9.62e6,
                          "chosen": true
                        }
                      ]
                    },
                    "condition_filtering_pct": 10,
                    "rows_for_plan": 9.62e6,
                    "cost_for_plan": 9.62e6,
                    "chosen": true
                  }
                ]
              },
              {
                "plan_prefix": [
                ],
                "table": "`sbtest2` `s2`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 9936,
                      "filtering_effect": [
                      ],
                      "final_filtering_effect": 1,
                      "access_type": "scan",
                      "resulting_rows": 9936,
                      "cost": 1033.8,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 9936,
                "cost_for_plan": 1033.8,
                "pruned_by_heuristic": true
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "(`s2`.`c` = `s1`.`c`)",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`sbtest1` `s1`",
                  "attached": null
                },
                {
                  "table": "`sbtest2` `s2`",
                  "attached": "(`s2`.`c` = `s1`.`c`)"
                }
              ]
            }
          },
          {
            "finalizing_table_conditions": [
              {
                "table": "`sbtest2` `s2`",
                "original_table_condition": "(`s2`.`c` = `s1`.`c`)",
                "final_table_condition   ": "(`s2`.`c` = `s1`.`c`)"
              }
            ]
          },
          {
            "refine_plan": [
              {
                "table": "`sbtest1` `s1`"
              },
              {
                "table": "`sbtest2` `s2`"
              }
            ]
          }
        ]
      }
    },
    {
      "join_explain": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
          INSUFFICIENT_PRIVILEGES: 0
1 row in set (0.00 sec)