8.0.18がリリースされたのでHash Joinを試してみました。
dockerには8.0.18 imageはなかったのでcentos7にinstallして実験
先にまとめ
- HASH JOINは等価条件のJOINでかつjoinするカラムにindexがない場合に採用される(ドキュメント1行目)
- HASH JOINしたかどうかは
EXPLAIN ANALYZE
もしくはEXPLAIN FORMAT=TREE:
で確認する - JOINアルゴリズムの選択をコスト計算で行っているかは不明
- optimizer_traceみても単純に判断できそうな出力はない
install
yum localinstall -y https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm yum install mysql-server systemctl start mysqld yum install epel-release yum install sysbench
MySQLの準備
cat /var/log/mysqld.log | grep generated # passwd mysql_secure_installation # configure... mysql -u root -p m> create database sysbench;
サンプルのテーブルを作成
sysbenchでテーブル2つ(sbtest1, sbtest2)を作成
sysbench /usr/share/sysbench/oltp_write_only.lua \ --db-driver=mysql \ --table-size=10000 \ --tables=2 \ --mysql-host=127.0.0.1 \ --mysql-port=3306 \ --mysql-user=sysbench \ --mysql-password=Sysbench8! \ --mysql-db=sysbench \ prepare
中身
mysql> show create table sbtest1\G *************************** 1. row *************************** Table: sbtest1 Create Table: CREATE TABLE `sbtest1` ( `id` int(11) NOT NULL AUTO_INCREMENT, `k` int(11) NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=10001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci 1 row in set (0.00 sec) mysql> select * from sbtest1 limit 10; +----+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 1 | 4993 | 83868641912-28773972837-60736120486-75162659906-27563526494-20381887404-41576422241-93426793964-56405065102-33518432330 | 67847967377-48000963322-62604785301-91415491898-96926520291 | | 2 | 5020 | 38014276128-25250245652-62722561801-27818678124-24890218270-18312424692-92565570600-36243745486-21199862476-38576014630 | 23183251411-36241541236-31706421314-92007079971-60663066966 | | 3 | 5044 | 33973744704-80540844748-72700647445-87330233173-87249600839-07301471459-22846777364-58808996678-64607045326-48799346817 | 38615512647-91458489257-90681424432-95014675832-60408598704 | | 4 | 5021 | 37002370280-58842166667-00026392672-77506866252-09658311935-56926959306-83464667271-94685475868-28264244556-14550208498 | 63947013338-98809887124-59806726763-79831528812-45582457048 | | 5 | 4999 | 44257470806-17967007152-32809666989-26174672567-29883439075-95767161284-94957565003-35708767253-53935174705-16168070783 | 34551750492-67990399350-81179284955-79299808058-21257255869 | | 6 | 5006 | 37216201353-39109531021-11197415756-87798784755-02463049870-83329763120-57551308766-61100580113-80090253566-30971527105 | 05161542529-00085727016-35134775864-52531204064-98744439797 | | 7 | 5014 | 33071042495-29920376648-91343430102-79082003121-73317691963-02846712788-88069761578-14885283975-44409837760-90760298045 | 91798303270-64988107984-08161247972-12116454627-22996445111 | | 8 | 5000 | 73754818686-04889373966-18668178968-56957589012-31352882173-91882653509-59577900152-88962682169-52981807259-62646890059 | 76460662325-41613089656-42706083314-81833284991-17063140920 | | 9 | 5273 | 26482547570-00155460224-12388481921-23289186371-78242522654-77998886134-73270876420-50821093220-31442690639-11588920653 | 30508501104-50823269125-88107014550-70202920684-95842308929 | | 10 | 5035 | 05677017559-47107518969-97509137401-28934334557-14497052050-61906823704-44077628507-24840441785-05187301456-27797851637 | 29489382504-13697582598-09964978366-26554639515-36136545002 | +----+------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 10 rows in set (0.00 sec)
やってみよう
mysql> explain select * from sbtest1 s1 inner join sbtest2 s2 on s1.k = s2.k limit 10; +----+-------------+-------+------------+------+---------------+------+---------+---------------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+---------------+------+----------+-------+ | 1 | SIMPLE | s1 | NULL | ALL | k_1 | NULL | NULL | NULL | 9680 | 100.00 | NULL | | 1 | SIMPLE | s2 | NULL | ref | k_2 | k_2 | 4 | sysbench.s1.k | 5 | 100.00 | NULL | +----+-------------+-------+------------+------+---------------+------+---------+---------------+------+----------+-------+ 2 rows in set, 1 warning (0.00 sec)
ああそうかexplainではでないのか。 と思ったけどsbtest2(s2)のrows 5ってなに?select ... limit 10なんだけど。
-- why rows 5?? mysql> select count(*) from sbtest1 where k = 4993; +----------+ | count(*) | +----------+ | 104 | +----------+ 1 row in set (0.01 sec)
確認したらkは結構かぶってた。
気を取り直してexplain analyze
!!
mysql> explain analyze select * from sbtest1 s1 inner join sbtest2 s2 on s1.k = s2.k limit 10\G *************************** 1. row *************************** EXPLAIN: -> Limit: 10 row(s) (actual time=0.159..0.523 rows=10 loops=1) -> Nested loop inner join (cost=20455.49 rows=55564) (actual time=0.128..0.349 rows=10 loops=1) -> Table scan on s1 (cost=1008.25 rows=9680) (actual time=0.035..0.035 rows=1 loops=1) -> Index lookup on s2 using k_2 (k=s1.k) (cost=1.44 rows=6) (actual time=0.046..0.118 rows=10 loops=1)
あれ??? 確かにkのカーディナリティは低いかもしれないけどoptimizer traceでなんかわかるのかな? optimizer trace見る
SET optimizer_trace_max_mem_size = 1048576; SET optimizer_trace="enabled=on" explain select * from sbtest1 s1 inner join sbtest2 s2 on s1.k = s2.k limit 10\G SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE -- (長いのでこの記事の最後に)
とりあえずoptimizer traceにjoinアルゴリズムのコスト計算経過がわかるものはなさそうでした。
一旦落ち着いてドキュメントを読む
https://dev.mysql.com/doc/refman/8.0/en/hash-joins.html
冒頭部分でequi-join(等価条件でのjoin)且つindexがない場合と書いてあった。。。
MySQL employs a hash join for any query for which each join has an equi-join condition and uses no indexes,
まじか。 ならindexのないカラムcでjoin
cでjoin
mysql> explain analyze select * from sbtest1 s1 inner join sbtest2 s2 on s1.c = s2.c limit 10\G *************************** 1. row *************************** EXPLAIN: -> Limit: 10 row(s) (actual time=3218.399..3218.399 rows=0 loops=1) -> Inner hash join (s2.c = s1.c) (cost=9620178.51 rows=9618048) (actual time=3218.357..3218.357 rows=0 loops=1) -> Table scan on s2 (cost=0.13 rows=9936) (actual time=0.037..62.317 rows=10000 loops=24) -> Hash -> Table scan on s1 (cost=1008.25 rows=9680) (actual time=0.039..65.958 rows=10000 loops=1) 1 row in set (3.22 sec)
おおおー!
Inner hash join
出ました。
片方のtable(sbtest1)にindex作成
indexあったらどうなのか。 (ちなみにこのクエリをexplainせずに実行すると結果は0件です)
mysql> alter table sbtest1 add index c_1(c); Query OK, 0 rows affected (0.09 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> explain analyze select * from sbtest1 s1 inner join sbtest2 s2 on s1.c = s2.c limit 10\G *************************** 1. row *************************** EXPLAIN: -> Limit: 10 row(s) (actual time=409.276..409.276 rows=0 loops=1) -> Nested loop inner join (cost=4511.45 rows=9936) (actual time=409.239..409.239 rows=0 loops=1) -> Table scan on s2 (cost=1033.85 rows=9936) (actual time=0.070..66.048 rows=10000 loops=1) -> Index lookup on s1 using c_1 (c=s2.c) (cost=0.25 rows=1) (actual time=0.016..0.016 rows=0 loops=10000) 1 row in set (0.41 sec)
Nested loopになりました。 このときもoptimizer traceではhash joinしたような文字列は見つけられませんでした。
なるほど。
まとめ
記事冒頭に書いたとおり
以下付録2つ。optimizer traceからjoin algorithmの判断部分がわかるよ?っていう場合以外は見る必要はなさそう。
(付録1) kでjoinしてnested loop担ったときのoptimizer trace結果
mysql> SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE\G *************************** 1. row *************************** QUERY: explain select * from sbtest1 s1 inner join sbtest2 s2 on s1.k = s2.k limit 10 TRACE: { "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `s1`.`id` AS `id`,`s1`.`k` AS `k`,`s1`.`c` AS `c`,`s1`.`pad` AS `pad`,`s2`.`id` AS `id`,`s2`.`k` AS `k`,`s2`.`c` AS `c`,`s2`.`pad` AS `pad` from (`sbtest1` `s1` join `sbtest2` `s2` on((`s1`.`k` = `s2`.`k`))) limit 10" }, { "transformations_to_nested_joins": { "transformations": [ "JOIN_condition_to_WHERE", "parenthesis_removal" ], "expanded_query": "/* select#1 */ select `s1`.`id` AS `id`,`s1`.`k` AS `k`,`s1`.`c` AS `c`,`s1`.`pad` AS `pad`,`s2`.`id` AS `id`,`s2`.`k` AS `k`,`s2`.`c` AS `c`,`s2`.`pad` AS `pad` from `sbtest1` `s1` join `sbtest2` `s2` where (`s1`.`k` = `s2`.`k`) limit 10" } } ] } }, { "join_optimization": { "select#": 1, "steps": [ { "condition_processing": { "condition": "WHERE", "original_condition": "(`s1`.`k` = `s2`.`k`)", "steps": [ { "transformation": "equality_propagation", "resulting_condition": "multiple equal(`s1`.`k`, `s2`.`k`)" }, { "transformation": "constant_propagation", "resulting_condition": "multiple equal(`s1`.`k`, `s2`.`k`)" }, { "transformation": "trivial_condition_removal", "resulting_condition": "multiple equal(`s1`.`k`, `s2`.`k`)" } ] } }, { "substitute_generated_columns": { } }, { "table_dependencies": [ { "table": "`sbtest1` `s1`", "row_may_be_null": false, "map_bit": 0, "depends_on_map_bits": [ ] }, { "table": "`sbtest2` `s2`", "row_may_be_null": false, "map_bit": 1, "depends_on_map_bits": [ ] } ] }, { "ref_optimizer_key_uses": [ { "table": "`sbtest1` `s1`", "field": "k", "equals": "`s2`.`k`", "null_rejecting": false }, { "table": "`sbtest2` `s2`", "field": "k", "equals": "`s1`.`k`", "null_rejecting": false } ] }, { "rows_estimation": [ { "table": "`sbtest1` `s1`", "table_scan": { "rows": 9680, "cost": 40.25 } }, { "table": "`sbtest2` `s2`", "table_scan": { "rows": 9936, "cost": 40.25 } } ] }, { "considered_execution_plans": [ { "plan_prefix": [ ], "table": "`sbtest1` `s1`", "best_access_path": { "considered_access_paths": [ { "access_type": "ref", "index": "k_1", "usable": false, "chosen": false }, { "rows_to_scan": 9680, "filtering_effect": [ ], "final_filtering_effect": 1, "access_type": "scan", "resulting_rows": 9680, "cost": 1008.2, "chosen": true } ] }, "condition_filtering_pct": 100, "rows_for_plan": 9680, "cost_for_plan": 1008.2, "rest_of_plan": [ { "plan_prefix": [ "`sbtest1` `s1`" ], "table": "`sbtest2` `s2`", "best_access_path": { "considered_access_paths": [ { "access_type": "ref", "index": "k_2", "rows": 5.74, "cost": 19447, "chosen": true }, { "rows_to_scan": 9936, "filtering_effect": [ ], "final_filtering_effect": 1, "access_type": "scan", "using_join_cache": true, "buffers_needed": 27, "resulting_rows": 9936, "cost": 9.62e6, "chosen": false } ] }, "condition_filtering_pct": 100, "rows_for_plan": 55564, "cost_for_plan": 20455, "chosen": true } ] }, { "plan_prefix": [ ], "table": "`sbtest2` `s2`", "best_access_path": { "considered_access_paths": [ { "access_type": "ref", "index": "k_2", "usable": false, "chosen": false }, { "rows_to_scan": 9936, "filtering_effect": [ ], "final_filtering_effect": 1, "access_type": "scan", "resulting_rows": 9936, "cost": 1033.8, "chosen": true } ] }, "condition_filtering_pct": 100, "rows_for_plan": 9936, "cost_for_plan": 1033.8, "rest_of_plan": [ { "plan_prefix": [ "`sbtest2` `s2`" ], "table": "`sbtest1` `s1`", "best_access_path": { "considered_access_paths": [ { "access_type": "ref", "index": "k_1", "rows": 5.8343, "cost": 20289, "chosen": true }, { "rows_to_scan": 9680, "filtering_effect": [ ], "final_filtering_effect": 1, "access_type": "scan", "using_join_cache": true, "buffers_needed": 28, "resulting_rows": 9680, "cost": 9.62e6, "chosen": false } ] }, "condition_filtering_pct": 100, "rows_for_plan": 57970, "cost_for_plan": 21323, "pruned_by_cost": true } ] } ] }, { "attaching_conditions_to_tables": { "original_condition": "(`s2`.`k` = `s1`.`k`)", "attached_conditions_computation": [ ], "attached_conditions_summary": [ { "table": "`sbtest1` `s1`", "attached": null }, { "table": "`sbtest2` `s2`", "attached": "(`s2`.`k` = `s1`.`k`)" } ] } }, { "finalizing_table_conditions": [ { "table": "`sbtest2` `s2`", "original_table_condition": "(`s2`.`k` = `s1`.`k`)", "final_table_condition ": null } ] }, { "refine_plan": [ { "table": "`sbtest1` `s1`" }, { "table": "`sbtest2` `s2`" } ] } ] } }, { "join_explain": { "select#": 1, "steps": [ ] } } ] } MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0 INSUFFICIENT_PRIVILEGES: 0 1 row in set (0.00 sec)
(付録2) cでindexなしでjoinしてhash joinした場合のoptimizer_trace結果
mysql> SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE -> \G *************************** 1. row *************************** QUERY: explain select * from sbtest1 s1 inner join sbtest2 s2 on s1.c = s2.c limit 10 TRACE: { "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `s1`.`id` AS `id`,`s1`.`k` AS `k`,`s1`.`c` AS `c`,`s1`.`pad` AS `pad`,`s2`.`id` AS `id`,`s2`.`k` AS `k`,`s2`.`c` AS `c`,`s2`.`pad` AS `pad` from (`sbtest1` `s1` join `sbtest2` `s2` on((`s1`.`c` = `s2`.`c`))) limit 10" }, { "transformations_to_nested_joins": { "transformations": [ "JOIN_condition_to_WHERE", "parenthesis_removal" ], "expanded_query": "/* select#1 */ select `s1`.`id` AS `id`,`s1`.`k` AS `k`,`s1`.`c` AS `c`,`s1`.`pad` AS `pad`,`s2`.`id` AS `id`,`s2`.`k` AS `k`,`s2`.`c` AS `c`,`s2`.`pad` AS `pad` from `sbtest1` `s1` join `sbtest2` `s2` where (`s1`.`c` = `s2`.`c`) limit 10" } } ] } }, { "join_optimization": { "select#": 1, "steps": [ { "condition_processing": { "condition": "WHERE", "original_condition": "(`s1`.`c` = `s2`.`c`)", "steps": [ { "transformation": "equality_propagation", "resulting_condition": "multiple equal(`s1`.`c`, `s2`.`c`)" }, { "transformation": "constant_propagation", "resulting_condition": "multiple equal(`s1`.`c`, `s2`.`c`)" }, { "transformation": "trivial_condition_removal", "resulting_condition": "multiple equal(`s1`.`c`, `s2`.`c`)" } ] } }, { "substitute_generated_columns": { } }, { "table_dependencies": [ { "table": "`sbtest1` `s1`", "row_may_be_null": false, "map_bit": 0, "depends_on_map_bits": [ ] }, { "table": "`sbtest2` `s2`", "row_may_be_null": false, "map_bit": 1, "depends_on_map_bits": [ ] } ] }, { "ref_optimizer_key_uses": [ ] }, { "rows_estimation": [ { "table": "`sbtest1` `s1`", "table_scan": { "rows": 9680, "cost": 40.25 } }, { "table": "`sbtest2` `s2`", "table_scan": { "rows": 9936, "cost": 40.25 } } ] }, { "considered_execution_plans": [ { "plan_prefix": [ ], "table": "`sbtest1` `s1`", "best_access_path": { "considered_access_paths": [ { "rows_to_scan": 9680, "filtering_effect": [ ], "final_filtering_effect": 1, "access_type": "scan", "resulting_rows": 9680, "cost": 1008.2, "chosen": true } ] }, "condition_filtering_pct": 100, "rows_for_plan": 9680, "cost_for_plan": 1008.2, "rest_of_plan": [ { "plan_prefix": [ "`sbtest1` `s1`" ], "table": "`sbtest2` `s2`", "best_access_path": { "considered_access_paths": [ { "rows_to_scan": 9936, "filtering_effect": [ ], "final_filtering_effect": 1, "access_type": "scan", "using_join_cache": true, "buffers_needed": 27, "resulting_rows": 9936, "cost": 9.62e6, "chosen": true } ] }, "condition_filtering_pct": 10, "rows_for_plan": 9.62e6, "cost_for_plan": 9.62e6, "chosen": true } ] }, { "plan_prefix": [ ], "table": "`sbtest2` `s2`", "best_access_path": { "considered_access_paths": [ { "rows_to_scan": 9936, "filtering_effect": [ ], "final_filtering_effect": 1, "access_type": "scan", "resulting_rows": 9936, "cost": 1033.8, "chosen": true } ] }, "condition_filtering_pct": 100, "rows_for_plan": 9936, "cost_for_plan": 1033.8, "pruned_by_heuristic": true } ] }, { "attaching_conditions_to_tables": { "original_condition": "(`s2`.`c` = `s1`.`c`)", "attached_conditions_computation": [ ], "attached_conditions_summary": [ { "table": "`sbtest1` `s1`", "attached": null }, { "table": "`sbtest2` `s2`", "attached": "(`s2`.`c` = `s1`.`c`)" } ] } }, { "finalizing_table_conditions": [ { "table": "`sbtest2` `s2`", "original_table_condition": "(`s2`.`c` = `s1`.`c`)", "final_table_condition ": "(`s2`.`c` = `s1`.`c`)" } ] }, { "refine_plan": [ { "table": "`sbtest1` `s1`" }, { "table": "`sbtest2` `s2`" } ] } ] } }, { "join_explain": { "select#": 1, "steps": [ ] } } ] } MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0 INSUFFICIENT_PRIVILEGES: 0 1 row in set (0.00 sec)