Running TPC-H on MYSQL–loading data

系统版本:Ubuntu-10.04  &&  MySQL-5.1.41
[background]
tpc-h是决策支持数据库的基准测试,它包含了一整套面向商业的ad-hoc查询并发数据修改,强调测试的是数据库、平台和I/O性能,关注查询能力
tpc-h通过检查大量数据,执行复杂的查询操作,以及应答商业问题来展示决策支持系统的性能。tpc-h中的数据库查询要比典型的OLTP查询更加复杂。

tpc-h性能测试包括两部分:
能力(power)测试—能力测试将以连续的次序执行一个数据库查询流。
吞吐量(throughput)—吞吐量测试将执行多条并发的数据库查询流,而每条查询流同样以连续的次序执行查询。

TPC-H所报告的性能计量单位被称为“TPC-H复合式每小时查询性能单位”(TPC-H Composite Query-per-Hour Performance Metric – QphH@Size),反映的是系统处理查询的多方面能力,包括查询执行时选定的数据库大小、单个流提交查询时的查询处理能力,以及多个并发用户提交查询时的查询吞吐量。
TPC-H的价格/性能比计量单位的表达方式为$/QphH@Size。

[load data step by step]
注:实验只需单表,所以做部分删减操作,多表需要修改ddl文件,或者create table的.sql文件,本实验选择lineitem表,该表含有各种数据类型。

1.DBGEN is available here–http://www.tpc.org/tpch/spec/tpch_2_14_0.tgz
together with tpc-h ref-data
–dbgen is a TPC provided software package that must be used to produce the data used to populate the database.

2.Generate data
2.1 tpch文件目录下makefile.suite复制并改名成makefile,并修改makefile文件tracylling@ubuntu:~/tpch/dbgen$ cp makefile.suite makefile
tracylling@ubuntu:~/tpch/dbgen$ gedit makefile
makefile modified as:

###############
## CHANGE NAME OF ANSI COMPILER HERE
################
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE= SQLSERVER
MACHINE = LINUX
WORKLOAD = TPCH

Tips: DATABASE选项中没有MySQL,没有太大关系,仍旧可以读取,以|分割的表数据。

2.2 同时修改tpcd.h文件
tracylling@ubuntu:~/tpch/dbgen$ gedit tpcd.h
tpcd.h modified as:

#ifdef       SQLSERVER
#define GEN_QUERY_PLAN  “EXPLAIN;”
#define START_TRAN      “START TRANSACTION;\n”
#define END_TRAN        “COMMIT;\n”
#define SET_OUTPUT      “”
#define SET_ROWCOUNT    “limit %d;\n”
#define SET_DBASE       “use %s;\n”
#endif

2.3 编译之后生存dbgen可执行文件
tracylling@ubuntu:~/tpch/dbgen$ make

2.4 生成需要的数据
tracylling@ubuntu:~/tpch/dbgen$ ./dbgen -v -T L -s 0.01
-v: verbose, -T L: lineitem only, -s 0.01: scale factor

TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 – 2010
Generating data for lineitem table/
Preloading text … 100%
|done.

3.Create schema/database
shell>mysql -e “create database tpch” -u root -p//创建数据库tpch
shell>mysql tpch < tpch_create_table.sql -u root -p //定义表结构(包含主键/索引信息)

tpch_create_table.sql(only one table,you can add others):

# CREATE lineitem
create table lineitem (l_shipdate date not null,
l_orderkey integer not null,
l_partkey  integer not null,
l_suppkey integer not null,
l_linenumber integer not null,
l_quantity decimal(15,2) not null,
l_extendedprice decimal(15,2) not null,
l_discount decimal(15,2) not null,
l_tax decimal(15,2) not null,
l_returnflag char(1) not null,
l_linestatus char(1) not null,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct   char(25) not null,
l_comment varchar(44) not null,
l_shipmode char(10) not null,
primary key (l_orderkey, l_linenumber)
,index li_shp_dt_idx (l_shipdate)
,index li_com_dt_idx (l_commitdate)
,index li_rcpt_dt_idx (l_receiptdate)
) engine=myisam;

4.Load data with load_mysql.sh
shell>sh load_mysql.sh
load_mysql.sh:

PATH_DATA=/home/tracylling/tpch/dbgen
DATABASE=tpch
TABLES=”lineitem”
export LANG=zh_CN.utf8
echo “Load table lineitem”
mysql -uroot -pll -S /var/run/mysqld/mysqld.sock -D $DATABASE -e “TRUNCATE TABLE $TABLES”
echo `date`
mysql -uroot -pll -S /var/run/mysqld/mysqld.sock -D $DATABASE -e “LOAD DATA LOCAL INFILE ‘$PATH_DATA/$TABLES.tbl’ INTO TABLE $TABLES FIELDS TERMINATED BY ‘|'”
mysql -uroot -pll -S /var/run/mysqld/mysqld.sock -D $DATABASE -e “SHOW TABLE STATUS\G”

load before:
*************************** 1. row ***************************
Name: lineitem
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 0
Avg_row_length: 0
Data_length: 0
Max_data_length: 281474976710655
Index_length: 1024
Data_free: 0
Auto_increment: NULL
Create_time: 2011-06-27 14:25:53
Update_time: 2011-06-27 14:25:53
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:

After execute:
Load table lineitem
2011年 06月 27日 星期一 14:26:52 CST
*************************** 1. row ***************************
Name: lineitem
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 45266
Avg_row_length: 87
Data_length: 3942820
Max_data_length: 281474976710655
Index_length: 2031616
Data_free: 0
Auto_increment: NULL
Create_time: 2011-06-27 14:26:52
Update_time: 2011-06-27 14:26:52
Check_time: 2011-06-27 14:26:52
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:

5.Calculate index sizes&total size of the data
计算表索引和数据大小,该信息存储在information_schema表中
mysql> SELECT CONCAT(ROUND(SUM(index_length)/(1024*1024*1024), 2), ‘ GB’) AS ‘Total Index Size’ FROM information_schema.TABLES WHERE table_schema LIKE ‘database’;
mysql> SELECT CONCAT(ROUND(SUM(data_length)/(1024*1024*1024), 2), ‘ GB’) AS ‘Total Data Size’ FROM information_schema.TABLES WHERE table_schema LIKE ‘database’;

Just replace database with the partial name of your database(tpch) you need to analyze.

An overall analysis of entire database on a per table basis:

SELECT CONCAT(table_schema,’.’,table_name) AS ‘Table Name’,CONCAT(ROUND(table_rows/1000000,2),’M’) AS ‘Number of Rows’,CONCAT(ROUND(data_length/(1024*1024*1024),2),’G’) AS ‘Data Size’,CONCAT(ROUND(index_length/(1024*1024*1024),2),’G’) AS ‘Index Size’ ,CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),2),’G’) AS’Total’FROM information_schema.TABLES WHERE table_schema LIKE ‘database’;

Leave a Reply

Your email address will not be published. Required fields are marked *