【MySQL】Best Practice

Posted by 西维蜀黍 on 2021-11-01, Last Modified on 2022-12-10

Naming

Generic

Do

  • Use meaningful names.

  • Use lower_case_underscores.

  • The length of db name should be less than 42

  • The length of table name should be less than 48

  • The length of username should be less than 32

  • Avoid MySQL reserved keywords as a column‘s name

  • e.g., keyword or reserved words , like: name, status, type, data, description, level, host, port……

  • https://dev.mysql.com/doc/refman/8.0/en/keywords.html

  • dangerous words ,like : rm, truncate ,delete……

  • confusion words , like : master, slave ,( which is a MySQL Replication role.)

DBs

  • lower_case_with_underscores with suffix _db.
    • e.g. account_db, company_account_db, gop_txn_db, company_admin_vn_db
  • Prefix is preferred for a database naming to show the business category or project code.
    • e.g. company_account_db, gop_txn_db
  • If the database is specific for a region, add region code before _db.
    • e.g. company_admin_vn_db
  • For sharding db, put shard id as suffix after _db, padding to 8 digits, started from 0.
    • e.g. company_account_db_00000000, log_db_00002018
  • the length of db name should be less than 42

Tables

  • lower_case_with_underscores with suffix _tab. e.g. user_tab
  • For sharding table, put shard id as suffix after _tab, padding to 8 digits, started from 0.
    • e.g. account_tab_00000000, order_tab_00001000, login_log_tab_00201801, audit_log_tab_20180101
  • the length of table name should be less than 48

Fields

  • lower_case_with_underscores.
    • e.g. order_id, create_time
  • If the field is a foreign reference to another table, and it is referencing an integer primary key, it should be suffixed with _id
  • 不要用mysql关键字做字段名,比如 status、desc等

Index

  • lower_case_with_underscores, joined keys with prefix idx_.
    • e.g. idx_order_id, idx_key1_key2

Stored Procedure

  • lower_case_with_underscores with prefix sp_.
    • e.g. sp_execute_topup

Functions

  • lower_case_with_underscores with prefix func_.
    • e.g. func_get_uid

Users

Data Size

  • What’s the estimated data size in 12 ~ 18 months

if the table meets any of the conditions, do table sharding or data archive

  • larger than 10GB
  • more than 10 million records
  • QPS (write&read) > 1k/s on single table

DB

  • 数据库字符编码使用 utf8mb4
  • not allowed in most cases. only allow some internal service or small DB, as we treat MySQL as a backend storage
    • foreign key
    • event
    • function
    • stored procedure
    • trigger
    • partition table(only allowed in some internal service,like zabbix)

Tables

  • primary key

    • preferable:
      • Primary key should always be BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, unless you know what you’re doing.
        • for some tables with only one index: the priamry key, like only item_id bigint as primary key, we can set it to unique key, and add “id bigint auto_increment” as primary key
      • not allowed: UUID, MD5 Hash, composite primary key
  • table collate use utf8mb4_unicode_ci (recommended) or latin1_general_ci (only for special cases)

  • Table collation should be innodb

    • Main advantages of InnoDB over MyISAM:
      • InnoDB supports transactions, MyISAM does not.
      • InnoDB supports row-level locking, MyISAM only supports table-level locking. Finer-grained locking = better concurrency.
      • InnoDB supports foreign keys, MyISAM does not. BUT we do not recommend using foreign keys.
  • use *_cs in field if this field require case sensitive collation

  • If primary key is not sequential in design, and the data will not be deleted frequently, create another primary key

    (`id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT)
    
    • This includes composite PKs.
    • Reason is to reduce disk I/O for InnoDB.
    • Note: this is not applicable to TiDB. TiDB is not suggested to use auto increment id as primary key.
  • Don’t use foreign keys

    • MySQL foreign key implementation is quite simple and crude, and its performance is not good
    • It is not recommended to place any computation logic onto MySQL. We take MySQL as backend storage and won’t allow foreign keys or stored routines.
      • If our applications rely on the computation logic running on MySQL server, it will be much more difficult for us to do database/table sharding.

Fields

  • Define ID value as BIGINT unsigned if possible.
  • Numeric value that does not lose precision (such as INT, DECIMAL) is preferred over precision-loss types (such as FLOAT, DOUBLE).
  • Define numerical value as unsigned if possible.
  • Don’t allow NULL unless necessary.
  • Don’t use ENUM type, use integers instead.
  • Use integers instead of reals for currency.
  • Don’t set default value unless necessary.
    • Default value is recommended when adding new fields on existing tables to ensure smooth upgrade/rollback.
  • Use INT UNSIGNED or BIGINT UNSIGNED (UNIX timestamp) instead of DATETIME to save time value.
    • Year 2038 Problem:
      • TIMESTAMP encoding is actually a 4-byte integer, only can store to a limit of year 2038.
      • No Year 2038 Problem with INT UNSIGNED OR BIGINT. INT storage limit is 2147483647, which is year 2038 when converted to UTC. INT UNSIGNED storage limit is 4294967295, which is year 2106 when converted to UTC. Likewise for BIGINT, it will be able to store bigger values.
    • UTC time is recommended for a global company.
      • NOW() 赋值或字段默认值的,存储的时间是 MySQL 服务器所在地区的时间,在不同地区看到的时间是都是 MySQL 服务器所在时区的时间,存在时区问题,我们是全球化部署业务,时间统一为 UTC 时间可以有效降低内部沟通成本
    • MySQL database is better to be a pure storage layer.
      • 通常我们把 MySQL 数据库当做一个存储层来使用, 不建议把计算逻辑放在数据库层面. 因此, 请避免使用内置的系统函数, 例如 NOW()DEFAULT CURRENT_TIMESTAMP / DEFAULT CURRENT_TIMESTAMP ON UPDATE
  • For adding column ALTER TABLE ADD COLUMN, should not indicate the location of it (FIRST / AFTER), and instead, the new column will be only appended as the last
  • comment: Chinese Characters or any other non-English language words are not allowed in table schema or comment content

Index

  • do not create index on every single column
  • create index on columns with high selectivity
  • create index on columns of which the target values in where condition are less part of the total records
    • e.g. order_status, most values are 1, the very little values are 0 ,and the target value in where condition is always 0, then can create index on it.
  • put the most selective column left
    • e.g. rows 10 million, shop_id 1000, region 7 ,create index (shop_id,region) , not (region,shop_id)
  • do not create duplicate indexes
    • e.g. idx_xxx1(shop_id), idx_xxx2(shop_id,order_status), idx_xxx3(shop_id,order_status,create_time) , the idx_xxx3 covers the idx_xxx1 and idx_xxx2
  • create prefix index for long string column if necessary
    • e.g. item_desc varchar(2000), the string value is “{‘uuid’:’asdfsdfkjl’,’desc’:’xxx’,……}” ,and the value of the uuid is almost unique ,then can create prefix index
  • do not put range-search-column left in composite index
    • e.g. where condition: create_time > ‘2020-06-01 12:00:00’ and shop_id=222 ,create index(shop_id,create_time), not (create_time,shop_id)

Queries

  • When connect to MySQL by command-line client in server, should use the -p or --password option with no password value specified. In this case, the client program solicits the password interactively. It is more secure than to specify it on the command line by -pyour_pass or --password=your_pass, because the password is not visible to other users, ps or history.
  • As a general guideline, writing operations (INSERT / UPDATE / DELETE) on production master database should not affect more than 1000 rows per query or per second, as queries affecting more rows will affect other queries and potentially lead to a slave replication delay.
    • To mitigate this, it is recommended to split the query up (e.g. LIMIT 1000) and introduce a small delay (e.g. 1 second) between each query.
  • Could not use use <> NULL, = NULL, in (NULL), not in (NULL), group by NULL, aggregation method(NULL), only could use is NULL and is not NULL due to NULL is not a value and could not do any comparison.
  • It is not allowed to directly update data in production database (e.g. fix data) by executing SQL in MySQL client.

Data Types

  • user_id: bigint unsigned
  • shop_id: bigint unsigned
  • order_id: bigint unsigned
  • item_id: bigint unsigned
  • ctime/mtime/expiry_time/… : int unsigned or bigint unsigned
  • Use the VARCHAR data type instead of CHAR to store variable-length strings or for columns with many NULL values
    • A CHAR(*N*) column always takes N characters to store data, even if the string is shorter or its value is NULL. Smaller tables fit better in the buffer pool and reduce disk I/O.

Performance

  • the Write QPS should be less than 2k/s
  • the Write+Read QPS on master DB should be less than 3k/s
  • The total Read QPS should be less than 6k/s
  • When doing bulk operations, sleep e.g. 1 second per 1000 records updated.

Security Design

Sensitive Data

  • is there any sensitive data fields in the databases? If yes, consider to the data encryption/protections.

What is sensitive data?

Information which can be used to distinguish or trace an individual’s identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc.

Password

  • Generate strong passwords.
    • The password minimum length should be 16 characters or longer.
    • The passwords should consist of randomly picked characters, which have mixed uppercase and lowercase and contain digits and symbols.
    • Different accounts should have different passwords.

Archive

  • 归档后的数据程序还需要读吗?
  • 归档后的数据dev还需要手动读取吗?
  • 归档后数据有没有永久存储或者存储一定时长(例如: 5年)的要求?

如果上述任何一个问题的答案是Yes,则建议在同一个MySQL/RDS实例上创建新的database,把数据归档到按天或者按月存储的table。基本考量是单表不宜超过10GB。

data archive is suitable for:

  • log data
  • history data
  • records with less udpate

How

  • 每隔一段时间,通过cronjob 在低峰期时做归档
  • Limit the speed when deleting.
    • Generally, the number of records deleted per second does not exceed 1000 rows, and it is recommended to be less than 500 rows. Usually, you can take the practice of sleep for one second every 500 rows deleted.
  • Try to delete during low peak periods.
    • For example, scheduled to be executed early in the morning
    • It can also be executed during the day, please keep following up about instance QPS, less than 3000.
  • Use primary key or unique key to delete.
    • Try not to delete multiple records in one SQL. For example: use primary key/unique key + logical delete condition double filter in delete SQL,
  • If you have to scan a table, do not delete too many rows with one SQL. you can delete rows by a primary key range + other logical filters. For example, delete from t where pk between xxx and yyyy and expired_time < zzzz;[]
  • 清理掉的数据需要保留吗?要如何保留?如果程序仍然需要读这些记录,可以考虑直接放到另一个新的归档数据库里,按天或者按月分表 (确保单表记录不超过1000万)

Heterogeneous Data Replication(异构数据复制)

What

如果要把A表数据无缝迁移到B表(最终会把A表废弃), 这就是通常讲的"异构数据复制".

以下几个问题异构数据复制需要注意的:

  • Source Table和Destination Table的表结构是否同构? 如果不同构, 要考虑具体的字段转换逻辑.
  • 是否需要去重? 把两个表的数据合并到一个新表里, 如何鉴别及处理重复数据?
  • 一般而言, source tables里的数据会处于一种不断变化的状态之中, 如何差分数据变化并及时反映到 destination table ?

How

在应用程序层面做双写(double write)。 一般的迁移过程分为若干个阶段:

  • 开启双写:程序同时写新旧两种table → 此时应用程序会仍然读取旧表数据
  • 存量数据导入:根据业务逻辑, 把旧表的数据循序渐进的导入新表
  • 校验数据:确认新旧表数据一致
  • 双读:程序同时读新旧表, 并以旧表读取结果为准校验检验新表 → 如果业务逻辑复杂, 建议增设此阶段
  • 读切换:程序改为读取新表
  • 停止双写并清理旧表

Reference