Naming

Generic

Use meaningful names.
Use lower_case_underscores.
The length of db name should be less than 42
The length of table name should be less than 48
The length of username should be less than 32
Avoid MySQL reserved keywords as a column‘s name
e.g., keyword or reserved words , like: name, status, type, data, description, level, host, port……
https://dev.mysql.com/doc/refman/8.0/en/keywords.html
dangerous words ,like : rm, truncate ,delete……
confusion words , like : master, slave ,( which is a MySQL Replication role.)

DBs

lower_case_with_underscores with suffix _db.
- e.g. account_db, company_account_db, gop_txn_db, company_admin_vn_db
Prefix is preferred for a database naming to show the business category or project code.
- e.g. company_account_db, gop_txn_db
If the database is specific for a region, add region code before _db.
- e.g. company_admin_vn_db
For sharding db, put shard id as suffix after _db, padding to 8 digits, started from 0.
- e.g. company_account_db_00000000, log_db_00002018
the length of db name should be less than 42

Tables

lower_case_with_underscores with suffix _tab. e.g. user_tab
For sharding table, put shard id as suffix after _tab, padding to 8 digits, started from 0.
- e.g. account_tab_00000000, order_tab_00001000, login_log_tab_00201801, audit_log_tab_20180101
the length of table name should be less than 48

Fields

lower_case_with_underscores.
- e.g. order_id, create_time
If the field is a foreign reference to another table, and it is referencing an integer primary key, it should be suffixed with _id
不要用mysql关键字做字段名，比如 status、desc等

Index

lower_case_with_underscores, joined keys with prefix idx_.
- e.g. idx_order_id, idx_key1_key2

Stored Procedure

lower_case_with_underscores with prefix sp_.
- e.g. sp_execute_topup

Functions

lower_case_with_underscores with prefix func_.
- e.g. func_get_uid

Users

Data Size

What’s the estimated data size in 12 ~ 18 months

if the table meets any of the conditions, do table sharding or data archive

larger than 10GB
more than 10 million records
QPS (write&read) > 1k/s on single table

DB

数据库字符编码使用 utf8mb4
not allowed in most cases. only allow some internal service or small DB, as we treat MySQL as a backend storage
- foreign key
- event
- function
- stored procedure
- trigger
- partition table(only allowed in some internal service,like zabbix)

Tables

primary key
- preferable:
  - Primary key should always be BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, unless you know what you’re doing.
    - for some tables with only one index: the priamry key, like only item_id bigint as primary key, we can set it to unique key, and add “id bigint auto_increment” as primary key
  - not allowed: UUID, MD5 Hash, composite primary key
table collate use utf8mb4_unicode_ci (recommended) or latin1_general_ci (only for special cases)
Table collation should be innodb
- Main advantages of InnoDB over MyISAM:
  - InnoDB supports transactions, MyISAM does not.
  - InnoDB supports row-level locking, MyISAM only supports table-level locking. Finer-grained locking = better concurrency.
  - InnoDB supports foreign keys, MyISAM does not. BUT we do not recommend using foreign keys.
use *_cs in field if this field require case sensitive collation
If primary key is not sequential in design, and the data will not be deleted frequently, create another primary key
```
(`id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT)
```
- This includes composite PKs.
- Reason is to reduce disk I/O for InnoDB.
- Note: this is not applicable to TiDB. TiDB is not suggested to use auto increment id as primary key.
Don’t use foreign keys
- MySQL foreign key implementation is quite simple and crude, and its performance is not good
- It is not recommended to place any computation logic onto MySQL. We take MySQL as backend storage and won’t allow foreign keys or stored routines.
  - If our applications rely on the computation logic running on MySQL server, it will be much more difficult for us to do database/table sharding.

Fields

Define ID value as BIGINT unsigned if possible.
Numeric value that does not lose precision (such as INT, DECIMAL) is preferred over precision-loss types (such as FLOAT, DOUBLE).
Define numerical value as unsigned if possible.
Don’t allow NULL unless necessary.
Don’t use ENUM type, use integers instead.
Use integers instead of reals for currency.
Don’t set default value unless necessary.
- Default value is recommended when adding new fields on existing tables to ensure smooth upgrade/rollback.
Use INT UNSIGNED or BIGINT UNSIGNED (UNIX timestamp) instead of DATETIME to save time value.
- Year 2038 Problem:
  - TIMESTAMP encoding is actually a 4-byte integer, only can store to a limit of year 2038.
  - No Year 2038 Problem with INT UNSIGNED OR BIGINT. INT storage limit is 2147483647, which is year 2038 when converted to UTC. INT UNSIGNED storage limit is 4294967295, which is year 2106 when converted to UTC. Likewise for BIGINT, it will be able to store bigger values.
- UTC time is recommended for a global company.
  - 用 NOW() 赋值或字段默认值的，存储的时间是 MySQL 服务器所在地区的时间，在不同地区看到的时间是都是 MySQL 服务器所在时区的时间，存在时区问题，我们是全球化部署业务，时间统一为 UTC 时间可以有效降低内部沟通成本
- MySQL database is better to be a pure storage layer.
  - 通常我们把 MySQL 数据库当做一个存储层来使用, 不建议把计算逻辑放在数据库层面. 因此, 请避免使用内置的系统函数, 例如 NOW() 和 DEFAULT CURRENT_TIMESTAMP / DEFAULT CURRENT_TIMESTAMP ON UPDATE
For adding column ALTER TABLE ADD COLUMN, should not indicate the location of it (FIRST / AFTER), and instead, the new column will be only appended as the last
comment: Chinese Characters or any other non-English language words are not allowed in table schema or comment content

Index

do not create index on every single column
create index on columns with high selectivity
create index on columns of which the target values in where condition are less part of the total records
- e.g. order_status, most values are 1, the very little values are 0 ,and the target value in where condition is always 0, then can create index on it.
put the most selective column left
- e.g. rows 10 million, shop_id 1000, region 7 ,create index (shop_id,region) , not (region,shop_id)
do not create duplicate indexes
- e.g. idx_xxx1(shop_id), idx_xxx2(shop_id,order_status), idx_xxx3(shop_id,order_status,create_time) , the idx_xxx3 covers the idx_xxx1 and idx_xxx2
create prefix index for long string column if necessary
- e.g. item_desc varchar(2000), the string value is “{‘uuid’:’asdfsdfkjl’,’desc’:’xxx’,……}” ,and the value of the uuid is almost unique ,then can create prefix index
do not put range-search-column left in composite index
- e.g. where condition: create_time > ‘2020-06-01 12:00:00’ and shop_id=222 ,create index(shop_id,create_time), not (create_time,shop_id)

Queries

When connect to MySQL by command-line client in server, should use the -p or --password option with no password value specified. In this case, the client program solicits the password interactively. It is more secure than to specify it on the command line by -pyour_pass or --password=your_pass, because the password is not visible to other users, ps or history.
As a general guideline, writing operations (INSERT / UPDATE / DELETE) on production master database should not affect more than 1000 rows per query or per second, as queries affecting more rows will affect other queries and potentially lead to a slave replication delay.
- To mitigate this, it is recommended to split the query up (e.g. LIMIT 1000) and introduce a small delay (e.g. 1 second) between each query.
Could not use use <> NULL, = NULL, in (NULL), not in (NULL), group by NULL, aggregation method(NULL), only could use is NULL and is not NULL due to NULL is not a value and could not do any comparison.
It is not allowed to directly update data in production database (e.g. fix data) by executing SQL in MySQL client.

Data Types

user_id: bigint unsigned
shop_id: bigint unsigned
order_id: bigint unsigned
item_id: bigint unsigned
ctime/mtime/expiry_time/… : int unsigned or bigint unsigned
Use the VARCHAR data type instead of CHAR to store variable-length strings or for columns with many NULL values
- A CHAR(*N*) column always takes N characters to store data, even if the string is shorter or its value is NULL. Smaller tables fit better in the buffer pool and reduce disk I/O.

Performance

the Write QPS should be less than 2k/s
the Write+Read QPS on master DB should be less than 3k/s
The total Read QPS should be less than 6k/s
When doing bulk operations, sleep e.g. 1 second per 1000 records updated.

Security Design

Sensitive Data

is there any sensitive data fields in the databases? If yes, consider to the data encryption/protections.

What is sensitive data?

Information which can be used to distinguish or trace an individual’s identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc.

Password

Generate strong passwords.
- The password minimum length should be 16 characters or longer.
- The passwords should consist of randomly picked characters, which have mixed uppercase and lowercase and contain digits and symbols.
- Different accounts should have different passwords.

Heterogeneous Data Replication（异构数据复制）

What

如果要把A表数据无缝迁移到B表(最终会把A表废弃), 这就是通常讲的"异构数据复制".

以下几个问题异构数据复制需要注意的:

Source Table和Destination Table的表结构是否同构? 如果不同构, 要考虑具体的字段转换逻辑.
是否需要去重? 把两个表的数据合并到一个新表里, 如何鉴别及处理重复数据?
一般而言, source tables里的数据会处于一种不断变化的状态之中, 如何差分数据变化并及时反映到 destination table ?

How

在应用程序层面做双写（double write）。一般的迁移过程分为若干个阶段:

开启双写：程序同时写新旧两种table → 此时应用程序会仍然读取旧表数据
存量数据导入：根据业务逻辑, 把旧表的数据循序渐进的导入新表
校验数据：确认新旧表数据一致
双读：程序同时读新旧表, 并以旧表读取结果为准校验检验新表 → 如果业务逻辑复杂, 建议增设此阶段
读切换：程序改为读取新表
停止双写并清理旧表

【MySQL】Best Practice