Error Return
If service can not handle request for internal error or upstream error, it should response proper error information to client.
Service should write log for every error return.
The error response structure should be consistent for all APIs in one service.
- The error code should be either a short string code or an integer, defined by service. This is to help client understand the situation and choose different ways to handle the error.
- The detailed error message is optional. This is to help client debug the issue.
- The error code need to be categorized to 2 types, and allow client side easily identify the type of error.
- Occasional error, such as upstream down, internal error, etc. This kind of errors means client can safely retry the request.
- Logic error, such as invalid argument. Client can cached the result and should not directly retry the request.
Error Handling
If your service depends on other services, such as database, redis, API server, the service may get error from upstream service.
The upstream request need to be wrapped in a function, there are 3 ways to return error from this function. This is suggested to be consistent in one project.
- (Recommended) Return error code and response data as tuple.
`error, data = request_api_server(request)``if` `error is not None:`` ``log.error(``'request_api_server_error|error=%s,request=%s'``, error, request)`` ``return` `error``return` `data`
- Return response data only, and if request failed, return null value. This is only applicable for non-critical request.
`data = request_api_server(request)``if` `data is None:`` ``log.error(``'request_api_server_error|request=%s'``, request)`` ``return` `'error_unknown'``return` `data`
- Return response data, raise exception if request failed.
`try``:`` ``data = request_api_server(request)``except Exception as ex:`` ``log.error(``'request_api_server_error|error=%s,request=%s'``, ex, request)`` ``return` `ex``return` `data`
If the error code is inside the response data body, the wrapped function need to extract the error code and return error in one of above ways.
Every time calling upstream request function, you need to check and handle error properly.
There are 3 types of errors when calling upstream service, and need to be handle in different ways:
- Any error from idempotent API, service can retry for several times.
- Occasional errors from upstream, such as internal error, service can retry for several times.
- Non-occasional errors from upstream, such as invalid argument, lost connection, etc., service should not directly retry the request.
Every time you receive a error from upstream, you should log the error information and context.
`get_user_query_db_error|error=lost_connection,uid=``10000``,sql=SELECT * FROM user_tab WHERE uid=``10000``get_user_query_api_server_error|error=invalid_argument,cmd=gete_user,uid=``10000`
If service needs to retry request, it should properly decide the retry delay for different kind of errors, and should have a maximum retry times limit.
If you can not get successful response from upstream service, you need to properly handle the error: 1) recover from the errors; 2) propagate errors.
If the upstreaming request is not a critical path in current request, you can log the errors and continue handling the request without making the request failed.
The service should not blindly propagate errors from those services to your clients. When translating errors, we suggest the following:
- Hide implementation details and confidential information.
- Adjust the party responsible for the error. For example, a server that receives an INVALID_ARGUMENT error from another service should propagate an INTERNAL to its own caller.
Reference
- Server Error Handling Guide - https://confluence.shopee.io/display/LABS/Server+Error+Handling+Guide