Scope
Title
Bi-Directional Scalable Highly Available ID Mapping
Question Description
The e-commerce platform would like to maintain its own user id space so it can support users from different business / products
Design a highly scalable, highly available bi-directional user id mapping service to bind an external user id to an internal e-commerce platform user id.
- map(external_uid) ==> internal_uid
- reverse_map(internal_uid) ==> external_uid
Describe key components for your design, including database selection and design, cache design, logical flow etc.
-
Basic Constraints (Phase 1):
- both external and internal uid are 64-bit numbers
- internal uid generation design is entirely up to the service
- if the internal uid has not been create/bind to the external uid, the id map service should generate one and bind it
- once bound, there should be a strict 1-1 mapping between external uid and internal uid
- Support 10B user id space
- Support high scale (> 500K QPS) and low latency (P90 < 20 ms)
-
Advanced Constraints (Phase 2):
-
Multi-Datacenter Deployment
- How to ensure data consistency across datacenters while adhering to low latency constraints?
-
New Region Launch
- How to avoid stampeding herd problem when we launch the e-commerce platform in a new country?
Requirement
- Do we have any middlewares in the company that we could use?
- Can we use public Saas?
High-level Design
Databases
- use a NoSQL database like Cassandra or a distributed key-value store like Amazon DynamoDB
Cache
- Introduce a caching layer to improve read performance and reduce load on the database.
- Consider a distributed caching system like Redis or Memcached.
- Cache the most frequently accessed mappings, but ensure cache consistency to avoid stale data.
- Use for example LRU, LFU, and analyse the cache hit rate to adjust the evition algo.
Service API
- Expose a RESTful API or a gRPC service to handle mapping requests.
- API endpoints:
/map/{external_uid}
for mapping external to internal UID./reverse_map/{internal_uid}
for reverse mapping.
- Implement rate limiting and authentication mechanisms to handle high QPS.
Monitoring and Logging
- Integrate monitoring tools to track service health, latency, and usage.
- Log requests and responses for debugging and auditing purposes.
Deep Dive
Multi-Datacenter Deployment
- master-slave
- Master-master
- if eventual consistency is ok, deal with conflicts later
- if a flow needing user_id generation needs to achieve P90 < 20 ms, the generation has to be done distributedly
- if a flow needing user_id generation doesn’t need to achieve P90 < 20 ms, the single source of generation is ok, which reults in better consistency.
Consider an eventual consistency model if strong consistency is not mandatory for your use case.
New Region Launch
- monitor the system
- rate limit to control the rate of requests sent to the new region.
- Gradually increase the traffic to avoid sudden spikes and potential issues.
- circuit breakers