【Distributed System】Data Modeling

Posted by 西维蜀黍 on 2023-09-19, Last Modified on 2023-11-24

Data Model

A data mode is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner.

A data model explicitly determines the structure of data; conversely, structured data is data organized according to an explicit data model or data structure. Structured data is in contrast to unstructured data and semi-structured data.

Once the requirements have been collected and analyzed, the next step is to create a conceptual schema for the database, using a high-level conceptual data model. This step is called conceptual design. The conceptual schema is a concise description of the data requirements of the users and includes detailed descriptions of the entity types, relationships, and constraints; these are expressed using the concepts provided by the high-level data model. Because these concepts do not include implementation details, they are usually easier to understand and can be used to communicate with nontechnical users. The high-level conceptual schema can also be used as a reference to ensure that all users’ data requirements are met and that the requirements do not conflict. This approach enables database designers to concentrate on specifying the properties of the data, without being concerned with storage and implementation details, which makes it is easier to create a good conceptual database design.

During or after the conceptual schema design, the basic data model operations can be used to specify the high-level user queries and operations identified during functional analysis. This also serves to confirm that the conceptual schema meets all the identified functional requirements. Modifications to the conceptual schema can be introduced if some functional requirements cannot be specified using the initial schema.

The next step in database design is the actual implementation of the database, using a commercial DBMS. Most current commercial DBMSs use an implementation data model—such as the relational (SQL) model—so the conceptual schema is transformed from the high-level data model into the implementation data model. This step is called logical design or data model mapping; its result is a database schema in the implementation data model of the DBMS. Data model mapping is often automated or semiautomated within the database design tools.

The last step is the physical design phase, during which the internal storage struc-tures, file organizations, indexes, access paths, and physical design parameters for the database files are specified. In parallel with these activities, application programs are designed and implemented as database transactions corresponding to the high-level transaction specifications.

Three perspectives

The ANSI/SPARC three level architecture. This shows that a data model can be an external model (or view), a conceptual model, or a physical model. This is not the only way to look at data models, but it is a useful way, particularly when comparing models. A data model instance may be one of three kinds according to ANSI in 1975

  1. Conceptual data model: describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial ’language’ with a scope that is limited by the scope of the model.

  2. Logical data model: describes the semantics, as represented by a particular data manipulation technology. This consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things.

  3. Physical data model: describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.

Conceptual Data Model

A conceptual data model is a model of the things in the business and the relationships among them, rather than a model of the data about those things. So in a conceptual data model, when you see an entity type called car, then you should think about pieces of metal with engines, not records in databases. As a result, conceptual data models usually have few, if any, attributes. What would often be attributes may well be treated as entity types or relationship types in their own right, and where information is considered, it is considered as an object in its own right, rather than as being necessarily about something else. A conceptual data model may still be sufficiently attributed to be fully instantiable, though usually in a somewhat generic way.

Logical Data Model

Logical data models include detail about attributes (characteristics in columns) needed to represent a concept, such as key structure (the attributes needed to define a unique instance of an entity), and they define details about the relationships within and between data entities. Relationships between entities can be optional or mandatory. They differ in terms of cardinality (one-to-one, one-to-many, many-to-many)

Physical Data Model

Physical data models represent the way that data are physically stored in a database. They describe the physical characteristics of data elements that are required to set up and store actual data about the entities represented.

Conceptual Data Model VS Logical Data Model

The Conceptual Data Model and Logical Data Model are both vital in the process of database design, but they serve different purposes and are used at different stages of the design process. Here’s a comparison:

  1. Purpose and Scope:
    • Conceptual Data Model (CDM): This model is high-level, abstract, and focuses on the big picture. It is used to organize, scope, and define business concepts and rules. The main purpose is to establish the entities, their attributes, and the relationships between them from a business perspective.
    • Logical Data Model (LDM): This model is more detailed than the CDM. It provides a detailed overview of the data to be stored in the database. It defines the structure of the data elements and sets the relationships between them. The LDM is independent of any database management system.
  2. Level of Detail:
    • CDM: It’s less detailed and usually does not include primary or foreign keys. It often uses high-level, business-friendly terminology and is easily understandable by non-technical stakeholders.
    • LDM: It includes more detail, such as the specific data types for each attribute and the primary and foreign keys. It starts to move towards a more technical representation of the data structure.
  3. Audience:
    • CDM: Intended for business stakeholders. It helps in communicating the core concepts and overall design to a non-technical audience.
    • LDM: Aimed more at data professionals such as database administrators and developers. It bridges the gap between the conceptual understanding of the business and the technical implementation.
  4. Usage:
    • CDM: Used in the initial stages of project development to ensure that the stakeholders have a common understanding of the entities and relationships in the system.
    • LDM: Used as a foundation for the physical design of the database. It’s a step closer to the technical implementation of the database.

Logical Data Model

Relational Model

Relational Model

NoSQL (Not Only SQL)

There are several driving forces behind the adoption of NoSQL databases, including:

  1. A need for greater scalability than relational databases can easily achieve, including very large datasets or very high write throughput
  2. A widespread preference for free and open source software over commercial database products
  3. Specialized query operations that are not well supported by the relational model
  4. Frustration with the restrictiveness of relational schemas, and a desire for a more dynamic and expressive data model

Document Model (JSON)

The JSON representation has better locality than the multi-table schema. If you want to fetch a profile in the relational example, you need to either perform multiple queries (query each table by user_id) or perform a messy multi-way join between the users table and its subordinate tables. In the JSON representa‐tion, all the relevant information is in one place, and one query is sufficient.

One-to-Many Relationships - Trees

Many-to-Many Relationships - Graph

A graph consists of two kinds of objects: vertices (also known as nodes or entities) and edges (also known as relationships or arcs). Many kinds of data can be modeled as a graph. Typical examples include:

  • Social graphs: Vertices are people, and edges indicate which people know each other.
  • The web graph: Vertices are web pages, and edges indicate HTML links to other pages.
  • Road or rail networks: Vertices are junctions, and edges represent the roads or railway lines between them.

Reference