Defining a Data Model

Modelling entities, relationships, signals and time series in your data

Exabel allows you to create flexible data models to represent your data, and subsequently query, transform and analyze the data to derive investment insights.

You can create new entities (eg brands that belong to a company) and relationships (eg ownership by a company of brands), and link these to the existing company entities already provided by Exabel. Then, you can import time series data on any entity in your data model, and use our signal DSL to query and transform this data.

πŸ“˜

Example - why is this important?

Gap Inc. owns several brands, including Banana Republic. Banana Republic has stores across many countries. You have data at a very granular level, showing Gap's sales performance by brand and also by country.

An analyst may want flexibility to deep-dive into Gap at this granular level (Banana Republic in the USA), but equally also want to roll the lower-level metrics up to understand Gap's overall performance.

A well-defined data model will allow the analyst a lot of flexibility to:

  1. Query and analyze data on a granular level
  2. Systematically aggregate data up from a lower entity-level to a higher one

πŸ‘

Best practice: data models

  1. Data models do not have to be complex! In many cases, simple company-level data models will suffice.

  2. Aim to ultimately connect entities to the company entity type, in order to enable aggregation up to company-level.

  3. Investment users (analysts / PMs) should define up-front the granularity of analysis desired. This will help your data engineering team design the data model and ensure that the right data is imported.

Before reviewing the example data models below, it is important to understand these key concepts:

Example #1: company-level data

The simplest data model, applicable if your data exists at company-level, or you choose to pre-aggregate as such:

1876

Notes

  • A single sales signal is defined for company entities, and encapsulates multiple time series - one for each company
  • company is a global entity type that is provided by the Exabel platform
  • Company entities are pre-loaded on the Exabel platform, so you don't have to import these

Steps

  1. Import time series data for each company, specifying sales as the signal. Whether you are using the file uploader or Exabel SDK, you will be able to specify the signal that each time series is tied to.

Example #2: companies with child entities

Another common data model has lower-level child entities that belong to each company. This example has segment child entities - these might be business or geographical segments, depending on your data:

2260

Notes

  • You can link each company to multiple segment entities.
  • We create segments as child entities of companies, because each segment can only be owned by 1 company
  • We define the sales signal are imported at both segment- and company-level, and import time series at both levels. Alternatively, you could choose to define the signal and import data only at segment-level, and aggregate to company-level dynamically in the Exabel platform.
  • The HAS_SEGMENT relationship type is set to be an ownership relationship. This denotes that the from-entity (company) "owns" the to-entity (segment). In most cases, you will be creating ownership relationships.

Steps

You will need to use the Exabel SDK in order to import entities, relationships, and non-company signals.

  1. Create the segment entity type in your namespace, as this is not a pre-defined global entity type.
    For now, you will need to ask Exabel to create these for you.
  2. Import all segment entities that exist in your data, with the segment entity type.
  3. Import relationships connecting each segment to a company, using the HAS_SEGMENT relationship type.
  4. Import time series data for each brand and each company, specifying sales as the signal.

Example #3: multiple top-level entities

Your data may have additional dimensions that you want to model, beyond the standard company-level entities.

In this example, we have an alternative data set for company job postings, segmented by both company and occupation (eg "sales jobs"):

Notes

  • We model occupation as a top-level entity type, because the data set defines a limited number of standardized occupations that are seen across all companies.
  • We create a company_and_occupation associative entity type that is owned by both a company and occupation.
  • Example entities in this model might be:
    • occupation: sales, engineering
    • company: Apple, Microsoft
    • company_and_occupation: apple_sales, apple_engineering, microsoft_sales, microsoft_engineering
  • There are 3 signals defined for the company_and_occupation entity: jobs_created, jobs_deleted, and jobs_active.
    • The raw signals can therefore be retrieved for a given company_and_occupation entity.
    • You may also create signals that aggregate these up to company-level or occupation-level, by using the signal DSL.
  • The jobs_active_duration signal is defined for all 3 entity types. This means that you must import time series data for this signal, for entities across all 3 entity types.

Steps

You will need to use the Exabel SDK in order to import entities, relationships, and non-company signals.

  1. Create the occupation and company_and_occupation entity types in your namespace, as these are not pre-defined global entity types. Define company_and_occupation as an associative entity type.
    For now, you will need to ask Exabel to create these for you.
  2. Import all occupation and company_and_occupation entities that exist in your data.
  3. Import relationships connecting each company_and_occupation to the appropriate company and occupation, using the corresponding relationship type.
  4. Import time series data for the jobs_created, jobs_deleted, and jobs_active signals, for the company_and_occupation entities.
  5. Import time series data for the jobs_active_duration signal, for the company_and_occupation, company, and occupation entities.