DAP News

DATE: August 9, 2023

AUTHOR: The Distribution Innovation team

Hope you all had a relaxing, recharging summer and have safely returned to work data curious and insight hungry.

Here's a summary of the most important changes that happened on the platform over the summer, and a heads up regarding a potentially breaking data model change scheduled for August 17th.

Address columns will be removed from planned delivery table

Firstly, we would like to draw your attention to an important update regarding the structure of our f_planneddelivery table. Our review has revealed certain inconsistencies within specific columns: [entrance, floor, house_nr, street, apartment_nr, zip_code]. After careful consideration, we have decided to remove the mentioned columns from the f_planneddelivery table. This change will happen August 17th. To ensure the availability of accurate address data linked to deliveries, we recommend utilizing DAP.address table in conjunction with the f_planneddelivery table. This combined approach will enable you to access a more robust and precise dataset. If your scripts will be affected by this change, please make the necessary adjustments.

Example query:
SELECT
*
FROM
schema.f_planneddelivery pd
LEFT JOIN
schema.address a
ON
pd.delivery_point_id = a.delivery_point_id

New parameter "Parallel Off" at Query API

By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. The default option is ON or TRUE. If PARALLEL is OFF or FALSE, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. So, for example, if you unload 13.4 GB of data, UNLOAD creates three files.
Disabling Parallel functionality will provide users with single pre-signed URL (in most cases), but with slight downgrade in performance.

Additional parameter added to the SNS message, that contains start and end date which indicates date ranges for the new data upload. Meaning users need only to query for the new data, avoiding full unload every single time.
New sns message example:

{
  "customEventId": "data-pipeline-completed",
  "eventTime": "2023-07-24T06:32:35.1690180355",
  "customEventPayload": {
    "pipeline-id": "load-redshift-fact-Parcel",
    "table-name": [
      "f_parcel"
    ],
    "job_metadata": {
      "start_date": {
        "delivery_date": "2023-07-24",
        "delivery_hour": 5
      },
      "end_date": {
        "delivery_date": "2023-07-24",
        "delivery_hour": 5
      }
    }
  }
}

Old example:

{
  "customEventId": "data-pipeline-completed",
  "eventTime": "2023-07-24T06:32:35.1690180355",
  "customEventPayload": {
    "pipeline-id": "load-redshift-fact-RouteSummary",
    "table-name": [
      "f_parcel"
    ],
    "job_metadata": {
    }
  }
}

Pseudo code for the unload query:

if sns_notification.get('customEventPayload', {}).get('job_metadata', {}).get('start_date'):
    sql_query = f"select * from {workspace_schema}.{table_name} where dateadd(hour,snapshot_hour,snapshot_date) >= dateadd(hour,{job-metadata['start_date']['delivery_hour']},{job-metadata['start_date']['delivery_date']});"

SQL Templates in Redshift query editor

Users should be able to access SQL Templates under the redshift tab, covering common use cases.

This new library of Templates will replace the Query Templates currently in the web app, and will be growing over time.

position_id has been added to f_plannedwork

Enabling joins with absence and absenceperiods

Mtr_address_id and UUID added to the address table

This addition should enable users to observe the IDs assigned internally by the DI system users. Note that Mtr_address_id is the fictitious ID used in Norway, while the Universal Unique Identifier (UUID) only applies to the Swedish customers (See: https://www.uuidgenerator.net/). These columns serve the purpose of identifying the customers that make use of the fictitious addresses, and possibly observing the differences between the ones who make use of these addresses and the ones who don't.

Bug fixes

Fixed an issue where parcels without deliveryPointId would be filtered out in the ingestion process and not show up in Redshift.

That's all for now

Eirik Lyngvi

Head of Analytics