Once you've connected your data sources and set up a functional data catalog, it’s time to start organizing your data into datasets for use in your projects and workflows. This guide will walk you through the steps of creating and managing datasets to ensure your data is structured, clean, and ready to be used effectively.
Step 1: Access the Dataset Section & Define the purpose of your Dataset
To start, navigate to the Datasets section in the platform’s side panel. From here, you can create a new dataset that will be incorporated into your workflows.
Note: A Dataset serves the purpose of organizing and naming the data that will be used in your workflow without altering the original schema at the source. This is essential for keeping your data consistent and accurate, especially when pulling from large or complex data sources.
Step 2: Create a New Dataset
Click the Create New Dataset button to begin. The first step is to name your data set and then select the data source that the data set will pull from. The primary entity will act as the core data source for the data set. To ensure clarity and consistency, you can assign a label to your data set. This label can help bridge any gaps between the terminology used in the data source and the language used in your workflow.
Step 3: Configure Write access for data fields
Once you're happy with the naming and labeling, click Next. On this screen, you can configure the write access for each data field. After you're happy with your configuration, you can click Create and the new data set will be ready for use in your workflow.
Step 4: Create a Dataset from the Data Catalog (Alternative Method)
In addition to creating a data set from the Datasets section, you can also create a dataset directly from the Data Catalog. Here’s how:
Go to the Data Catalog and select an entity from your data source.
This will open up the schema view, displaying the data within the entity, along with information about its classification and data types. On this screen, you can click the New dataset from entity option.
At this stage, you can begin to customize the data set by selecting specific fields or modifying their classifications if needed.
(Optional) Step 5: Add Related Entities
When creating a dataset, you may want to include related entities—additional tables or data sources that can be linked to the primary entity. These related entities will be available for use in your workflow, making your data more comprehensive.
To add a related entity, click on the Add Related Entity option right under your Primary entity. A list of available related entities will appear, and you can select the ones that match your data needs. The platform will automatically use common data fields to connect these related entities to the primary data source.
Note: You can continue adding as many related entities as needed. When you’re ready, click Next & Create to finalize your data set as needed.
(Optional) Step 6: Edit an Existing Dataset
If you need to modify your dataset after creation, editing is simple. Go to the Datasets section, find the data set you want to modify, and click on Edit. This will open the data set for editing.
You can:
- Add or remove related entities
- Change labels or classifications
- Make adjustments to the data set as needed
Once you’ve made the necessary changes, click Next & Update to save your modifications. A notification will confirm that your data set has been successfully updated.
By following these steps, you can easily create and manage datasets within your project, ensuring that your data remains organized and aligned with your workflow needs. Whether creating a new dataset from the data catalog or editing an existing one, the platform provides all the tools you need to structure your data efficiently.
Comments
0 comments
Article is closed for comments.