How Does Streamline Tag Sensitive Data? – Streamline

One of Streamline’s standout features is its ability to automatically detect and tag sensitive data across your data fabric. These tags help you stay on top of security and compliance by highlighting areas that may require special handling.

What Are Sensitive Data Tags?

Streamline uses tags to identify and label data sources, entities, and datasets that may contain sensitive information. These tags are visible right in the data catalog, making it easy to know where sensitive data lives.

Example view in the data catalog:

How Does Streamline Detect Sensitive Data?

We use a process called data classification. Here’s how it works:

Automatic Detection:
When a data source is connected, our machine learning model runs automatically—usually in just a second or two.

Classification by Data Class:
The model inspects both the contents and the structure of your tables to assign a data class to each field.

View the Results:
You can see the assigned data classes by clicking on an entity in your data catalog.

What Is Data Classification?

Data classification is the process of labeling data based on its type—like identifying that a column contains Social Security Numbers or email addresses.

For example:

Each column in your data source is assigned a single data class (like “Full Name” or “IP Address”). If a field doesn’t match any known type, it’s marked as “Other.”

Note: The classification is based on data content, not the column name. For instance, patient_full_nameand doctor_full_name would both be classified as “Full Name.”

Supported Data Classes

Streamline supports a wide and growing list of data classes, grouped into categories like:

ID Number: Social Security Number, Passport Number, etc.

Contact: Email Address, Phone Number

Medical: ICD Codes, MRNs, UDI

Financial: Bank Account Numbers, Payment Card Info

Location, Name, Demographics, and more

Here's a full list:

Data Class	Category
Social Security Number	ID Number
Passport Number	ID Number
Driver's License Number	ID Number
National ID Number	ID Number
Tax ID Number	ID Number
License Plate Number	ID Number
Vehicle Identification Number	ID Number
Patient Account Number	ID Number
Other Certificate or License Number	ID Number
Email Address	Contact
URL	Digital
IP Address	Digital
MAC Address	Digital
Phone Number	Contact
Contact Information**	Contact
Birth Date	Date
Date of Death	Date
Full Address	Location
Street Address	Location
Address Line 2	Location
City	Location
State or Province	Location
Country Name or Code	Location
Zip or Postal Code	Location
Full Name	Name
Name Prefix	Name
First Name	Name
Middle Name	Name
Last Name	Name
Name Suffix	Name
Mother’s Maiden Name	Name
International Diagnostic Code (ICD)	Medical
Other Diagnostic Code*	Medical
Procedure Code	Medical
Healthcare Common Procedure Coding System (HCPCS) Code	Medical
Unique Device Identifier (UDI)	Medical
National Drug Code (NDC)	Medical
Dosage Information	Medical
Medicare Beneficiary Identifier (MBI)	Insurance
National Provider Identifier (NPI)	Insurance
Insurance Member ID	Insurance
Insurance Group Number	Insurance
Insurance Provider Name	Insurance
Insurance Payor Number (EDI)*	Insurance
Medical Record Number	Medical
US Bank Account Number	Financial
US Bank Account Routing Number	Financial
International Bank Account Number (IBAN)	Financial
SWIFT/BIC Code	Financial
Payment Card Number	Financial
Payment Card Verification Code	Financial
Payment Card Expiration Date	Financial
Currency Code**	Financial
Race or Ethnicity	Demographics
Sex, Gender, or Gender Identity	Demographics
Age	Demographics
Nationality	Demographics
Language**	Demographics
Job Title	Demographics
Student ID Number	ID Number
Course Name	Academic
Course Number	Academic
Course Registration Number	Academic
Grade	Academic
GPA	Academic
Enrollment Status	Academic
Graduation Year	Academic
Degree Program	Academic
Major Field of Study	Academic
Academic Advisor	Academic
Employment Status	HR/Employment
Employee ID Number	HR/Employment
Salary	HR/Employment
Department	HR/Employment
Supervisor Name	HR/Employment
Marital Status	Demographics
Place of Birth	Demographics
Personal Relationship	Demographics
Political Part Registration	Demographics
Education Level	Demographics
Religious Affiliation	Demographics
Tribal Affiliation	Demographics
Company Name	Business
DUNS Number	Business
Price or Quote	Business
Revenue	Business
Credit Card Network	Financial
Bank Name	Financial
Other Medical Information	Medical
Other Potential PII*	Other
Other	Other
Filename	Digital
Username	Digital
Plaintext Password	Authentication
Password Hash	Authentication
Middle Initial	Name
GeoCoordinates	Location

You can find the full list of data classes in the definitions.py file of our classifier code.

Some data classes are currently limited to specific systems like Salesforce or EHRs—these are noted in the list.

Note: Custom data classes are not currently not available.

How Are Tags Like PII, PHI, and PCI Applied?

Streamline supports 11 sensitive data tags:

Tag	Meaning
PII	Personally Identifiable Information
PHI	Protected Health Information
PCI	Payment Card Information
HIPAA	Health Insurance Portability and Accountability Act (United States)
GDPR	General Data Protection Regulation (European Union)
PIPEDA	Personal Information Protection and Electronic Documents Act (Canada)
LGPD	Lei Geral de Proteção de Dados (Brazil)
APA	Australian Privacy Act
UK_GDPR	UK Version of GDPR
DPDP_ACT	Digital Personal Data Protection Act (India)
FERPA	Family Educational Rights and Privacy Act (United States)

When Are These Tags Applied?

Each tag is triggered based on the presence of certain data classes:

PII: Triggered by things like Social Security Numbers, Names, IPs, etc.

PHI: Includes medical codes, patient IDs, diagnosis data, etc.

PCI: Covers credit card numbers, expiration dates, and similar data.

HIPAA: Appears when both PII and PHI tags are present, or if a Medical Record Number (MRN) or UDI is found.

Here's a full list:

PII	PHI	PCI
Social Security Number	International Diagnostic Code (ICD)	Payment Card Number
Passport Number	Other Diagnostic Code	Payment Card Verification Code
Driver's License Number	Procedure Code	Payment Card Expiration Date
National ID Number	Healthcare Common Procedure Coding System (HCPCS) Code
Tax ID Number	Unique Device Identifier (UDI)
License Plate Number	National Drug Code (NDC)
Vehicle Identification Number	Other Medical Information
Patient Account Number
Other Certificate or License Number
Email Address
URL
IP Address
MAC Address
Phone Number
Birth Date
Date of Death
Full Address
Street Address
Address Line 2
City
Zip or Postal Code
Full Name
Name Prefix
First Name
Middle Name
Last Name
Name Suffix
Mother’s Maiden Name
Unique Device Identifier (UDI)
Medicare Beneficiary Identifier (MBI)
Insurance Member ID
Medical Record Number
US Bank Account Number
International Bank Account Number (IBAN)
Payment Card Number
Age
Contact Information
Other Potential PII

Important: Understanding Compliance Tags

When Streamline tags your data with regulatory framework labels (such as GDPR, HIPAA, etc.), this indicates that the data likely contains information subject to these frameworks based on its classification, not that your organization is in compliance with them.

Regulatory compliance depends on how data is collected, stored, processed, and ultimately used, which is beyond what automated classification can determine. These tags are intended to help you identify data that may require careful handling under applicable regulations, so you can take appropriate action based on your organization's specific compliance obligations.

Consult with your legal or compliance team to determine your actual compliance status and requirements.

How Does Streamline Classify Different Data Sources?

Table-like Data Sources (e.g., SQL Databases)

We classify data at the column level based on:

Sample values from the data
Column names and table structure
Why classify columns instead of rows?
It’s more accurate
It fits how data fabrics typically structure data

Standard Schema Sources (e.g., Salesforce, EHRs)

These systems have well-documented structures. Here’s how we handle them:

Salesforce: We used an LLM to analyze documentation and assign default classifications to standard fields. Custom fields fall back to the ML model.

EHR Systems (FHIR-based): Fields are classified manually during setup.

How the Classifier Works

Our classification engine combines:

An expert system for rule-based identification

A custom-trained language model, trained on public and synthetic data

Together, these components assign a best-fit data class—or “Other” if no match is found.

Geoclassification

After all of the fields have been classified, the data catalog checks whether a data source contains known forms of location data. The following data classes are presently supported:

City
Country
Full Address
Street Address
Postal Code
State
Place Of Birth
Phone Number
Geo Coordinates
Work Location
Nationality
Latitude
Longitude