One of Streamline’s standout features is its ability to automatically detect and tag sensitive data across your data fabric. These tags help you stay on top of security and compliance by highlighting areas that may require special handling.
What Are Sensitive Data Tags?
Streamline uses tags to identify and label data sources, entities, and datasets that may contain sensitive information. These tags are visible right in the data catalog, making it easy to know where sensitive data lives.
Example view in the data catalog:

How Does Streamline Detect Sensitive Data?
We use a process called data classification. Here’s how it works:
Automatic Detection:
When a data source is connected, our machine learning model runs automatically—usually in just a second or two.
Classification by Data Class:
The model inspects both the contents and the structure of your tables to assign a data class to each field.
View the Results:
You can see the assigned data classes by clicking on an entity in your data catalog.

What Is Data Classification?
Data classification is the process of labeling data based on its type—like identifying that a column contains Social Security Numbers or email addresses.
For example:

Each column in your data source is assigned a single data class (like “Full Name” or “IP Address”). If a field doesn’t match any known type, it’s marked as “Other.”
Note: The classification is based on data content, not the column name. For instance,
patient_full_nameanddoctor_full_namewould both be classified as “Full Name.”
Supported Data Classes
Streamline supports a wide and growing list of data classes, grouped into categories like:
ID Number: Social Security Number, Passport Number, etc.
Contact: Email Address, Phone Number
Medical: ICD Codes, MRNs, UDI
Financial: Bank Account Numbers, Payment Card Info
Location, Name, Demographics, and more
Here's a full list:
Data Class | Category |
|---|---|
Social Security Number | ID Number |
Passport Number | ID Number |
Driver's License Number | ID Number |
National ID Number | ID Number |
Tax ID Number | ID Number |
License Plate Number | ID Number |
Vehicle Identification Number | ID Number |
Patient Account Number | ID Number |
Other Certificate or License Number | ID Number |
Email Address | Contact |
URL | Digital |
IP Address | Digital |
MAC Address | Digital |
Phone Number | Contact |
Contact Information** | Contact |
Birth Date | Date |
Date of Death | Date |
Full Address | Location |
Street Address | Location |
Address Line 2 | Location |
City | Location |
State or Province | Location |
Country Name or Code | Location |
Zip or Postal Code | Location |
Full Name | Name |
Name Prefix | Name |
First Name | Name |
Middle Name | Name |
Last Name | Name |
Name Suffix | Name |
Mother’s Maiden Name | Name |
International Diagnostic Code (ICD) | Medical |
Other Diagnostic Code* | Medical |
Procedure Code | Medical |
Healthcare Common Procedure Coding System (HCPCS) Code | Medical |
Unique Device Identifier (UDI) | Medical |
National Drug Code (NDC) | Medical |
Dosage Information | Medical |
Medicare Beneficiary Identifier (MBI) | Insurance |
National Provider Identifier (NPI) | Insurance |
Insurance Member ID | Insurance |
Insurance Group Number | Insurance |
Insurance Provider Name | Insurance |
Insurance Payor Number (EDI)* | Insurance |
Medical Record Number | Medical |
US Bank Account Number | Financial |
US Bank Account Routing Number | Financial |
International Bank Account Number (IBAN) | Financial |
SWIFT/BIC Code | Financial |
Payment Card Number | Financial |
Payment Card Verification Code | Financial |
Payment Card Expiration Date | Financial |
Currency Code** | Financial |
Race or Ethnicity | Demographics |
Sex, Gender, or Gender Identity | Demographics |
Age | Demographics |
Nationality | Demographics |
Language** | Demographics |
Job Title | Demographics |
Student ID Number | ID Number |
Course Name | Academic |
Course Number | Academic |
Course Registration Number | Academic |
Grade | Academic |
GPA | Academic |
Enrollment Status | Academic |
Graduation Year | Academic |
Degree Program | Academic |
Major Field of Study | Academic |
Academic Advisor | Academic |
Employment Status | HR/Employment |
Employee ID Number | HR/Employment |
Salary | HR/Employment |
Department | HR/Employment |
Supervisor Name | HR/Employment |
Marital Status | Demographics |
Place of Birth | Demographics |
Personal Relationship | Demographics |
Political Part Registration | Demographics |
Education Level | Demographics |
Religious Affiliation | Demographics |
Tribal Affiliation | Demographics |
Company Name | Business |
DUNS Number | Business |
Price or Quote | Business |
Revenue | Business |
Credit Card Network | Financial |
Bank Name | Financial |
Other Medical Information | Medical |
Other Potential PII* | Other |
Other | Other |
Filename | Digital |
Username | Digital |
Plaintext Password | Authentication |
Password Hash | Authentication |
Middle Initial | Name |
GeoCoordinates | Location |
You can find the full list of data classes in the definitions.py file of our classifier code.
Some data classes are currently limited to specific systems like Salesforce or EHRs—these are noted in the list.
Note: Custom data classes are not currently not available.
How Are Tags Like PII, PHI, and PCI Applied?
Streamline supports 11 sensitive data tags:
Tag | Meaning |
|---|---|
PII | Personally Identifiable Information |
PHI | Protected Health Information |
PCI | Payment Card Information |
HIPAA | Health Insurance Portability and Accountability Act (United States) |
GDPR | General Data Protection Regulation (European Union) |
PIPEDA | Personal Information Protection and Electronic Documents Act (Canada) |
LGPD | Lei Geral de Proteção de Dados (Brazil) |
APA | Australian Privacy Act |
UK_GDPR | UK Version of GDPR |
DPDP_ACT | Digital Personal Data Protection Act (India) |
FERPA | Family Educational Rights and Privacy Act (United States) |
When Are These Tags Applied?
Each tag is triggered based on the presence of certain data classes:
PII: Triggered by things like Social Security Numbers, Names, IPs, etc.
PHI: Includes medical codes, patient IDs, diagnosis data, etc.
PCI: Covers credit card numbers, expiration dates, and similar data.
HIPAA: Appears when both PII and PHI tags are present, or if a Medical Record Number (MRN) or UDI is found.
Here's a full list:
PII | PHI | PCI |
|---|---|---|
Social Security Number | International Diagnostic Code (ICD) | Payment Card Number |
Passport Number | Other Diagnostic Code | Payment Card Verification Code |
Driver's License Number | Procedure Code | Payment Card Expiration Date |
National ID Number | Healthcare Common Procedure Coding System (HCPCS) Code |
|
Tax ID Number | Unique Device Identifier (UDI) |
|
License Plate Number | National Drug Code (NDC) |
|
Vehicle Identification Number | Other Medical Information |
|
Patient Account Number |
|
|
Other Certificate or License Number |
|
|
Email Address |
|
|
URL |
|
|
IP Address |
|
|
MAC Address |
|
|
Phone Number |
|
|
Birth Date |
|
|
Date of Death |
|
|
Full Address |
|
|
Street Address |
|
|
Address Line 2 |
|
|
City |
|
|
Zip or Postal Code |
|
|
Full Name |
|
|
Name Prefix |
|
|
First Name |
|
|
Middle Name |
|
|
Last Name |
|
|
Name Suffix |
|
|
Mother’s Maiden Name |
|
|
Unique Device Identifier (UDI) |
|
|
Medicare Beneficiary Identifier (MBI) |
|
|
Insurance Member ID |
|
|
Medical Record Number |
|
|
US Bank Account Number |
|
|
International Bank Account Number (IBAN) |
|
|
Payment Card Number |
|
|
Age |
|
|
Contact Information |
|
|
Other Potential PII |
|
Important: Understanding Compliance Tags
When Streamline tags your data with regulatory framework labels (such as GDPR, HIPAA, etc.), this indicates that the data likely contains information subject to these frameworks based on its classification, not that your organization is in compliance with them.
Regulatory compliance depends on how data is collected, stored, processed, and ultimately used, which is beyond what automated classification can determine. These tags are intended to help you identify data that may require careful handling under applicable regulations, so you can take appropriate action based on your organization's specific compliance obligations.
Consult with your legal or compliance team to determine your actual compliance status and requirements.
How Does Streamline Classify Different Data Sources?
Table-like Data Sources (e.g., SQL Databases)
We classify data at the column level based on:
- Sample values from the data
- Column names and table structure
- Why classify columns instead of rows?
- It’s more accurate
- It fits how data fabrics typically structure data
Standard Schema Sources (e.g., Salesforce, EHRs)
These systems have well-documented structures. Here’s how we handle them:
Salesforce: We used an LLM to analyze documentation and assign default classifications to standard fields. Custom fields fall back to the ML model.
EHR Systems (FHIR-based): Fields are classified manually during setup.
How the Classifier Works
Our classification engine combines:
An expert system for rule-based identification
A custom-trained language model, trained on public and synthetic data
Together, these components assign a best-fit data class—or “Other” if no match is found.
Geoclassification
After all of the fields have been classified, the data catalog checks whether a data source contains known forms of location data. The following data classes are presently supported:
City
Country
Full Address
Street Address
Postal Code
State
Place Of Birth
Phone Number
Geo Coordinates
Work Location
Nationality
Latitude
Longitude
Comments
0 comments
Article is closed for comments.