Anonymizer definitions are used by HL7Viewer and HL7Script to quickly anonymize or de-identify one or more messages that contain PHI, making them suitable for replay in a test environment. When anonymizing a series of messages, the changed data is persisted to keep the messages consistent. For example, if PID.3 (the patient ID) is changed from "12345" to "TEST001" in the first message, "12345" is changed to "TEST001" in all PID.3 fields.

A sample definition is included with the release, Generic.anon.ini. This definition is a good start, but should not be considered an authoritative guide to anonymization. It covers the segments regularly encountered in the author's experience, but it should be tested with your own messages and configured to make sure all instances of actual PHI have been anonymized, including any custom Z-segments.

A definition includes three (sometimes four) sections:

The [Global] section appears at the top and is used to set global options.

The [Values] section defines how to generate replacement values for various string, numeric, and date/time types.

The [Fields] section lists all the fields that require anonymization, and which value generator should be used for each. A field may also be replaced with another previously anonymized field.

The [Increments] section is maintained by the program if the SaveIncrements global option is enabled.

Below is a snippet of an example definition file:

; This is a comment

[Global]
Alphabet=BCDFGHJKLMNPQRSTVWXYZ
Persist=1
SaveIncrements=1
DataStore=D:\HL7\Example.anon.data
NamedFields=D:\HL7\HL7NamedFields.2.7.txt

[Values]
anyST=ST
anyNM=NM
anyDT=DT SameAge=1
MRN=NM Min=100000001 Increment=1 Prefix=M
Name=ST Min=3 Max=12
Street=ST Constant="123 ANON ST"
Zip=ST Constant=12345
Phone=ST Constant=(800)555-1212
Email=ST Constant=anon@example.com
SSN=NM Min=999101000 Increment=1 Mask=999-99-9999 Ignore=000-00-0000|999-99-9999

[Fields]
PID.3=MRN ;End-of-line comment
PID.5.1.1=Name
PID.5.1.2=Name
PID.5.2=Name
PID.5.3=anyST
PID.7=anyDT
PID.11.1=Street
PID.11.2=Blank
PID.11.3=Name
PID.11.5=Zip
PID.13.1=Phone
PID.13.4=Email
PID.14.1=Phone
PID.14.4=Email
PID.18.1=PID.3
PID.19=SSN

Blank lines are ignored. Comments start with a semicolon (;) and may be whole-line or end-of-line comments.

The default extension for a definition file is .anon.ini. It can be anything since they are just plain text files, but .anon.ini is what HL7Viewer and HL7Script look for first.

Return to Top

Global Settings

The following options may be specified in the Global Settings section:

Alphabet
Provide the default Alphabet used by string value generators that do not specify their own Alphabet or Constant.
Persist
Change the default Persist setting of 1 (true) on field definitions. Individual fields may override the global setting.
NamedFields
Specify a Named Fields file to use named fields in the anonymizer definition instead of (or in addition to) numeric field keys.
SaveIncrements
Set SaveIncrements to 1 to have the definition remember the last used increment values on numeric value generators. This helps ensure that as new IDs are generated, they don't conflict with older identifiers already in the system. The values are maintained in the [Increments] section in the definition file. To reset the values, simply delete the section.
DataStore
If there is a need to persist anonymization data across multiple sessions, supply a filename for the DataStore. At the end of anonymization, the persisted data from the field definitions will be written to the file. This data will be loaded the next time this definition is used. To clear the data and start over, simply delete the DataStore file. This file contains the original PHI that has been anonymized, so protect it like any other sensitive data file.

Database Storage

A database may be used instead of the ini and datastore file for storing increment values and persisted data, respectively. The following global settings are used only when using a database, and override the file-based settings when provided.

Database
The name of a pre-configured Database Connection.
IncrementSQL
The SQL for getting a new increment value. This single SQL statement is responsible for initializing, advancing, and retrieving the next increment value. The SQL expects three parameters: :ValueName, :ValueInc, and :ValueMin, the properties of a numeric Value Generator. It should return a single integer (bigint) field as the result.
DataReadSQL
The SQL for retrieving the current persisted data for a given anonymized Field. It should return nothing (zero records) if there is no current value for the given input. It expects two parameters: :FieldKey and :OrigData, a field definition's HL7 key and the value being anonymized.
DataSaveSQL
The SQL for storing a new anonymized value for a persisted Field value. It expects three parameters: :FieldKey, :OrigData, and :AnonData, the HL7 key from a field definition, the key's original value, and the new anonymous value.
NamedFieldsSQL
A Named Fields definition can be retrieved from the database instead of a file. The SQL expects no parameters and should return a single string field containing the definition.

The Anonymizer Database Schema section contains an example of how to create tables and procedures for working with anonymization data.

Return to Top

Value Generators

There are three types of value generators: strings (ST), numbers (NM), and dates (DT). Each value definition starts with a unique name, an equal sign, and one of the types.

; A bare minimum value definition
anyST=ST

Only the name and type are required, but there are numerous options to help generate an interesting value. Options are separated by spaces and are given in Option=Value format. If a value contains spaces or semicolons, enclose it in double quotes (").

; Quote option values that contain spaces or semicolons
Street=ST Constant="123 ANON ST"

There are two built-in value generators that are always available: Blank and Null. Those do exactly what you would think and set the value to blank or Null (""), respectively.

The available options vary based on the value type. If an option has a default value other than blank, it is shown in parentheses. Boolean values use 0 for False and 1 for True.

ST - Strings

Alphabet
String. The Alphabet is a list of characters that are selected at random to generate the output. If left blank, it will default to the global Alphabet setting, or upper-case letters if no global default was provided. Excluding vowels can help avoid randomly generating rude words.
Constant
String. If provided, this value becomes the output for every input instead of generating a random value from the Alphabet.
Min (0)
Numeric. The minimum number of characters to randomly generate. If Min and Max are both zero (the default), the length of the output will match that of the input. Otherwise, the length is a random number between Min..Max (inclusive).
Max (0)
Numeric. The maximum number of characters to generate. If Min is zero and Max is negative, the output length is the length of the input + Max.

NM - Numbers

IsDigits (1)
Boolean. Determines if Min and Max are a number of digits for the output length (1), or actual values for generating a random number (0).
Min (0)
Numeric. The minimum value or number of digits in the output. If IsDigits=1 and Min and Max are both zero, the output length matches that of the input. Setting Min to a value > 99 or negative automatically sets IsDigits=0.
Max (0)
Numeric. The maximum value or number of digits in the output. If IsDigits=1 and Min is zero and Max is negative, the output length is the length of the input + Max. Setting Max to a value > 99 automatically sets IsDigits=0.
Decimals (0)
Numeric. The number of decimal places in the output. If using digits and Min, Max, and Decimals are all zero, the output matches the format of the input.
Increment (0)
Numeric. The output is incremented by this amount each time a value is generated, starting at Min. If Decimals > 0, the desired number of random decimal digits will also be added after the incremented integer. Setting Increment to any non-zero value will automatically set IsDigits=0.

DT - Dates

Date values generate a random date based on the options. If the input contains a time, the time remains unchanged.

Min (19000101)
Numeric. The minimum date to generate in yyyymmdd format.
Max (today's date)
Numeric. The maximum date to generate in yyyymmdd format.
SameAge (1)
Boolean. If set to 1, the random output date is limited to a value that has the same age in years as the input (as of today), bounded by Min and Max. According to HIPAA, any age over 90 should be anonymized to 90, so SameAge will cap the age at 90. When SameAge=0, the output is a random date between Min..Max (inclusive).

General Value Options

The following options apply to all types, even when the value is a Constant. Ignore is always checked first to determine if the value should remain unchanged. After generating the value using the type-specific options, the general options are applied in the order they are provided in the definition. Each option may be specified only once.

Ignore
String. Provide a list of pipe-delimited values that should not be anonymized. For example, an SSN might set Ignore="999-99-9999|000-00-0000" as those values are already anonymous and may have special meaning to the application. Blanks and Nulls ("") are never anonymized.
PadChar ("0")
String. The character used to pad output with the PadL and PadR options (see below). Defaults to a zero.
PadL, PadR
Numeric. Pads the output with leading/trailing PadChar characters up to the specified length. If the output is equal in length or longer than the pad value, it is not changed. The maximum value is 99. If set to zero, the original input length is used.
Left
Numeric. Keeps the leftmost count of characters from the output. If the output is shorter than this value, the entire value is kept. If Left is negative, it is added to the output length. Example: If the output is "foobar" and Left=-1, the result would be "fooba". If Left is set to zero, the original input length is used.
Right
Numeric. Keeps the rightmost count of characters from the output. If the output is shorter than this value, the entire value is kept. If Right is negative, it is added to the output length. Example: If the output is "foobar" and Right=-1, the result would be "oobar". If Right is set to zero, the original input length is used.
Prefix
String. This value is prepended to the output.
Suffix
String. This value is appended to the output.
MaskEscape ("\")
String. Changes the default Mask escape character (see below).
Mask
String. Formats the output using the mask value. It uses LogicLib's llStrings.FormatDigits function. Here is the documentation for that function:
  Right-justifies/overlays a string of characters (usually digits) into a format
  string. Especially handy for phone number/SSN formatting, but it could
  conceivably be used on any type of input.

  Ex: FormatDigits('6025551212', '(099)999-9999')  ->  '(602)555-1212'
      FormatDigits('5551212',    '(099)999-9999')  ->  '555-1212'
      FormatDigits('6025551212', '999.999.9999')   ->  '602.555.1212'
      FormatDigits('foo',        'bar')            ->  'foobar'

  All digit characters are always output even if the format string is shorter
  or blank. Output stops when you run out of digit characters, even though there
  may be more format string remaining.

  Format string rules:
  9 = Replace this character with a character from the digit string.
  0 = Same as 9 but always includes the next format character to the left, even
      if you have run out of digit characters.
  * = Any other character is copied to the output as a literal.

  To output a literal 0 or 9, precede it with the escape character. The default
  escape character is a backslash, but it can be changed if you need backslashes
  in your output.

  Example: FormatDigits('123456', '999\0999') -> '1230456'

Return to Top

Fields

The Fields section contains a list of all fields, components, and subcomponents that require anonymization. Each line consists of a field key, an equal sign, and the name of a value generator or a previously anonymized field key to copy.

If a Named Fields file has been specified, named fields may be used in field definitions. Numeric keys are always valid, even when a Named Fields file has been loaded.

The following example applies the value generator called "MRN" to PID.3:

PID.3=MRN

This example copies the value generated for PID.3 into PID.18. Note that PID.3 must be defined in the Fields list before PID.18 to do this.

PID.18=PID.3

All repetitions in all like segments will be anonymized unless the key provides specific segment sequence and/or repetition indexes, e.g. NK1#1.5, PID.3~1. One example of a reason to include a specific repetition index would be if a sender always uses the third repetition of PID.13 for the email address. You would list the regular PID.13 anonymization first, then the specific repetition.

PID.13.1=Phone
PID.13~3.1=Email ;Vendor always puts email here

If copying a previously anonymized field and the value should be copied from the same segment sequence and/or repetition that is currently being anonymized, wildcards can be used. The wildcard character is a question mark (?) and can follow either a segment sequence (#) or repetition (~) marker. The question marks will be replaced with the appropriate indexes for the current field. Without a wildcard, the first such segment (#1) and repetition (~1) are assumed.

PID.18=PID#?.3~?.1 ;Copies PID.3.1 from the same segment and repetition of this PID.18

A field definition may also include one or more of the following options:

Persist
Boolean. Overrides the global Persist setting if provided.
When copying another field or using a constant value, Persist will automatically be set to 0 since persisting the data would be of no use.
Ignore
String, default=blank. Provide a list of pipe-delimited values that should not be anonymized. Blanks and Nulls ("") are never anonymized. Quote values that include spaces or semicolons.
This Ignore works at the specific field level rather than the more generic value generator level. If both are present, the field's Ignore will be checked first. If it does not match, then the value generator's Ignore is checked.

Advanced Conditionals

If more flexibility is required in choosing a replacement for a field, conditional logic in IF-THEN-ELSE format can be used to select the correct value generator or field to copy:

PID.13.1=IF PID#?.13~?.2 == "NET" THEN Email ELSE Phone
; If the SSN starts with "X" don't change it:
PID.19.1=IF PID.19.1 ~= "X" THEN IGNORE ELSE SSN

"IF " must immediately follow the equal sign. The IF portion of the expression uses the same syntax as HL7Script IF statements. The THEN and ELSE parts are both required, and must provide either a value generator name, a field key to copy, or the word IGNORE to leave the field unchanged. Segment and repetition wildcards work as they do in non-conditional assignments.

The THEN and ELSE parts can also nest additional conditional logic expressions within parentheses:

PID.13.1=IF PID#?.13~?.2=="NET" THEN Email ELSE (IF PID#?.13~?.3=="CP" THEN CellPhone ELSE Phone)

Nesting is effectively unlimited, but the entire expression must be contained on a single line.

The Persist and Ignore options are still available when using conditional logic, and must be the last options on the line when present.

Return to Top

Anonymizer Database Schema

See also: HL7Tools Database Connections

Here is an example of a possible database schema for persisting anonymization data, including Global Options tailored to work with it.

If multiple threads or processes could be anonymizing data simultaneously, a threadsafe design using sequences/generators/identity columns should be developed. Those constructs guarantee that no two connections could retrieve the same increment value.

Schema Example (SQL Server)

CREATE TABLE AnonStore (
    fieldkey nvarchar(50) NOT NULL,
    origdata nvarchar(250) NOT NULL,
    anondata nvarchar(250) NOT NULL,
    CONSTRAINT pk_AnonStore PRIMARY KEY (fieldkey, origdata)
)
GO

CREATE TABLE AnonInc (
    valuename NVARCHAR(50) NOT NULL PRIMARY KEY,
    lastincrement BIGINT NOT NULL
)
GO

CREATE PROCEDURE AnonIncrement(@valuename NVARCHAR(50), @inc BIGINT, @min BIGINT)
AS
BEGIN
	DECLARE @last BIGINT
	SELECT @last = lastincrement FROM AnonInc WHERE valuename = @valuename;
	IF @last IS NULL
		INSERT INTO AnonInc (valuename, lastincrement) VALUES (@valuename, @min);
	ELSE BEGIN
		SET @last = @last + @inc;
		UPDATE AnonInc SET lastincrement = @last WHERE valuename = @valuename;
	END
	SELECT lastincrement FROM AnonInc WHERE valuename = @valuename;
END
GO

Global Options

Database=(your connection name here)
IncrementSQL=EXEC AnonIncrement :ValueName, :ValueInc, :ValueMin;
DataReadSQL=SELECT anondata FROM AnonStore WHERE fieldkey = :FieldKey AND origdata = :OrigData;
DataSaveSQL=INSERT INTO AnonStore (fieldkey, origdata, anondata) VALUES (:FieldKey, :OrigData, :AnonData);

Return to Top

Useful Links

Methods for De-identification
https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html
What is Considered Protected Health Information Under HIPAA?
https://www.hipaajournal.com/what-is-considered-protected-health-information-under-hipaa/

Return to Top