Anonymizer definitions are used by HL7Viewer and HL7Script to quickly anonymize one or more messages that contain PHI, making them suitable for replay in a test environment. When anonymizing a series of messages, the changed data is persisted to keep the messages consistent. For example, if PID.3 (the patient ID) is changed from "12345" to "TEST001" in the first message, "12345" is changed to "TEST001" in all PID.3 fields.

A sample definition is included with the release, Generic.anon.ini. This definition should not be considered an authoritative guide to anonymization. I have tried to cover all the segments I typically encounter from the 2.31 specs, but you need to test with your own messages to make sure all instances of actual PHI have been anonymized, including any custom Z-segments.

A definition includes three (sometimes four) sections:

The [Global] section appears at the top and is used to set global options.

The [Values] section defines how to generate replacement values for various string, numeric, and date/time types.

The [Fields] section lists all the fields you want to anonymize, and which value generator you want to use for each. You can also replace a field with another previously anonymized field.

The [Increments] section is maintained by the program if you are using the SaveIncrements global option.

Below is a snippet of an example definition file:

; This is a comment

[Global]
Alphabet=BCDFGHJKLMNPQRSTVWXYZ
Persist=1
SaveIncrements=1
DataStore=D:\HL7\Example.anon.data
NamedFields=D:\HL7\HL7NamedFields.txt

[Values]
anyST=ST
anyNM=NM
anyDT=DT SameAge=1
MRN=NM Min=100000001 Increment=1 Prefix=M
Name=ST Min=3 Max=12
Street=ST Constant="123 ANON ST"
Street2=ST Constant="APT 00"
Zip=ST Constant=12345
Phone=ST Constant=(800)555-1212
Email=ST Constant=anon@example.com
SSN=NM Min=999101000 Increment=1 Mask=999-99-9999 Ignore=000-00-0000|999-99-9999

[Fields]
PID.3=MRN ;End-of-line comment
PID.5.1.1=Name
PID.5.1.2=Name
PID.5.2=Name
PID.5.3=anyST
PID.7=anyDT
PID.11.1=Street
PID.11.2=Street2
PID.11.3=Name
PID.11.5=Zip
PID.13.1=Phone
PID.13.4=Email
PID.14.1=Phone
PID.14.4=Email
PID.18.1=PID.3
PID.19=SSN

Blank lines are ignored. Comments start with a semicolon (;) and may be whole-line or end-of-line comments.

The default extension for a definition file is .anon.ini. It can be anything since they are just plain text files, but .anon.ini is what HL7Viewer looks for first.

Return to Top

Global Settings

The following options can be specified in the Global Settings section:

Alphabet
Provide the default Alphabet used by string value generators that do not specify their own Alphabet or Constant.
Persist
Overrides the default Persist setting of 1 (true) on field definitions. Individual fields can still specify their own options.
NamedFields
Specify a Named Fields file if you want to use named fields in the anonymizer definition instead of (or in addition to) numeric field keys.
SaveIncrements
Set SaveIncrements to 1 if you want the definition to remember the last used increment values on numeric value generators. This helps ensure that as new IDs are generated, they don't conflict with older identifers already in the system. The values are maintained in the [Increments] section in the definition file. To reset the values, just delete that whole section.
DataStore
If you need to persist anonymization data across multiple sessions, supply a filename for the DataStore. At the end of anonymization, the persisted data from the field definitions will be written to the file. This data will be loaded the next time this definition is used. To clear the data and start over, simply delete the DataStore file. This file contains the original PHI that you have anonymized, so protect it like you would any other sensitive data file.

Return to Top

Value Generators

There are three types of value generators: strings (ST), numbers (NM), and dates (DT). Each value definition starts with a unique name, an equal sign, and one of the types.

; A bare minimum value definition
anyST=ST

Only the name and type are required, but there are numerous options to help generate an interesting value. Options are separated by spaces and are given in Option=Value format. If a value contains spaces or semicolons, enclose it in double quotes (").

; Quote option values that contain spaces or semicolons
Street=ST Constant="123 MAIN ST"

The options vary based on the value type:

ST - Strings

Alphabet
String, default=blank. The Alphabet is a list of characters that are selected at random to generate the output. If left blank, it will default to the global Alphabet setting, or upper-case letters if no default was specified. (I like to exclude vowels to avoid randomly generating rude words!)
Constant
String, default=blank. Used instead of Alphabet, this value becomes the output for every input.
Min
Numeric, default=0. The minimum number of characters to randomly generate. If Min and Max are both zero (the default), the length of the output will match that of the input. Otherwise, the length is a random number between Min..Max (inclusive).
Max
Numeric, default=0. The maximum number of characters to generate. If Min is zero and Max is negative, the output length is the length of the input + Max.

NM - Numbers

IsDigits
Boolean, default=1 (true). Determines if Min and Max are a number of digits for the output length (1), or actual values for generating a random number (0).
Min
Numeric, default=0. The minimum value or number of digits in the output. If IsDigits=1 and Min and Max are both zero, the output length matches that of the input. Setting Min to a value > 99 or negative automatically sets IsDigits=0.
Max
Numeric, default=0. The maximum value or number of digits in the output. If IsDigits=1 and Min is zero and Max is negative, the output length is the length of the input + Max. Setting Max to a value > 99 automatically sets IsDigits=0.
Decimals
Numeric, default=0. The number of decimal places in the output. If using digits and Min, Max, and Decimals are all zero, the output matches the format of the input.
Increment
Numeric, default=0. The output is incremented by this amount each time a value is generated, starting at Min. If Decimals > 0, the desired number of random decimal digits will also be added after the incremented integer. Setting Increment to any non-zero value will automatically set IsDigits=0.

DT - Dates

Date values generate a random date based on the options. If the input contains a time, the time remains unchanged.

Min
Numeric, default=19000101. The minimum date to generate in yyyymmdd format.
Max
Numeric, default=today's date. The maximum date to generate in yyyymmdd format.
SameAge
Boolean, default=1 (true). If set to 1, the random output date is limited to a value that has the same age in years as the input (as of today), bounded by Min and Max. According to HIPAA, any age >= 90 should be anonymized to 90, so SameAge will cap the age at 90. When SameAge=0, the output is a random date between Min..Max (inclusive).

General Value Options

The following options apply to all types, even when the value is a Constant. Ignore is always checked first. After generating the value using the type-specific options, the general options are applied in the order they are provided in the definition. Each option may be specified only once.

Ignore
String. Provide a list of pipe-delimited values that you do not want anonymized. For example, on an SSN you might set Ignore="999-99-9999|000-00-0000" as those values are already anonymous and may have special meaning to your application. Blanks and Nulls ("") are never anonymized.
PadChar
String, default="0" (zero). The character used to pad output with the PadL and PadR options (see below).
PadL, PadR
Numeric. Pads the output with leading/trailing PadChar characters up to the specified length. If the output is equal in length or longer than the pad value, it is not changed. The maximum value is 99. If set to zero, the original input length is used.
Left
Numeric. Keeps the leftmost count of characters from the output. If the output is shorter than this value, the entire value is kept. If Left is negative, it is added to the output length. Example: If the output is "foobar" and Left=-1, the result would be "fooba". If Left is set to zero, the original input length is used.
Right
Numeric. Keeps the rightmost count of characters from the output. If the output is shorter than this value, the entire value is kept. If Right is negative, it is added to the output length. Example: If the output is "foobar" and Right=-1, the result would be "oobar". If Right is set to zero, the original input length is used.
Prefix
String. This value is prepended to the output.
Suffix
String. This value is appended to the output.
MaskEscape
String, default="\". Changes the default Mask escape character (see below).
Mask
String. Formats the output using the mask value. It uses LogicLib's llStrings.FormatDigits function. Here is the documentation for that function:
  Right-justifies/overlays a string of characters (usually digits) into a format
  string. Especially handy for phone number/SSN formatting, but it could
  conceivably be used on any type of input.

  Ex: FormatDigits('6025551212', '(099)999-9999')  ->  '(602)555-1212'
      FormatDigits('5551212',    '(099)999-9999')  ->  '555-1212'
      FormatDigits('6025551212', '999.999.9999')   ->  '602.555.1212'
      FormatDigits('foo',        'bar')            ->  'foobar'

  All digit characters are always output even if the format string is shorter
  or blank. Output stops when you run out of digit characters, even though there
  may be more format string remaining.

  Format string rules:
  9 = Replace this character with a character from the digit string.
  0 = Same as 9 but always includes the next format character to the left, even
      if you have run out of digit characters.
  * = Any other character is copied to the output as a literal.

  To output a literal 0 or 9, precede it with the escape character. The default
  escape character is a backslash, but it can be changed if you need backslashes
  in your output.

  Example: FormatDigits('123456', '999\0999') -> '1230456'

Return to Top

Fields

The Fields section contains a list of all fields that require anonymization. Each line consists of a field key, an equal sign, and the name of a value generator or a previously anonymized field key to copy.

If you have specified a Named Fields file, named fields can be used in field definitions. Numeric keys are always valid, even when a Named Fields file has been loaded.

The following example applies the value generator called "MRN" to PID.3:

PID.3=MRN

This example copies the value generated for PID.3 into PID.18. Note that PID.3 must be defined in the Fields list before PID.18 to do this.

PID.18=PID.3

All repetitions in all like segments will be anonymized unless the key provides specific seqment sequence and/or repetition indexes, e.g. NK1#1.5, PID.3~1. One example of a reason to include a specific repetition index would be if a sender always uses the third repetition of PID.13 for the email address. You would list the regular PID.13 anonymization first, then the specific repetition.

PID.13.1=Phone
PID.13~3.1=Email ;Vendor always puts email here

If copying a previously anonymized field and you want to copy the value from the same segment sequence and/or repetition that is currently being anonymized, you can use wildcards. The wildcard character is a question mark (?) and can follow either a segment sequence (#) or repetition (~) marker. The question marks will be replaced with the appropriate indexes for the current field.

PID.18=PID#?.3~?.1 ;Copies PID.3.1 from the same segment and repetition of this PID.18

A field definition may also include one or more of the following options:

Persist
Boolean. Overrides the global Persist setting if provided.
When copying another field or using a constant value, Persist will automatically be set to 0 since persisting the data would be of no use.
Ignore
String, default=blank. Provide a list of pipe-delimited values that you do not want anonymized. Blanks and Nulls ("") are never anonymized. If any of the values include spaces or semicolons, be sure to use quotes.
This Ignore works at the specific field level rather than the more generic value generator level. If both are present, the field's Ignore will be checked first. If it does not match, then the value generator's Ignore is checked.

Advanced Conditionals

If you need more flexibility in how to assign a new value to a field, you can use conditional logic in IF-THEN-ELSE format to choose the right value generator or field to copy:

PID.13.1=IF PID#?.13~?.2=="NET" THEN Email ELSE Phone

"IF " must immediately follow the equal sign. The IF portion of the expression uses the same syntax as HL7Script IF statements. The THEN and ELSE parts are both required, and must provide either a value generator name or a field key to copy. You may also specify "IGNORE" if you wish to leave the field unchanged. Field keys in conditional logic support the same segment and repetition wildcards as copying fields.

The THEN and ELSE parts can also nest additional conditional logic expressions within parentheses:

PID.13.1=IF PID#?.13~?.2=="NET" THEN Email ELSE (IF PID#?.13~?.3=="CP" THEN CellPhone ELSE Phone)

Nesting is effectively unlimited, but anything beyond 2 or 3 levels deep will become very hard to read when you have to make it all fit on a single line!

The Persist and Ignore options are still available when using conditional logic, and must be the last things on the line when present.

Return to Top