Engaging Conversations on Healthcare and Technology

  • TwitterFacebookRSS

HL7 Separator Characters

In HL7 messaging, the separator characters are also known as the message delimiters or special encoding characters. The following are the HL7 recommended values:

(x0D) Segment separator
| Field separator, aka pipe
^ Component separator, aka hat
& Sub-component separator
~ Field repeat separator
\ Escape character

 

The segment separator is not negotiable. It is always a carriage return (ASCII 13 or HEX 0D). The others are suggested values only, but usually used as indicated above. The HL7 standard lets you choose your own as long as you show them in the MSH segment.

The MSH is the first segment of all HL7 messages (except HL7 batch messages). The field separator is presented as the 4th character in the message and it also represents the first field of the MSH segment. Since the first field of the MSH is typically only a pipe,’|', counting MSH fields becomes tricky. Field 2 of the MSH (MSH-2) contains the other separator characters in this order: component, field repeat, escape, and sub-component.

Thus, the following is an example of the beginning of an HL7 message:
MSH|^~\&|…

The delimiter values used in the MSH segment are supposed to be the delimiter values used throughout the entire message. Encoding HL7 messages in this manner allows an application parser to simply use the special characters in the MSH to parse the message. However, beware that many application parsers just use hard coded values and ignore MSH-1 (Field Separator) and MSH-2 (Encoding Characters).

Related posts:

  1. HL7 Escape Sequences
  2. HL7 Engine Mapping
  3. Introduction to HL7 Field Cardinality and Field Lengths
  4. HL7 Minimum Layer Protocol (MLP) Defined
  5. Why Filter HL7 Messages?
  • N. Murali Mohan

    Could someone tell me if I can use sub-component separotor like FirstName&&&&^LastName&&&^MiddleName&&&&.
    The trend that I observed is &’s at the end of each subocomponent is trimmed and the output is FirstName^LastName^MiddleName.

    If I have Firts&&Name, the &’s are ‘not’ truncated.

    Thanks in advance,
    Murali

  • G Wang

    I am new to HL7. EDU-2 “academic degree” is IS with max len 10 and the column RP/# is empty in 15.4.3. However, I have seen messages like

    EDU|1|BA^BACHELOR OF ARTS^HL70360|…

    Is this legal? If not, what is the right way to encode the same information?

    Thanks.

  • http://www.prosolv.com J Lloyd

    Please forgive me for not contributing, but I find no other way to post a query… I need to make certain of something seemingly obvious: How are the characters in “MSH”, which identifies the message header segment, themselves encoded? The HL7 standard states that the MSH-18 field “contains the character set for the entire message.” If “MSH” is part of the message, and conforms with MSH-18, then MSH-1 cannot always be located unambiguously, and so then neither can MSH-18. My hope is that “MSH” is always 7-bit ascii, but is it?

  • http://www.corepointhealth.com Jon Mertz

    To try to answer your questions, we need additional information, but we asked some of our HL7 experts to respond to two of your questions.

    1. How are the characters in “MSH”, which identifies the message header segment, themselves encoded?

    The encoding characters in MSH-1 and MSH-2 are used for parsing the rest of the message. They are encoded in that they become special parsing characters for an application to properly determine what separates the data. Typically, the MSH-1 is a | which delimits different fields. The MSH-2 is ^~\& which translates to ^ delimiting components, ~ delimiting repeating fields, \ as an escape character, and & to delimit subcomponents (rare). These can be changed. HL7 specifies that the 5 characters after MSH in that order are what determine the separators. If you choose different ascii characters, that is ok, but not standard.

    2. Is “MSH” always 7-bit ascii?

    I have dealt with interfaces that are standard ascii. I have seen them about 100% of the time. This, too, is a loose standard. In HL7 standard 2.3, Chapter 1, section 1.6, it says “All data is represented as displayable characters from a selected character set. The ASCII displayable character set (hexadecimal values between 20 and 7E, inclusive) is the default character set unless modified in the MSH header segment.” However, it also gives the addition that this is a suggestion and not an unbreakable rule. I have not yet seen an application that has a need to break this standard. So the answer is no; it’s not required to be standard ascii but I have never seen HL7 that is not standard ascii.

    Let us know if this helps or if you have further questions.

  • Alan Krueger

    I’d like to echo J Lloyd’s question. MSH-18 describes the encoding of the message, but to get to it you need to be able to find that field. To find that field you must be able to interpret the characters of the message, as described by the standard. However, since the translation from bytes to characters requires knowledge of the encoding of the message, this seems to be a chicken-and-egg scenario.

    This would seem to imply that “MSH” and the delimiter characters themselves are encoded in US-ASCII, but I’ve seen production data that violates this. (Specifically, delimiter selected from characters in ISO-8859-1 but not in US-ASCII.)

  • http://www.hl7standards.com Erica Olenski

    J Lloyd & Alan, thanks for your question.

    The field MSH-18-character set is an optional, repeating field of data type ID.

    We have typically seen this field intentionally left blank. Per the HL7 specs, if the field is not valued, the default single-byte character set (ASCII (“ISO IR6″)) should be assumed. And thus, no other character sets are allowed in the message.

    I hope this helps!

  • Alan Krueger

    On the subject of the delimiter characters, I just checked: chapter 2 of the standard says “NOTE: The field separator character must still be chosen from the printable 7-bit ASCII character set.”

  • http://twitter.com/medbob Bob Kellum

    Murali,
      If the character is listed in MSH[2] in the set of delimiters, then it should always be parsed as a delimiter.  If it is merely delimiting “passive nulls” as in your example, then there is no reason to output that delimiter.  Parsing for that particular element should return null if it’s address is beyond the last delimiter.
    If the delimiter character is actually part of the data (ex. Ova & Parasites), then it must be replaced by an HL7 Escape Sequence.  Our magnanimous host has provided a table at:
    http://www.hl7standards.com/blog/2006/11/02/hl7-escape-sequences/ 

    When you create an HL7 message, you must parse the data looking for your delimiters.  If you find any, you must escape them.  When the reading process parses an inbound HL7 message, it must look for these escape sequences and expand them into the data that it intends to use.
    That is the official mechanism for handling such characters.  Now, your assignment for extra points is to find a vendor that ACTUALLY does this correctly…..
    Yeah.  I thought so.

    My response is to steer my users and build teams to stay away from characters such as #,@, %, &, *, and instead write out “and” or “percent”.  While it may not “seem right”, you will save yourself a world of trouble.

blog comments powered by Disqus