• The Java language string occupies 16 bits
  • Java char cannot represent various possible Unicode characters
  • Splitting a char or byte array risks splitting a multi byte character
  • Attackers can bypass the input validation checks when characters are split between two data structures

Vulnerable Code

  • The code fails to interact between characters of multibyte encoding and boundaries between the loop iterations

Secure Code

  • The code eliminates the splitting of multi byte encoded characters in the buffers
  • It also defers the construction of the result string until the data is read completely