💖Input Validation Errors: Improper Validation of Strings

Web applications that accept input strings from untrusted sources perform filtering and validation mechanisms based on the strings’ character data
Unicode standards are followed for character information in Java by:
– Checking if two strings are equivalent to each other
– Transforming a particular unicode normalization form to either canonical or compatibility equivalence

Normalization Forms

Form
– Normalization Form D (NFD)
– Normalization Form C (NFC)
– Normalization Form KD (NFKD)
– Normalization Form KC (NFKC)
Description
– Canonical Decomposition
以標準等價方式來分解
– Canonical Decomposition, followed by Canonical Composition
以標準等價方式來分解，然後以標準等價重組之。若是singleton的話，重組結果有可能和分解前不同
– Compatibility Decomposition
以相容等價方式來分解
– Compatibility Decomposition, followed by Canonical Composition
以相容等價方式來分解，然後以標準等價重組之
ref link: https://zh.wikipedia.org/wiki/Unicode%E7%AD%89%E5%83%B9%E6%80%A7

Using normalization forms KC and KD for arbitrary input strings may sometimes remove formatting distinctions that are important for text semantics
NFKC converts the input strings into an equivalent canonical form without altering formatting distinctions to the required input form
The Normalize method is used to convert Unicode text into an equivalent composed or decomposed form making sorting and searching of text easier

Input Validation Errors: Improper Validation of Strings (Cont’d)

In the below code, String is validated before normalizing and it fails to detect any arbitrary inputs
Validation logic also fails to detect inputs as a check for angle brackets does not detect alternate Unicode representations

In the below code, validating is performed after normalizing string into canonical angle brackets
Input validation mechanism throws an IllegalStateException if it detects any malicious inputs