Changes
Page history
initial analysis of modify string
authored
Apr 28, 2021
by
Samuel Melm
Show whitespace changes
Inline
Side-by-side
Analyse-"Modify-String-Values".md
View page @
130ee512
...
...
@@ -5,38 +5,85 @@ Modify string includes all scenarios that are applied to a string column and tra
Trifacta lists the following sub-scenarios:
-
Basic String Operations
-
Prepend/Append to string ("foo" + "bar" -> "foobar")
-
Substring (e.g. "Hello World"[1,4] -> "ello")
-
Replace substring with other string
-
Others that don't qualify as "modify string" (length, isSubstring, indexOf...)
-
Clean Strings
-
Define and enforce formats (e.g. url/zip code) and replace wrong values with defaults
-
Remove from string:
-
leading/trainling whitespaces (and tabs) (aka trim)
-
first/last word(s)
-
all whitespace in string
-
double spaces (replace by single spaces)
-
symbols (everything but letters and numbers) especially includes punctuation
-
special letters (e.g. transform ä to a)
-
a specific sub-string (all or first occurence)
-
quotes arround the string
-
Standardize String Values
-
Standardize case: toLower, toUpper
-
Break out CamelCase ("fooBar" -> "foo bar")
-
Standardize String Lengths
-
Pad string values (e.g. leading/trailing spaces)
-
Fix the length of strings (e.g. remove all characters after the 8th)
Excluded from this analysis:
-
Convert colums to one string column (aka printf), because it does not modify strings.
## Scenario and user-visible elements
-
Fill out
-
User has a table with one or more string colums loaded into the editor.
-
He/She selects the column and applies one or more of the following operations to it.
-
Let us assume the colum is called "text" with rows ["foo", "bar"]
### Basic String Operations
#### Prepend
-
The user enters a string ("baz") which is added in front of every row of the column.
-
Input: ["foo", "bar"]
-
Result: ["bazfoo", "bazbar"]
#### Append
-
The user enters a string ("baz") which is added behind every row of the column.
-
Input: ["foo", "bar"]
-
Result: ["foobaz", "barbaz"]
#### Substring
-
The user enters the index of the first and last char of the substring to be selected (1, 2)
-
Input: ["foo", "bar"]
-
Result: ["oo", "ar"]
#### Replace substring with other string
-
The user enters a substring ("oo") which is replaced by another entered substring ("aa")
-
Input: ["foo", "bar"]
-
Result: ["faa", "bar"]
### Clean Strings
#### Define and Enforce Formats
-
The user enters a pattern which every entry of the column should follow
-
He/She can then specify what should happend with the entries that do not follow the pattern (e.g. be replaced with a default string)
#### Remove from String
-
The user selects what should be removed from the string, this could be:
*
a specific sub-string where either only the first or all ocurrences are removed
*
leading/trainling whitespaces
*
first/last word(s)
*
all whitespace in the string
*
double spaces which are replaced by a single space
*
symbols (everything but letters and numbers)
*
non-ascii letters (e.g. ä is replaced by a)
*
surrounding quotes arround the string
### Standardize case
#### To Lower
-
Every uppercase letter is replaced by its lower case counterpart
-
Example: "fOoBar$" -> "foobar$"
#### To Upper
-
Every lowercase letter is replaced by its uppercase counterpart
-
Example: "fOoBar$" -> "FOOBAR$"
#### Break out Special Case
-
The user selects a special form of case (e.g. camelCase). Every entry is then transformed into its words according to the selected case.
-
Example: "fooBarBaz" -> "foo bar baz"
### Standardize String Lengths
#### Pad string values
-
The user specifies a number and a character. This character is added to the front/back of every entry until it has the specified length.
-
Pad("foo", ' ', 5) -> " foo"
#### Fix the Length of Strings
-
The users specifies a length. Characters from the back get removed until every entry is smaller than the specified length.
-
FixLength("foofoofoo", 7) -> "foofoof"
## Atomic operations
-
nearly all of these scenarios can be covert by a regex engine
-
Replace regex
-
Prepend
-
Append
-
Substring
-
Matches Pattern
-
Replace If Matches/Not Matches
-
ToLower
-
ToUpper
-
Pad
-
Note: text in bold tries to suggest "atomic operation"
-
Fill out
\ No newline at end of file