Skip to content
GitLab
Menu
Why GitLab
Pricing
Contact Sales
Explore
Why GitLab
Pricing
Contact Sales
Explore
Sign in
Get free trial
Changes
Page history
initial analysis of modify string
authored
Apr 28, 2021
by
Samuel Melm
Show whitespace changes
Inline
Side-by-side
Analyse-"Modify-String-Values".md
View page @
130ee512
...
...
@@ -5,38 +5,85 @@ Modify string includes all scenarios that are applied to a string column and tra
Trifacta lists the following sub-scenarios:
-
Basic String Operations
-
Prepend/Append to string ("foo" + "bar" -> "foobar")
-
Substring (e.g. "Hello World"[1,4] -> "ello")
-
Replace substring with other string
-
Others that don't qualify as "modify string" (length, isSubstring, indexOf...)
-
Clean Strings
-
Define and enforce formats (e.g. url/zip code) and replace wrong values with defaults
-
Remove from string:
-
leading/trainling whitespaces (and tabs) (aka trim)
-
first/last word(s)
-
all whitespace in string
-
double spaces (replace by single spaces)
-
symbols (everything but letters and numbers) especially includes punctuation
-
special letters (e.g. transform ä to a)
-
a specific sub-string (all or first occurence)
-
quotes arround the string
-
Standardize String Values
-
Standardize case: toLower, toUpper
-
Break out CamelCase ("fooBar" -> "foo bar")
-
Standardize String Lengths
-
Pad string values (e.g. leading/trailing spaces)
-
Fix the length of strings (e.g. remove all characters after the 8th)
Excluded from this analysis:
-
Convert colums to one string column (aka printf), because it does not modify strings.
## Scenario and user-visible elements
-
Fill out
-
User has a table with one or more string colums loaded into the editor.
-
He/She selects the column and applies one or more of the following operations to it.
-
Let us assume the colum is called "text" with rows ["foo", "bar"]
### Basic String Operations
#### Prepend
-
The user enters a string ("baz") which is added in front of every row of the column.
-
Input: ["foo", "bar"]
-
Result: ["bazfoo", "bazbar"]
#### Append
-
The user enters a string ("baz") which is added behind every row of the column.
-
Input: ["foo", "bar"]
-
Result: ["foobaz", "barbaz"]
#### Substring
-
The user enters the index of the first and last char of the substring to be selected (1, 2)
-
Input: ["foo", "bar"]
-
Result: ["oo", "ar"]
#### Replace substring with other string
-
The user enters a substring ("oo") which is replaced by another entered substring ("aa")
-
Input: ["foo", "bar"]
-
Result: ["faa", "bar"]
### Clean Strings
#### Define and Enforce Formats
-
The user enters a pattern which every entry of the column should follow
-
He/She can then specify what should happend with the entries that do not follow the pattern (e.g. be replaced with a default string)
#### Remove from String
-
The user selects what should be removed from the string, this could be:
*
a specific sub-string where either only the first or all ocurrences are removed
*
leading/trainling whitespaces
*
first/last word(s)
*
all whitespace in the string
*
double spaces which are replaced by a single space
*
symbols (everything but letters and numbers)
*
non-ascii letters (e.g. ä is replaced by a)
*
surrounding quotes arround the string
### Standardize case
#### To Lower
-
Every uppercase letter is replaced by its lower case counterpart
-
Example: "fOoBar$" -> "foobar$"
#### To Upper
-
Every lowercase letter is replaced by its uppercase counterpart
-
Example: "fOoBar$" -> "FOOBAR$"
#### Break out Special Case
-
The user selects a special form of case (e.g. camelCase). Every entry is then transformed into its words according to the selected case.
-
Example: "fooBarBaz" -> "foo bar baz"
### Standardize String Lengths
#### Pad string values
-
The user specifies a number and a character. This character is added to the front/back of every entry until it has the specified length.
-
Pad("foo", ' ', 5) -> " foo"
#### Fix the Length of Strings
-
The users specifies a length. Characters from the back get removed until every entry is smaller than the specified length.
-
FixLength("foofoofoo", 7) -> "foofoof"
## Atomic operations
-
nearly all of these scenarios can be covert by a regex engine
-
Replace regex
-
Prepend
-
Append
-
Substring
-
Matches Pattern
-
Replace If Matches/Not Matches
-
ToLower
-
ToUpper
-
Pad
-
Note: text in bold tries to suggest "atomic operation"
-
Fill out
\ No newline at end of file