Class Text

java.lang.Object
com.randomnoun.common.Text

public class Text extends Object
Text utility functions
Author:
knoxg
  • Field Details

  • Constructor Details

    • Text

      public Text()
  • Method Details

    • isBlank

      public static boolean isBlank(String text)
      Returns true if the supplied string is null or the empty string, false otherwise
      Parameters:
      text - The string to test
      Returns:
      true if the supplied string is null or the empty string, false otherwise
    • isNumeric

      public static boolean isNumeric(String text)
      Returns true if the supplied string is non-null and only contains numeric characters
      Parameters:
      text - The string to test
      Returns:
      true if the supplied string is non-null and only contains numeric characters
    • isNumericDecimal

      public static boolean isNumericDecimal(String text)
      Returns true if the supplied string is non-null and only contains numeric characters or a single decimal point. The value can have a leading negative ('-') symbol.
      Parameters:
      text - The string to test
      Returns:
      true if the supplied string is non-null and only contains numeric characters, which may contain a '.' character in there somewhere.
    • isNumericDecimalExp

      public static boolean isNumericDecimalExp(String text)
      Returns true if the supplied string is non-null and only contains numeric characters or a single decimal point. The value can have a leading negative ('-') symbol. This version allows exponents ("E+nn" or "E-nn") to the end of the value.
      Parameters:
      text - The string to test
      Returns:
      true if the supplied string is non-null and only contains numeric characters, which may contain a '.' character in there somewhere.
    • reduceNewlines

      public static String reduceNewlines(String input)
      Ensures that a string returned from a browser (on any platform) conforms to unix line-EOF conventions. Any instances of consecutive CRs (0xD) and LFs (0xA) in a string will be reduced to a series of CRs (the number of CRs will be the maximum number of CRs or LFs found in a row).
      Parameters:
      input - the input string
      Returns:
      the canonicalised string, as described above
    • escapeHtml

      public static String escapeHtml(String string)
      Returns the HTML-escaped form of a string. The &, <, >, and " characters are converted to &amp;, &lt;, &gt;, and &quot; respectively.

      Characters in the unicode control code blocks ( apart from \t, \n and \r ) are converted to &xfffd;

      Characters outside of the ASCII printable range are converted into &xnnnn; form

      Parameters:
      string - the string to convert
      Returns:
      the HTML-escaped form of the string
    • escapeRegex

      public static String escapeRegex(String string)
      Returns a regex-escaped form of a string. That is, the pattern returned by this method, if compiled into a regex, will match the supplied string exactly.
      Parameters:
      string - the string to convert
      Returns:
      the HTML-escaped form of the string
    • escapeCsv

      public static String escapeCsv(String string)
      Returns the csv-escaped form of a string. A csv-escaped string is used when writing to a CSV (comma-separated-value) file. It ensures that commas included within a string are quoted. We use the Microsoft-Excel quoting rules, so that our CSV files can be imported into that. These rules (derived from experimentation) are:
      • Strings without commas (,) inverted commas ("), or newlines (\n) are returned as-is.
      • Otherwise, the string is surrounded by inverted commas, and any inverted commas within the string are doubled-up (i.e. '"' becomes '""').
      • A value that starts with any of "=", "@", "+" or "-" has a leading single apostrophe added to prevent the value being evaluated in Excel. The leading quote is visible to the user when the csv is opened, which may mean that it will have to be removed when roundtripping data. This may complicate things if the user actually wants a leading single quote in their CSV value.

      Embedded newlines are inserted as-is, as per Excel. This will require some care whilst parsing if we want to be able to read these files.

      Parameters:
      string - the string to convert
      Returns:
      the csv-escaped form of the string
    • parseCsv

      public static List<String> parseCsv(String text, boolean whitespaceSensitive) throws ParseException
      Given a csv-encoded string (as produced by the rules in escapeCsv(String), produces a List of Strings which represent the individual values in the string. Note that this method is *not* equivalent to calling Arrays.asList(astring.split(",")).

      Setting the whitespaceSensitive parameter to false allows leading and trailing whitespace in *non-quoted* values to be removed, e.g. if the input string text is:

       abc,def,  ghi, j k ,"lmn"," op "," q,r","""hello""", "another"
       
      then parseCsv(text, false) will return the strings:
       abc
       def
       ghi
       j k
       lmn
        op        (this String has one leading space, and a trailing space after 'p')
        q,r       (this String has one leading space)
       "hello"
       another
       
      and parseCsv(text, true) would throw a ParseException (since the final element is a quoted value, but begins with a space). If the , "another" text is removed, however, then parseCsv(text, true) would return the following: and parseCsv(text, true) will return the string
       abc
       def
         ghi      (this String has two leading spaces)
        j k       (this String has one leading space and a trailing space after the 'k' character)
       lmn
        op        (this String has one leading space, and a trailing space after 'p')
        q,r       (this String has one leading space)
       "hello"
       

      Most applications would want to use the 'whiteSpaceSensitive=false' form of this function, since (a) less chance of a ParseException, and (b) it's what an end-user would normally expect. This can be performed by calling the parseCsv(String) method.

      Whitespace is determined by using the Character.isSpaceChar() method, which is Unicode-aware.

      Parameters:
      text - The CSV-encoded string to parse
      whitespaceSensitive - If set to true, will trim leading and trailing whitespace in *non-quoted* values.
      Returns:
      a List of Strings. The returned List is guaranteed to always contain at least one element.
      Throws:
      NullPointerException - if the text passed to this method is null
      ParseException - if a quoted value contains leading whitespace before the opening quote, or after the trailing quote.
      ParseException - if a quoted value has a start quote, but no end quote, or if a value has additional text after a quoted value (before the next comma or EOL).
    • parseCsv

      public static Text.CsvLineReader parseCsv(Reader r, boolean whitespaceSensitive)
    • parseCsv

      public static List<String> parseCsv(String text) throws ParseException
      Equivalent to parseCsv(text, false); (i.e. whitespace-insensitive parsing). Refer to the documentation for that method for more details.
      Parameters:
      text - he CSV-encoded string to parse
      Returns:
      a List of Strings. The returned List is guaranteed to always contain at least one element.
      Throws:
      NullPointerException - if the text passed to this method is null.
      ParseException - see parseCsv(String, boolean) for details.
      See Also:
    • escapeJava

      public static String escapeJava(String string)
      Returns a java-escaped string. Replaces '"' with '\"'.

      Since this is predominantly used in the query builder, I am not worrying about unicode sequences (SWIFT is ASCII) or newlines (although this may be necessary later) for multiline textboxes

      Returns:
      The java-escaped version of the string
    • escapeJavascript

      public static String escapeJavascript(String string)
      Returns a javascript string. The characters ', " and \ are converted into their Unicode equivalents,

      Non-printable characters are converted into unicode equivalents

      Newlines are now replaced with "\n"

      Returns:
      The java-escaped version of the string
    • escapeJavascript2

      public static String escapeJavascript2(String string)
      Deprecated.
      Returns a javascript string. The characters ', " and \ are converted into their Unicode equivalents,

      Non-printable characters are converted into unicode equivalents

      Returns:
      The java-escaped version of the string
    • unescapeJava

      public static String unescapeJava(String string)
      Unescapes a java-escaped string. Replaces '\"' with '"', '\\u0022' with '"', '\\u0027' with ''', '\\u005C' with '\'.

      Since this is predominantly used in the query builder, I am not worrying about unicode sequences (SWIFT is ASCII) or newlines (although this may be necessary later) for multiline textboxes

      Returns:
      The java-escaped version of the string
    • escapePython

      public static String escapePython(String string)
      Returns a python string, escaped so that it can be enclosed in a single-quoted string.

      The characters ', " and \ are converted into their Unicode equivalents,

      Non-printable characters are converted into unicode equivalents

      Returns:
      The python-escaped version of the string
    • escapePathComponent

      public static String escapePathComponent(String string)
      Escape a filename or path component. Characters that typically have special meanings in paths (":", "/", "\") are escaped with a preceding "\" character. Does not escape glob characters ( "*" or "?" ). Do not use this method to escape a full file path; when escaping a file path, escape each path component separately and then join the components with "/" characters ( see createEscapedPath(String[]) ).
      Parameters:
      string - the filename or path component to escape
      Returns:
      the escaped form of the filename (or path component)
    • unescapePathComponent

      public static String unescapePathComponent(String pathComponent)
      Unescape a filename or path component. The escape sequences "\\" , "\:" and "\/" are converted to "\", ":" and "/" respectively. All other escape sequences will raise an IllegalArgumentException

      See splitEscapedPath(String) to split an escaped path into components.

      Parameters:
      pathComponent - the filename or path component to unescape
      Returns:
      the unescaped form of the filename or path component
      Throws:
      IllegalArgumentException - if an unexpected escape is encountered, or the escape is unclosed
    • splitEscapedPath

      public static String[] splitEscapedPath(String escapedPath)
      Split a path, but allow forward slashes in path components if they're escaped by a preceding '\' character. Individual path components returned by this method will be unescaped.
       splitPath(null) = NPE
       splitPath("") = [ "" ]
       splitPath("abc") = [ "abc" ]
       splitPath("abc/def/ghi") = [ "abc", "def", "ghi" ]
       splitPath("abc\\/def/ghi") = [ "abc/def", "ghi" ]
       

      Opposite of createEscapedPath(String[])

    • createEscapedPath

      public static String createEscapedPath(String[] pathComponents)
      Escapes the components of a path String, returning an escaped full path String. Each path component is escaped with escapePathComponent(String) and then joined using '/' characters.

      Opposite of splitEscapedPath(String).

      Parameters:
      pathComponents - the filename components
      Returns:
      an escaped path
    • escapeCss

      public static String escapeCss(String input)
      Returns the CSS-escaped form of a string.

      Characters outside of the printable ASCII range are converted to \nnnn form

      Parameters:
      input - the string to convert
      Returns:
      the HTML-escaped form of the string
    • getDisplayString

      public static String getDisplayString(String key, String string)
      Returns the given string; but will truncate it to MAX_STRING_OUTPUT_CHARS. If it exceeds this length, a message is appended expressing how many characters were truncated. Strings with the key of 'exception' are not truncated (in order to display full stack traces when these occur). Any keys that contain the text 'password', 'Password', 'credential' or 'Credential' will be returned as eight asterisks.

      This method is used in the debug JSP when dumping properties to the user, in order to prevent inordinately verbose output.

      Parameters:
      key - The key of the string we wish to display
      string - The string value
      Returns:
      A (possibly truncated) version of this string
    • getDisplayString

      public static String getDisplayString(String key, String string, int maxChars)
      Returns the given string; but will truncate it to MAX_STRING_OUTPUT_CHARS. If it exceeds this length, a message is appended expressing how many characters were truncated. Strings with the key of 'exception' are not truncated (in order to display full stack traces when these occur). Any keys that contain the text 'password', 'Password', 'credential' or 'Credential' will be returned as eight asterisks.

      This method is used in the debug JSP when dumping properties to the user, in order to prevent inordinately verbose output.

      Parameters:
      key - The key of the string we wish to display
      string - The string value
      maxChars - The maximum number of characters to display
      Returns:
      A (possibly truncated) version of this string
    • strDefault

      public static String strDefault(String strText, String strDefaultText)
      Utility function to return a default if the supplied string is null. Shorthand for (strText==null) ? strDefaultText : strText;
      Returns:
      strText is strText is not null, otherwise strDefaultText
    • join

      public static String join(String[] elements, String delimiter)
      Return a string composed of a series of strings, separated with the specified delimiter
      Parameters:
      elements - The array of elements to join
      Returns:
      delimiter The delimiter to join each string with
      Throws:
      NullPointerException - if elements or delimiter is null
    • join

      public static String join(Iterable<?> elements, String delimiter)
      Return a string composed of a series of strings, separated with the specified delimiter
      Parameters:
      elements - A Collection or Iterable of the elements to join
      Returns:
      delimiter The delimiter to join each string with
      Throws:
      NullPointerException - if elements or delimiter is null
    • joinWithLast

      public static String joinWithLast(String[] elements, boolean isQuoted, String delimiter, String lastDelimiter)
      Return a string composed of a series of strings, separated with the specified delimiter. Each element is contained in single quotes. The final delimeter can be set to a different value, to produce text in the form "'a', 'b' or 'c'" or "'a', 'b' and 'c'".

      There is no special handling of values containing quotes; see escapeCsv(String)

      Parameters:
      elements - The array of elements to join
      isQuoted - If true, each element is surrounded by single quotes
      delimiter - The delimiter to join each string with
      lastDelimiter - The delimiter to join the second-last and last elements
      Throws:
      NullPointerException - if elements or delimiter is null
    • joinWithLast

      public static String joinWithLast(Iterable<?> elements, boolean isQuoted, String delimiter, String lastDelimiter)
      Return a string composed of a series of strings, separated with the specified delimiter

      There is no special handling of values containing quotes; see escapeCsv(String)

      Parameters:
      elements - A Collection or Iterable containing the elements to join
      isQuoted - If true, each element is surrounded by single quotes
      delimiter - The delimiter to join each string with
      lastDelimiter - The delimiter to join the second-last and last elements
      Throws:
      NullPointerException - if elements or delimiter is null
      See Also:
    • replaceString

      public static String replaceString(String originalString, String searchString, String replaceString)
      An efficient search & replace routine. Replaces all instances of searchString within str with replaceString.
      Parameters:
      originalString - The string to search
      searchString - The string to search for
      replaceString - The string to replace it with
    • getFileContents

      public static String getFileContents(String filename) throws IOException
      Reads a file, and returns its contents in a String
      Parameters:
      filename - The file to read
      Returns:
      The contents of the string,
      Throws:
      IOException - A problem occurred whilst attempting to read the string
    • getFileContents

      public static String getFileContents(File file) throws IOException
      Reads a file, and returns its contents in a String. Identical to calling getFileContents(projectFile.getCanonicalPath()).
      Parameters:
      file - The file to read
      Returns:
      The contents of the string,
      Throws:
      IOException
      IOException - A problem occurred whilst attempting to read the string
    • indent

      public static String indent(String indentString, String originalString)
      Prefixes every lines supplied with a given indent. e.g. indent("\t", "abcd\nefgh") would return "\tabcd\n\tefgh". If the string ends in a newline, then the return value also ends with a newline.
      Parameters:
      indentString - The characters to indent with. Usually spaces or tabs, but could be something like a timestamp.
      originalString - The string to indent.
      Returns:
      The originalString, with every line (as separated by the newline character) prefixed with indentString.
    • pad

      public static String pad(String inputString, int length, int justification)
      Ensure that a string is padded with spaces so that it meets the required length. If the input string exceeds this length, this it is returned unchanged
      Parameters:
      inputString - the string to pad
      length - the desired length
      justification - a JUSTIFICATION_* constant defining whether left or right justification is required.
      Returns:
      a padded string.
    • getLastComponent

      public static String getLastComponent(String string)
      Given a period-separated list of components (e.g. variable references ("a.b.c") or classnames), returns the last component. For example, getLastComponent("com.randomnoun.common.util.Text") will return "Text".

      If component is null, this function returns null.

      If component contains no periods, this function returns the original string.

      Parameters:
      string - The string to retrieve the last component from
    • escapeQueryString

      public static String escapeQueryString(String unescapedQueryString)
      Escape this supplied string so it can represent a 'name' or 'value' component on a HTTP queryString. This generally involves escaping special characters into %xx form. Note that this only works for US-ASCII data.
    • encodeBase64

      public static String encodeBase64(String s)
      Encodes a string into Base64 format. No blanks or line breaks are inserted.
      Parameters:
      s - a String to be encoded.
      Returns:
      A String with the Base64 encoded data.
    • encodeBase64

      public static char[] encodeBase64(byte[] in)
      Encodes a byte array into Base64 format. No blanks or line breaks are inserted.
      Parameters:
      in - an array containing the data bytes to be encoded.
      Returns:
      A character array with the Base64 encoded data.
    • getNaturalComparator

      Returns a comparator that compares contained numbers based on their numeric values and compares other parts using the current locale's order rules.

      For example in German locale this will be a comparator that handles umlauts correctly and ignores upper/lower case differences.

      Returns:

      A string comparator that uses the current locale's order rules and handles embedded numbers correctly.

    • compareNatural

      public static int compareNatural(Collator collator, String s, String t)

      Compares two strings using the current locale's rules and comparing contained numbers based on their numeric values.

      This is probably the best default comparison to use.

      If you know that the texts to be compared are in a certain language that differs from the default locale's langage, then get a collator for the desired locale (Collator.getInstance(java.util.Locale)) and pass it to compareNatural(java.text.Collator, String, String)

      Parameters:
      s - first string
      t - second string
      Returns:
      zero iff s and t are equal, a value less than zero iff s lexicographically precedes t and a value larger than zero iff s lexicographically follows t
    • unescapeQueryString

      public static String unescapeQueryString(String s)
      Unescape a HTTP escaped string
      Parameters:
      s - The string to be unescaped
      Returns:
      the unescaped string.
    • getCommonPrefix

      public static String getCommonPrefix(String string1, String string2)
      Returns the largest common prefix between two other strings; e.g. getCommonPrefix("abcsomething", "abcsometharg") would be "abcsometh".
      Parameters:
      string1 - String number one
      string2 - String number two
      Returns:
      the large common prefix between the two strings
      Throws:
      NullPointerException - is string1 or string2 is null
    • toFirstUpper

      public static String toFirstUpper(String text)
      Uppercases the first character of a string.
      Parameters:
      text - text to modify
      Returns:
      the supplied text, with the first character converted to uppercase.
    • toFirstLower

      public static String toFirstLower(String text)
      Lowercases the first character of a string.
      Parameters:
      text - text to modify
      Returns:
      the supplied text, with the first character converted to lowercase.
    • getLevenshteinDistance

      public static int getLevenshteinDistance(String s, String t)
      Number of character edits between two strings; taken from http://www.merriampark.com/ldjava.htm. There's a version in commongs-lang, apparently, but according to the comments on that page, it uses O(n^2) memory, which can't be good.
      Parameters:
      s - string 1
      t - string 2
      Returns:
      the smallest number of edits required to convert s into t
    • getMD5

      public static String getMD5(String text)
      Return the md5 hash of a string
      Parameters:
      text - text to hash
      Returns:
      a hex-encoded version of the MD5 hash
      Throws:
      IllegalStateException - if the java installation in use doesn't know about MD5
    • repeat

      public static String repeat(String text, int count)
      Returns a string composed of the supplied text, repeated 0 or more times
      Parameters:
      text - text to repeat
      count - number of repetitions
      Returns:
      the repeated text
    • substitutePlaceholders

      public static String substitutePlaceholders(Map<?,?> variables, String text)
      Perform ${xxxx}-style substitution of placeholders in text. Placeholders without values will be left as-is.

      For example, gives the set of variables:

      • abc = def

      then the result of substituteParameters("xxxx${abc}yyyy${def}zzzz") will be "xxxxdefyyyy${def}zzzz"

      $ followed by any other character will be left as-is.

      Parameters:
      variables - a set of variable names and values, used in the substitution
      text - the text to be substituted.
      Returns:
      text, with placeholders replaced with values in the variables parameter