=======
Strings
=======

| The most important thing to know about strings in Common Lisp is
  probably that
| they are arrays and thus also sequences. This implies that all
  concepts that are
| applicable to arrays and sequences also apply to strings. If you can't
  find a
| particular string function, make sure you've also searched for the
  more general
| array or sequence functions. We'll only cover a fraction of what can
  be done
| with and to strings here.

Accessing Substrings
====================

| As a string is a sequence, you can access substrings with the SUBSEQ
| function. The index into the string is, as always, zero-based. The
  third,
| optional, argument is the index of the first character which is not a
  part of
| the substring, it is not the length of the substring.

.. code:: lisp

    * (defparameter *my-string* (string "Groucho Marx"))
    *MY-STRING*
    * (subseq *my-string* 8)
    "Marx"
    * (subseq *my-string* 0 7)
    "Groucho"
    * (subseq *my-string* 1 5)
    "rouc"

You can also manipulate the substring if you use SUBSEQ together with
SETF.

.. code:: lisp

    * (defparameter *my-string* (string "Harpo Marx"))
    *MY-STRING*
    * (subseq *my-string* 0 5)
    "Harpo"
    * (setf (subseq *my-string* 0 5) "Chico")
    "Chico"
    * *my-string*
    "Chico Marx"

| But note that the string isn't "stretchable". To cite from the
  HyperSpec: "If
| the subsequence and the new sequence are not of equal length, the
  shorter length
| determines the number of elements that are replaced." For example:

.. code:: lisp

    * (defparameter *my-string* (string "Karl Marx"))
    *MY-STRING*
    * (subseq *my-string* 0 4)
    "Karl"
    * (setf (subseq *my-string* 0 4) "Harpo")
    "Harpo"
    * *my-string*
    "Harp Marx"
    * (subseq *my-string* 4)
    " Marx"
    * (setf (subseq *my-string* 4) "o Marx")
    "o Marx"
    * *my-string*
    "Harpo Mar"

Accessing Individual Characters
===============================

| You can use the function CHAR to access individual characters of a
  string. CHAR
| can also be used in conjunction with SETF.

.. code:: lisp

    * (defparameter *my-string* (string "Groucho Marx"))
    *MY-STRING*
    * (char *my-string* 11)
    #\x
    * (char *my-string* 7)
    #\Space
    * (char *my-string* 6)
    #\o
    * (setf (char *my-string* 6) #\y)
    #\y
    * *my-string*
    "Grouchy Marx"

| Note that there's also SCHAR. If efficiency is important, SCHAR can be
  a bit
| faster where appropriate.

| Because strings are arrays and thus sequences, you can also use the
  more generic
| functions AREF and ELT (which are more general while CHAR might be
  implemented
| more efficiently).

.. code:: lisp

    * (defparameter *my-string* (string "Groucho Marx"))
    *MY-STRING*
    * (aref *my-string* 3)
    #\u
    * (elt *my-string* 8)
    #\M

| Each character in a string has an integer code. The range of
  recognized codes
| and Lisp's ability to print them is directed related to your
  implementation's
| character set support, e.g. ISO-8859-1, or Unicode. Here are some
  examples in
| SBCL of UTF-8 which encodes characters as 1 to 4 8 bit bytes. The
  first example
| shows a character outside the first 128 chars, or what is considered
  the normal
| Latin character set. The second example shows a multibyte encoding
  (beyond the
| value 255). Notice the Lisp reader can round-trip characters by name.

.. code:: lisp

    * (stream-external-format *standard-output*)

    :UTF-8
    * (code-char 200)

    #\LATIN_CAPITAL_LETTER_E_WITH_GRAVE
    * (char-code #\LATIN_CAPITAL_LETTER_E_WITH_GRAVE)

    200
    * (code-char 1488)
    #\HEBREW_LETTER_ALEF

    * (char-code #\HEBREW_LETTER_ALEF)
    1488

| Check out the UTF-8 Wikipedia article for the range of supported
  characters and
| their encodings.

Manipulating Parts of a String
==============================

| There's a slew of (sequence) functions that can be used to manipulate
  a string
| and we'll only provide some examples here. See the sequences
  dictionary in the
| HyperSpec for more.

.. code:: lisp

    * (remove #\o "Harpo Marx")
    "Harp Marx"
    * (remove #\a "Harpo Marx")
    "Hrpo Mrx"
    * (remove #\a "Harpo Marx" :start 2)
    "Harpo Mrx"
    * (remove-if #'upper-case-p "Harpo Marx")
    "arpo arx"
    * (substitute #\u #\o "Groucho Marx")
    "Gruuchu Marx"
    * (substitute-if #\_ #'upper-case-p "Groucho Marx")
    "_roucho _arx"
    * (defparameter *my-string* (string "Zeppo Marx"))
    *MY-STRING*
    * (replace *my-string* "Harpo" :end1 5)
    "Harpo Marx"
    * *my-string*
    "Harpo Marx"

| Another function that can be frequently used (but not part of the ANSI
  standard)
| is replace-all. This function provides an easy functionality for
  search/replace
| operations on a string, by returning a new string in which all the
  occurences of
| the 'part' in string is replaced with 'replacement'".

.. code:: lisp

    * (replace-all "Groucho Marx Groucho" "Groucho" "ReplacementForGroucho")
    "ReplacementForGroucho Marx ReplacementForGroucho"

One of the implementations of replace-all is as follows:

.. code:: lisp

    (defun replace-all (string part replacement &key (test #'char=))
    "Returns a new string in which all the occurences of the part
    is replaced with replacement."
        (with-output-to-string (out)
          (loop with part-length = (length part)
                for old-pos = 0 then (+ pos part-length)
                for pos = (search part string
                                  :start2 old-pos
                                  :test test)
                do (write-string string out
                                 :start old-pos
                                 :end (or pos (length string)))
                when pos do (write-string replacement out)
                while pos)))

| However, bear in mind that the above code is not optimized for long
  strings; if
| you intend to perform such an operation on very long strings, files,
  etc. please
| consider using cl-ppcre regular expressions and string processing
  library which
| is heavily optimized.

Concatenating Strings
=====================

| The name says it all: CONCATENATE is your friend. Note that this a
  generic
| sequence function and you have to provide the result type as the first
  argument.

.. code:: lisp

    * (concatenate 'string "Karl" " " "Marx")
    "Karl Marx"
    * (concatenate 'list "Karl" " " "Marx")
    (#\K #\a #\r #\l #\Space #\M #\a #\r #\x)

| If you have to construct a string out of many parts, all of these
  calls to
| CONCATENATE seem wasteful, though. There are at least three other good
  ways to
| construct a string piecemeal, depending on what exactly your data is.
  If you
| build your string one character at a time, make it an adjustable
  VECTOR (a
| one-dimensional ARRAY) of type character with a fill-pointer of zero,
  then use
| VECTOR-PUSH-EXTEND on it. That way, you can also give hints to the
  system if you
| can estimate how long the string will be. (See the optional third
  argument to
| VECTOR-PUSH-EXTEND.)

.. code:: lisp

    * (defparameter *my-string* (make-array 0
                                            :element-type 'character
                                            :fill-pointer 0
                                            :adjustable t))
    *MY-STRING*
    * *my-string*
    ""
    * (dolist (char '(#\Z #\a #\p #\p #\a))
        (vector-push-extend char *my-string*))
    NIL
    * *my-string*
    "Zappa"

| If the string will be constructed out of (the printed representations
  of)
| arbitrary objects, (symbols, numbers, characters, strings, ...), you
  can use
| FORMAT with an output stream argument of NIL. This directs FORMAT to
  return the
| indicated output as a string.

.. code:: lisp

    * (format nil "This is a string with a list ~A in it"
              '(1 2 3))
    "This is a string with a list (1 2 3) in it"

| We can use the looping constructs of the FORMAT mini language to
  emulate
| CONCATENATE.

.. code:: lisp

    * (format nil "The Marx brothers are:~{ ~A~}."
              '("Groucho" "Harpo" "Chico" "Zeppo" "Karl"))
    "The Marx brothers are: Groucho Harpo Chico Zeppo Karl."

| FORMAT can do a lot more processing but it has a relatively arcane
  syntax. After
| this last example, you can find the details in the CLHS section about
  formatted
| output.

.. code:: lisp

    * (format nil "The Marx brothers are:~{ ~A~^,~}."
              '("Groucho" "Harpo" "Chico" "Zeppo" "Karl"))
    "The Marx brothers are: Groucho, Harpo, Chico, Zeppo, Karl."

| Another way to create a string out of the printed representation of
  various
| object is using WITH-OUTPUT-TO-STRING. The value of this handy macro
  is a string
| containing everything that was output to the string stream within the
  body to
| the macro. This means you also have the full power of FORMAT at your
  disposal,
| should you need it.

.. code:: lisp

    * (with-output-to-string (stream)
        (dolist (char '(#\Z #\a #\p #\p #\a #\, #\Space))
          (princ char stream))
        (format stream "~S - ~S" 1940 1993))
    "Zappa, 1940 - 1993"

Processing a String One Character at a Time
===========================================

Use the MAP function to process a string one character at a time.

.. code:: lisp

    * (defparameter *my-string* (string "Groucho Marx"))
    *MY-STRING*
    * (map 'string #'(lambda (c) (print c)) *my-string*)
    #\G
    #\r
    #\o
    #\u
    #\c
    #\h
    #\o
    #\Space
    #\M
    #\a
    #\r
    #\x
    "Groucho Marx"

Or do it with LOOP.

.. code:: lisp

    * (loop for char across "Zeppo"
            collect char)
    (#\Z #\e #\p #\p #\o)

Reversing a String by Word or Character
=======================================

| Reversing a string by character is easy using the built-in REVERSE
  function (or
| its destructive counterpart NREVERSE).

.. code:: lisp

    *(defparameter *my-string* (string "DSL"))
    *MY-STRING*
    * (reverse *my-string*)
    "LSD"

| There's no one-liner in CL to reverse a string by word (like you would
  do it in
| Perl with split and join). You either have to use function from an
  external
| library like SPLIT-SEQUENCE or you have to roll your own solution.
  Here's an
| attempt:

.. code:: lisp

    * (defun split-by-one-space (string)
        "Returns a list of substrings of string
    divided by ONE space each.
    Note: Two consecutive spaces will be seen as
    if there were an empty string between them."
        (loop for i = 0 then (1+ j)
              as j = (position #\Space string :start i)
              collect (subseq string i j)
              while j))
    SPLIT-BY-ONE-SPACE
    * (split-by-one-space "Singing in the rain")
    ("Singing" "in" "the" "rain")
    * (split-by-one-space "Singing in the  rain")
    ("Singing" "in" "the" "" "rain")
    * (split-by-one-space "Cool")
    ("Cool")
    * (split-by-one-space " Cool ")
    ("" "Cool" "")
    * (defun join-string-list (string-list)
        "Concatenates a list of strings
    and puts spaces between the elements."
        (format nil "~{~A~^ ~}" string-list))
    JOIN-STRING-LIST
    * (join-string-list '("We" "want" "better" "examples"))
    "We want better examples"
    * (join-string-list '("Really"))
    "Really"
    * (join-string-list '())
    ""
    * (join-string-list
       (nreverse
        (split-by-one-space
         "Reverse this sentence by word")))
    "word by sentence this Reverse"

Controlling Case
================

Common Lisp has a couple of functions to control the case of a string.

.. code:: lisp

    * (string-upcase "cool")
    "COOL"
    * (string-upcase "Cool")
    "COOL"
    * (string-downcase "COOL")
    "cool"
    * (string-downcase "Cool")
    "cool"
    * (string-capitalize "cool")
    "Cool"
    * (string-capitalize "cool example")
    "Cool Example"

| These functions take :START and :END keyword arguments so you can
  optionally
| only manipulate a part of the string. They also have destructive
  counterparts
| whose names starts with "N".

.. code:: lisp

    * (string-capitalize "cool example" :start 5)
    "cool Example"
    * (string-capitalize "cool example" :end 5)
    "Cool example"
    * (defparameter *my-string* (string "BIG"))
    *MY-STRING*
    * (defparameter *my-downcase-string* (nstring-downcase *my-string*))
    *MY-DOWNCASE-STRING*
    * *my-downcase-string*
    "big"
    * *my-string*
    "big"

| Note this potential caveat: According to the HyperSpec, "for
  STRING-UPCASE,
| STRING-DOWNCASE, and STRING-CAPITALIZE, string is not modified.
  However, if no
| characters in string require conversion, the result may be either
  string or a
| copy of it, at the implementation's discretion." This implies the last
  result in
| the following example is implementation-dependent - it may either be
  "BIG" or
| "BUG". If you want to be sure, use COPY-SEQ.

.. code:: lisp

    * (defparameter *my-string* (string "BIG"))
    *MY-STRING*
    * (defparameter *my-upcase-string* (string-upcase *my-string*))
    *MY-UPCASE-STRING*
    * (setf (char *my-string* 1) #\U)
    #\U
    * *my-string*
    "BUG"
    * *my-upcase-string*
    "BIG"

Trimming Blanks from the Ends of a String
=========================================

| Not only can you trim blanks, but you can get rid of arbitary
  characters. The
| functions STRING-TRIM, STRING-LEFT-TRIM and STRING-RIGHT-TRIM return a
  substring
| of their second argument where all characters that are in the first
  argument are
| removed off the beginning and/or the end. The first argument can be
  any sequence
| of characters.

.. code:: lisp

    * (string-trim " " " trim me ")
    "trim me"
    * (string-trim " et" " trim me ")
    "rim m"
    * (string-left-trim " et" " trim me ")
    "rim me "
    * (string-right-trim " et" " trim me ")
    " trim m"
    * (string-right-trim '(#\Space #\e #\t) " trim me ")
    " trim m"
    * (string-right-trim '(#\Space #\e #\t #\m) " trim me ")

| Note: The caveat mentioned in the section about Controlling Case also
  applies
| here.

Converting between Symbols and Strings
======================================

| The function INTERN will "convert" a string to a symbol. Actually, it
  will check
| whether the symbol denoted by the string (its first argument) is
  already
| accessible in the package (its second, optional, argument which
  defaults to the
| current package) and enter it, if necessary, into this package. It is
  beyond the
| scope of this chapter to explain all the concepts involved and to
  address the
| second return value of this function. See the CLHS chapter about
  packages for
| details.

Note that the case of the string is relevant.

.. code:: lisp

    * (in-package "COMMON-LISP-USER")
    #<The COMMON-LISP-USER package, 35/44 internal, 0/9 external>
    * (intern "MY-SYMBOL")
    MY-SYMBOL
    NIL
    * (intern "MY-SYMBOL")
    MY-SYMBOL
    :INTERNAL
    * (export 'MY-SYMBOL)
    T
    * (intern "MY-SYMBOL")
    MY-SYMBOL
    :EXTERNAL
    * (intern "My-Symbol")
    |My-Symbol|
    NIL
    * (intern "MY-SYMBOL" "KEYWORD")
    :MY-SYMBOL
    NIL
    * (intern "MY-SYMBOL" "KEYWORD")
    :MY-SYMBOL
    :EXTERNAL

| To do the opposite, convert from a symbol to a string, use SYMBOL-NAME
  or
| STRING.

.. code:: lisp

    * (symbol-name 'MY-SYMBOL)
    "MY-SYMBOL"
    * (symbol-name 'my-symbol)
    "MY-SYMBOL"
    * (symbol-name '|my-symbol|)
    "my-symbol"
    * (string 'howdy)
    "HOWDY"

Converting between Characters and Strings
=========================================

| You can use COERCE to convert a string of length 1 to a character. You
  can also
| use COERCE to convert any sequence of characters into a string. You
  can not use
| COERCE to convert a character to a string, though - you'll have to use
  STRING
| instead.

.. code:: lisp

    * (coerce "a" 'character)
    #\a
    * (coerce (subseq "cool" 2 3) 'character)
    #\o
    * (coerce "cool" 'list)
    (#\c #\o #\o #\l)
    * (coerce '(#\h #\e #\y) 'string)
    "hey"
    * (coerce (nth 2 '(#\h #\e #\y)) 'character)
    #\y
    * (defparameter *my-array* (make-array 5 :initial-element #\x))
    *MY-ARRAY*
    * *my-array*
    #(#\x #\x #\x #\x #\x)
    * (coerce *my-array* 'string)
    "xxxxx"
    * (string 'howdy)
    "HOWDY"
    * (string #\y)
    "y"
    * (coerce #\y 'string)
    #\y can't be converted to type STRING.
       [Condition of type SIMPLE-TYPE-ERROR]

Finding an Element of a String
==============================

Use FIND, POSITION, and their -IF counterparts to find characters in a
string.

.. code:: lisp

    * (find #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
    #\t
    * (find #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
    #\T
    * (find #\z "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
    NIL
    * (find-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
    #\1
    * (find-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :from-end t)
    #\0
    * (position #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
    17
    * (position #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
    0
    * (position-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
    37
    * (position-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :from-end t)
    43

Or use COUNT and friends to count characters in a string.

.. code:: lisp

    * (count #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equal)
    2
    * (count #\t "The Hyperspec contains approximately 110,000 hyperlinks." :test #'equalp)
    3
    * (count-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks.")
    6
    * (count-if #'digit-char-p "The Hyperspec contains approximately 110,000 hyperlinks." :start 38)
    5

Finding a Substring of a String
===============================

The function SEARCH can find substrings of a string.

.. code:: lisp

    * (search "we" "If we can't be free we can at least be cheap")
    3
    * (search "we" "If we can't be free we can at least be cheap" :from-end t)
    20
    * (search "we" "If we can't be free we can at least be cheap" :start2 4)
    20
    * (search "we" "If we can't be free we can at least be cheap" :end2 5 :from-end t)
    3
    * (search "FREE" "If we can't be free we can at least be cheap")
    NIL
    * (search "FREE" "If we can't be free we can at least be cheap" :test #'char-equal)
    15

Converting a String to a Number
===============================

| CL provides the PARSE-INTEGER to convert a string representation of an
  integer
| to the corresponding numeric value. The second return value is the
  index into
| the string where the parsing stopped.

.. code:: lisp

    * (parse-integer "42")
    42
    2
    * (parse-integer "42" :start 1)
    2
    2
    * (parse-integer "42" :end 1)
    4
    1
    * (parse-integer "42" :radix 8)
    34
    2
    * (parse-integer " 42 ")
    42
    3
    * (parse-integer " 42 is forty-two" :junk-allowed t)
    42
    3
    * (parse-integer " 42 is forty-two")

    Error in function PARSE-INTEGER:
       There's junk in this string: " 42 is forty-two".

| PARSE-INTEGER doesn't understand radix specifiers like #X, nor is
  there a
| built-in function to parse other numeric types. You could use
  READ-FROM-STRING
| in this case, but be aware that the full reader is in effect if you're
  using
| this function.

.. code:: lisp

    * (read-from-string "#X23")
    35
    4
    * (read-from-string "4.5")
    4.5
    3
    * (read-from-string "6/8")
    3/4
    3
    * (read-from-string "#C(6/8 1)")
    #C(3/4 1)
    9
    * (read-from-string "1.2e2")
    120.00001
    5
    * (read-from-string "symbol")
    SYMBOL
    6
    * (defparameter *foo* 42)
    *FOO*
    * (read-from-string "#.(setq *foo* \"gotcha\")")
    "gotcha"
    23
    * *foo*
    "gotcha"

Converting a Number to a String
===============================

| The general function WRITE-TO-STRING or one of its simpler variants
| PRIN1-TO-STRING or PRINC-TO-STRING may be used to convert a number to
  a
| string. With WRITE-TO-STRING, the :base keyword argument may be used
  to change
| the output base for a single call. To change the output base globally,
  set
| *print-base* which defaults to 10. Remember in Lisp, rational numbers
  are
| represented as quotients of two integers even when converted to
  strings.

.. code:: lisp

    * (write-to-string 250)
    "250"
    * (write-to-string 250.02)
    "250.02"
    * (write-to-string 250 :base 5)
    "2000"
    * (write-to-string (/ 1 3))
    "1/3"
    *

Comparing Strings
=================

| The general functions EQUAL and EQUALP can be used to test whether two
  strings
| are equal. The strings are compared element-by-element, either in a
| case-sensitive manner (EQUAL) or not (EQUALP). There's also a bunch of
| string-specific comparison functions. You'll want to use these if
  you're
| deploying implementation-defined attributes of characters. Check your
  vendor's
| documentation in this case.

Here are a few examples. Note that all functions that test for
inequality return the position of the first mismatch as a generalized
boolean. You can also use the generic sequence function MISMATCH if you
need more versatility.

.. code:: lisp

    * (string= "Marx" "Marx")
    T
    * (string= "Marx" "marx")
    NIL
    * (string-equal "Marx" "marx")
    T
    * (string< "Groucho" "Zeppo")
    0
    * (string< "groucho" "Zeppo")
    NIL
    * (string-lessp "groucho" "Zeppo")
    0
    * (mismatch "Harpo Marx" "Zeppo Marx" :from-end t :test #'char=)
    3