String

A String in Elixir is a UTF-8 encoded binary.

String and binary operations

The functions in this module act according to the Unicode Standard, version 6.3.0. For example, capitalize/1, downcase/1, strip/1 are provided by this module.

In addition to this module, Elixir provides more low-level operations that work directly with binaries. Some of those can be found in the Kernel module, as:

Finally, the :binary module provides a few other functions that work on the byte level.

Codepoints and graphemes

As per the Unicode Standard, a codepoint is an Unicode Character, which may be represented by one or more bytes. For example, the character "é" is represented with two bytes:

iex> byte_size("é")
2

However, this module returns the proper length:

iex> String.length("é")
1

Furthermore, this module also presents the concept of graphemes, which are multiple characters that may be "perceived as a single character" by readers. For example, the same "é" character written above could be represented by the letter "e" followed by the accent ́:

iex> string = "\x{0065}\x{0301}"
...> byte_size(string)
3
iex> String.length(string)
1

Although the example above is made of two characters, it is perceived by users as one.

Graphemes can also be two characters that are interpreted as one by some languages. For example, some languages may consider "ch" as a grapheme. However, since this information depends on the locale, it is not taken into account by this module.

In general, the functions in this module rely on the Unicode Standard, but does not contain any of the locale specific behaviour.

More information about graphemes can be found in the Unicode Standard Annex #29. This current Elixir version implements Extended Grapheme Cluster algorithm.

Integer codepoints

Although codepoints could be represented as integers, this module represents all codepoints as strings. For example:

iex> String.codepoints("josé")
["j", "o", "s", "é"]

There are a couple of ways to retrieve a character integer codepoint. One may use the ? special macro:

iex> ?j
106
iex> ?é
233

Or also via pattern matching:

iex> << eacute :: utf8 >> = "é"
...> eacute
233

As we have seen above, codepoints can be inserted into a string by their hexadecimal code:

"jos\x{0065}\x{0301}" #=>
"josé"

Self-synchronization

The UTF-8 encoding is self-synchronizing. This means that if malformed data (i.e., data that is not possible according to the definition of the encoding) is encountered, only one codepoint needs to be rejected.

This module relies on this behaviour to ignore such invalid characters. For example, length/1 is going to return a correct result even if an invalid codepoint is fed into it.

In other words, this module expects invalid data to be detected when retrieving data from the external source. For example, a driver that reads strings from a database will be the one responsible to check the validity of the encoding.

Source

Summary

at(string, position)

Returns the grapheme in the position of the given utf8 string. If position is greater than string length, than it returns nil

capitalize(string)

Converts the first character in the given string to uppercase and the remaining to lowercase

codepoints(string)

Returns all codepoints in the string

contains?(string, contents)

Check if string contains any of the given contents

downcase(binary)

Convert all characters on the given string to lowercase

duplicate(subject, n)

Returns a binary subject duplicated n times

ends_with?(string, suffixes)

Returns true if string ends with any of the suffixes given, otherwise false. suffixes can be either a single suffix or a list of suffixes

first(string)

Returns the first grapheme from an utf8 string, nil if the string is empty

from_char_list!(list)

Converts a list of integer codepoints to a string

from_char_list(list)

Converts a list of integer codepoints to a string

graphemes(string)

Returns unicode graphemes in the string as per Extended Grapheme Cluster algorithm outlined in the Unicode Standard Annex #29, Unicode Text Segmentation

last(string)

Returns the last grapheme from an utf8 string, nil if the string is empty

length(string)

Returns the number of unicode graphemes in an utf8 string

ljust(subject, len)

Returns a new string of length len with subject left justified and padded with padding. If padding is not present, it defaults to whitespace. When len is less than the length of subject, subject is returned

ljust(subject, len, padding)
lstrip(binary)

Returns a string where leading Unicode whitespace has been removed

lstrip(other, char)

Returns a string where leading char have been removed

match?(string, regex)

Check if string matches the given regular expression

next_codepoint(string)

Returns the next codepoint in a String

next_grapheme(string)

Returns the next grapheme in a String

printable?(b)

Checks if a string is printable considering it is encoded as UTF-8. Returns true if so, false otherwise

replace(subject, pattern, replacement, options \\ [])

Returns a new binary based on subject by replacing the parts matching pattern by replacement. By default, it replaces all entries, except if the global option is set to false

reverse(string)

Reverses the given string. Works on graphemes

rjust(subject, len)

Returns a new string of length len with subject right justified and padded with padding. If padding is not present, it defaults to whitespace. When len is less than the length of subject, subject is returned

rjust(subject, len, padding)
rstrip(binary)

Returns a string where trailing Unicode whitespace has been removed

rstrip(string, char)

Returns a string where trailing char have been removed

slice(string, range)

Returns a substring from the offset given by the start of the range to the offset given by the end of the range

slice(string, start, len)

Returns a substring starting at the offset given by the first, and a length given by the second. If the offset is greater than string length, than it returns nil

split(binary)

Divides a string into substrings at each Unicode whitespace occurrence with leading and trailing whitespace ignored

split(binary, pattern, options \\ [])

Divides a string into substrings based on a pattern, returning a list of these substrings. The pattern can be a string, a list of strings or a regular expression

starts_with?(string, prefixes)

Returns true if string starts with any of the prefixes given, otherwise false. prefixes can be either a single prefix or a list of prefixes

strip(string)

Returns a string where leading/trailing Unicode whitespace has been removed

strip(string, char)

Returns a string where leading/trailing char have been removed

to_char_list!(string)

Converts a string into a char list converting each codepoint to its respective integer value

to_char_list(string)

Converts a string into a char list converting each codepoint to its respective integer value

upcase(binary)

Convert all characters on the given string to uppercase

valid?(arg1)

Checks whether str contains only valid characters

valid_character?(codepoint)

Checks whether str is a valid character

Types

t :: binary

grapheme :: t

Functions

at(string, position)

Specs:

Returns the grapheme in the position of the given utf8 string. If position is greater than string length, than it returns nil.

Examples

iex> String.at("elixir", 0)
"e"
iex> String.at("elixir", 1)
"l"
iex> String.at("elixir", 10)
nil
iex> String.at("elixir", -1)
"r"
iex> String.at("elixir", -10)
nil
Source
capitalize(string)

Specs:

  • capitalize(t) :: t

Converts the first character in the given string to uppercase and the remaining to lowercase.

This relies on the titlecase information provided by the Unicode Standard. Note this function makes no attempt to capitalize all words in the string (usually known as titlecase).

Examples

iex> String.capitalize("abcd")
"Abcd"
iex> String.capitalize("fin")
"Fin"
iex> String.capitalize("josé")
"José"
Source
codepoints(string)

Specs:

Returns all codepoints in the string.

Examples

iex> String.codepoints("josé")
["j", "o", "s", "é"]
iex> String.codepoints("оптими зации")
["о","п","т","и","м","и"," ","з","а","ц","и","и"]
iex> String.codepoints("ἅἪῼ")
["ἅ","Ἢ","ῼ"]
Source
contains?(string, contents)

Specs:

  • contains?(t, t | [t]) :: boolean

Check if string contains any of the given contents.

matches can be either a single string or a list of strings.

Examples

iex> String.contains? "elixir of life", "of"
true
iex> String.contains? "elixir of life", ["life", "death"]
true
iex> String.contains? "elixir of life", ["death", "mercury"]
false
Source
downcase(binary)

Specs:

  • downcase(t) :: t

Convert all characters on the given string to lowercase.

Examples

iex> String.downcase("ABCD")
"abcd"
iex> String.downcase("AB 123 XPTO")
"ab 123 xpto"
iex> String.downcase("JOSÉ")
"josé"
Source
duplicate(subject, n)

Specs:

  • duplicate(t, pos_integer) :: t

Returns a binary subject duplicated n times.

Examples

iex> String.duplicate("abc", 0)
""
iex> String.duplicate("abc", 1)
"abc"
iex> String.duplicate("abc", 2)
"abcabc"
Source
ends_with?(string, suffixes)

Specs:

  • ends_with?(t, t | [t]) :: boolean

Returns true if string ends with any of the suffixes given, otherwise false. suffixes can be either a single suffix or a list of suffixes.

Examples

iex> String.ends_with? "language", "age"
true
iex> String.ends_with? "language", ["youth", "age"]
true
iex> String.ends_with? "language", ["youth", "elixir"]
false
Source
first(string)

Specs:

Returns the first grapheme from an utf8 string, nil if the string is empty.

Examples

iex> String.first("elixir")
"e"
iex> String.first("եոգլի")
"ե"
Source
from_char_list(list)

Specs:

  • from_char_list(char_list) :: {:ok, String.t} | {:error, binary, binary} | {:incomplete, binary, binary}

Converts a list of integer codepoints to a string.

Examples

iex> String.from_char_list([0x00E6, 0x00DF])
{ :ok, "æß" }
iex> String.from_char_list([0x0061, 0x0062, 0x0063])
{ :ok, "abc" }
Source
from_char_list!(list)

Specs:

  • from_char_list!(char_list) :: String.t | no_return

Converts a list of integer codepoints to a string.

In case the conversion fails, it raises a String.UnicodeConversionError.

Examples

iex> String.from_char_list!([0x00E6, 0x00DF])
"æß"
iex> String.from_char_list!([0x0061, 0x0062, 0x0063])
"abc"
Source
graphemes(string)

Specs:

Returns unicode graphemes in the string as per Extended Grapheme Cluster algorithm outlined in the Unicode Standard Annex #29, Unicode Text Segmentation.

Examples

iex> String.graphemes("Ā̀stute")
["Ā̀","s","t","u","t","e"]
Source
last(string)

Specs:

Returns the last grapheme from an utf8 string, nil if the string is empty.

Examples

iex> String.last("elixir")
"r"
iex> String.last("եոգլի")
"ի"
Source
length(string)

Specs:

  • length(t) :: non_neg_integer

Returns the number of unicode graphemes in an utf8 string.

Examples

iex> String.length("elixir")
6
iex> String.length("եոգլի")
5
Source
ljust(subject, len)

Specs:

  • ljust(t, pos_integer) :: t

Returns a new string of length len with subject left justified and padded with padding. If padding is not present, it defaults to whitespace. When len is less than the length of subject, subject is returned.

Examples

iex> String.ljust("abc", 5)
"abc  "
iex> String.ljust("abc", 5, ?-)
"abc--"
Source
ljust(subject, len, padding)

Specs:

  • ljust(t, pos_integer, char) :: t
Source
lstrip(binary)

Returns a string where leading Unicode whitespace has been removed.

Examples

iex> String.lstrip("   abc  ")
"abc  "
Source
lstrip(other, char)

Specs:

  • lstrip(t, char) :: t

Returns a string where leading char have been removed.

Examples

iex> String.lstrip("_  abc  _", ?_)
"  abc  _"
Source
match?(string, regex)

Specs:

Check if string matches the given regular expression.

Examples

iex> String.match?("foo", ~r/foo/)
true
iex> String.match?("bar", ~r/foo/)
false
Source
next_codepoint(string)

Specs:

Returns the next codepoint in a String.

The result is a tuple with the codepoint and the remaining of the string or nil in case the string reached its end.

As with other functions in the String module, this function does not check for the validity of the codepoint. That said, if an invalid codepoint is found, it will be returned by this function.

Examples

iex> String.next_codepoint("josé")
{ "j", "osé" }
Source
next_grapheme(string)

Specs:

Returns the next grapheme in a String.

The result is a tuple with the grapheme and the remaining of the string or nil in case the String reached its end.

Examples

iex> String.next_grapheme("josé")
{ "j", "osé" }
Source
printable?(b)

Specs:

  • printable?(t) :: boolean

Checks if a string is printable considering it is encoded as UTF-8. Returns true if so, false otherwise.

Examples

iex> String.printable?("abc")
true
Source
replace(subject, pattern, replacement, options \\ [])

Specs:

Returns a new binary based on subject by replacing the parts matching pattern by replacement. By default, it replaces all entries, except if the global option is set to false.

A pattern may be a string or a regex.

Examples

iex> String.replace("a,b,c", ",", "-")
"a-b-c"
iex> String.replace("a,b,c", ",", "-", global: false)
"a-b,c"

The pattern can also be a regex. In those cases, one can give \N in the replacement string to access a specific capture in the regex:

iex> String.replace("a,b,c", ~r/,(.)/, ",\\1\\1")
"a,bb,cc"

Notice we had to escape the escape character \. By giving &, one can inject the whole matched pattern in the replacement string.

When strings are used as a pattern, a developer can also use the replaced part inside the replacement via the :insert_replaced option:

iex> String.replace("a,b,c", "b", "[]", insert_replaced: 1)
"a,[b],c"
iex> String.replace("a,b,c", ",", "[]", insert_replaced: 2)
"a[],b[],c"
iex> String.replace("a,b,c", ",", "[]", insert_replaced: [1, 1])
"a[,,]b[,,]c"
Source
reverse(string)

Specs:

  • reverse(t) :: t

Reverses the given string. Works on graphemes.

Examples

iex> String.reverse("abcd")
"dcba"
iex> String.reverse("hello world")
"dlrow olleh"
iex> String.reverse("hello ∂og")
"go∂ olleh"
Source
rjust(subject, len)

Specs:

  • rjust(t, pos_integer) :: t

Returns a new string of length len with subject right justified and padded with padding. If padding is not present, it defaults to whitespace. When len is less than the length of subject, subject is returned.

Examples

iex> String.rjust("abc", 5)
"  abc"
iex> String.rjust("abc", 5, ?-)
"--abc"
Source
rjust(subject, len, padding)

Specs:

  • rjust(t, pos_integer, char) :: t
Source
rstrip(binary)

Specs:

  • rstrip(t) :: t

Returns a string where trailing Unicode whitespace has been removed.

Examples

iex> String.rstrip("   abc  ")
"   abc"
Source
rstrip(string, char)

Specs:

  • rstrip(t, char) :: t

Returns a string where trailing char have been removed.

Examples

iex> String.rstrip("   abc _", ?_)
"   abc "
Source
slice(string, range)

Specs:

Returns a substring from the offset given by the start of the range to the offset given by the end of the range.

If the start of the range is not a valid offset for the given string or if the range is in reverse order, returns nil.

Examples

iex> String.slice("elixir", 1..3)
"lix"
iex> String.slice("elixir", 1..10)
"lixir"
iex> String.slice("elixir", 10..3)
nil

iex> String.slice("elixir", -4..-1)
"ixir"
iex> String.slice("elixir", 2..-1)
"ixir"
iex> String.slice("elixir", -4..6)
"ixir"
iex> String.slice("elixir", -1..-4)
nil
iex> String.slice("elixir", -10..-7)
nil

iex> String.slice("a", 0..1500)
"a"
iex> String.slice("a", 1..1500)
""
iex> String.slice("a", 2..1500)
nil
Source
slice(string, start, len)

Specs:

  • slice(t, integer, integer) :: grapheme | nil

Returns a substring starting at the offset given by the first, and a length given by the second. If the offset is greater than string length, than it returns nil.

Examples

iex> String.slice("elixir", 1, 3)
"lix"
iex> String.slice("elixir", 1, 10)
"lixir"
iex> String.slice("elixir", 10, 3)
nil
iex> String.slice("elixir", -4, 4)
"ixir"
iex> String.slice("elixir", -10, 3)
nil
iex> String.slice("a", 0, 1500)
"a"
iex> String.slice("a", 1, 1500)
""
iex> String.slice("a", 2, 1500)
nil
Source
split(binary)

Specs:

  • split(t) :: [t]

Divides a string into substrings at each Unicode whitespace occurrence with leading and trailing whitespace ignored.

Examples

iex> String.split("foo bar")
["foo", "bar"]
iex> String.split("foo" <> <<194, 133>> <> "bar")
["foo", "bar"]
iex> String.split(" foo bar ")
["foo", "bar"]
Source
split(binary, pattern, options \\ [])

Specs:

Divides a string into substrings based on a pattern, returning a list of these substrings. The pattern can be a string, a list of strings or a regular expression.

The string is split into as many parts as possible by default, unless the global option is set to false.

Empty strings are only removed from the result if the trim option is set to true.

Examples

Splitting with a string pattern:

iex> String.split("a,b,c", ",")
["a", "b", "c"]
iex> String.split("a,b,c", ",", global: false)
["a", "b,c"]
iex> String.split(" a b c ", " ", trim: true)
["a", "b", "c"]

A list of patterns:

iex> String.split("1,2 3,4", [" ", ","])
["1", "2", "3", "4"]

A regular expression:

iex> String.split("a,b,c", ~r{,})
["a", "b", "c"]
iex> String.split("a,b,c", ~r{,}, global: false)
["a", "b,c"]
iex> String.split(" a b c ", ~r{\s}, trim: true)
["a", "b", "c"]

Splitting on empty patterns returns codepoints:

iex> String.split("abc", ~r{})
["a", "b", "c", ""]
iex> String.split("abc", "")
["a", "b", "c", ""]
iex> String.split("abc", "", trim: true)
["a", "b", "c"]
iex> String.split("abc", "", global: false)
["a", "bc"]
Source
starts_with?(string, prefixes)

Specs:

  • starts_with?(t, t | [t]) :: boolean

Returns true if string starts with any of the prefixes given, otherwise false. prefixes can be either a single prefix or a list of prefixes.

Examples

iex> String.starts_with? "elixir", "eli"
true
iex> String.starts_with? "elixir", ["erlang", "elixir"]
true
iex> String.starts_with? "elixir", ["erlang", "ruby"]
false
Source
strip(string)

Specs:

  • strip(t) :: t

Returns a string where leading/trailing Unicode whitespace has been removed.

Examples

iex> String.strip("   abc  ")
"abc"
Source
strip(string, char)

Specs:

  • strip(t, char) :: t

Returns a string where leading/trailing char have been removed.

Examples

iex> String.strip("a  abc  a", ?a)
"  abc  "
Source
to_char_list(string)

Specs:

  • to_char_list(String.t) :: {:ok, char_list} | {:error, [], binary} | {:incomplete, [], binary}

Converts a string into a char list converting each codepoint to its respective integer value.

Examples

iex> String.to_char_list("æß")
{ :ok, 'æß' }
iex> String.to_char_list("abc")
{ :ok, 'abc' }
Source
to_char_list!(string)

Specs:

  • to_char_list!(String.t) :: char_list | no_return

Converts a string into a char list converting each codepoint to its respective integer value.

In case the conversion fails or is incomplete, it raises a String.UnicodeConversionError.

Examples

iex> String.to_char_list!("æß")
'æß'
iex> String.to_char_list!("abc")
'abc'
Source
upcase(binary)

Specs:

  • upcase(t) :: t

Convert all characters on the given string to uppercase.

Examples

iex> String.upcase("abcd")
"ABCD"
iex> String.upcase("ab 123 xpto")
"AB 123 XPTO"
iex> String.upcase("josé")
"JOSÉ"
Source
valid?(arg1)

Specs:

  • valid?(t) :: boolean

Checks whether str contains only valid characters.

Examples

iex> String.valid?("a")
true
iex> String.valid?("ø")
true
iex> String.valid?(<<0xffff :: 16>>)
false
iex> String.valid?("asd" <> <<0xffff :: 16>>)
false
Source
valid_character?(codepoint)

Specs:

  • valid_character?(t) :: boolean

Checks whether str is a valid character.

All characters are codepoints, but some codepoints are not valid characters. They may be reserved, private, or other.

More info at: http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Noncharacters

Examples

iex> String.valid_character?("a")
true
iex> String.valid_character?("ø")
true
iex> String.valid_character?("\x{ffff}")
false
Source