Strings
Strings are an interesting case because we often think of them as a primitive data type like numbers or booleans, but they also have characteristics of a data structure.
We can iterate through a string where the time complexity is based on its length n
.
Iterating is O(n) time and accessing a character by index is O(1) time.
We think of strings as primitive types because individual characters are actually represented as integers because they are mapped to a value using an encoding standard such as ASCII or Unicode. If you go to the console in your browser, you can quickly prove this by getting the UTF-16 (Unicode) value for a string:
Strings are often learned directly alongside arrays, and you'll understand why as you work through this lesson. However, in this course we learn strings separately because they have unique characteristics, and one of the most important is that they convey meaning to us as humans. String interview questions often make use of the fact that we combine characters to form words and can rearrange them or extract substrings.
There are a few important reasons strings should be learned separately:
- Strings provide meaning to us, so they are tested differently than arrays in interviews
- Strings may be mutable or immutable in a given language
- Strings implementations vary by language
- String methods may be entirely different than array methods
The most common string interview questions include:
- Palindromes, permutations, and anagrams
- Counting or transforming strings based on repeated characters, unique characters, or sequences
- Finding or grouping substrings of a string
Strings and Arrays
In memory, strings are actually stored as an array of integers where the integer maps to the value of an encoding standard.
From our example above, the string Dev
would actually be stored in memory as:
| 68 | 101 | 118 |
Your programming language will know exactly how to map the integer representation to a string on your screen based on the encoding.
Even at a higher level though, strings are just a list of characters. You can transform a string to an array, typically with a single function.
This means we can treat any string problem as an array question if needed. We just pay an O(n) cost to transform it in either direction.
Strings are often taught alongside arrays because strings themselves are just arrays at a fundamental level. Underneath the hood, strings are stored as an array of characters. More specifically, since characters are actually mapped to encoding tables like ASCII or Unicode, strings are represented as arrays of integers where the integer maps to their encoded character.
The Big O complexity of strings is nearly identical to arrays.
The primary difference is that if strings are immutable,
when you modify a string (push
, pop
, set
), it becomes an O(n) operation even though it is O(1) for an array because it must copy all the characters to a new string.
Big O Complexity | |
---|---|
traverse | O(n) |
lookup | O(1) |
copy | O(n) |
space | O(n) |
Impact of Immutability
Immutability means that a variable can't be changed after it is declared. So if we want to update something that is immutable, what actually happens is a new variable is created that contains the changes we wanted.
In immutable string languages, trying to mutate a string requires O(n) time for any change because it creates a new copy of the string for every update.
In JavaScript, Golang, C#, and Python, strings are immutable.
In Java/Kotlin, strings are immutable unless you use StringBuilder
.
In C++ strings are mutable.
For immutable strings, anytime we append values, an O(n) operation occurs because we copy all the characters and create a new string with the additional character(s) at the end.
If you need to do a lot of appending to a string (many O(n) operations), you are actually better off to split it into an array and push the new characters to the end. Then you can join it back into a string once you are finished.
The split and join will be O(n), but all the array push operations will be constant time O(1).