Update: This post ignored the fact that this works for utf-8 characters only. Characters which are stored on more than 1 byte will cause trouble. Look at this Effective Go Example.
for pos, char := range "日本\x80語" { // \x80 is an illegal UTF-8 encoding
fmt.Printf("character %#U starts at byte position %d\n", char, pos)
}
Prints:
character U+65E5 '日' starts at byte position 0
character U+672C '本' starts at byte position 3
character U+FFFD '�' starts at byte position 6
character U+8A9E '語' starts at byte position 7
Keep this in mind when working with strings.
Another quick reminder… Always go with []byte if possible. I said it before, and I’m going to say it over and over again. It’s crucial.
Here is a little code from exercism.io. First, with strings:
package igpay
import (
"strings"
)
// PigLatin translates reguler old English into awesome pig-latin.
func PigLatin(in string) (ret string) {
for _, v := range strings.Fields(in) {
ret += pigLatin(v) + " "
}
return strings.Trim(ret, " ")
}
func pigLatin(in string) (ret string) {
if strings.IndexAny(in, "aeiou") == 0 {
ret += in + "ay"
return
}
for i := 0; i < len(in); i++ {
vowelPos := strings.IndexAny(in, "aeiou")
if (in[0] == 'y' || in[0] == 'x') && vowelPos > 1 {
vowelPos = 0
ret = in
}
if vowelPos != 0 {
adjustPosition := vowelPos
if in[adjustPosition] == 'u' && in[adjustPosition - 1] == 'q' {
adjustPosition++
}
ret = in[adjustPosition:] + in[:adjustPosition]
}
}
ret += "ay"
return
}
Than with []byte:
package igpay
import (
// "fmt"
"bytes"
)
// PigLatin translates reguler old English into awesome pig-latin.
func PigLatin(in string) (ret string) {
inBytes := []byte(in)
var retBytes [][]byte
for _, v := range bytes.Fields(inBytes) {
v2 := make([]byte, len(v))
copy(v2, v)
retBytes = append(retBytes, pigLatin(v2))
}
ret = string(bytes.Join(retBytes, []byte(" ")))
return
}
func pigLatin(in []byte) (ret []byte) {
if bytes.IndexAny(in, "aeiou") == 0 {
ret = append(in, []byte("ay")...)
return
}
for i := 0; i < len(in); i++ {
vowelPos := bytes.IndexAny(in, "aeiou")
if (in[0] == 'y' || in[0] == 'x') && vowelPos > 1 {
vowelPos = 0
ret = in
}
if vowelPos != 0 {
adjustPosition := vowelPos
if in[adjustPosition] == 'u' && in[adjustPosition - 1] == 'q' {
adjustPosition++
}
in = append(in[adjustPosition:], in[:adjustPosition]...)
ret = in
// fmt.Printf("%s\n", ret)
}
}
ret = append(ret, []byte("ay")...)
return
}
And than,the benchmarks of course:
BenchmarkPigLatin-8 200000 10688 ns/op
BenchmarkPigLatinStrings-8 100000 15211 ns/op
PASS
The improvement is not massive in this case, but it’s more than enough to matter. And in a bigger, more complicated program, string concatenation will take a LOT of time away.
In Go, the bytes
package has a 1-1 map compared to the strings
packages, so chances are, if you are doing strings concatenations you will be able to port that piece of code easily to []byte.
That’s all folks.
Happy coding, Gergely.