Sloc Cloc and Code

Can a crusty Go program outperform a well written Rust Project?

Can a crusty Go program outperform a well written Rust Project?

Ben Boyter @boyter
Code Monkey at Kablamo. ```func Produce(c Coffee, b Beer) (Code, Cloud, []error)```

Hello all! I am Ben. My official title is technical lead. Im a code monkey. I write code. This talk is about a command line tool I made called Sloc Cloc and Code. The name is inspired from two similar tools called sloccount and cloc while trying to make it sound like a Guy Richie film. As I mentioned I work for Kablamo. Kablamo builds a lot of custom software on AWS. Our backend language of choice is Go which was a problem for me because I didn't know it. I had used it a little bit for some come command line and http tools but nothing major. As such I was working on projects in other languages such as C# and Java. One that came past was to upgrade a an application written in C# with a JavaScript frontend. The goal was to upgrade the frontend and fix some backend issues. It was meant to take 6 weeks. It turned into a year long death-march project. My fault. I totally underestimated how complex it was. So what do we do when we make mistakes?

Code Iceberg

Image by © Ralph A. Clevenger/CORBIS

How to spot code icebergs?

SLOC counters

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages

VERY full featured.

How to spot code icebergs? Continued...

Cyclomatic Complexity

I am thinking...

We totally need another code counter!

Has anyone else considered this?
tokei, loc, polyglot, loccount and gocloc.

SPIN! Calculate some "value" for code complexity.

Goals

Learn Go.
Be as fast as possible.
Push CPU limits OR my limits.
Be as accurate as possible.
Estimate complexity.

Design

4 stage pipeline

1. File Walking

Go's built in file walk is slow! (comparatively)
File walk benchmark

``` Case 0 Create a directory thats quite deep and put a 10000 files at the end Case 1 Create a directory thats quite deep and put 100 files in each folder Case 2 Create a directory that has a single level and put 10000 files in it Case 3 Create a directory that has a two levels with 10000 directories in the second with a single file in each Case 4 Create a directory that with 10 subdirectories and 1000 files in each Case 5 Create a directory that with 20 subdirectories and 500 files in each Case 6 Create a directory that with 5 subdirectories and 2000 files in each Case 7 Create a directory that with 100 subdirectories and 100 files in each ```

Still not fast enough

godirwalk an improvement, but not enough

Make parallel!

New problem .gitignore / .ignore files

.gitignore / .ignore

Channels are great for uni-directional work

However ignore files mean we need to alter rules on the fly

https://github.com/dbaggerman/cuba

2. File Reading

Know your use case! 18,554 bytes.
Memory maps.

Just use ioutil.ReadFile for small files.

```$ time scc linux DEBUG 2018-03-27T21:34:26Z: milliseconds to walk directory: 7593 --SNIP-- scc linux 11.02s user 19.92s system 669% cpu 7.623 total```

Now the second part of the pipeline. Reading the files from our lovely disks into memory. The first thing is to know your use case, so I worked out the average number of bytes in a code file which came out at about 19kb. If you look into reading files quickly a lot of suggestions will say use memory maps. They allow you to outsource bookkeeping activity of locations in files to the kernel. Don't do this for small files. Its slower. I also made a huge rookie error here. I made the biggest mistake you can make with performance. I saw something was slow, guessed what it was and was wrong. See I thought the reading of files into memory was slow. What was actually slow was the walking of files from the disk. Hence doing so much work in the previous step. I only noticed when I was adding times for the processed and noticed that my CPU was not being used and yet the file walk took as long as the program run time.

3. File Processor

3 main ways tools count code.

Use regular expressions.
Use abstract syntax tree (AST).
State machine.

Results

```─────────────────────────────────────────────────────────────────────────────── Language Files Lines Blanks Comments Code Complexity ─────────────────────────────────────────────────────────────────────────────── Go 22 6506 1075 273 5158 1116 ─────────────────────────────────────────────────────────────────────────────── processor/workers_test.go 1492 278 32 1182 271 processor/workers.go 734 106 78 550 181 processor/formatters.go 699 104 12 583 144 processor/detector_test.go 378 84 1 293 99 processor/formatters_test.go 863 96 2 765 71 processor/file.go 231 39 9 183 54 processor/file_test.go 327 68 7 252 53 processor/detector.go 210 40 20 150 52 processor/processor.go 416 89 58 269 45 ~ocessor/workers_tokei_test.go 247 36 1 210 40 ~or/workers_regression_test.go 150 30 4 116 32 processor/helpers_test.go 60 13 0 47 20 scripts/include.go 79 16 8 55 15 processor/structs.go 183 20 17 146 14 processor/processor_test.go 80 18 0 62 11 processor/cocomo_test.go 35 7 3 25 6 processor/structs_test.go 30 7 0 23 4 processor/helpers.go 30 5 3 22 2 main.go 212 7 6 199 2 processor/cocomo.go 26 5 6 15 0 processor/constants.go 5 1 0 4 0 examples/language/go.go 19 6 6 7 0```

Problem

```$ scc ─────────────────────────────────────────────────────────────────────────────── Language Files Lines Blanks Comments Code Complexity ─────────────────────────────────────────────────────────────────────────────── C 258 153081 17005 26121 109955 27671 C Header 200 28794 3252 5877 19665 1557 TCL 101 17802 1879 981 14942 1439 Shell 42 1467 197 314 956 176 Lua 20 525 68 70 387 65 Autoconf 18 10821 1026 1326 8469 951 Makefile 10 1082 220 103 759 51 Ruby 10 778 78 71 629 115 gitignore 10 150 16 0 134 0 Markdown 9 1935 527 0 1408 0 HTML 5 9658 2928 12 6718 0 C++ 4 286 48 14 224 31 License 4 100 20 0 80 0 YAML 4 266 20 3 243 0 CSS 2 107 16 0 91 0 Python 2 219 12 6 201 34 BASH 1 102 13 5 84 26 Batch 1 28 2 0 26 3 C++ Header 1 9 1 3 5 0 Extensible Styleshe… 1 10 0 0 10 0 Plain Text 1 23 7 0 16 0 Smarty Template 1 44 1 0 43 5 m4 1 562 116 53 393 0 ─────────────────────────────────────────────────────────────────────────────── Total 706 227849 27452 34959 165438 32124 ─────────────────────────────────────────────────────────────────────────────── Estimated Cost to Develop $5,769,821 Estimated Schedule Effort 29.862934 months Estimated People Required 22.886772 ───────────────────────────────────────────────────────────────────────────────```

Mechanical Sympathy

"You don't have to be an engineer to be be a racing driver, but you do have to have Mechanical Sympathy."

Jackie Stewart looking cool

How to Go fast 2019

"The key to making programs fast is to make them do practically nothing"

Do as little as possible.
On many cores.
Make it easy to do the next thing.

Go Fast - Measure

Your bottleneck is often not what you expect.

pprof

```(pprof) top10 Showing nodes accounting for 49.46s, 89.12% of 55.50s total Showing top 10 nodes out of 83 flat flat% sum% cum cum% 20.67s 37.24% 37.24% 20.70s 37.30% runtime.cgocall 17.41s 31.37% 68.61% 25.54s 46.02% github.com/boyter/scc/processor.countStats ```

Flame Graphs

Go Fast - Benchmark

In god we trust. Everyone else bring data.

Go benchmark tools are pretty good.

Code speaks volumes. Prove it.

Byte Comparison

Which is fastest?

```equal := reflect.DeepEqual(one, two)``` ```equal := bytes.Equal(one, two)``` ```

equal := true
for j := 0; j < len(one); j++ {
	if one[j] != two[j] {
		equal = false
		break
	}
}

```

Byte Comparison Continued

``` BenchmarkCheckByteEqualityReflect-8 5000000 344.00 ns/op BenchmarkCheckByteEqualityBytes-8 300000000 5.52 ns/op BenchmarkCheckByteEqualityLoop-8 500000000 3.76 ns/op ```

Why?

Loop & Check VS Change & Loop

``` Benchmark #1: ./scc1 linux Time (mean ± σ): 2.343 s ± 0.097 s [User: 27.740 s, System: 0.868 s] Range (min … max): 2.187 s … 2.509 s Benchmark #1: ./scc2 linux Time (mean ± σ): 1.392 s ± 0.019 s [User: 19.415 s, System: 0.825 s] Range (min … max): 1.367 s … 1.430 s ```

Why?

if statement Ordering

Serious micro-optimisation.

``` $ hyperfine -m 50 'scc1 cpython' Benchmark #1: scc1 cpython Time (mean ± σ): 522.9 ms ± 9.3 ms [User: 1.890 s, System: 1.740 s] Range (min … max): 510.1 ms … 577.7 ms $ hyperfine -m 50 'scc2 cpython' Benchmark #1: scc2 cpython Time (mean ± σ): 491.0 ms ± 10.2 ms [User: 1.628 s, System: 1.763 s] Range (min … max): 476.3 ms … 539.5 ms ```

Why?

Algorithms

How to check for conditions? ```/* /** <%-- --> # //``` ```for if each while switch && || != ==``` Loop. Bit-Masks. Trie

A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn". Mailinator blog about Trie.

Garbage Collector

Not tune-able. On/Off.

Turn off till some threshold.

Lazy Loading

Support 200+ languages.

Also caching of filenames -> language

Most noticeable with smaller repositories

``` Benchmark #1: scc-2.0.0 redis Time (mean ± σ): 124.4 ms ± 2.4 ms [User: 168.6 ms, System: 289.1 ms] Range (min … max): 120.0 ms … 128.4 ms Benchmark #1: scc-2.1.0 redis Time (mean ± σ): 81.6 ms ± 5.0 ms [User: 173.8 ms, System: 265.4 ms] Range (min … max): 75.5 ms … 97.1 ms ```

Annoyances (edge cases)

Verbatim strings

Nested Multi-line Comments

D-Lang

Python DocString's

Byte Order Marks (BOM)

Ah edge cases. The bottom of any code iceberg. In no particular order, some of the more annoying ones I had to deal with. Verbartim strings. These are especially annoying because they are a special edge case of string where you need to ignore escape characters. Nested multi line comments. I didn't even know this was a thing. Its a complier error in Go for example. Its how valid in Rust. Which means you need to keep a queue and push and pop to ensure you match correctly. D. Any D programmers? So D is problematic because it supports nested multi line comments, and has two ways of declaring them and you can nest them inside each other. This is an edge case I cannot be bothered to fix and I keep it as a open bug. Docstrings in python are annoying because people quite often don't want them counted as code. So you have to check if the previous bytes were whitespace to a colon. Byte Order Marks. So they are not actually required in UTF-8 and the spec suggests not to use them. For whatever reason they are common though. So you need to check for them, and if found skip them otherwise you can count the first line of a file as code when it might actually be a comment.

4. Summerise

Limited output (thankfully).

String concatenation benchmark

``` BenchmarkConcat-8 1000000 64850.00 ns/op BenchmarkBuffer-8 200000000 6.76 ns/op BenchmarkCopy-8 1000000000 3.06 ns/op BenchmarkStringBuilder-8 200000000 7.74 ns/op ```

Use StringBuilder to ensure >= Go 1.10