Is there a performance difference between a macro and a global constant?

Disclaimer on 2018-11-30: These tests were performed under Julia v0.6. Due to the incompatible syntax changes brought by Julia v0.7 and v1.0 and improved compiler optimization, they may not run or the emitted low-level code may not be the same.

The use of the global variable is discouraged in Julia for performance reasons (see the Julia documentation). But what about global constants? I'm quite new to the Julia language, but I bet that it would not be difficult for the compiler to identify a const and optimize over it.

A C programmer may be accustomed to defining constants in macros, like in

#define PI 3.14159265

In numerical computing, since often the global constants needed are of numerical types, macros suffice for most cases. In fact, a macro may sometimes be a better choice because the compiler knows its value ahead of time and can do some optimization (e.g., this).

(Caveat: With an -O3 flag in GCC, a static const is treated no different from a macro. See this.)

Does this analogy apply to Julia? Below I test it with implementations of a simplistic function that multiplies its argument by a constant.

using BenchmarkTools, Compat

const g = 9.80665

macro const_g()
    return :(9.80665)
end

function f_global(x)
    return g * x
end

function f_macro(x)
    return (@const_g) * x
end

The microbenchmark test shows

julia> x_test = rand(10^5);

julia> @benchmark f_global($x_test)

BenchmarkTools.Trial:
  memory estimate:  781.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     58.201 μs (0.00% GC)
  median time:      73.425 μs (0.00% GC)
  mean time:        251.767 μs (11.68% GC)
  maximum time:     2.083 ms (67.26% GC)
  --------------
  samples:          10000
  evals/sample:     1

@benchmark f_macro($x_test)

BenchmarkTools.Trial:
  memory estimate:  781.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     59.888 μs (0.00% GC)
  median time:      73.803 μs (0.00% GC)
  mean time:        253.644 μs (11.67% GC)
  maximum time:     2.867 ms (72.65% GC)
  --------------
  samples:          10000
  evals/sample:     1

I get the same results in terms of execution time and memory allocation.

And how does Julia treat the two functions?

julia> @code_llvm f_global(42.)

define double @julia_f_global_62616(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, 9.806650e+00
  ret double %1
}

julia> @code_llvm f_macro(42.)

define double @julia_f_macro_62620(double) #0 !dbg !5 {
top:
  %1 = fmul double %0, 9.806650e+00
  ret double %1
}

From the generated LLVM code, there is no difference between the atomic operations performed on a scalar input. The compiler translates both the global constant g and the macro @const_g to the values they represent. Thus, defining a global constant as a macro does not grant you any performance gain. (Well, it may also be due to that LLVM code is not machine code.) In terms of style, a global const would be the preferable way because it is clear and concise.

How about using a global variable?

gvar = 9.80665  # <- variable in the REPL has implicit `global`

function f_globalvar(x)
    return gvar * x
end

The microbenchmark test result is

julia> @benchmark f_globalvar($x_test)

BenchmarkTools.Trial:
  memory estimate:  781.33 KiB
  allocs estimate:  2
  --------------
  minimum time:     63.804 μs (0.00% GC)
  median time:      77.266 μs (0.00% GC)
  mean time:        264.945 μs (12.06% GC)
  maximum time:     2.175 ms (71.80% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @code_llvm f_globalvar(42.)

define i8** @julia_f_globalvar_62652(double) #0 !dbg !5 {
top:
  %1 = call i8**** @jl_get_ptls_states() #3
  %2 = alloca [5 x i8**], align 8
  %.sub = getelementptr inbounds [5 x i8**], [5 x i8**]* %2, i64 0, i64 0
  %3 = getelementptr [5 x i8**], [5 x i8**]* %2, i64 0, i64 2
  %4 = bitcast i8*** %3 to i8*
  call void @llvm.memset.p0i8.i32(i8* %4, i8 0, i32 24, i32 8, i1 false)
  %5 = bitcast [5 x i8**]* %2 to i64*
  store i64 6, i64* %5, align 8
  %6 = bitcast i8**** %1 to i64*
  %7 = load i64, i64* %6, align 8
  %8 = getelementptr [5 x i8**], [5 x i8**]* %2, i64 0, i64 1
  %9 = bitcast i8*** %8 to i64*
  store i64 %7, i64* %9, align 8
  store i8*** %.sub, i8**** %1, align 8
  %10 = getelementptr [5 x i8**], [5 x i8**]* %2, i64 0, i64 4
  %11 = getelementptr [5 x i8**], [5 x i8**]* %2, i64 0, i64 3
  %12 = load i64, i64* inttoptr (i64 4492535192 to i64*), align 8
  %13 = bitcast i8*** %11 to i64*
  store i64 %12, i64* %13, align 8
  store i8** inttoptr (i64 4431924168 to i8**), i8*** %3, align 8
  %14 = bitcast i8**** %1 to i8*
  %15 = call i8** @jl_gc_pool_alloc(i8* %14, i32 1384, i32 16)
  %16 = getelementptr i8*, i8** %15, i64 -1
  %17 = bitcast i8** %16 to i8***
  store i8** inttoptr (i64 4431301936 to i8**), i8*** %17, align 8
  %18 = bitcast i8** %15 to double*
  store double %0, double* %18, align 8
  store i8** %15, i8*** %10, align 8
  %19 = call i8** @jl_apply_generic(i8*** %3, i32 3)
  %20 = load i64, i64* %9, align 8
  store i64 %20, i64* %6, align 8
  ret i8** %19
}

The microbenchmark result of the function that uses a global variable is slightly slower, but the translated LLVM code is way more complicated. The slowdown in performance would potentially be nontrivial in real-world cases.

Summary: Avoid global variables, but it is fine to use global constants. The macro does not necessarily give a performance gain with respect to a global constant, probably because LLVM isn't GCC. For styles and code maintenance, using global constants is preferable to macros that represent constants.