Skip to content

Profiling

Quick profiling in your terminal

Note

This is only meant to be used for quick profiling or programmatically accessing the profiling results. For more detailed and GUI friendly profiling proceed to the next section.

Simply replace the use of Base.@time or Base.@timed with Reactant.Profiler.@time or Reactant.Profiler.@timed. We will automatically compile the function if it is not already a Reactant compiled function (with sync=true).

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b

Reactant.@time linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:626
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1782337961.097347    4221 profiler_session.cc:119] Profiler session initializing.
I0000 00:00:1782337961.097398    4221 profiler_session.cc:134] Profiler session started.
I0000 00:00:1782337961.097678    4221 profiler_session.cc:82] Profiler session collecting data.
I0000 00:00:1782337961.098270    4221 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ/plugins/profile/2026_06_24_21_52_41/runnervm7b5n9.xplane.pb
I0000 00:00:1782337961.098412    4221 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ/plugins/profile/2026_06_24_21_52_41

I0000 00:00:1782337961.098520    4221 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ/plugins/profile/2026_06_24_21_52_41/runnervm7b5n9.trace.json.gz
I0000 00:00:1782337961.098535    4221 profiler_session.cc:152] Profiler session tear down.
Debug: Starting XProf gRPC server...
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:598
Debug: Initializing XProf stubs for worker service at 0.0.0.0:38711
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:397
I0000 00:00:1782337961.113407    4221 stub_factory.cc:163] Created gRPC channel for address: 0.0.0.0:38711
Debug: Starting XProf gRPC server on port 38711
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:413
I0000 00:00:1782337961.113803    4221 grpc_server.cc:94] Server listening on 0.0.0.0:38711 with max_concurrent_requests 1
I0000 00:00:1782337961.122735    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ/plugins/profile/2026_06_24_21_52_41
I0000 00:00:1782337961.122748    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1782337961.122751    4221 memory_profile_processor.cc:47] Processing memory profile for host: runnervm7b5n9
I0000 00:00:1782337961.123030    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 286.327us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ/plugins/profile/2026_06_24_21_52_41
I0000 00:00:1782337961.136196    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ/plugins/profile/2026_06_24_21_52_41
I0000 00:00:1782337961.136212    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1782337961.136215    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782337961.136261    4221 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1782337961.136264    4221 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1782337961.136268    4221 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1782337961.136389    4221 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1782337961.136397    4221 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782337961.136631    4221 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1782337961.136939    4221 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1782337961.137279    4221 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1782337961.137292    4221 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782337961.137426    4221 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1782337961.137671    4221 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1782337961.140438    4221 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1782337961.140453    4221 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782337961.140594    4221 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1782337961.140866    4221 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1782337961.140875    4221 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1782337961.141083    4221 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 221 bytes (0.000210762 MiB).
I0000 00:00:1782337961.141185    4221 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1782337961.141203    4221 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 4.941117ms
I0000 00:00:1782337961.141210    4221 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1782337961.141213    4221 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.252us
I0000 00:00:1782337961.141215    4221 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1782337961.141219    4221 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 3.045us
I0000 00:00:1782337961.141221    4221 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 4.957648ms
I0000 00:00:1782337961.141224    4221 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1782337961.141301    4221 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1782337961.141304    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 5.089766ms
I0000 00:00:1782337961.141407    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 5.199231ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ/plugins/profile/2026_06_24_21_52_41
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:816
I0000 00:00:1782337961.403095    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ/plugins/profile/2026_06_24_21_52_41
I0000 00:00:1782337961.403115    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1782337961.403118    4221 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1782337961.403120    4221 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1782337961.403122    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782337961.403202    4221 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1782337961.403252    4221 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1782337961.403255    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 133.35us
I0000 00:00:1782337961.403259    4221 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1782337961.403261    4221 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1782337961.403273    4221 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1782337961.403278    4221 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1782337961.403428    4221 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1782337961.403434    4221 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1782337961.403530    4221 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1782337961.403539    4221 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1782337961.403541    4221 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1782337961.403543    4221 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 281.979us
I0000 00:00:1782337961.403545    4221 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1782337961.403552    4221 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1782337961.403934    4221 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1782337961.403941    4221 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1782337961.404200    4221 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1782337961.404212    4221 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 1.094875ms
I0000 00:00:1782337961.404227    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 1.116426ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_aloGwZ/plugins/profile/2026_06_24_21_52_41
  runtime: 0.00022928s
  compile time: 3.93722964s
julia
Reactant.@timed nrepeat=100 linear(x, W, b)
AggregateProfilingResult(
    runtime = 0.00003061s, 
    compile_time = 0.11644153s, )

Note that the information returned depends on the backend. Specifically CUDA and TPU backends provide more detailed information regarding memory usage and allocation (something like the following will be displayed on GPUs):

julia
AggregateProfilingResult(
    runtime = 0.00003829s, 
    compile_time = 2.18053260s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 30.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514931,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 30.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.04975365s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.8369974648038653e-9,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.8369974648038653e-9, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00040298422s,  # Raw time in seconds
        RawFlopsRate = 1.0074836180930361e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 1.0074836180930361e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Additionally for GPUs and TPUs, we can use the Reactant.@profile macro to profile the function and get information regarding each of the kernels executed.

julia
Reactant.@profile linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:626
I0000 00:00:1782337962.085271    4221 profiler_session.cc:119] Profiler session initializing.
I0000 00:00:1782337962.085401    4221 profiler_session.cc:134] Profiler session started.
I0000 00:00:1782337962.085503    4221 profiler_session.cc:82] Profiler session collecting data.
I0000 00:00:1782337962.085940    4221 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42/runnervm7b5n9.xplane.pb
I0000 00:00:1782337962.086086    4221 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42

I0000 00:00:1782337962.086239    4221 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42/runnervm7b5n9.trace.json.gz
I0000 00:00:1782337962.086257    4221 profiler_session.cc:152] Profiler session tear down.
I0000 00:00:1782337962.086313    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42
I0000 00:00:1782337962.086319    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1782337962.086321    4221 memory_profile_processor.cc:47] Processing memory profile for host: runnervm7b5n9
I0000 00:00:1782337962.086453    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 139.201us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42
I0000 00:00:1782337962.086474    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42
I0000 00:00:1782337962.086478    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1782337962.086480    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782337962.086507    4221 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1782337962.086509    4221 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1782337962.086512    4221 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1782337962.086586    4221 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1782337962.086594    4221 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782337962.086784    4221 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1782337962.087120    4221 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1782337962.087586    4221 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1782337962.087609    4221 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782337962.087755    4221 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1782337962.087982    4221 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1782337962.088481    4221 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1782337962.088506    4221 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782337962.088642    4221 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1782337962.088910    4221 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1782337962.088917    4221 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1782337962.089084    4221 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 265 bytes (0.000252724 MiB).
I0000 00:00:1782337962.089201    4221 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1782337962.089243    4221 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 2.736295ms
I0000 00:00:1782337962.089248    4221 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1782337962.089251    4221 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.553us
I0000 00:00:1782337962.089253    4221 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1782337962.089258    4221 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 3.497us
I0000 00:00:1782337962.089260    4221 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 2.751012ms
I0000 00:00:1782337962.089263    4221 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1782337962.089318    4221 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1782337962.089322    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 2.841943ms
I0000 00:00:1782337962.089344    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 2.867822ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:816
I0000 00:00:1782337962.089649    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42
I0000 00:00:1782337962.089656    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1782337962.089658    4221 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1782337962.089660    4221 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1782337962.089661    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782337962.089694    4221 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1782337962.089728    4221 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1782337962.089730    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 68.97us
I0000 00:00:1782337962.089732    4221 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1782337962.089735    4221 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1782337962.089738    4221 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1782337962.089742    4221 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1782337962.089770    4221 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1782337962.089774    4221 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1782337962.089779    4221 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1782337962.089784    4221 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1782337962.089786    4221 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1782337962.089788    4221 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 53.821us
I0000 00:00:1782337962.089790    4221 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1782337962.089794    4221 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1782337962.090057    4221 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1782337962.090064    4221 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1782337962.090291    4221 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1782337962.090300    4221 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 643.017us
I0000 00:00:1782337962.090311    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 658.145us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42
I0000 00:00:1782337962.179589    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: kernel_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42
I0000 00:00:1782337962.179616    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: kernel_stats
I0000 00:00:1782337962.179620    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782337962.179680    4221 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1782337962.179736    4221 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1782337962.179738    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 118.723us
I0000 00:00:1782337962.179801    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool kernel_stats: 189.886us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42
I0000 00:00:1782337962.336757    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: framework_op_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42
I0000 00:00:1782337962.336801    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: framework_op_stats
I0000 00:00:1782337962.336805    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782337962.336863    4221 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1782337962.336910    4221 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1782337962.336913    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 108.033us
I0000 00:00:1782337962.337046    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool framework_op_stats: 248.797us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_xHwTjJ/plugins/profile/2026_06_24_21_52_42

╔================================================================================╗
║ SUMMARY                                                                        ║
╚================================================================================╝

AggregateProfilingResult(
    runtime = 0.00007937s,
    compile_time = 0.11128608s,  # time spent compiling by Reactant
)

On GPUs this would look something like the following:

julia
================================================================================
║ KERNEL STATISTICS                                                              ║
================================================================================

┌───────────────────┬─────────────┬────────────────┬──────────────┬──────────────┬──────────────┬──────────────┬───────────┬──────────┬────────────┬─────────────┐
│       Kernel Name │ Occurrences │ Total Duration │ Avg Duration │ Min Duration │ Max Duration │ Static Shmem │ Block Dim │ Grid Dim │ TensorCore │ Occupancy %
├───────────────────┼─────────────┼────────────────┼──────────────┼──────────────┼──────────────┼──────────────┼───────────┼──────────┼────────────┼─────────────┤
│ gemm_fusion_dot_1 │           10.00000250s │  0.00000250s │  0.00000250s │  0.00000250s │    2.000 KiB │    64,1,11,1,1 │          ✗ │      100.0%
│   loop_add_fusion │           10.00000131s │  0.00000131s │  0.00000131s │  0.00000131s │      0 bytes │    20,1,11,1,1 │          ✗ │       31.2%
└───────────────────┴─────────────┴────────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────┴──────────┴────────────┴─────────────┘

================================================================================
║ FRAMEWORK OP STATISTICS                                                        ║
================================================================================

┌───────────────────┬─────────┬─────────────┬─────────────┬─────────────────┬───────────────┬──────────┬───────────┬──────────────┬──────────┐
│         Operation │    Type │ Host/Device │ Occurrences │ Total Self-Time │ Avg Self-Time │ Device % │ Memory BW │    FLOP Rate │ Bound By │
├───────────────────┼─────────┼─────────────┼─────────────┼─────────────────┼───────────────┼──────────┼───────────┼──────────────┼──────────┤
│ gemm_fusion_dot.1 │ Unknown │      Device │           10.00000250s │   0.00000250s │   65.55%1.82 GB/s │  1.6 GFLOP/s │      HBM │
+/add │     add │      Device │           10.00000131s │   0.00000131s │   34.45%0.14 GB/s │ 0.05 GFLOP/s │      HBM │
└───────────────────┴─────────┴─────────────┴─────────────┴─────────────────┴───────────────┴──────────┴───────────┴──────────────┴──────────┘

================================================================================
║ SUMMARY                                                                        ║
================================================================================

AggregateProfilingResult(
    runtime = 0.00005622s, 
    compile_time = 2.32802137s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 81.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514564,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 81.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.00608052s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.033375207640664e-8,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.033375207640664e-8, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00005622s,  # Raw time in seconds
        RawFlopsRate = 7.220987105380169e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 7.220987105380169e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Capturing traces

When running Reactant, it is possible to capture traces using the XLA profiler. These traces can provide information about where the XLA specific parts of program spend time during compilation or execution. Note that tracing and compilation happen on the CPU even though the final execution is aimed to run on another device such as GPU or TPU. Therefore, including tracing and compilation in a trace will create annotations on the CPU.

Let's setup a simple function which we can then profile

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b
linear (generic function with 1 method)

The profiler can be accessed using the Reactant.with_profiler function.

julia
Reactant.with_profiler("./") do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end
10×2 ConcretePJRTArray{Float32,2}:
  12.8371    14.8446
   8.64564    1.72928
   4.23963    5.06735
   8.4381   -15.3863
   5.65091    1.78899
  21.5165     2.05188
 -13.6244    -8.03165
   3.94115   11.3669
   2.1578   -24.5417
   7.70937    1.5869

Running this function should create a folder called plugins in the folder provided to Reactant.with_profiler which will contain the trace files. The traces can then be visualized in different ways.

Note

For more insights about the current state of Reactant, it is possible to fetch device information about allocations using the Reactant.XLA.allocatorstats function.

Perfetto UI

The first and easiest way to visualize a captured trace is to use the online perfetto.dev tool. Reactant.with_profiler has a keyword parameter called create_perfetto_link which will create a usable perfetto URL for the generated trace. The function will block execution until the URL has been clicked and the trace is visualized. The URL only works once.

julia
Reactant.with_profiler("./"; create_perfetto_link=true) do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end

Note

It is recommended to use the Chrome browser to open the perfetto URL.

XProf

XProf is a complete web UI to analyze the log files captured by Reactant. It can be installed in the following manner:

bash
pip install xprof # or xprof-nightly

Launching xprof is then as simple as:

bash
xprof --logdir=./

which will then make the xprof interface available on port :8791 by default.

Tensorboard

Another option to visualize the generated trace files is to use the tensorboard profiler plugin. The tensorboard viewer can offer more details than the timeline view such as visualization for compute graphs.

First install tensorboard and its profiler plugin:

bash
pip install tensorboard tensorboard-plugin-profile

And then run the following in the folder where the plugins folder was generated:

bash
tensorboard --logdir ./

Adding Custom Annotations

By default, the traces contain only information captured from within XLA. The Reactant.Profiler.annotate function can be used to annotate traces for Julia code evaluated during tracing.

julia
Reactant.Profiler.annotate("my_annotation") do
    # Do things...
end

The added annotations will be captured in the traces and can be seen in the different viewers along with the default XLA annotations. When the profiler is not activated, then the custom annotations have no effect and can therefore always be activated.