Skip to content

Profiling

Quick profiling in your terminal

Note

This is only meant to be used for quick profiling or programmatically accessing the profiling results. For more detailed and GUI friendly profiling proceed to the next section.

Simply replace the use of Base.@time or Base.@timed with Reactant.Profiler.@time or Reactant.Profiler.@timed. We will automatically compile the function if it is not already a Reactant compiled function (with sync=true).

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b

Reactant.@time linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:516
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1773433311.507267    4143 profiler_session.cc:117] Profiler session initializing.
I0000 00:00:1773433311.507306    4143 profiler_session.cc:132] Profiler session started.
I0000 00:00:1773433311.507613    4143 profiler_session.cc:81] Profiler session collecting data.
I0000 00:00:1773433311.508175    4143 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P/plugins/profile/2026_03_13_20_21_51/runnervm46oaq.xplane.pb
I0000 00:00:1773433311.508319    4143 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P/plugins/profile/2026_03_13_20_21_51

I0000 00:00:1773433311.508440    4143 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P/plugins/profile/2026_03_13_20_21_51/runnervm46oaq.trace.json.gz
I0000 00:00:1773433311.508457    4143 profiler_session.cc:150] Profiler session tear down.
Debug: Starting XProf gRPC server...
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:490
Debug: Initializing XProf stubs for worker service at 0.0.0.0:38129
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:289
I0000 00:00:1773433311.523180    4143 stub_factory.cc:163] Created gRPC channel for address: 0.0.0.0:38129
Debug: Starting XProf gRPC server on port 38129
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:305
I0000 00:00:1773433311.523634    4143 grpc_server.cc:94] Server listening on 0.0.0.0:38129 with max_concurrent_requests 1
I0000 00:00:1773433311.532373    4143 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P/plugins/profile/2026_03_13_20_21_51
I0000 00:00:1773433311.532389    4143 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1773433311.532392    4143 memory_profile_processor.cc:47] Processing memory profile for host: runnervm46oaq
I0000 00:00:1773433311.532675    4143 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 291.603us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P/plugins/profile/2026_03_13_20_21_51
I0000 00:00:1773433311.545539    4143 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P/plugins/profile/2026_03_13_20_21_51
I0000 00:00:1773433311.545554    4143 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1773433311.545557    4143 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773433311.545606    4143 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1773433311.545609    4143 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1773433311.545614    4143 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1773433311.545724    4143 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1773433311.545733    4143 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773433311.545988    4143 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1773433311.546318    4143 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1773433311.546623    4143 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1773433311.546637    4143 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773433311.546842    4143 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1773433311.547112    4143 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1773433311.549642    4143 xplane_to_op_stats.cc:402] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1773433311.549657    4143 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773433311.549841    4143 xplane_to_op_stats.cc:458] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1773433311.550127    4143 xplane_to_op_stats.cc:414] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1773433311.550137    4143 xplane_to_op_stats.cc:419] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1773433311.550330    4143 xplane_to_op_stats.cc:684] ConvertXSpaceToOpStats: Final OpStats size: 220 bytes (0.000209808 MiB).
I0000 00:00:1773433311.550397    4143 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1773433311.550410    4143 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 4.802416ms
I0000 00:00:1773433311.550416    4143 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1773433311.550420    4143 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.473us
I0000 00:00:1773433311.550422    4143 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1773433311.550426    4143 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 2.885us
I0000 00:00:1773433311.550429    4143 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 4.819898ms
I0000 00:00:1773433311.550432    4143 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1773433311.550510    4143 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1773433311.550515    4143 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 4.958036ms
I0000 00:00:1773433311.550619    4143 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 5.069623ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P/plugins/profile/2026_03_13_20_21_51
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:706
I0000 00:00:1773433311.815017    4143 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P/plugins/profile/2026_03_13_20_21_51
I0000 00:00:1773433311.815038    4143 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1773433311.815041    4143 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1773433311.815044    4143 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1773433311.815046    4143 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773433311.815107    4143 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1773433311.815158    4143 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1773433311.815161    4143 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 114.683us
I0000 00:00:1773433311.815165    4143 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1773433311.815168    4143 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1773433311.815174    4143 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1773433311.815179    4143 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1773433311.815336    4143 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1773433311.815344    4143 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1773433311.815441    4143 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1773433311.815452    4143 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1773433311.815454    4143 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1773433311.815456    4143 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 289.179us
I0000 00:00:1773433311.815459    4143 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1773433311.815467    4143 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1773433311.815905    4143 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1773433311.815915    4143 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1773433311.816125    4143 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1773433311.816133    4143 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 1.092125ms
I0000 00:00:1773433311.816145    4143 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 1.112192ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_INVa4P/plugins/profile/2026_03_13_20_21_51
  runtime: 0.00024733s
  compile time: 3.56487194s
julia
Reactant.@timed nrepeat=100 linear(x, W, b)
AggregateProfilingResult(
    runtime = 0.00002677s, 
    compile_time = 0.12766734s, )

Note that the information returned depends on the backend. Specifically CUDA and TPU backends provide more detailed information regarding memory usage and allocation (something like the following will be displayed on GPUs):

julia
AggregateProfilingResult(
    runtime = 0.00003829s, 
    compile_time = 2.18053260s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 30.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514931,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 30.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.04975365s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.8369974648038653e-9,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.8369974648038653e-9, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00040298422s,  # Raw time in seconds
        RawFlopsRate = 1.0074836180930361e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 1.0074836180930361e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Additionally for GPUs and TPUs, we can use the Reactant.@profile macro to profile the function and get information regarding each of the kernels executed.

julia
Reactant.@profile linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:516
I0000 00:00:1773433312.546424    4143 profiler_session.cc:117] Profiler session initializing.
I0000 00:00:1773433312.546501    4143 profiler_session.cc:132] Profiler session started.
I0000 00:00:1773433312.546592    4143 profiler_session.cc:81] Profiler session collecting data.
I0000 00:00:1773433312.547013    4143 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52/runnervm46oaq.xplane.pb
I0000 00:00:1773433312.547160    4143 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52

I0000 00:00:1773433312.547274    4143 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52/runnervm46oaq.trace.json.gz
I0000 00:00:1773433312.547290    4143 profiler_session.cc:150] Profiler session tear down.
I0000 00:00:1773433312.547350    4143 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52
I0000 00:00:1773433312.547356    4143 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1773433312.547358    4143 memory_profile_processor.cc:47] Processing memory profile for host: runnervm46oaq
I0000 00:00:1773433312.547490    4143 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 137.927us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52
I0000 00:00:1773433312.547511    4143 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52
I0000 00:00:1773433312.547515    4143 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1773433312.547517    4143 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773433312.547547    4143 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1773433312.547550    4143 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1773433312.547553    4143 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1773433312.547621    4143 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1773433312.547628    4143 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773433312.547851    4143 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1773433312.548139    4143 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1773433312.548467    4143 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1773433312.548484    4143 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773433312.548632    4143 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1773433312.548909    4143 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1773433312.549269    4143 xplane_to_op_stats.cc:402] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1773433312.549287    4143 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773433312.549440    4143 xplane_to_op_stats.cc:458] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1773433312.549698    4143 xplane_to_op_stats.cc:414] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1773433312.549708    4143 xplane_to_op_stats.cc:419] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1773433312.549870    4143 xplane_to_op_stats.cc:684] ConvertXSpaceToOpStats: Final OpStats size: 262 bytes (0.000249863 MiB).
I0000 00:00:1773433312.549940    4143 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1773433312.549965    4143 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 2.417491ms
I0000 00:00:1773433312.549970    4143 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1773433312.549973    4143 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.623us
I0000 00:00:1773433312.549976    4143 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1773433312.549981    4143 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 3.417us
I0000 00:00:1773433312.549983    4143 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 2.433712ms
I0000 00:00:1773433312.549986    4143 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1773433312.550045    4143 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1773433312.550050    4143 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 2.532847ms
I0000 00:00:1773433312.550071    4143 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 2.558955ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:706
I0000 00:00:1773433312.550373    4143 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52
I0000 00:00:1773433312.550380    4143 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1773433312.550383    4143 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1773433312.550385    4143 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1773433312.550387    4143 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773433312.550421    4143 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1773433312.550457    4143 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1773433312.550459    4143 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 72.766us
I0000 00:00:1773433312.550462    4143 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1773433312.550464    4143 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1773433312.550469    4143 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1773433312.550472    4143 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1773433312.550501    4143 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1773433312.550506    4143 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1773433312.550511    4143 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1773433312.550516    4143 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1773433312.550519    4143 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1773433312.550520    4143 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 56.445us
I0000 00:00:1773433312.550523    4143 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1773433312.550527    4143 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1773433312.550764    4143 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1773433312.550772    4143 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1773433312.550981    4143 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1773433312.550991    4143 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 608.233us
I0000 00:00:1773433312.551001    4143 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 623.591us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52
I0000 00:00:1773433312.644960    4143 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: kernel_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52
I0000 00:00:1773433312.644993    4143 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: kernel_stats
I0000 00:00:1773433312.644997    4143 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773433312.645059    4143 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1773433312.645116    4143 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1773433312.645119    4143 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 122.509us
I0000 00:00:1773433312.645177    4143 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool kernel_stats: 189.823us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52
I0000 00:00:1773433312.777128    4143 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: framework_op_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52
I0000 00:00:1773433312.777158    4143 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: framework_op_stats
I0000 00:00:1773433312.777162    4143 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773433312.777222    4143 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1773433312.777272    4143 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1773433312.777276    4143 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 113.862us
I0000 00:00:1773433312.777408    4143 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool framework_op_stats: 253.442us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_tulxb1/plugins/profile/2026_03_13_20_21_52

╔================================================================================╗
║ SUMMARY                                                                        ║
╚================================================================================╝

AggregateProfilingResult(
    runtime = 0.00006938s,
    compile_time = 0.12139655s,  # time spent compiling by Reactant
)

On GPUs this would look something like the following:

julia
================================================================================
║ KERNEL STATISTICS                                                              ║
================================================================================

┌───────────────────┬─────────────┬────────────────┬──────────────┬──────────────┬──────────────┬──────────────┬───────────┬──────────┬────────────┬─────────────┐
│       Kernel Name │ Occurrences │ Total Duration │ Avg Duration │ Min Duration │ Max Duration │ Static Shmem │ Block Dim │ Grid Dim │ TensorCore │ Occupancy %
├───────────────────┼─────────────┼────────────────┼──────────────┼──────────────┼──────────────┼──────────────┼───────────┼──────────┼────────────┼─────────────┤
│ gemm_fusion_dot_1 │           10.00000250s │  0.00000250s │  0.00000250s │  0.00000250s │    2.000 KiB │    64,1,11,1,1 │          ✗ │      100.0%
│   loop_add_fusion │           10.00000131s │  0.00000131s │  0.00000131s │  0.00000131s │      0 bytes │    20,1,11,1,1 │          ✗ │       31.2%
└───────────────────┴─────────────┴────────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────┴──────────┴────────────┴─────────────┘

================================================================================
║ FRAMEWORK OP STATISTICS                                                        ║
================================================================================

┌───────────────────┬─────────┬─────────────┬─────────────┬─────────────────┬───────────────┬──────────┬───────────┬──────────────┬──────────┐
│         Operation │    Type │ Host/Device │ Occurrences │ Total Self-Time │ Avg Self-Time │ Device % │ Memory BW │    FLOP Rate │ Bound By │
├───────────────────┼─────────┼─────────────┼─────────────┼─────────────────┼───────────────┼──────────┼───────────┼──────────────┼──────────┤
│ gemm_fusion_dot.1 │ Unknown │      Device │           10.00000250s │   0.00000250s │   65.55%1.82 GB/s │  1.6 GFLOP/s │      HBM │
+/add │     add │      Device │           10.00000131s │   0.00000131s │   34.45%0.14 GB/s │ 0.05 GFLOP/s │      HBM │
└───────────────────┴─────────┴─────────────┴─────────────┴─────────────────┴───────────────┴──────────┴───────────┴──────────────┴──────────┘

================================================================================
║ SUMMARY                                                                        ║
================================================================================

AggregateProfilingResult(
    runtime = 0.00005622s, 
    compile_time = 2.32802137s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 81.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514564,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 81.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.00608052s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.033375207640664e-8,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.033375207640664e-8, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00005622s,  # Raw time in seconds
        RawFlopsRate = 7.220987105380169e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 7.220987105380169e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Capturing traces

When running Reactant, it is possible to capture traces using the XLA profiler. These traces can provide information about where the XLA specific parts of program spend time during compilation or execution. Note that tracing and compilation happen on the CPU even though the final execution is aimed to run on another device such as GPU or TPU. Therefore, including tracing and compilation in a trace will create annotations on the CPU.

Let's setup a simple function which we can then profile

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b
linear (generic function with 1 method)

The profiler can be accessed using the Reactant.with_profiler function.

julia
Reactant.with_profiler("./") do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end
10×2 ConcretePJRTArray{Float32,2}:
   9.2039   -22.417
  20.4223   -12.6139
 -13.6102    -4.97859
   8.70096   12.7834
  -3.66423  -14.2185
   4.59465   10.6098
  -2.26824   12.5408
 -12.041     12.6127
   4.95346    3.13896
  20.1539    -3.73828

Running this function should create a folder called plugins in the folder provided to Reactant.with_profiler which will contain the trace files. The traces can then be visualized in different ways.

Note

For more insights about the current state of Reactant, it is possible to fetch device information about allocations using the Reactant.XLA.allocatorstats function.

Perfetto UI

The first and easiest way to visualize a captured trace is to use the online perfetto.dev tool. Reactant.with_profiler has a keyword parameter called create_perfetto_link which will create a usable perfetto URL for the generated trace. The function will block execution until the URL has been clicked and the trace is visualized. The URL only works once.

julia
Reactant.with_profiler("./"; create_perfetto_link=true) do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end

Note

It is recommended to use the Chrome browser to open the perfetto URL.

Tensorboard

Another option to visualize the generated trace files is to use the tensorboard profiler plugin. The tensorboard viewer can offer more details than the timeline view such as visualization for compute graphs.

First install tensorboard and its profiler plugin:

bash
pip install tensorboard tensorboard-plugin-profile

And then run the following in the folder where the plugins folder was generated:

bash
tensorboard --logdir ./

Adding Custom Annotations

By default, the traces contain only information captured from within XLA. The Reactant.Profiler.annotate function can be used to annotate traces for Julia code evaluated during tracing.

julia
Reactant.Profiler.annotate("my_annotation") do
    # Do things...
end

The added annotations will be captured in the traces and can be seen in the different viewers along with the default XLA annotations. When the profiler is not activated, then the custom annotations have no effect and can therefore always be activated.