Skip to content

Profiling

Quick profiling in your terminal

Note

This is only meant to be used for quick profiling or programmatically accessing the profiling results. For more detailed and GUI friendly profiling proceed to the next section.

Simply replace the use of Base.@time or Base.@timed with Reactant.Profiler.@time or Reactant.Profiler.@timed. We will automatically compile the function if it is not already a Reactant compiled function (with sync=true).

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b

Reactant.@time linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:626
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1778690219.584913    4335 profiler_session.cc:119] Profiler session initializing.
I0000 00:00:1778690219.584955    4335 profiler_session.cc:134] Profiler session started.
I0000 00:00:1778690219.585294    4335 profiler_session.cc:82] Profiler session collecting data.
I0000 00:00:1778690219.585904    4335 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd/plugins/profile/2026_05_13_16_36_59/runnervmeorf1.xplane.pb
I0000 00:00:1778690219.586047    4335 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd/plugins/profile/2026_05_13_16_36_59

I0000 00:00:1778690219.586163    4335 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd/plugins/profile/2026_05_13_16_36_59/runnervmeorf1.trace.json.gz
I0000 00:00:1778690219.586179    4335 profiler_session.cc:152] Profiler session tear down.
Debug: Starting XProf gRPC server...
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:598
Debug: Initializing XProf stubs for worker service at 0.0.0.0:42911
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:397
I0000 00:00:1778690219.601277    4335 stub_factory.cc:163] Created gRPC channel for address: 0.0.0.0:42911
Debug: Starting XProf gRPC server on port 42911
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:413
I0000 00:00:1778690219.601707    4335 grpc_server.cc:94] Server listening on 0.0.0.0:42911 with max_concurrent_requests 1
I0000 00:00:1778690219.610608    4335 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd/plugins/profile/2026_05_13_16_36_59
I0000 00:00:1778690219.610622    4335 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1778690219.610625    4335 memory_profile_processor.cc:47] Processing memory profile for host: runnervmeorf1
I0000 00:00:1778690219.610923    4335 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 307.713us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd/plugins/profile/2026_05_13_16_36_59
I0000 00:00:1778690219.624998    4335 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd/plugins/profile/2026_05_13_16_36_59
I0000 00:00:1778690219.625018    4335 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1778690219.625022    4335 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778690219.625069    4335 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1778690219.625072    4335 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1778690219.625076    4335 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1778690219.625206    4335 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1778690219.625215    4335 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778690219.625471    4335 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1778690219.625730    4335 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1778690219.626079    4335 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1778690219.626093    4335 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778690219.626273    4335 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1778690219.626539    4335 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1778690219.629253    4335 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1778690219.629269    4335 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778690219.629432    4335 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1778690219.629769    4335 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1778690219.629778    4335 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1778690219.629983    4335 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 222 bytes (0.000211716 MiB).
I0000 00:00:1778690219.630052    4335 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1778690219.630065    4335 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 4.995269ms
I0000 00:00:1778690219.630072    4335 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1778690219.630075    4335 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.693us
I0000 00:00:1778690219.630077    4335 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1778690219.630081    4335 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 2.845us
I0000 00:00:1778690219.630083    4335 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 5.01188ms
I0000 00:00:1778690219.630086    4335 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1778690219.630170    4335 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1778690219.630174    4335 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 5.152712ms
I0000 00:00:1778690219.630280    4335 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 5.266194ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd/plugins/profile/2026_05_13_16_36_59
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:816
I0000 00:00:1778690219.901841    4335 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd/plugins/profile/2026_05_13_16_36_59
I0000 00:00:1778690219.901862    4335 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1778690219.901865    4335 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1778690219.901868    4335 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1778690219.901870    4335 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778690219.901932    4335 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1778690219.901980    4335 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1778690219.901983    4335 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 113.281us
I0000 00:00:1778690219.901987    4335 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1778690219.901989    4335 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1778690219.901995    4335 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1778690219.902006    4335 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1778690219.902160    4335 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1778690219.902167    4335 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1778690219.902269    4335 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1778690219.902278    4335 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1778690219.902280    4335 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1778690219.902282    4335 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 293.446us
I0000 00:00:1778690219.902285    4335 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1778690219.902293    4335 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1778690219.902722    4335 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1778690219.902729    4335 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1778690219.902952    4335 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1778690219.902960    4335 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 1.09605ms
I0000 00:00:1778690219.902974    4335 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 1.116168ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_swmgRd/plugins/profile/2026_05_13_16_36_59
  runtime: 0.00029140s
  compile time: 3.83738834s
julia
Reactant.@timed nrepeat=100 linear(x, W, b)
AggregateProfilingResult(
    runtime = 0.00003341s, 
    compile_time = 0.12188423s, )

Note that the information returned depends on the backend. Specifically CUDA and TPU backends provide more detailed information regarding memory usage and allocation (something like the following will be displayed on GPUs):

julia
AggregateProfilingResult(
    runtime = 0.00003829s, 
    compile_time = 2.18053260s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 30.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514931,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 30.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.04975365s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.8369974648038653e-9,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.8369974648038653e-9, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00040298422s,  # Raw time in seconds
        RawFlopsRate = 1.0074836180930361e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 1.0074836180930361e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Additionally for GPUs and TPUs, we can use the Reactant.@profile macro to profile the function and get information regarding each of the kernels executed.

julia
Reactant.@profile linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:626
I0000 00:00:1778690220.600801    4335 profiler_session.cc:119] Profiler session initializing.
I0000 00:00:1778690220.600919    4335 profiler_session.cc:134] Profiler session started.
I0000 00:00:1778690220.601007    4335 profiler_session.cc:82] Profiler session collecting data.
I0000 00:00:1778690220.601492    4335 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00/runnervmeorf1.xplane.pb
I0000 00:00:1778690220.601645    4335 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00

I0000 00:00:1778690220.601759    4335 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00/runnervmeorf1.trace.json.gz
I0000 00:00:1778690220.601775    4335 profiler_session.cc:152] Profiler session tear down.
I0000 00:00:1778690220.601832    4335 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00
I0000 00:00:1778690220.601838    4335 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1778690220.601840    4335 memory_profile_processor.cc:47] Processing memory profile for host: runnervmeorf1
I0000 00:00:1778690220.601975    4335 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 141.664us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00
I0000 00:00:1778690220.601999    4335 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00
I0000 00:00:1778690220.602003    4335 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1778690220.602005    4335 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778690220.602033    4335 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1778690220.602035    4335 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1778690220.602038    4335 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1778690220.602110    4335 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1778690220.602118    4335 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778690220.602311    4335 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1778690220.602649    4335 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1778690220.603053    4335 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1778690220.603078    4335 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778690220.603215    4335 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1778690220.603449    4335 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1778690220.603893    4335 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1778690220.603916    4335 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778690220.604047    4335 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1778690220.604316    4335 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1778690220.604325    4335 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1778690220.604522    4335 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 265 bytes (0.000252724 MiB).
I0000 00:00:1778690220.604600    4335 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1778690220.604638    4335 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 2.605711ms
I0000 00:00:1778690220.604643    4335 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1778690220.604646    4335 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.513us
I0000 00:00:1778690220.604648    4335 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1778690220.604653    4335 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 3.566us
I0000 00:00:1778690220.604655    4335 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 2.620288ms
I0000 00:00:1778690220.604657    4335 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1778690220.604708    4335 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1778690220.604712    4335 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 2.706999ms
I0000 00:00:1778690220.604734    4335 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 2.733779ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:816
I0000 00:00:1778690220.605046    4335 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00
I0000 00:00:1778690220.605053    4335 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1778690220.605055    4335 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1778690220.605057    4335 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1778690220.605058    4335 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778690220.605091    4335 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1778690220.605125    4335 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1778690220.605127    4335 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 69.189us
I0000 00:00:1778690220.605129    4335 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1778690220.605132    4335 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1778690220.605135    4335 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1778690220.605139    4335 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1778690220.605168    4335 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1778690220.605172    4335 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1778690220.605177    4335 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1778690220.605182    4335 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1778690220.605185    4335 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1778690220.605186    4335 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 55.062us
I0000 00:00:1778690220.605188    4335 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1778690220.605192    4335 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1778690220.605467    4335 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1778690220.605474    4335 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1778690220.605670    4335 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1778690220.605679    4335 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 624.954us
I0000 00:00:1778690220.605690    4335 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 640.482us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00
I0000 00:00:1778690220.699308    4335 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: kernel_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00
I0000 00:00:1778690220.699338    4335 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: kernel_stats
I0000 00:00:1778690220.699342    4335 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778690220.699425    4335 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1778690220.699486    4335 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1778690220.699489    4335 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 147.635us
I0000 00:00:1778690220.699555    4335 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool kernel_stats: 222.313us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00
I0000 00:00:1778690220.829825    4335 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: framework_op_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00
I0000 00:00:1778690220.829857    4335 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: framework_op_stats
I0000 00:00:1778690220.829861    4335 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778690220.829921    4335 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1778690220.829981    4335 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1778690220.829983    4335 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 123.099us
I0000 00:00:1778690220.830129    4335 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool framework_op_stats: 275.893us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_gYiQ8l/plugins/profile/2026_05_13_16_37_00

╔================================================================================╗
║ SUMMARY                                                                        ║
╚================================================================================╝

AggregateProfilingResult(
    runtime = 0.00006594s,
    compile_time = 0.11342665s,  # time spent compiling by Reactant
)

On GPUs this would look something like the following:

julia
================================================================================
║ KERNEL STATISTICS                                                              ║
================================================================================

┌───────────────────┬─────────────┬────────────────┬──────────────┬──────────────┬──────────────┬──────────────┬───────────┬──────────┬────────────┬─────────────┐
│       Kernel Name │ Occurrences │ Total Duration │ Avg Duration │ Min Duration │ Max Duration │ Static Shmem │ Block Dim │ Grid Dim │ TensorCore │ Occupancy %
├───────────────────┼─────────────┼────────────────┼──────────────┼──────────────┼──────────────┼──────────────┼───────────┼──────────┼────────────┼─────────────┤
│ gemm_fusion_dot_1 │           10.00000250s │  0.00000250s │  0.00000250s │  0.00000250s │    2.000 KiB │    64,1,11,1,1 │          ✗ │      100.0%
│   loop_add_fusion │           10.00000131s │  0.00000131s │  0.00000131s │  0.00000131s │      0 bytes │    20,1,11,1,1 │          ✗ │       31.2%
└───────────────────┴─────────────┴────────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────┴──────────┴────────────┴─────────────┘

================================================================================
║ FRAMEWORK OP STATISTICS                                                        ║
================================================================================

┌───────────────────┬─────────┬─────────────┬─────────────┬─────────────────┬───────────────┬──────────┬───────────┬──────────────┬──────────┐
│         Operation │    Type │ Host/Device │ Occurrences │ Total Self-Time │ Avg Self-Time │ Device % │ Memory BW │    FLOP Rate │ Bound By │
├───────────────────┼─────────┼─────────────┼─────────────┼─────────────────┼───────────────┼──────────┼───────────┼──────────────┼──────────┤
│ gemm_fusion_dot.1 │ Unknown │      Device │           10.00000250s │   0.00000250s │   65.55%1.82 GB/s │  1.6 GFLOP/s │      HBM │
+/add │     add │      Device │           10.00000131s │   0.00000131s │   34.45%0.14 GB/s │ 0.05 GFLOP/s │      HBM │
└───────────────────┴─────────┴─────────────┴─────────────┴─────────────────┴───────────────┴──────────┴───────────┴──────────────┴──────────┘

================================================================================
║ SUMMARY                                                                        ║
================================================================================

AggregateProfilingResult(
    runtime = 0.00005622s, 
    compile_time = 2.32802137s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 81.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514564,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 81.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.00608052s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.033375207640664e-8,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.033375207640664e-8, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00005622s,  # Raw time in seconds
        RawFlopsRate = 7.220987105380169e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 7.220987105380169e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Capturing traces

When running Reactant, it is possible to capture traces using the XLA profiler. These traces can provide information about where the XLA specific parts of program spend time during compilation or execution. Note that tracing and compilation happen on the CPU even though the final execution is aimed to run on another device such as GPU or TPU. Therefore, including tracing and compilation in a trace will create annotations on the CPU.

Let's setup a simple function which we can then profile

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b
linear (generic function with 1 method)

The profiler can be accessed using the Reactant.with_profiler function.

julia
Reactant.with_profiler("./") do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end
10×2 ConcretePJRTArray{Float32,2}:
  6.98495    1.55487
  5.81096   -3.9968
 17.0579    -3.4817
 -1.61731   19.3494
 -4.37275  -10.4042
 -5.36059  -14.3759
 -3.0712     2.68685
  3.3971    -7.40288
 10.5107    11.2151
 11.305      2.70275

Running this function should create a folder called plugins in the folder provided to Reactant.with_profiler which will contain the trace files. The traces can then be visualized in different ways.

Note

For more insights about the current state of Reactant, it is possible to fetch device information about allocations using the Reactant.XLA.allocatorstats function.

Perfetto UI

The first and easiest way to visualize a captured trace is to use the online perfetto.dev tool. Reactant.with_profiler has a keyword parameter called create_perfetto_link which will create a usable perfetto URL for the generated trace. The function will block execution until the URL has been clicked and the trace is visualized. The URL only works once.

julia
Reactant.with_profiler("./"; create_perfetto_link=true) do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end

Note

It is recommended to use the Chrome browser to open the perfetto URL.

XProf

XProf is a complete web UI to analyze the log files captured by Reactant. It can be installed in the following manner:

bash
pip install xprof # or xprof-nightly

Launching xprof is then as simple as:

bash
xprof --logdir=./

which will then make the xprof interface available on port :8791 by default.

Tensorboard

Another option to visualize the generated trace files is to use the tensorboard profiler plugin. The tensorboard viewer can offer more details than the timeline view such as visualization for compute graphs.

First install tensorboard and its profiler plugin:

bash
pip install tensorboard tensorboard-plugin-profile

And then run the following in the folder where the plugins folder was generated:

bash
tensorboard --logdir ./

Adding Custom Annotations

By default, the traces contain only information captured from within XLA. The Reactant.Profiler.annotate function can be used to annotate traces for Julia code evaluated during tracing.

julia
Reactant.Profiler.annotate("my_annotation") do
    # Do things...
end

The added annotations will be captured in the traces and can be seen in the different viewers along with the default XLA annotations. When the profiler is not activated, then the custom annotations have no effect and can therefore always be activated.