Skip to content

Profiling

Quick profiling in your terminal

Note

This is only meant to be used for quick profiling or programmatically accessing the profiling results. For more detailed and GUI friendly profiling proceed to the next section.

Simply replace the use of Base.@time or Base.@timed with Reactant.Profiler.@time or Reactant.Profiler.@timed. We will automatically compile the function if it is not already a Reactant compiled function (with sync=true).

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b

Reactant.@time linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:516
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1773514758.298973    4155 profiler_session.cc:117] Profiler session initializing.
I0000 00:00:1773514758.299013    4155 profiler_session.cc:132] Profiler session started.
I0000 00:00:1773514758.299328    4155 profiler_session.cc:81] Profiler session collecting data.
I0000 00:00:1773514758.300041    4155 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn/plugins/profile/2026_03_14_18_59_18/runnervm46oaq.xplane.pb
I0000 00:00:1773514758.300189    4155 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn/plugins/profile/2026_03_14_18_59_18

I0000 00:00:1773514758.300314    4155 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn/plugins/profile/2026_03_14_18_59_18/runnervm46oaq.trace.json.gz
I0000 00:00:1773514758.300331    4155 profiler_session.cc:150] Profiler session tear down.
Debug: Starting XProf gRPC server...
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:490
Debug: Initializing XProf stubs for worker service at 0.0.0.0:46853
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:289
I0000 00:00:1773514758.315209    4155 stub_factory.cc:163] Created gRPC channel for address: 0.0.0.0:46853
Debug: Starting XProf gRPC server on port 46853
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:305
I0000 00:00:1773514758.315613    4155 grpc_server.cc:94] Server listening on 0.0.0.0:46853 with max_concurrent_requests 1
I0000 00:00:1773514758.324608    4155 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn/plugins/profile/2026_03_14_18_59_18
I0000 00:00:1773514758.324625    4155 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1773514758.324628    4155 memory_profile_processor.cc:47] Processing memory profile for host: runnervm46oaq
I0000 00:00:1773514758.324957    4155 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 337.775us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn/plugins/profile/2026_03_14_18_59_18
I0000 00:00:1773514758.338057    4155 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn/plugins/profile/2026_03_14_18_59_18
I0000 00:00:1773514758.338077    4155 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1773514758.338081    4155 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773514758.338135    4155 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1773514758.338138    4155 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1773514758.338144    4155 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1773514758.338273    4155 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1773514758.338282    4155 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773514758.338523    4155 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1773514758.338874    4155 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1773514758.339463    4155 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1773514758.339484    4155 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773514758.339667    4155 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1773514758.339942    4155 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1773514758.342670    4155 xplane_to_op_stats.cc:402] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1773514758.342696    4155 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773514758.342855    4155 xplane_to_op_stats.cc:458] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1773514758.343123    4155 xplane_to_op_stats.cc:414] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1773514758.343133    4155 xplane_to_op_stats.cc:419] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1773514758.343381    4155 xplane_to_op_stats.cc:684] ConvertXSpaceToOpStats: Final OpStats size: 325 bytes (0.000309944 MiB).
I0000 00:00:1773514758.343486    4155 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1773514758.343521    4155 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 5.385355ms
I0000 00:00:1773514758.343529    4155 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1773514758.343532    4155 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.533us
I0000 00:00:1773514758.343535    4155 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1773514758.343540    4155 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 4.208us
I0000 00:00:1773514758.343543    4155 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 5.404551ms
I0000 00:00:1773514758.343546    4155 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1773514758.343631    4155 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1773514758.343662    4155 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 5.582494ms
I0000 00:00:1773514758.343772    4155 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 5.699725ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn/plugins/profile/2026_03_14_18_59_18
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:706
I0000 00:00:1773514758.612258    4155 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn/plugins/profile/2026_03_14_18_59_18
I0000 00:00:1773514758.612281    4155 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1773514758.612284    4155 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1773514758.612287    4155 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1773514758.612290    4155 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773514758.612350    4155 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1773514758.612405    4155 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1773514758.612409    4155 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 119.494us
I0000 00:00:1773514758.612413    4155 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1773514758.612416    4155 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1773514758.612422    4155 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1773514758.612427    4155 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1773514758.612582    4155 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1773514758.612590    4155 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1773514758.612704    4155 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1773514758.612715    4155 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1773514758.612718    4155 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1773514758.612720    4155 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 304.491us
I0000 00:00:1773514758.612723    4155 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1773514758.612731    4155 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1773514758.613767    4155 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1773514758.613775    4155 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1773514758.613984    4155 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1773514758.613992    4155 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 1.708349ms
I0000 00:00:1773514758.614004    4155 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 1.727574ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_wb50Bn/plugins/profile/2026_03_14_18_59_18
  runtime: 0.00025917s
  compile time: 3.54523885s
julia
Reactant.@timed nrepeat=100 linear(x, W, b)
AggregateProfilingResult(
    runtime = 0.00003582s, 
    compile_time = 0.12952426s, )

Note that the information returned depends on the backend. Specifically CUDA and TPU backends provide more detailed information regarding memory usage and allocation (something like the following will be displayed on GPUs):

julia
AggregateProfilingResult(
    runtime = 0.00003829s, 
    compile_time = 2.18053260s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 30.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514931,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 30.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.04975365s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.8369974648038653e-9,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.8369974648038653e-9, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00040298422s,  # Raw time in seconds
        RawFlopsRate = 1.0074836180930361e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 1.0074836180930361e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Additionally for GPUs and TPUs, we can use the Reactant.@profile macro to profile the function and get information regarding each of the kernels executed.

julia
Reactant.@profile linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:516
I0000 00:00:1773514759.389104    4155 profiler_session.cc:117] Profiler session initializing.
I0000 00:00:1773514759.389171    4155 profiler_session.cc:132] Profiler session started.
I0000 00:00:1773514759.389275    4155 profiler_session.cc:81] Profiler session collecting data.
I0000 00:00:1773514759.389776    4155 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19/runnervm46oaq.xplane.pb
I0000 00:00:1773514759.389932    4155 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19

I0000 00:00:1773514759.390046    4155 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19/runnervm46oaq.trace.json.gz
I0000 00:00:1773514759.390063    4155 profiler_session.cc:150] Profiler session tear down.
I0000 00:00:1773514759.390122    4155 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19
I0000 00:00:1773514759.390128    4155 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1773514759.390131    4155 memory_profile_processor.cc:47] Processing memory profile for host: runnervm46oaq
I0000 00:00:1773514759.390269    4155 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 144.952us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19
I0000 00:00:1773514759.390291    4155 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19
I0000 00:00:1773514759.390295    4155 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1773514759.390298    4155 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773514759.390328    4155 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1773514759.390330    4155 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1773514759.390334    4155 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1773514759.390405    4155 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1773514759.390413    4155 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773514759.390618    4155 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1773514759.390936    4155 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1773514759.391384    4155 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1773514759.391406    4155 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773514759.391568    4155 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1773514759.391847    4155 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1773514759.392335    4155 xplane_to_op_stats.cc:402] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1773514759.392357    4155 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1773514759.392522    4155 xplane_to_op_stats.cc:458] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1773514759.392802    4155 xplane_to_op_stats.cc:414] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1773514759.392812    4155 xplane_to_op_stats.cc:419] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1773514759.392966    4155 xplane_to_op_stats.cc:684] ConvertXSpaceToOpStats: Final OpStats size: 325 bytes (0.000309944 MiB).
I0000 00:00:1773514759.393046    4155 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1773514759.393082    4155 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 2.75378ms
I0000 00:00:1773514759.393087    4155 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1773514759.393091    4155 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.783us
I0000 00:00:1773514759.393093    4155 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1773514759.393099    4155 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 4.428us
I0000 00:00:1773514759.393101    4155 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 2.771163ms
I0000 00:00:1773514759.393105    4155 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1773514759.393161    4155 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1773514759.393166    4155 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 2.868666ms
I0000 00:00:1773514759.393186    4155 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 2.893773ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:706
I0000 00:00:1773514759.393509    4155 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19
I0000 00:00:1773514759.393516    4155 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1773514759.393519    4155 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1773514759.393521    4155 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1773514759.393523    4155 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773514759.393559    4155 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1773514759.393597    4155 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1773514759.393599    4155 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 76.984us
I0000 00:00:1773514759.393602    4155 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1773514759.393605    4155 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1773514759.393609    4155 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1773514759.393613    4155 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1773514759.393670    4155 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1773514759.393678    4155 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1773514759.393683    4155 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1773514759.393689    4155 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1773514759.393691    4155 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1773514759.393693    4155 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 89.227us
I0000 00:00:1773514759.393696    4155 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1773514759.393701    4155 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1773514759.393952    4155 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1773514759.393960    4155 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1773514759.394147    4155 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1773514759.394155    4155 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 637.007us
I0000 00:00:1773514759.394166    4155 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 652.656us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19
I0000 00:00:1773514759.488736    4155 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: kernel_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19
I0000 00:00:1773514759.488769    4155 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: kernel_stats
I0000 00:00:1773514759.488773    4155 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773514759.488836    4155 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1773514759.488896    4155 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1773514759.488899    4155 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 126.597us
I0000 00:00:1773514759.488957    4155 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool kernel_stats: 192.882us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19
I0000 00:00:1773514759.623024    4155 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: framework_op_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19
I0000 00:00:1773514759.623056    4155 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: framework_op_stats
I0000 00:00:1773514759.623060    4155 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1773514759.623123    4155 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1773514759.623176    4155 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1773514759.623179    4155 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 119.905us
I0000 00:00:1773514759.623310    4155 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool framework_op_stats: 257.774us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_mCW8wx/plugins/profile/2026_03_14_18_59_19

╔================================================================================╗
║ SUMMARY                                                                        ║
╚================================================================================╝

AggregateProfilingResult(
    runtime = 0.00007969s,
    compile_time = 0.12356744s,  # time spent compiling by Reactant
)

On GPUs this would look something like the following:

julia
================================================================================
║ KERNEL STATISTICS                                                              ║
================================================================================

┌───────────────────┬─────────────┬────────────────┬──────────────┬──────────────┬──────────────┬──────────────┬───────────┬──────────┬────────────┬─────────────┐
│       Kernel Name │ Occurrences │ Total Duration │ Avg Duration │ Min Duration │ Max Duration │ Static Shmem │ Block Dim │ Grid Dim │ TensorCore │ Occupancy %
├───────────────────┼─────────────┼────────────────┼──────────────┼──────────────┼──────────────┼──────────────┼───────────┼──────────┼────────────┼─────────────┤
│ gemm_fusion_dot_1 │           10.00000250s │  0.00000250s │  0.00000250s │  0.00000250s │    2.000 KiB │    64,1,11,1,1 │          ✗ │      100.0%
│   loop_add_fusion │           10.00000131s │  0.00000131s │  0.00000131s │  0.00000131s │      0 bytes │    20,1,11,1,1 │          ✗ │       31.2%
└───────────────────┴─────────────┴────────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────┴──────────┴────────────┴─────────────┘

================================================================================
║ FRAMEWORK OP STATISTICS                                                        ║
================================================================================

┌───────────────────┬─────────┬─────────────┬─────────────┬─────────────────┬───────────────┬──────────┬───────────┬──────────────┬──────────┐
│         Operation │    Type │ Host/Device │ Occurrences │ Total Self-Time │ Avg Self-Time │ Device % │ Memory BW │    FLOP Rate │ Bound By │
├───────────────────┼─────────┼─────────────┼─────────────┼─────────────────┼───────────────┼──────────┼───────────┼──────────────┼──────────┤
│ gemm_fusion_dot.1 │ Unknown │      Device │           10.00000250s │   0.00000250s │   65.55%1.82 GB/s │  1.6 GFLOP/s │      HBM │
+/add │     add │      Device │           10.00000131s │   0.00000131s │   34.45%0.14 GB/s │ 0.05 GFLOP/s │      HBM │
└───────────────────┴─────────┴─────────────┴─────────────┴─────────────────┴───────────────┴──────────┴───────────┴──────────────┴──────────┘

================================================================================
║ SUMMARY                                                                        ║
================================================================================

AggregateProfilingResult(
    runtime = 0.00005622s, 
    compile_time = 2.32802137s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 81.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514564,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 81.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.00608052s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.033375207640664e-8,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.033375207640664e-8, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00005622s,  # Raw time in seconds
        RawFlopsRate = 7.220987105380169e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 7.220987105380169e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Capturing traces

When running Reactant, it is possible to capture traces using the XLA profiler. These traces can provide information about where the XLA specific parts of program spend time during compilation or execution. Note that tracing and compilation happen on the CPU even though the final execution is aimed to run on another device such as GPU or TPU. Therefore, including tracing and compilation in a trace will create annotations on the CPU.

Let's setup a simple function which we can then profile

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b
linear (generic function with 1 method)

The profiler can be accessed using the Reactant.with_profiler function.

julia
Reactant.with_profiler("./") do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end
10×2 ConcretePJRTArray{Float32,2}:
 -8.49854      9.93885
 -0.0141741   -2.95104
 11.7781     -14.7689
  4.64157     -0.528241
 -5.0893      -9.0915
 -3.20722     -6.17705
 21.2741       1.52629
 -1.0745      -3.74397
  1.11373      1.5709
  1.46015      2.7103

Running this function should create a folder called plugins in the folder provided to Reactant.with_profiler which will contain the trace files. The traces can then be visualized in different ways.

Note

For more insights about the current state of Reactant, it is possible to fetch device information about allocations using the Reactant.XLA.allocatorstats function.

Perfetto UI

The first and easiest way to visualize a captured trace is to use the online perfetto.dev tool. Reactant.with_profiler has a keyword parameter called create_perfetto_link which will create a usable perfetto URL for the generated trace. The function will block execution until the URL has been clicked and the trace is visualized. The URL only works once.

julia
Reactant.with_profiler("./"; create_perfetto_link=true) do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end

Note

It is recommended to use the Chrome browser to open the perfetto URL.

Tensorboard

Another option to visualize the generated trace files is to use the tensorboard profiler plugin. The tensorboard viewer can offer more details than the timeline view such as visualization for compute graphs.

First install tensorboard and its profiler plugin:

bash
pip install tensorboard tensorboard-plugin-profile

And then run the following in the folder where the plugins folder was generated:

bash
tensorboard --logdir ./

Adding Custom Annotations

By default, the traces contain only information captured from within XLA. The Reactant.Profiler.annotate function can be used to annotate traces for Julia code evaluated during tracing.

julia
Reactant.Profiler.annotate("my_annotation") do
    # Do things...
end

The added annotations will be captured in the traces and can be seen in the different viewers along with the default XLA annotations. When the profiler is not activated, then the custom annotations have no effect and can therefore always be activated.