Skip to content

Profiling

Quick profiling in your terminal

Note

This is only meant to be used for quick profiling or programmatically accessing the profiling results. For more detailed and GUI friendly profiling proceed to the next section.

Simply replace the use of Base.@time or Base.@timed with Reactant.Profiler.@time or Reactant.Profiler.@timed. We will automatically compile the function if it is not already a Reactant compiled function (with sync=true).

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b

Reactant.@time linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:626
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1782399063.298583    4221 profiler_session.cc:119] Profiler session initializing.
I0000 00:00:1782399063.298638    4221 profiler_session.cc:134] Profiler session started.
I0000 00:00:1782399063.298968    4221 profiler_session.cc:82] Profiler session collecting data.
I0000 00:00:1782399063.338268    4221 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d/plugins/profile/2026_06_25_14_51_03/runnervmmklqx.xplane.pb
I0000 00:00:1782399063.338890    4221 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d/plugins/profile/2026_06_25_14_51_03

I0000 00:00:1782399063.339008    4221 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d/plugins/profile/2026_06_25_14_51_03/runnervmmklqx.trace.json.gz
I0000 00:00:1782399063.339031    4221 profiler_session.cc:152] Profiler session tear down.
Debug: Starting XProf gRPC server...
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:598
Debug: Initializing XProf stubs for worker service at 0.0.0.0:34451
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:397
I0000 00:00:1782399063.354755    4221 stub_factory.cc:163] Created gRPC channel for address: 0.0.0.0:34451
Debug: Starting XProf gRPC server on port 34451
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:413
I0000 00:00:1782399063.355151    4221 grpc_server.cc:94] Server listening on 0.0.0.0:34451 with max_concurrent_requests 1
I0000 00:00:1782399063.364520    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d/plugins/profile/2026_06_25_14_51_03
I0000 00:00:1782399063.364538    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1782399063.364541    4221 memory_profile_processor.cc:47] Processing memory profile for host: runnervmmklqx
I0000 00:00:1782399063.365052    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 519.844us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d/plugins/profile/2026_06_25_14_51_03
I0000 00:00:1782399063.378680    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d/plugins/profile/2026_06_25_14_51_03
I0000 00:00:1782399063.378699    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1782399063.378703    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782399063.378756    4221 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1782399063.378759    4221 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1782399063.378764    4221 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1782399063.379090    4221 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1782399063.379100    4221 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782399063.379360    4221 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1782399063.379670    4221 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1782399063.454191    4221 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1782399063.460983    4221 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782399063.461211    4221 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1782399063.462044    4221 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1782399063.569276    4221 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1782399063.574991    4221 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782399063.575254    4221 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1782399063.575946    4221 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1782399063.575959    4221 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1782399063.591574    4221 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 242 bytes (0.000230789 MiB).
I0000 00:00:1782399063.614391    4221 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1782399063.614426    4221 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 235.669721ms
I0000 00:00:1782399063.614437    4221 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1782399063.614441    4221 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 2.435us
I0000 00:00:1782399063.614444    4221 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1782399063.614450    4221 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 5.03us
I0000 00:00:1782399063.614452    4221 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 235.693646ms
I0000 00:00:1782399063.614455    4221 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1782399063.614571    4221 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1782399063.614574    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 235.87244ms
I0000 00:00:1782399063.614729    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 236.034925ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d/plugins/profile/2026_06_25_14_51_03
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:816
I0000 00:00:1782399063.881100    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d/plugins/profile/2026_06_25_14_51_03
I0000 00:00:1782399063.881122    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1782399063.881125    4221 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1782399063.881127    4221 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1782399063.881129    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782399063.881195    4221 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1782399063.881268    4221 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1782399063.881272    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 143.088us
I0000 00:00:1782399063.881276    4221 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1782399063.881279    4221 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1782399063.881293    4221 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1782399063.881298    4221 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1782399063.881466    4221 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1782399063.881474    4221 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1782399063.881585    4221 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1782399063.881597    4221 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1782399063.881599    4221 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1782399063.881601    4221 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 322.805us
I0000 00:00:1782399063.881603    4221 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1782399063.881611    4221 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1782399063.882994    4221 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1782399063.883000    4221 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1782399063.883286    4221 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1782399063.883296    4221 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 2.171858ms
I0000 00:00:1782399063.883309    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 2.191976ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_T4KX9d/plugins/profile/2026_06_25_14_51_03
  runtime: 0.00027730s
  compile time: 3.93456624s
julia
Reactant.@timed nrepeat=100 linear(x, W, b)
AggregateProfilingResult(
    runtime = 0.00004246s, 
    compile_time = 0.11800471s, )

Note that the information returned depends on the backend. Specifically CUDA and TPU backends provide more detailed information regarding memory usage and allocation (something like the following will be displayed on GPUs):

julia
AggregateProfilingResult(
    runtime = 0.00003829s, 
    compile_time = 2.18053260s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 30.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514931,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 30.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.04975365s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.8369974648038653e-9,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.8369974648038653e-9, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00040298422s,  # Raw time in seconds
        RawFlopsRate = 1.0074836180930361e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 1.0074836180930361e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Additionally for GPUs and TPUs, we can use the Reactant.@profile macro to profile the function and get information regarding each of the kernels executed.

julia
Reactant.@profile linear(x, W, b)
Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:626
I0000 00:00:1782399064.877558    4221 profiler_session.cc:119] Profiler session initializing.
I0000 00:00:1782399064.877664    4221 profiler_session.cc:134] Profiler session started.
I0000 00:00:1782399064.877758    4221 profiler_session.cc:82] Profiler session collecting data.
I0000 00:00:1782399064.912414    4221 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04/runnervmmklqx.xplane.pb
I0000 00:00:1782399064.913072    4221 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04

I0000 00:00:1782399064.913190    4221 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04/runnervmmklqx.trace.json.gz
I0000 00:00:1782399064.913215    4221 profiler_session.cc:152] Profiler session tear down.
I0000 00:00:1782399064.913309    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04
I0000 00:00:1782399064.913315    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1782399064.913318    4221 memory_profile_processor.cc:47] Processing memory profile for host: runnervmmklqx
I0000 00:00:1782399064.913624    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 314.239us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04
I0000 00:00:1782399064.913652    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04
I0000 00:00:1782399064.913656    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1782399064.913658    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782399064.913686    4221 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1782399064.913688    4221 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1782399064.913692    4221 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1782399064.913926    4221 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1782399064.913934    4221 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782399064.914149    4221 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1782399064.914539    4221 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1782399064.978894    4221 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1782399064.984419    4221 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782399064.984644    4221 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1782399064.984943    4221 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1782399065.089945    4221 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1782399065.095968    4221 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1782399065.096207    4221 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1782399065.096540    4221 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1782399065.096549    4221 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1782399065.112150    4221 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 263 bytes (0.000250816 MiB).
I0000 00:00:1782399065.134941    4221 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1782399065.135599    4221 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 221.914017ms
I0000 00:00:1782399065.135608    4221 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1782399065.135612    4221 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 2.434us
I0000 00:00:1782399065.135614    4221 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1782399065.135620    4221 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 5.119us
I0000 00:00:1782399065.135622    4221 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 221.934325ms
I0000 00:00:1782399065.135626    4221 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1782399065.135745    4221 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1782399065.135748    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 222.090567ms
I0000 00:00:1782399065.135783    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 222.130313ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04
Debug: `op_profile` data missing keys for metrics
  data_available_keys =
   KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
     "byProgram"
     "deviceType"
     "byProgramExcludeIdle"
     "aggDvfsTimeScaleMultiplier"
  by_program_available_keys =
   KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
     "name"
     "children"
     "numChildren"
@ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:816
I0000 00:00:1782399065.136218    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04
I0000 00:00:1782399065.136248    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1782399065.136251    4221 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1782399065.136253    4221 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1782399065.136254    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782399065.136292    4221 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1782399065.136333    4221 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1782399065.136335    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 81.302us
I0000 00:00:1782399065.136338    4221 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1782399065.136340    4221 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1782399065.136345    4221 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1782399065.136349    4221 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1782399065.136387    4221 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1782399065.136391    4221 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1782399065.136396    4221 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1782399065.136401    4221 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1782399065.136403    4221 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1782399065.136405    4221 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 65.172us
I0000 00:00:1782399065.136407    4221 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1782399065.136414    4221 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1782399065.137131    4221 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1782399065.137138    4221 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1782399065.137386    4221 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1782399065.137394    4221 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 1.143842ms
I0000 00:00:1782399065.137405    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 1.161245ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04
I0000 00:00:1782399065.227118    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: kernel_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04
I0000 00:00:1782399065.227145    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: kernel_stats
I0000 00:00:1782399065.227149    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782399065.227211    4221 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1782399065.227289    4221 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1782399065.227293    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 144.471us
I0000 00:00:1782399065.227354    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool kernel_stats: 213.49us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04
I0000 00:00:1782399065.359270    4221 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: framework_op_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04
I0000 00:00:1782399065.359298    4221 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: framework_op_stats
I0000 00:00:1782399065.359302    4221 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1782399065.359361    4221 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1782399065.359409    4221 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1782399065.359411    4221 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 110.176us
I0000 00:00:1782399065.359558    4221 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool framework_op_stats: 263.504us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_moHGCP/plugins/profile/2026_06_25_14_51_04

╔================================================================================╗
║ SUMMARY                                                                        ║
╚================================================================================╝

AggregateProfilingResult(
    runtime = 0.00006901s,
    compile_time = 0.11404040s,  # time spent compiling by Reactant
)

On GPUs this would look something like the following:

julia
================================================================================
║ KERNEL STATISTICS                                                              ║
================================================================================

┌───────────────────┬─────────────┬────────────────┬──────────────┬──────────────┬──────────────┬──────────────┬───────────┬──────────┬────────────┬─────────────┐
│       Kernel Name │ Occurrences │ Total Duration │ Avg Duration │ Min Duration │ Max Duration │ Static Shmem │ Block Dim │ Grid Dim │ TensorCore │ Occupancy %
├───────────────────┼─────────────┼────────────────┼──────────────┼──────────────┼──────────────┼──────────────┼───────────┼──────────┼────────────┼─────────────┤
│ gemm_fusion_dot_1 │           10.00000250s │  0.00000250s │  0.00000250s │  0.00000250s │    2.000 KiB │    64,1,11,1,1 │          ✗ │      100.0%
│   loop_add_fusion │           10.00000131s │  0.00000131s │  0.00000131s │  0.00000131s │      0 bytes │    20,1,11,1,1 │          ✗ │       31.2%
└───────────────────┴─────────────┴────────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────┴──────────┴────────────┴─────────────┘

================================================================================
║ FRAMEWORK OP STATISTICS                                                        ║
================================================================================

┌───────────────────┬─────────┬─────────────┬─────────────┬─────────────────┬───────────────┬──────────┬───────────┬──────────────┬──────────┐
│         Operation │    Type │ Host/Device │ Occurrences │ Total Self-Time │ Avg Self-Time │ Device % │ Memory BW │    FLOP Rate │ Bound By │
├───────────────────┼─────────┼─────────────┼─────────────┼─────────────────┼───────────────┼──────────┼───────────┼──────────────┼──────────┤
│ gemm_fusion_dot.1 │ Unknown │      Device │           10.00000250s │   0.00000250s │   65.55%1.82 GB/s │  1.6 GFLOP/s │      HBM │
+/add │     add │      Device │           10.00000131s │   0.00000131s │   34.45%0.14 GB/s │ 0.05 GFLOP/s │      HBM │
└───────────────────┴─────────┴─────────────┴─────────────┴─────────────────┴───────────────┴──────────┴───────────┴──────────────┴──────────┘

================================================================================
║ SUMMARY                                                                        ║
================================================================================

AggregateProfilingResult(
    runtime = 0.00005622s, 
    compile_time = 2.32802137s,  # time spent compiling by Reactant
    GPU_0_bfc = MemoryProfileSummary(
        peak_bytes_usage_lifetime = 64.010 MiB,  # peak memory usage over the entire program (lifetime of memory allocator)
        peak_stats = MemoryAggregationStats(
            stack_reserved_bytes = 0 bytes,  # memory usage by stack reservation
            heap_allocated_bytes = 81.750 KiB,  # memory usage by heap allocation
            free_memory_bytes = 23.518 GiB,  # free memory available for allocation or reservation
            fragmentation = 0.514564,  # fragmentation of memory within [0, 1]
            peak_bytes_in_use = 81.750 KiB # The peak memory usage over the entire program
        )
        peak_stats_time = 0.00608052s, 
        memory_capacity = 23.518 GiB # memory capacity of the allocator
    )
    flops = FlopsSummary(
        Flops = 2.033375207640664e-8,  # [flops / (peak flops * program time)], capped at 1.0
        UncappedFlops = 2.033375207640664e-8, 
        RawFlops = 4060.0,  # Total FLOPs performed
        BF16Flops = 4060.0,  # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
        RawTime = 0.00005622s,  # Raw time in seconds
        RawFlopsRate = 7.220987105380169e7,  # Raw FLOPs rate in FLOPs/seconds
        BF16FlopsRate = 7.220987105380169e7,  # BF16 FLOPs rate in FLOPs/seconds
    )
)

Capturing traces

When running Reactant, it is possible to capture traces using the XLA profiler. These traces can provide information about where the XLA specific parts of program spend time during compilation or execution. Note that tracing and compilation happen on the CPU even though the final execution is aimed to run on another device such as GPU or TPU. Therefore, including tracing and compilation in a trace will create annotations on the CPU.

Let's setup a simple function which we can then profile

julia
using Reactant

x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))

linear(x, W, b) = (W * x) .+ b
linear (generic function with 1 method)

The profiler can be accessed using the Reactant.with_profiler function.

julia
Reactant.with_profiler("./") do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end
10×2 ConcretePJRTArray{Float32,2}:
   3.66858    5.83411
  -4.24557    0.437079
 -25.7257   -18.1759
  10.8542     5.91559
 -12.9439     5.45929
  -5.81558    6.03157
   7.89149  -12.8815
  24.3073    15.4735
  -6.70293   -6.62602
 -19.9167     3.88494

Running this function should create a folder called plugins in the folder provided to Reactant.with_profiler which will contain the trace files. The traces can then be visualized in different ways.

Note

For more insights about the current state of Reactant, it is possible to fetch device information about allocations using the Reactant.XLA.allocatorstats function.

Perfetto UI

The first and easiest way to visualize a captured trace is to use the online perfetto.dev tool. Reactant.with_profiler has a keyword parameter called create_perfetto_link which will create a usable perfetto URL for the generated trace. The function will block execution until the URL has been clicked and the trace is visualized. The URL only works once.

julia
Reactant.with_profiler("./"; create_perfetto_link=true) do
    mylinear = Reactant.@compile linear(x, W, b)
    mylinear(x, W, b)
end

Note

It is recommended to use the Chrome browser to open the perfetto URL.

XProf

XProf is a complete web UI to analyze the log files captured by Reactant. It can be installed in the following manner:

bash
pip install xprof # or xprof-nightly

Launching xprof is then as simple as:

bash
xprof --logdir=./

which will then make the xprof interface available on port :8791 by default.

Tensorboard

Another option to visualize the generated trace files is to use the tensorboard profiler plugin. The tensorboard viewer can offer more details than the timeline view such as visualization for compute graphs.

First install tensorboard and its profiler plugin:

bash
pip install tensorboard tensorboard-plugin-profile

And then run the following in the folder where the plugins folder was generated:

bash
tensorboard --logdir ./

Adding Custom Annotations

By default, the traces contain only information captured from within XLA. The Reactant.Profiler.annotate function can be used to annotate traces for Julia code evaluated during tracing.

julia
Reactant.Profiler.annotate("my_annotation") do
    # Do things...
end

The added annotations will be captured in the traces and can be seen in the different viewers along with the default XLA annotations. When the profiler is not activated, then the custom annotations have no effect and can therefore always be activated.