Profiling
Quick profiling in your terminal
Note
This is only meant to be used for quick profiling or programmatically accessing the profiling results. For more detailed and GUI friendly profiling proceed to the next section.
Simply replace the use of Base.@time or Base.@timed with Reactant.Profiler.@time or Reactant.Profiler.@timed. We will automatically compile the function if it is not already a Reactant compiled function (with sync=true).
using Reactant
x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))
linear(x, W, b) = (W * x) .+ b
Reactant.@time linear(x, W, b)┌ Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:626
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1778468338.905903 4099 profiler_session.cc:119] Profiler session initializing.
I0000 00:00:1778468338.905950 4099 profiler_session.cc:134] Profiler session started.
I0000 00:00:1778468338.906235 4099 profiler_session.cc:82] Profiler session collecting data.
I0000 00:00:1778468338.906808 4099 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT/plugins/profile/2026_05_11_02_58_58/runnervmeorf1.xplane.pb
I0000 00:00:1778468338.906939 4099 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT/plugins/profile/2026_05_11_02_58_58
I0000 00:00:1778468338.907029 4099 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT/plugins/profile/2026_05_11_02_58_58/runnervmeorf1.trace.json.gz
I0000 00:00:1778468338.907042 4099 profiler_session.cc:152] Profiler session tear down.
┌ Debug: Starting XProf gRPC server...
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:598
┌ Debug: Initializing XProf stubs for worker service at 0.0.0.0:44215
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:397
I0000 00:00:1778468338.921332 4099 stub_factory.cc:163] Created gRPC channel for address: 0.0.0.0:44215
┌ Debug: Starting XProf gRPC server on port 44215
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:413
I0000 00:00:1778468338.921676 4099 grpc_server.cc:94] Server listening on 0.0.0.0:44215 with max_concurrent_requests 1
I0000 00:00:1778468338.930151 4099 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT/plugins/profile/2026_05_11_02_58_58
I0000 00:00:1778468338.930173 4099 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1778468338.930176 4099 memory_profile_processor.cc:47] Processing memory profile for host: runnervmeorf1
I0000 00:00:1778468338.930507 4099 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 339.152us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT/plugins/profile/2026_05_11_02_58_58
I0000 00:00:1778468338.942757 4099 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT/plugins/profile/2026_05_11_02_58_58
I0000 00:00:1778468338.942782 4099 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1778468338.942786 4099 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778468338.942846 4099 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1778468338.942848 4099 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1778468338.942853 4099 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1778468338.942999 4099 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1778468338.943006 4099 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778468338.943225 4099 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1778468338.943423 4099 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1778468338.943742 4099 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1778468338.943753 4099 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778468338.943868 4099 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1778468338.944091 4099 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1778468338.946486 4099 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1778468338.946501 4099 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778468338.946618 4099 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1778468338.946875 4099 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1778468338.946883 4099 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1778468338.947090 4099 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 221 bytes (0.000210762 MiB).
I0000 00:00:1778468338.947150 4099 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1778468338.947162 4099 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 4.315788ms
I0000 00:00:1778468338.947169 4099 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1778468338.947172 4099 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.503us
I0000 00:00:1778468338.947174 4099 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1778468338.947179 4099 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 3.345us
I0000 00:00:1778468338.947180 4099 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 4.332593ms
I0000 00:00:1778468338.947183 4099 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1778468338.947294 4099 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1778468338.947297 4099 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 4.511868ms
I0000 00:00:1778468338.947397 4099 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 4.619328ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT/plugins/profile/2026_05_11_02_58_58
┌ Debug: `op_profile` data missing keys for metrics
│ data_available_keys =
│ KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
│ "byProgram"
│ "deviceType"
│ "byProgramExcludeIdle"
│ "aggDvfsTimeScaleMultiplier"
│ by_program_available_keys =
│ KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
│ "name"
│ "children"
│ "numChildren"
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:816
I0000 00:00:1778468339.182286 4099 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT/plugins/profile/2026_05_11_02_58_58
I0000 00:00:1778468339.182310 4099 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1778468339.182313 4099 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1778468339.182316 4099 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1778468339.182318 4099 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778468339.182379 4099 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1778468339.182429 4099 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1778468339.182431 4099 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 113.337us
I0000 00:00:1778468339.182435 4099 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1778468339.182445 4099 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1778468339.182452 4099 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1778468339.182457 4099 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1778468339.182621 4099 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1778468339.182627 4099 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1778468339.182718 4099 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1778468339.182731 4099 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1778468339.182733 4099 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1778468339.182735 4099 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 290.52us
I0000 00:00:1778468339.182737 4099 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1778468339.182744 4099 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1778468339.183119 4099 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1778468339.183126 4099 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1778468339.183353 4099 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1778468339.183360 4099 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 1.047922ms
I0000 00:00:1778468339.183373 4099 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 1.069073ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_7gwJyT/plugins/profile/2026_05_11_02_58_58
runtime: 0.00023508s
compile time: 3.04170463sReactant.@timed nrepeat=100 linear(x, W, b)AggregateProfilingResult(
runtime = 0.00001558s,
compile_time = 0.10624061s, )Note that the information returned depends on the backend. Specifically CUDA and TPU backends provide more detailed information regarding memory usage and allocation (something like the following will be displayed on GPUs):
AggregateProfilingResult(
runtime = 0.00003829s,
compile_time = 2.18053260s, # time spent compiling by Reactant
GPU_0_bfc = MemoryProfileSummary(
peak_bytes_usage_lifetime = 64.010 MiB, # peak memory usage over the entire program (lifetime of memory allocator)
peak_stats = MemoryAggregationStats(
stack_reserved_bytes = 0 bytes, # memory usage by stack reservation
heap_allocated_bytes = 30.750 KiB, # memory usage by heap allocation
free_memory_bytes = 23.518 GiB, # free memory available for allocation or reservation
fragmentation = 0.514931, # fragmentation of memory within [0, 1]
peak_bytes_in_use = 30.750 KiB # The peak memory usage over the entire program
)
peak_stats_time = 0.04975365s,
memory_capacity = 23.518 GiB # memory capacity of the allocator
)
flops = FlopsSummary(
Flops = 2.8369974648038653e-9, # [flops / (peak flops * program time)], capped at 1.0
UncappedFlops = 2.8369974648038653e-9,
RawFlops = 4060.0, # Total FLOPs performed
BF16Flops = 4060.0, # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
RawTime = 0.00040298422s, # Raw time in seconds
RawFlopsRate = 1.0074836180930361e7, # Raw FLOPs rate in FLOPs/seconds
BF16FlopsRate = 1.0074836180930361e7, # BF16 FLOPs rate in FLOPs/seconds
)
)Additionally for GPUs and TPUs, we can use the Reactant.@profile macro to profile the function and get information regarding each of the kernels executed.
Reactant.@profile linear(x, W, b)┌ Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:626
I0000 00:00:1778468339.784875 4099 profiler_session.cc:119] Profiler session initializing.
I0000 00:00:1778468339.785006 4099 profiler_session.cc:134] Profiler session started.
I0000 00:00:1778468339.785082 4099 profiler_session.cc:82] Profiler session collecting data.
I0000 00:00:1778468339.785535 4099 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59/runnervmeorf1.xplane.pb
I0000 00:00:1778468339.785669 4099 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
I0000 00:00:1778468339.785761 4099 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59/runnervmeorf1.trace.json.gz
I0000 00:00:1778468339.785772 4099 profiler_session.cc:152] Profiler session tear down.
I0000 00:00:1778468339.785826 4099 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
I0000 00:00:1778468339.785830 4099 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1778468339.785832 4099 memory_profile_processor.cc:47] Processing memory profile for host: runnervmeorf1
I0000 00:00:1778468339.785959 4099 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 132.486us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
I0000 00:00:1778468339.785979 4099 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
I0000 00:00:1778468339.785982 4099 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1778468339.785985 4099 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778468339.786006 4099 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1778468339.786007 4099 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1778468339.786010 4099 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1778468339.786064 4099 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1778468339.786073 4099 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778468339.786240 4099 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1778468339.786459 4099 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1778468339.786827 4099 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1778468339.786845 4099 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778468339.786959 4099 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1778468339.787145 4099 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1778468339.787540 4099 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1778468339.787561 4099 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1778468339.787675 4099 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1778468339.787853 4099 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1778468339.787859 4099 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1778468339.788004 4099 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 265 bytes (0.000252724 MiB).
I0000 00:00:1778468339.788065 4099 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1778468339.788091 4099 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 2.085406ms
I0000 00:00:1778468339.788096 4099 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1778468339.788098 4099 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.742us
I0000 00:00:1778468339.788100 4099 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1778468339.788105 4099 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 3.074us
I0000 00:00:1778468339.788106 4099 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 2.099377ms
I0000 00:00:1778468339.788109 4099 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1778468339.788160 4099 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1778468339.788162 4099 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 2.177493ms
I0000 00:00:1778468339.788183 4099 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 2.203121ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
┌ Debug: `op_profile` data missing keys for metrics
│ data_available_keys =
│ KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
│ "byProgram"
│ "deviceType"
│ "byProgramExcludeIdle"
│ "aggDvfsTimeScaleMultiplier"
│ by_program_available_keys =
│ KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
│ "name"
│ "children"
│ "numChildren"
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:816
I0000 00:00:1778468339.788536 4099 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
I0000 00:00:1778468339.788542 4099 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1778468339.788544 4099 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1778468339.788546 4099 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1778468339.788548 4099 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778468339.788577 4099 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1778468339.788606 4099 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1778468339.788608 4099 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 60.39us
I0000 00:00:1778468339.788610 4099 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1778468339.788612 4099 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1778468339.788616 4099 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1778468339.788619 4099 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1778468339.788648 4099 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1778468339.788652 4099 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1778468339.788657 4099 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1778468339.788662 4099 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1778468339.788664 4099 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1778468339.788665 4099 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 53.94us
I0000 00:00:1778468339.788667 4099 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1778468339.788671 4099 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1778468339.788879 4099 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1778468339.788885 4099 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1778468339.789056 4099 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1778468339.789063 4099 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 519.019us
I0000 00:00:1778468339.789071 4099 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 532.198us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
I0000 00:00:1778468339.867213 4099 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: kernel_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
I0000 00:00:1778468339.867244 4099 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: kernel_stats
I0000 00:00:1778468339.867248 4099 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778468339.867322 4099 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1778468339.867373 4099 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1778468339.867375 4099 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 127.618us
I0000 00:00:1778468339.867443 4099 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool kernel_stats: 203.861us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
I0000 00:00:1778468339.976430 4099 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: framework_op_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
I0000 00:00:1778468339.976461 4099 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: framework_op_stats
I0000 00:00:1778468339.976465 4099 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1778468339.976517 4099 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1778468339.976566 4099 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1778468339.976568 4099 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 103.363us
I0000 00:00:1778468339.976713 4099 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool framework_op_stats: 255.599us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_Qy5Mxl/plugins/profile/2026_05_11_02_58_59
╔================================================================================╗
║ SUMMARY ║
╚================================================================================╝
AggregateProfilingResult(
runtime = 0.00005385s,
compile_time = 0.09901967s, # time spent compiling by Reactant
)On GPUs this would look something like the following:
╔================================================================================╗
║ KERNEL STATISTICS ║
╚================================================================================╝
┌───────────────────┬─────────────┬────────────────┬──────────────┬──────────────┬──────────────┬──────────────┬───────────┬──────────┬────────────┬─────────────┐
│ Kernel Name │ Occurrences │ Total Duration │ Avg Duration │ Min Duration │ Max Duration │ Static Shmem │ Block Dim │ Grid Dim │ TensorCore │ Occupancy % │
├───────────────────┼─────────────┼────────────────┼──────────────┼──────────────┼──────────────┼──────────────┼───────────┼──────────┼────────────┼─────────────┤
│ gemm_fusion_dot_1 │ 1 │ 0.00000250s │ 0.00000250s │ 0.00000250s │ 0.00000250s │ 2.000 KiB │ 64,1,1 │ 1,1,1 │ ✗ │ 100.0% │
│ loop_add_fusion │ 1 │ 0.00000131s │ 0.00000131s │ 0.00000131s │ 0.00000131s │ 0 bytes │ 20,1,1 │ 1,1,1 │ ✗ │ 31.2% │
└───────────────────┴─────────────┴────────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────┴──────────┴────────────┴─────────────┘
╔================================================================================╗
║ FRAMEWORK OP STATISTICS ║
╚================================================================================╝
┌───────────────────┬─────────┬─────────────┬─────────────┬─────────────────┬───────────────┬──────────┬───────────┬──────────────┬──────────┐
│ Operation │ Type │ Host/Device │ Occurrences │ Total Self-Time │ Avg Self-Time │ Device % │ Memory BW │ FLOP Rate │ Bound By │
├───────────────────┼─────────┼─────────────┼─────────────┼─────────────────┼───────────────┼──────────┼───────────┼──────────────┼──────────┤
│ gemm_fusion_dot.1 │ Unknown │ Device │ 1 │ 0.00000250s │ 0.00000250s │ 65.55% │ 1.82 GB/s │ 1.6 GFLOP/s │ HBM │
│ +/add │ add │ Device │ 1 │ 0.00000131s │ 0.00000131s │ 34.45% │ 0.14 GB/s │ 0.05 GFLOP/s │ HBM │
└───────────────────┴─────────┴─────────────┴─────────────┴─────────────────┴───────────────┴──────────┴───────────┴──────────────┴──────────┘
╔================================================================================╗
║ SUMMARY ║
╚================================================================================╝
AggregateProfilingResult(
runtime = 0.00005622s,
compile_time = 2.32802137s, # time spent compiling by Reactant
GPU_0_bfc = MemoryProfileSummary(
peak_bytes_usage_lifetime = 64.010 MiB, # peak memory usage over the entire program (lifetime of memory allocator)
peak_stats = MemoryAggregationStats(
stack_reserved_bytes = 0 bytes, # memory usage by stack reservation
heap_allocated_bytes = 81.750 KiB, # memory usage by heap allocation
free_memory_bytes = 23.518 GiB, # free memory available for allocation or reservation
fragmentation = 0.514564, # fragmentation of memory within [0, 1]
peak_bytes_in_use = 81.750 KiB # The peak memory usage over the entire program
)
peak_stats_time = 0.00608052s,
memory_capacity = 23.518 GiB # memory capacity of the allocator
)
flops = FlopsSummary(
Flops = 2.033375207640664e-8, # [flops / (peak flops * program time)], capped at 1.0
UncappedFlops = 2.033375207640664e-8,
RawFlops = 4060.0, # Total FLOPs performed
BF16Flops = 4060.0, # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
RawTime = 0.00005622s, # Raw time in seconds
RawFlopsRate = 7.220987105380169e7, # Raw FLOPs rate in FLOPs/seconds
BF16FlopsRate = 7.220987105380169e7, # BF16 FLOPs rate in FLOPs/seconds
)
)Capturing traces
When running Reactant, it is possible to capture traces using the XLA profiler. These traces can provide information about where the XLA specific parts of program spend time during compilation or execution. Note that tracing and compilation happen on the CPU even though the final execution is aimed to run on another device such as GPU or TPU. Therefore, including tracing and compilation in a trace will create annotations on the CPU.
Let's setup a simple function which we can then profile
using Reactant
x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))
linear(x, W, b) = (W * x) .+ blinear (generic function with 1 method)The profiler can be accessed using the Reactant.with_profiler function.
Reactant.with_profiler("./") do
mylinear = Reactant.@compile linear(x, W, b)
mylinear(x, W, b)
end10×2 ConcretePJRTArray{Float32,2}:
17.6138 -4.20773
2.6673 -8.46881
7.40452 0.333003
6.27172 11.6807
-0.0247079 -2.13574
-4.71964 -9.43176
12.9682 10.4451
-11.0855 -2.55176
6.60932 -10.4896
-13.2506 8.14313Running this function should create a folder called plugins in the folder provided to Reactant.with_profiler which will contain the trace files. The traces can then be visualized in different ways.
Note
For more insights about the current state of Reactant, it is possible to fetch device information about allocations using the Reactant.XLA.allocatorstats function.
Perfetto UI

The first and easiest way to visualize a captured trace is to use the online perfetto.dev tool. Reactant.with_profiler has a keyword parameter called create_perfetto_link which will create a usable perfetto URL for the generated trace. The function will block execution until the URL has been clicked and the trace is visualized. The URL only works once.
Reactant.with_profiler("./"; create_perfetto_link=true) do
mylinear = Reactant.@compile linear(x, W, b)
mylinear(x, W, b)
endNote
It is recommended to use the Chrome browser to open the perfetto URL.
XProf
XProf is a complete web UI to analyze the log files captured by Reactant. It can be installed in the following manner:
pip install xprof # or xprof-nightlyLaunching xprof is then as simple as:
xprof --logdir=./which will then make the xprof interface available on port :8791 by default.
Tensorboard

Another option to visualize the generated trace files is to use the tensorboard profiler plugin. The tensorboard viewer can offer more details than the timeline view such as visualization for compute graphs.
First install tensorboard and its profiler plugin:
pip install tensorboard tensorboard-plugin-profileAnd then run the following in the folder where the plugins folder was generated:
tensorboard --logdir ./Adding Custom Annotations
By default, the traces contain only information captured from within XLA. The Reactant.Profiler.annotate function can be used to annotate traces for Julia code evaluated during tracing.
Reactant.Profiler.annotate("my_annotation") do
# Do things...
endThe added annotations will be captured in the traces and can be seen in the different viewers along with the default XLA annotations. When the profiler is not activated, then the custom annotations have no effect and can therefore always be activated.