Profiling
Quick profiling in your terminal
Note
This is only meant to be used for quick profiling or programmatically accessing the profiling results. For more detailed and GUI friendly profiling proceed to the next section.
Simply replace the use of Base.@time or Base.@timed with Reactant.Profiler.@time or Reactant.Profiler.@timed. We will automatically compile the function if it is not already a Reactant compiled function (with sync=true).
using Reactant
x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))
linear(x, W, b) = (W * x) .+ b
Reactant.@time linear(x, W, b)┌ Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:625
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1774725105.538178 4081 profiler_session.cc:117] Profiler session initializing.
I0000 00:00:1774725105.538218 4081 profiler_session.cc:132] Profiler session started.
I0000 00:00:1774725105.538521 4081 profiler_session.cc:81] Profiler session collecting data.
I0000 00:00:1774725105.539094 4081 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID/plugins/profile/2026_03_28_19_11_45/runnervmrg6be.xplane.pb
I0000 00:00:1774725105.539244 4081 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID/plugins/profile/2026_03_28_19_11_45
I0000 00:00:1774725105.539363 4081 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID/plugins/profile/2026_03_28_19_11_45/runnervmrg6be.trace.json.gz
I0000 00:00:1774725105.539379 4081 profiler_session.cc:150] Profiler session tear down.
┌ Debug: Starting XProf gRPC server...
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:597
┌ Debug: Initializing XProf stubs for worker service at 0.0.0.0:37079
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:396
I0000 00:00:1774725105.554162 4081 stub_factory.cc:163] Created gRPC channel for address: 0.0.0.0:37079
┌ Debug: Starting XProf gRPC server on port 37079
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:412
I0000 00:00:1774725105.554598 4081 grpc_server.cc:94] Server listening on 0.0.0.0:37079 with max_concurrent_requests 1
I0000 00:00:1774725105.563388 4081 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID/plugins/profile/2026_03_28_19_11_45
I0000 00:00:1774725105.563405 4081 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1774725105.563408 4081 memory_profile_processor.cc:47] Processing memory profile for host: runnervmrg6be
I0000 00:00:1774725105.563696 4081 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 296.23us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID/plugins/profile/2026_03_28_19_11_45
I0000 00:00:1774725105.576481 4081 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID/plugins/profile/2026_03_28_19_11_45
I0000 00:00:1774725105.576494 4081 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1774725105.576498 4081 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1774725105.576540 4081 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1774725105.576542 4081 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1774725105.576546 4081 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1774725105.576649 4081 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1774725105.576658 4081 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1774725105.576858 4081 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1774725105.577169 4081 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1774725105.577543 4081 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1774725105.577558 4081 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1774725105.577721 4081 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1774725105.577988 4081 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1774725105.580362 4081 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1774725105.580378 4081 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1774725105.580514 4081 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1774725105.580787 4081 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1774725105.580796 4081 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1774725105.581032 4081 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 238 bytes (0.000226974 MiB).
I0000 00:00:1774725105.581105 4081 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1774725105.581122 4081 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 4.58196ms
I0000 00:00:1774725105.581129 4081 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1774725105.581132 4081 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.803us
I0000 00:00:1774725105.581135 4081 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1774725105.581139 4081 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 3.406us
I0000 00:00:1774725105.581141 4081 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 4.599473ms
I0000 00:00:1774725105.581144 4081 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1774725105.581227 4081 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1774725105.581231 4081 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 4.733652ms
I0000 00:00:1774725105.581332 4081 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 4.843145ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID/plugins/profile/2026_03_28_19_11_45
┌ Debug: `op_profile` data missing keys for metrics
│ data_available_keys =
│ KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
│ "byProgram"
│ "deviceType"
│ "byProgramExcludeIdle"
│ "aggDvfsTimeScaleMultiplier"
│ by_program_available_keys =
│ KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
│ "name"
│ "children"
│ "numChildren"
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:815
I0000 00:00:1774725105.847221 4081 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID/plugins/profile/2026_03_28_19_11_45
I0000 00:00:1774725105.847242 4081 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1774725105.847245 4081 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1774725105.847248 4081 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1774725105.847250 4081 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1774725105.847309 4081 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1774725105.847360 4081 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1774725105.847363 4081 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 113.581us
I0000 00:00:1774725105.847368 4081 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1774725105.847378 4081 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1774725105.847385 4081 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1774725105.847389 4081 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1774725105.847547 4081 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1774725105.847555 4081 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1774725105.847659 4081 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1774725105.847669 4081 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1774725105.847672 4081 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1774725105.847674 4081 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 296.39us
I0000 00:00:1774725105.847676 4081 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1774725105.847684 4081 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1774725105.848148 4081 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1774725105.848159 4081 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1774725105.848375 4081 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1774725105.848384 4081 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 1.139375ms
I0000 00:00:1774725105.848397 4081 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 1.160644ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_QYovID/plugins/profile/2026_03_28_19_11_45
runtime: 0.00025263s
compile time: 3.54069495sReactant.@timed nrepeat=100 linear(x, W, b)AggregateProfilingResult(
runtime = 0.00002879s,
compile_time = 0.12268345s, )Note that the information returned depends on the backend. Specifically CUDA and TPU backends provide more detailed information regarding memory usage and allocation (something like the following will be displayed on GPUs):
AggregateProfilingResult(
runtime = 0.00003829s,
compile_time = 2.18053260s, # time spent compiling by Reactant
GPU_0_bfc = MemoryProfileSummary(
peak_bytes_usage_lifetime = 64.010 MiB, # peak memory usage over the entire program (lifetime of memory allocator)
peak_stats = MemoryAggregationStats(
stack_reserved_bytes = 0 bytes, # memory usage by stack reservation
heap_allocated_bytes = 30.750 KiB, # memory usage by heap allocation
free_memory_bytes = 23.518 GiB, # free memory available for allocation or reservation
fragmentation = 0.514931, # fragmentation of memory within [0, 1]
peak_bytes_in_use = 30.750 KiB # The peak memory usage over the entire program
)
peak_stats_time = 0.04975365s,
memory_capacity = 23.518 GiB # memory capacity of the allocator
)
flops = FlopsSummary(
Flops = 2.8369974648038653e-9, # [flops / (peak flops * program time)], capped at 1.0
UncappedFlops = 2.8369974648038653e-9,
RawFlops = 4060.0, # Total FLOPs performed
BF16Flops = 4060.0, # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
RawTime = 0.00040298422s, # Raw time in seconds
RawFlopsRate = 1.0074836180930361e7, # Raw FLOPs rate in FLOPs/seconds
BF16FlopsRate = 1.0074836180930361e7, # BF16 FLOPs rate in FLOPs/seconds
)
)Additionally for GPUs and TPUs, we can use the Reactant.@profile macro to profile the function and get information regarding each of the kernels executed.
Reactant.@profile linear(x, W, b)┌ Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:625
I0000 00:00:1774725106.611918 4081 profiler_session.cc:117] Profiler session initializing.
I0000 00:00:1774725106.612071 4081 profiler_session.cc:132] Profiler session started.
I0000 00:00:1774725106.612162 4081 profiler_session.cc:81] Profiler session collecting data.
I0000 00:00:1774725106.612513 4081 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46/runnervmrg6be.xplane.pb
I0000 00:00:1774725106.612656 4081 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
I0000 00:00:1774725106.612772 4081 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46/runnervmrg6be.trace.json.gz
I0000 00:00:1774725106.612787 4081 profiler_session.cc:150] Profiler session tear down.
I0000 00:00:1774725106.612846 4081 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
I0000 00:00:1774725106.612852 4081 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1774725106.612854 4081 memory_profile_processor.cc:47] Processing memory profile for host: runnervmrg6be
I0000 00:00:1774725106.613019 4081 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 172.25us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
I0000 00:00:1774725106.613057 4081 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
I0000 00:00:1774725106.613061 4081 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1774725106.613063 4081 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1774725106.613093 4081 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1774725106.613095 4081 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1774725106.613099 4081 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1774725106.613166 4081 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1774725106.613174 4081 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1774725106.613354 4081 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1774725106.613618 4081 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1774725106.613862 4081 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1774725106.613874 4081 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1774725106.614037 4081 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1774725106.614253 4081 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1774725106.614499 4081 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1774725106.614512 4081 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1774725106.614639 4081 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1774725106.614864 4081 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1774725106.614873 4081 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1774725106.615012 4081 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 238 bytes (0.000226974 MiB).
I0000 00:00:1774725106.615064 4081 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1774725106.615080 4081 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 1.986757ms
I0000 00:00:1774725106.615085 4081 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1774725106.615088 4081 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.613us
I0000 00:00:1774725106.615090 4081 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1774725106.615094 4081 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 2.996us
I0000 00:00:1774725106.615096 4081 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 2.000754ms
I0000 00:00:1774725106.615099 4081 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1774725106.615149 4081 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1774725106.615153 4081 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 2.089688ms
I0000 00:00:1774725106.615174 4081 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 2.114925ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
┌ Debug: `op_profile` data missing keys for metrics
│ data_available_keys =
│ KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
│ "byProgram"
│ "deviceType"
│ "byProgramExcludeIdle"
│ "aggDvfsTimeScaleMultiplier"
│ by_program_available_keys =
│ KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
│ "name"
│ "children"
│ "numChildren"
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:815
I0000 00:00:1774725106.615472 4081 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
I0000 00:00:1774725106.615479 4081 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1774725106.615481 4081 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1774725106.615483 4081 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1774725106.615485 4081 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1774725106.615517 4081 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1774725106.615550 4081 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1774725106.615553 4081 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 68.076us
I0000 00:00:1774725106.615555 4081 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1774725106.615557 4081 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1774725106.615561 4081 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1774725106.615565 4081 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1774725106.615593 4081 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1774725106.615597 4081 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1774725106.615602 4081 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1774725106.615607 4081 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1774725106.615609 4081 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1774725106.615611 4081 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 54.29us
I0000 00:00:1774725106.615613 4081 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1774725106.615617 4081 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1774725106.615850 4081 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1774725106.615859 4081 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1774725106.616078 4081 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1774725106.616089 4081 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 607.889us
I0000 00:00:1774725106.616099 4081 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 623.086us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
I0000 00:00:1774725106.709398 4081 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: kernel_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
I0000 00:00:1774725106.709428 4081 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: kernel_stats
I0000 00:00:1774725106.709432 4081 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1774725106.709491 4081 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1774725106.709547 4081 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1774725106.709550 4081 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 118.299us
I0000 00:00:1774725106.709607 4081 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool kernel_stats: 183.911us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
I0000 00:00:1774725106.841038 4081 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: framework_op_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
I0000 00:00:1774725106.841066 4081 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: framework_op_stats
I0000 00:00:1774725106.841070 4081 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1774725106.841128 4081 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1774725106.841176 4081 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1774725106.841179 4081 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 109.574us
I0000 00:00:1774725106.841311 4081 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool framework_op_stats: 248.341us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_11bXAN/plugins/profile/2026_03_28_19_11_46
╔================================================================================╗
║ SUMMARY ║
╚================================================================================╝
AggregateProfilingResult(
runtime = 0.00006873s,
compile_time = 0.15312218s, # time spent compiling by Reactant
)On GPUs this would look something like the following:
╔================================================================================╗
║ KERNEL STATISTICS ║
╚================================================================================╝
┌───────────────────┬─────────────┬────────────────┬──────────────┬──────────────┬──────────────┬──────────────┬───────────┬──────────┬────────────┬─────────────┐
│ Kernel Name │ Occurrences │ Total Duration │ Avg Duration │ Min Duration │ Max Duration │ Static Shmem │ Block Dim │ Grid Dim │ TensorCore │ Occupancy % │
├───────────────────┼─────────────┼────────────────┼──────────────┼──────────────┼──────────────┼──────────────┼───────────┼──────────┼────────────┼─────────────┤
│ gemm_fusion_dot_1 │ 1 │ 0.00000250s │ 0.00000250s │ 0.00000250s │ 0.00000250s │ 2.000 KiB │ 64,1,1 │ 1,1,1 │ ✗ │ 100.0% │
│ loop_add_fusion │ 1 │ 0.00000131s │ 0.00000131s │ 0.00000131s │ 0.00000131s │ 0 bytes │ 20,1,1 │ 1,1,1 │ ✗ │ 31.2% │
└───────────────────┴─────────────┴────────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────┴──────────┴────────────┴─────────────┘
╔================================================================================╗
║ FRAMEWORK OP STATISTICS ║
╚================================================================================╝
┌───────────────────┬─────────┬─────────────┬─────────────┬─────────────────┬───────────────┬──────────┬───────────┬──────────────┬──────────┐
│ Operation │ Type │ Host/Device │ Occurrences │ Total Self-Time │ Avg Self-Time │ Device % │ Memory BW │ FLOP Rate │ Bound By │
├───────────────────┼─────────┼─────────────┼─────────────┼─────────────────┼───────────────┼──────────┼───────────┼──────────────┼──────────┤
│ gemm_fusion_dot.1 │ Unknown │ Device │ 1 │ 0.00000250s │ 0.00000250s │ 65.55% │ 1.82 GB/s │ 1.6 GFLOP/s │ HBM │
│ +/add │ add │ Device │ 1 │ 0.00000131s │ 0.00000131s │ 34.45% │ 0.14 GB/s │ 0.05 GFLOP/s │ HBM │
└───────────────────┴─────────┴─────────────┴─────────────┴─────────────────┴───────────────┴──────────┴───────────┴──────────────┴──────────┘
╔================================================================================╗
║ SUMMARY ║
╚================================================================================╝
AggregateProfilingResult(
runtime = 0.00005622s,
compile_time = 2.32802137s, # time spent compiling by Reactant
GPU_0_bfc = MemoryProfileSummary(
peak_bytes_usage_lifetime = 64.010 MiB, # peak memory usage over the entire program (lifetime of memory allocator)
peak_stats = MemoryAggregationStats(
stack_reserved_bytes = 0 bytes, # memory usage by stack reservation
heap_allocated_bytes = 81.750 KiB, # memory usage by heap allocation
free_memory_bytes = 23.518 GiB, # free memory available for allocation or reservation
fragmentation = 0.514564, # fragmentation of memory within [0, 1]
peak_bytes_in_use = 81.750 KiB # The peak memory usage over the entire program
)
peak_stats_time = 0.00608052s,
memory_capacity = 23.518 GiB # memory capacity of the allocator
)
flops = FlopsSummary(
Flops = 2.033375207640664e-8, # [flops / (peak flops * program time)], capped at 1.0
UncappedFlops = 2.033375207640664e-8,
RawFlops = 4060.0, # Total FLOPs performed
BF16Flops = 4060.0, # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
RawTime = 0.00005622s, # Raw time in seconds
RawFlopsRate = 7.220987105380169e7, # Raw FLOPs rate in FLOPs/seconds
BF16FlopsRate = 7.220987105380169e7, # BF16 FLOPs rate in FLOPs/seconds
)
)Capturing traces
When running Reactant, it is possible to capture traces using the XLA profiler. These traces can provide information about where the XLA specific parts of program spend time during compilation or execution. Note that tracing and compilation happen on the CPU even though the final execution is aimed to run on another device such as GPU or TPU. Therefore, including tracing and compilation in a trace will create annotations on the CPU.
Let's setup a simple function which we can then profile
using Reactant
x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))
linear(x, W, b) = (W * x) .+ blinear (generic function with 1 method)The profiler can be accessed using the Reactant.with_profiler function.
Reactant.with_profiler("./") do
mylinear = Reactant.@compile linear(x, W, b)
mylinear(x, W, b)
end10×2 ConcretePJRTArray{Float32,2}:
-2.51201 3.39944
2.37971 10.1317
-7.9995 -3.24137
-8.71623 -4.35187
-8.18914 -7.76062
9.08949 -3.78197
14.6572 20.8409
20.5292 -16.204
-19.7293 -13.9338
-9.18288 -2.98373Running this function should create a folder called plugins in the folder provided to Reactant.with_profiler which will contain the trace files. The traces can then be visualized in different ways.
Note
For more insights about the current state of Reactant, it is possible to fetch device information about allocations using the Reactant.XLA.allocatorstats function.
Perfetto UI

The first and easiest way to visualize a captured trace is to use the online perfetto.dev tool. Reactant.with_profiler has a keyword parameter called create_perfetto_link which will create a usable perfetto URL for the generated trace. The function will block execution until the URL has been clicked and the trace is visualized. The URL only works once.
Reactant.with_profiler("./"; create_perfetto_link=true) do
mylinear = Reactant.@compile linear(x, W, b)
mylinear(x, W, b)
endNote
It is recommended to use the Chrome browser to open the perfetto URL.
XProf
XProf is a complete web UI to analyze the log files captured by Reactant. It can be installed in the following manner:
pip install xprof # or xprof-nightlyLaunching xprof is then as simple as:
xprof --logdir=./which will then make the xprof interface available on port :8791 by default.
Tensorboard

Another option to visualize the generated trace files is to use the tensorboard profiler plugin. The tensorboard viewer can offer more details than the timeline view such as visualization for compute graphs.
First install tensorboard and its profiler plugin:
pip install tensorboard tensorboard-plugin-profileAnd then run the following in the folder where the plugins folder was generated:
tensorboard --logdir ./Adding Custom Annotations
By default, the traces contain only information captured from within XLA. The Reactant.Profiler.annotate function can be used to annotate traces for Julia code evaluated during tracing.
Reactant.Profiler.annotate("my_annotation") do
# Do things...
endThe added annotations will be captured in the traces and can be seen in the different viewers along with the default XLA annotations. When the profiler is not activated, then the custom annotations have no effect and can therefore always be activated.