Profiling
Quick profiling in your terminal
Note
This is only meant to be used for quick profiling or programmatically accessing the profiling results. For more detailed and GUI friendly profiling proceed to the next section.
Simply replace the use of Base.@time or Base.@timed with Reactant.Profiler.@time or Reactant.Profiler.@timed. We will automatically compile the function if it is not already a Reactant compiled function (with sync=true).
using Reactant
x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))
linear(x, W, b) = (W * x) .+ b
Reactant.@time linear(x, W, b)┌ Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:625
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1775072348.297949 4109 profiler_session.cc:117] Profiler session initializing.
I0000 00:00:1775072348.297987 4109 profiler_session.cc:132] Profiler session started.
I0000 00:00:1775072348.298308 4109 profiler_session.cc:81] Profiler session collecting data.
I0000 00:00:1775072348.298955 4109 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon/plugins/profile/2026_04_01_19_39_08/runnervmrg6be.xplane.pb
I0000 00:00:1775072348.299129 4109 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon/plugins/profile/2026_04_01_19_39_08
I0000 00:00:1775072348.299248 4109 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon/plugins/profile/2026_04_01_19_39_08/runnervmrg6be.trace.json.gz
I0000 00:00:1775072348.299266 4109 profiler_session.cc:150] Profiler session tear down.
┌ Debug: Starting XProf gRPC server...
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:597
┌ Debug: Initializing XProf stubs for worker service at 0.0.0.0:44077
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:396
I0000 00:00:1775072348.313943 4109 stub_factory.cc:163] Created gRPC channel for address: 0.0.0.0:44077
┌ Debug: Starting XProf gRPC server on port 44077
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:412
I0000 00:00:1775072348.314379 4109 grpc_server.cc:94] Server listening on 0.0.0.0:44077 with max_concurrent_requests 1
I0000 00:00:1775072348.323159 4109 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon/plugins/profile/2026_04_01_19_39_08
I0000 00:00:1775072348.323174 4109 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1775072348.323176 4109 memory_profile_processor.cc:47] Processing memory profile for host: runnervmrg6be
I0000 00:00:1775072348.323457 4109 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 288.511us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon/plugins/profile/2026_04_01_19_39_08
I0000 00:00:1775072348.336553 4109 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon/plugins/profile/2026_04_01_19_39_08
I0000 00:00:1775072348.336572 4109 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1775072348.336576 4109 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1775072348.336625 4109 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1775072348.336627 4109 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1775072348.336632 4109 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1775072348.336765 4109 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1775072348.336774 4109 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1775072348.337000 4109 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1775072348.337294 4109 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1775072348.337835 4109 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1775072348.337858 4109 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1775072348.338002 4109 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1775072348.338251 4109 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1775072348.340987 4109 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1775072348.341010 4109 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1775072348.341165 4109 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1775072348.341789 4109 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1775072348.341796 4109 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1775072348.342041 4109 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 326 bytes (0.000310898 MiB).
I0000 00:00:1775072348.342187 4109 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1775072348.342224 4109 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 5.599504ms
I0000 00:00:1775072348.342232 4109 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1775072348.342235 4109 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.392us
I0000 00:00:1775072348.342237 4109 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1775072348.342242 4109 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 4.058us
I0000 00:00:1775072348.342244 4109 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 5.617298ms
I0000 00:00:1775072348.342247 4109 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1775072348.342326 4109 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1775072348.342330 4109 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 5.754475ms
I0000 00:00:1775072348.342432 4109 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 5.86394ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon/plugins/profile/2026_04_01_19_39_08
┌ Debug: `op_profile` data missing keys for metrics
│ data_available_keys =
│ KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
│ "byProgram"
│ "deviceType"
│ "byProgramExcludeIdle"
│ "aggDvfsTimeScaleMultiplier"
│ by_program_available_keys =
│ KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
│ "name"
│ "children"
│ "numChildren"
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:815
I0000 00:00:1775072348.608364 4109 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon/plugins/profile/2026_04_01_19_39_08
I0000 00:00:1775072348.608386 4109 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1775072348.608389 4109 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1775072348.608391 4109 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1775072348.608393 4109 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1775072348.608451 4109 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1775072348.608502 4109 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1775072348.608505 4109 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 112.181us
I0000 00:00:1775072348.608509 4109 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1775072348.608511 4109 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1775072348.608517 4109 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1775072348.608522 4109 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1775072348.608679 4109 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1775072348.608686 4109 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1775072348.608785 4109 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1775072348.608793 4109 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1775072348.608796 4109 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1775072348.608798 4109 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 286.538us
I0000 00:00:1775072348.608800 4109 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1775072348.608807 4109 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1775072348.609688 4109 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1775072348.609695 4109 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1775072348.609909 4109 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1775072348.609917 4109 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 1.528998ms
I0000 00:00:1775072348.609929 4109 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 1.548115ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_uqxxon/plugins/profile/2026_04_01_19_39_08
runtime: 0.00027300s
compile time: 3.51905349sReactant.@timed nrepeat=100 linear(x, W, b)AggregateProfilingResult(
runtime = 0.00003631s,
compile_time = 0.12302047s, )Note that the information returned depends on the backend. Specifically CUDA and TPU backends provide more detailed information regarding memory usage and allocation (something like the following will be displayed on GPUs):
AggregateProfilingResult(
runtime = 0.00003829s,
compile_time = 2.18053260s, # time spent compiling by Reactant
GPU_0_bfc = MemoryProfileSummary(
peak_bytes_usage_lifetime = 64.010 MiB, # peak memory usage over the entire program (lifetime of memory allocator)
peak_stats = MemoryAggregationStats(
stack_reserved_bytes = 0 bytes, # memory usage by stack reservation
heap_allocated_bytes = 30.750 KiB, # memory usage by heap allocation
free_memory_bytes = 23.518 GiB, # free memory available for allocation or reservation
fragmentation = 0.514931, # fragmentation of memory within [0, 1]
peak_bytes_in_use = 30.750 KiB # The peak memory usage over the entire program
)
peak_stats_time = 0.04975365s,
memory_capacity = 23.518 GiB # memory capacity of the allocator
)
flops = FlopsSummary(
Flops = 2.8369974648038653e-9, # [flops / (peak flops * program time)], capped at 1.0
UncappedFlops = 2.8369974648038653e-9,
RawFlops = 4060.0, # Total FLOPs performed
BF16Flops = 4060.0, # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
RawTime = 0.00040298422s, # Raw time in seconds
RawFlopsRate = 1.0074836180930361e7, # Raw FLOPs rate in FLOPs/seconds
BF16FlopsRate = 1.0074836180930361e7, # BF16 FLOPs rate in FLOPs/seconds
)
)Additionally for GPUs and TPUs, we can use the Reactant.@profile macro to profile the function and get information regarding each of the kernels executed.
Reactant.@profile linear(x, W, b)┌ Debug: Profiling directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:625
I0000 00:00:1775072349.374159 4109 profiler_session.cc:117] Profiler session initializing.
I0000 00:00:1775072349.374233 4109 profiler_session.cc:132] Profiler session started.
I0000 00:00:1775072349.374345 4109 profiler_session.cc:81] Profiler session collecting data.
I0000 00:00:1775072349.374824 4109 save_profile.cc:150] Collecting XSpace to repository: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09/runnervmrg6be.xplane.pb
I0000 00:00:1775072349.374971 4109 save_profile.cc:123] Creating directory: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
I0000 00:00:1775072349.375128 4109 save_profile.cc:129] Dumped gzipped tool data for trace.json.gz to /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09/runnervmrg6be.trace.json.gz
I0000 00:00:1775072349.375147 4109 profiler_session.cc:150] Profiler session tear down.
I0000 00:00:1775072349.375207 4109 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: memory_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
I0000 00:00:1775072349.375213 4109 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: memory_profile
I0000 00:00:1775072349.375215 4109 memory_profile_processor.cc:47] Processing memory profile for host: runnervmrg6be
I0000 00:00:1775072349.375351 4109 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool memory_profile: 142.919us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
I0000 00:00:1775072349.375376 4109 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: op_profile with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
I0000 00:00:1775072349.375380 4109 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: op_profile
I0000 00:00:1775072349.375382 4109 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1775072349.375410 4109 multi_xplanes_to_op_stats.cc:134] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache miss, calling ConvertMultiXSpacesToCombinedOpStats
I0000 00:00:1775072349.375412 4109 multi_xplanes_to_op_stats.cc:45] ConvertMultiXSpacesToCombinedOpStats: Started. Number of XSpaces: 1
I0000 00:00:1775072349.375416 4109 multi_xplanes_to_op_stats.cc:55] ConvertMultiXSpacesToCombinedOpStats: Starting to process XSpace 0/1
I0000 00:00:1775072349.375487 4109 derived_timeline.cc:693] GenerateDerivedTimeLines: creating derived_timeline_trace_events XprofThreadPoolExecutor
I0000 00:00:1775072349.375494 4109 xprof_thread_pool_executor.cc:22] Creating derived_timeline_trace_events XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1775072349.375681 4109 derived_timeline.cc:705] GenerateDerivedTimeLines: waiting for derived_timeline_trace_events threads to join
I0000 00:00:1775072349.375944 4109 derived_timeline.cc:709] GenerateDerivedTimeLines: derived_timeline_trace_events threads joined successfully
I0000 00:00:1775072349.376429 4109 derived_timeline.cc:758] GenerateDerivedTimeLines: creating ProcessTensorCorePlanes XprofThreadPoolExecutor
I0000 00:00:1775072349.376456 4109 xprof_thread_pool_executor.cc:22] Creating ProcessTensorCorePlanes XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1775072349.376591 4109 derived_timeline.cc:769] GenerateDerivedTimeLines: waiting for ProcessTensorCorePlanes threads to join
I0000 00:00:1775072349.376832 4109 derived_timeline.cc:772] GenerateDerivedTimeLines: ProcessTensorCorePlanes threads joined successfully
I0000 00:00:1775072349.377376 4109 xplane_to_op_stats.cc:405] ConvertXSpaceToOpStats: creating op_stats_threads XprofThreadPoolExecutor
I0000 00:00:1775072349.377400 4109 xprof_thread_pool_executor.cc:22] Creating op_stats_threads XprofThreadPoolExecutor with 4 threads.
I0000 00:00:1775072349.377529 4109 xplane_to_op_stats.cc:461] ConvertXSpaceToOpStats: Scheduled 0 OpMetricsDb generation tasks.
I0000 00:00:1775072349.377765 4109 xplane_to_op_stats.cc:417] ConvertXSpaceToOpStats: Combining 0 op_metrics_dbs.
I0000 00:00:1775072349.377773 4109 xplane_to_op_stats.cc:422] ConvertXSpaceToOpStats: Finished combining op_metrics_dbs.
I0000 00:00:1775072349.377926 4109 xplane_to_op_stats.cc:687] ConvertXSpaceToOpStats: Final OpStats size: 325 bytes (0.000309944 MiB).
I0000 00:00:1775072349.378007 4109 multi_xplanes_to_op_stats.cc:67] ConvertMultiXSpacesToCombinedOpStats: Finished processing XSpace 0.
I0000 00:00:1775072349.378049 4109 multi_xplanes_to_op_stats.cc:72] ConvertMultiXSpacesToCombinedOpStats: Finished extracting all 1 OpStats. Time: 2.639535ms
I0000 00:00:1775072349.378055 4109 multi_xplanes_to_op_stats.cc:85] ConvertMultiXSpacesToCombinedOpStats: Starting ComputeStepIntersectionToMergeOpStats.
I0000 00:00:1775072349.378058 4109 multi_xplanes_to_op_stats.cc:94] ConvertMultiXSpacesToCombinedOpStats: Finished ComputeStepIntersectionToMergeOpStats in 1.663us
I0000 00:00:1775072349.378060 4109 multi_xplanes_to_op_stats.cc:99] ConvertMultiXSpacesToCombinedOpStats: Starting CombineAllOpStats.
I0000 00:00:1775072349.378098 4109 multi_xplanes_to_op_stats.cc:106] ConvertMultiXSpacesToCombinedOpStats: Finished CombineAllOpStats in 38.492us
I0000 00:00:1775072349.378102 4109 multi_xplanes_to_op_stats.cc:109] ConvertMultiXSpacesToCombinedOpStats: Overall Finished in 2.690541ms
I0000 00:00:1775072349.378106 4109 multi_xplanes_to_op_stats.cc:138] ConvertMultiXSpaceToCombinedOpStatsWithCache: Starting to write cache file.
I0000 00:00:1775072349.378163 4109 multi_xplanes_to_op_stats.cc:145] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished writing cache file.
I0000 00:00:1775072349.378167 4109 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 2.784938ms
I0000 00:00:1775072349.378188 4109 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool op_profile: 2.810576ms session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
┌ Debug: `op_profile` data missing keys for metrics
│ data_available_keys =
│ KeySet for a JSON.Object{String, Any} with 4 entries. Keys:
│ "byProgram"
│ "deviceType"
│ "byProgramExcludeIdle"
│ "aggDvfsTimeScaleMultiplier"
│ by_program_available_keys =
│ KeySet for a JSON.Object{String, Any} with 3 entries. Keys:
│ "name"
│ "children"
│ "numChildren"
└ @ Reactant.Profiler ~/work/Reactant.jl/Reactant.jl/src/Profiler.jl:815
I0000 00:00:1775072349.378510 4109 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: overview_page with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
I0000 00:00:1775072349.378516 4109 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: overview_page
I0000 00:00:1775072349.378518 4109 overview_page_processor.cc:84] OverviewPageProcessor::ProcessSession: Started
I0000 00:00:1775072349.378520 4109 overview_page_processor.cc:86] OverviewPageProcessor::ProcessSession: Starting ConvertMultiXSpaceToCombinedOpStatsWithCache
I0000 00:00:1775072349.378522 4109 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1775072349.378555 4109 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1775072349.378588 4109 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1775072349.378591 4109 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 69.23us
I0000 00:00:1775072349.378593 4109 overview_page_processor.cc:90] OverviewPageProcessor::ProcessSession: Starting ConvertOpStatsToOverviewPage
I0000 00:00:1775072349.378595 4109 op_stats_to_overview_page.cc:388] ConvertOpStatsToOverviewPage: Starting ComputeRunEnvironment
I0000 00:00:1775072349.378599 4109 op_stats_to_overview_page.cc:393] ConvertOpStatsToOverviewPage: Starting ComputeAnalysisResult
I0000 00:00:1775072349.378603 4109 op_stats_to_overview_page.cc:396] ConvertOpStatsToOverviewPage: Starting ConvertOpStatsToInputPipelineAnalysis
I0000 00:00:1775072349.378631 4109 op_stats_to_overview_page.cc:401] ConvertOpStatsToOverviewPage: Starting ComputeBottleneckAnalysis
I0000 00:00:1775072349.378635 4109 op_stats_to_overview_page.cc:407] ConvertOpStatsToOverviewPage: Starting ComputeGenericRecommendation
I0000 00:00:1775072349.378640 4109 op_stats_to_overview_page.cc:412] ConvertOpStatsToOverviewPage: Starting SetCommonRecommendation
I0000 00:00:1775072349.378645 4109 op_stats_to_overview_page.cc:425] ConvertOpStatsToOverviewPage: Starting PopulateOverviewDiagnostics
I0000 00:00:1775072349.378647 4109 op_stats_to_overview_page.cc:429] ConvertOpStatsToOverviewPage: Starting setting utilizations
I0000 00:00:1775072349.378649 4109 op_stats_to_overview_page.cc:435] ConvertOpStatsToOverviewPage: Overall Finished in 53.43us
I0000 00:00:1775072349.378651 4109 overview_page_processor.cc:94] OverviewPageProcessor::ProcessSession: Not a training run, Starting to convert inference stats.
I0000 00:00:1775072349.378655 4109 xprof_thread_pool_executor.cc:22] Creating ConvertMultiXSpaceToInferenceStats XprofThreadPoolExecutor with 1 threads.
I0000 00:00:1775072349.378906 4109 overview_page_processor.cc:99] OverviewPageProcessor::ProcessSession: Starting to compute InferenceLatency
I0000 00:00:1775072349.378914 4109 overview_page_processor.cc:104] OverviewPageProcessor::ProcessSession: Starting to serialize OverviewPage toJson
I0000 00:00:1775072349.379292 4109 overview_page_processor.cc:107] OverviewPageProcessor::ProcessSession: Starting to set Output
I0000 00:00:1775072349.379304 4109 overview_page_processor.cc:109] OverviewPageProcessor::ProcessSession: Overall Finished in 786.088us
I0000 00:00:1775072349.379315 4109 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool overview_page: 801.666us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
I0000 00:00:1775072349.474494 4109 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: kernel_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
I0000 00:00:1775072349.474523 4109 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: kernel_stats
I0000 00:00:1775072349.474527 4109 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1775072349.474584 4109 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1775072349.474639 4109 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1775072349.474642 4109 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 115.708us
I0000 00:00:1775072349.474698 4109 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool kernel_stats: 180.829us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
I0000 00:00:1775072349.610725 4109 xplane_to_tools_data_with_profile_processor.cc:142] serving tool: framework_op_stats with options: {} using ProfileProcessor session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
I0000 00:00:1775072349.610754 4109 xplane_to_tools_data_with_profile_processor.cc:165] Using local processing for tool: framework_op_stats
I0000 00:00:1775072349.610758 4109 multi_xplanes_to_op_stats.cc:118] ConvertMultiXSpaceToCombinedOpStatsWithCache: Started
I0000 00:00:1775072349.610816 4109 multi_xplanes_to_op_stats.cc:126] ConvertMultiXSpaceToCombinedOpStatsWithCache: Cache hit, reading binary proto
I0000 00:00:1775072349.610865 4109 multi_xplanes_to_op_stats.cc:131] ConvertMultiXSpaceToCombinedOpStatsWithCache: Finished reading cache file.
I0000 00:00:1775072349.610867 4109 multi_xplanes_to_op_stats.cc:149] ConvertMultiXSpaceToCombinedOpStatsWithCache: Overall Finished in 109.656us
I0000 00:00:1775072349.611002 4109 xplane_to_tools_data_with_profile_processor.cc:170] Total time for tool framework_op_stats: 251.533us session_id: /home/runner/.julia/scratchspaces/3c362404-f566-11ee-1572-e11a4b42c853/reactant_profiling/jl_RsWJxY/plugins/profile/2026_04_01_19_39_09
╔================================================================================╗
║ SUMMARY ║
╚================================================================================╝
AggregateProfilingResult(
runtime = 0.00008707s,
compile_time = 0.12200000s, # time spent compiling by Reactant
)On GPUs this would look something like the following:
╔================================================================================╗
║ KERNEL STATISTICS ║
╚================================================================================╝
┌───────────────────┬─────────────┬────────────────┬──────────────┬──────────────┬──────────────┬──────────────┬───────────┬──────────┬────────────┬─────────────┐
│ Kernel Name │ Occurrences │ Total Duration │ Avg Duration │ Min Duration │ Max Duration │ Static Shmem │ Block Dim │ Grid Dim │ TensorCore │ Occupancy % │
├───────────────────┼─────────────┼────────────────┼──────────────┼──────────────┼──────────────┼──────────────┼───────────┼──────────┼────────────┼─────────────┤
│ gemm_fusion_dot_1 │ 1 │ 0.00000250s │ 0.00000250s │ 0.00000250s │ 0.00000250s │ 2.000 KiB │ 64,1,1 │ 1,1,1 │ ✗ │ 100.0% │
│ loop_add_fusion │ 1 │ 0.00000131s │ 0.00000131s │ 0.00000131s │ 0.00000131s │ 0 bytes │ 20,1,1 │ 1,1,1 │ ✗ │ 31.2% │
└───────────────────┴─────────────┴────────────────┴──────────────┴──────────────┴──────────────┴──────────────┴───────────┴──────────┴────────────┴─────────────┘
╔================================================================================╗
║ FRAMEWORK OP STATISTICS ║
╚================================================================================╝
┌───────────────────┬─────────┬─────────────┬─────────────┬─────────────────┬───────────────┬──────────┬───────────┬──────────────┬──────────┐
│ Operation │ Type │ Host/Device │ Occurrences │ Total Self-Time │ Avg Self-Time │ Device % │ Memory BW │ FLOP Rate │ Bound By │
├───────────────────┼─────────┼─────────────┼─────────────┼─────────────────┼───────────────┼──────────┼───────────┼──────────────┼──────────┤
│ gemm_fusion_dot.1 │ Unknown │ Device │ 1 │ 0.00000250s │ 0.00000250s │ 65.55% │ 1.82 GB/s │ 1.6 GFLOP/s │ HBM │
│ +/add │ add │ Device │ 1 │ 0.00000131s │ 0.00000131s │ 34.45% │ 0.14 GB/s │ 0.05 GFLOP/s │ HBM │
└───────────────────┴─────────┴─────────────┴─────────────┴─────────────────┴───────────────┴──────────┴───────────┴──────────────┴──────────┘
╔================================================================================╗
║ SUMMARY ║
╚================================================================================╝
AggregateProfilingResult(
runtime = 0.00005622s,
compile_time = 2.32802137s, # time spent compiling by Reactant
GPU_0_bfc = MemoryProfileSummary(
peak_bytes_usage_lifetime = 64.010 MiB, # peak memory usage over the entire program (lifetime of memory allocator)
peak_stats = MemoryAggregationStats(
stack_reserved_bytes = 0 bytes, # memory usage by stack reservation
heap_allocated_bytes = 81.750 KiB, # memory usage by heap allocation
free_memory_bytes = 23.518 GiB, # free memory available for allocation or reservation
fragmentation = 0.514564, # fragmentation of memory within [0, 1]
peak_bytes_in_use = 81.750 KiB # The peak memory usage over the entire program
)
peak_stats_time = 0.00608052s,
memory_capacity = 23.518 GiB # memory capacity of the allocator
)
flops = FlopsSummary(
Flops = 2.033375207640664e-8, # [flops / (peak flops * program time)], capped at 1.0
UncappedFlops = 2.033375207640664e-8,
RawFlops = 4060.0, # Total FLOPs performed
BF16Flops = 4060.0, # Total FLOPs Normalized to the bf16 (default) devices peak bandwidth
RawTime = 0.00005622s, # Raw time in seconds
RawFlopsRate = 7.220987105380169e7, # Raw FLOPs rate in FLOPs/seconds
BF16FlopsRate = 7.220987105380169e7, # BF16 FLOPs rate in FLOPs/seconds
)
)Capturing traces
When running Reactant, it is possible to capture traces using the XLA profiler. These traces can provide information about where the XLA specific parts of program spend time during compilation or execution. Note that tracing and compilation happen on the CPU even though the final execution is aimed to run on another device such as GPU or TPU. Therefore, including tracing and compilation in a trace will create annotations on the CPU.
Let's setup a simple function which we can then profile
using Reactant
x = Reactant.to_rarray(randn(Float32, 100, 2))
W = Reactant.to_rarray(randn(Float32, 10, 100))
b = Reactant.to_rarray(randn(Float32, 10))
linear(x, W, b) = (W * x) .+ blinear (generic function with 1 method)The profiler can be accessed using the Reactant.with_profiler function.
Reactant.with_profiler("./") do
mylinear = Reactant.@compile linear(x, W, b)
mylinear(x, W, b)
end10×2 ConcretePJRTArray{Float32,2}:
2.64335 -1.43561
6.60194 -8.00313
-13.5046 -4.58924
16.2172 9.21281
4.91321 -4.73837
-20.576 6.43772
-13.2729 -1.11146
-7.56351 0.572568
-2.61496 -15.718
7.4606 -20.2634Running this function should create a folder called plugins in the folder provided to Reactant.with_profiler which will contain the trace files. The traces can then be visualized in different ways.
Note
For more insights about the current state of Reactant, it is possible to fetch device information about allocations using the Reactant.XLA.allocatorstats function.
Perfetto UI

The first and easiest way to visualize a captured trace is to use the online perfetto.dev tool. Reactant.with_profiler has a keyword parameter called create_perfetto_link which will create a usable perfetto URL for the generated trace. The function will block execution until the URL has been clicked and the trace is visualized. The URL only works once.
Reactant.with_profiler("./"; create_perfetto_link=true) do
mylinear = Reactant.@compile linear(x, W, b)
mylinear(x, W, b)
endNote
It is recommended to use the Chrome browser to open the perfetto URL.
XProf
XProf is a complete web UI to analyze the log files captured by Reactant. It can be installed in the following manner:
pip install xprof # or xprof-nightlyLaunching xprof is then as simple as:
xprof --logdir=./which will then make the xprof interface available on port :8791 by default.
Tensorboard

Another option to visualize the generated trace files is to use the tensorboard profiler plugin. The tensorboard viewer can offer more details than the timeline view such as visualization for compute graphs.
First install tensorboard and its profiler plugin:
pip install tensorboard tensorboard-plugin-profileAnd then run the following in the folder where the plugins folder was generated:
tensorboard --logdir ./Adding Custom Annotations
By default, the traces contain only information captured from within XLA. The Reactant.Profiler.annotate function can be used to annotate traces for Julia code evaluated during tracing.
Reactant.Profiler.annotate("my_annotation") do
# Do things...
endThe added annotations will be captured in the traces and can be seen in the different viewers along with the default XLA annotations. When the profiler is not activated, then the custom annotations have no effect and can therefore always be activated.