Onboard Analysis - Official 10x Genomics Support

This page describes raw output (decoded transcript counts and morphology images) and other standard output files derived from them, which are included in the Xenium output directory for each selected region. These data reduce low level internal image sensor data, preserving details needed to assess decoded transcript quality (learn more at Overview of Xenium Algorithms).

Table of contents

Important

See what's new in the Xenium Onboard Analysis software pipeline. Click here to read the release notes.

Output directory size

Each tissue region selected on the Xenium Analyzer produces a separate output directory with images, decoded transcripts, cell-feature count matrices, and more. For a complete table of output files, see At a Glance: Xenium Output Files.

The file formats were deliberately designed and chosen to balance compatibility, performance, and file size. There is no simple formula for calculating the output directory size from the Xenium Analyzer region area alone. Output size also depends on sample-specific factors like tissue shape, number of cells, number of decoded transcripts, and percent of high quality transcripts.

To help budget for data storage requirements, here are some examples based on estimations and 10x Genomics public datasets.

The table below shows estimated output directory sizes as a function of tissue area, assuming the sample has similar properties to a model mouse brain coronal section with the following metrics:

0.72 cm² tissue area
11 Z-slices
162k cells
62.4M transcripts
0.25 cells per 100 µm²
107 transcripts > Q20 per 100 µm²
80% of transcripts > Q20

Tissue	Tissue area (cm²)	Estimated output directory size (GB)
Core needle biopsy	0.01	0.2
Hemisphere of coronal mouse brain	0.5	10
Full coronal mouse brain	1	20
Tissue section covering entire sample area	3	60

The 10x Genomics public datasets page provides additional examples of several sample configurations. For example:

Dataset	Tissue area (cm²)	Output directory size (GB)
Mouse brain tiny subset	~0.17	3.5
Mouse brain full coronal section	0.66	13.0
FFPE human breast, Tissue 1	0.90	24.4
FFPE human breast using the entire sample area, Replicate 1	2.28	51.9

Overview of the output structure

All run data will be stored in the output/ directory on the Xenium Analysis Computer and will be accessible on the Desktop. Refer to the Xenium Instrument User Guide (CG000584) for instructions to export run data off the instrument.

Within the output/ directory, the data from individual runs are stored as subfolders and include the user-defined run name in the folder name. Within the top-level run folder, there are subfolders for each of the user-defined regions on the Xenium slides. The overall organization of subfolders is shown below:


output
└── <yyyymmdd>__<runName>
   └── output-<instrumentSN>__<slideID>__<regionName>__<yyyymmdd>__<hhmmss>

The runName and regionName strings are user-defined; the other components of the directory names are auto-generated. The separators between the strings in the directory name are two underscores. Spaces in runName and regionName will be replaced by an underscore (_) in the output directory name.

Xenium output file descriptions

Within each of the subfolders, users can expect the following files:

Web summary

The Xenium onboard analysis pipeline outputs an interactive HTML file named analysis_summary.html. Open it in a web browser or Xenium Explorer. It contains summary metrics and automated secondary analysis results. Any alerts issued by the pipeline are displayed at the top of the page.

There are four clickable tabs that capture different information:

The Summary tab contains summary metrics, images, and experiment information for a quick overview of the data.
The Decoding tab contains more specific transcript decoding metrics.
The Cell Segmentation tab shows the metrics for cell segmentation and partitioning transcripts into single cells.
The Analysis tab captures the results from the pipeline's secondary analysis run on single cell data.

Click the ? at the top of each dashboard for more information about each metric.

Gene expression metrics

The Xenium onboard analysis pipeline outputs key metrics in text format as metrics_summary.csv. This file contains metrics that are useful for assessing decoding and cell segmentation quality.

Cell-feature matrix

The Xenium onboard analysis pipeline outputs a cell-feature matrix (cell_feature_matrix) in three file formats: the Market Exchange Format (MEX), the Hierarchical Data Format (HDF5), and the Zarr format. The matrixes only include transcripts that pass the default quality value (Q-Score) threshold of Q20.

Each matrix in the cell_feature_matrix/ folder is stored in the MEX format for sparse matrices. It also contains gzipped TSV files with feature and barcode sequences corresponding to row and column indices respectively. The cell_feature_matrix/features.tsv.gz file contains a list of pre-designed panel genes (and any custom add-on genes), negative controls, and unassigned codewords (learn more about controls on the Algorithms page).

Column Number	Description
1	Ensemble ID for panel and add-on genes
2	Gene name for panel and add-on genes
3	Feature type (`Gene Expression`, `Negative Control Codeword`, `Negative Control Probe`, `Unassigned Codeword`).

There are two classes of negative controls:

Negative control codewords are codewords in the codebook that do not have any probes matching that code. They can be used to assess the specificity of the decoding algorithm.
Negative control probes are probes that exist in the panels but target non-biological sequences. They can be used to assess the specificity of the assay.

Unassigned codewords are unused codewords. There is no probe in this particular gene panel that will generate the codeword.

The cell_feature_matrix.h5 is a HDF5 file type, which is a binary format that compresses and accesses data more efficiently than text formats such as MEX, and is useful when dealing with large datasets. H5 files are supported in both R and Python.

The cell_feature_matrix.zarr.zip is a zipped Zarr file, which is a format for storage of chunked, compressed, N-dimensional arrays. This file can be read by Xenium Explorer.

Transcript data

The transcripts file (transcripts.csv.gz) in gzipped CSV format contains data to evaluate transcript quality and localization. The file contains one row for each decoded transcript, with the following columns:

Column Name	Description
transcript_id	Unique ID of the transcript
cell_id	Unique ID of the cell
overlaps_nucleus	Binary value to indicate if the transcript falls within the segmented nucleus of the cell or not
feature_name	Gene or control name
x_location	X location in µm
y_location	Y location in µm
z_location	Z location in µm
qv	Phred-scaled quality value (Q-Score) estimating the probability of incorrect call
fov_name	Name of the field of view (FOV)

The transcripts.parquet is an additional transcripts file in Parquet format. It contains the same information as the transcripts.csv.gz file but enables faster loading and reading of data.

The transcripts.zarr.zip is a zipped Zarr format file with the same information as the transcripts.csv.gz file. This file can be read by Xenium Explorer.

The aux_outputs/fov_locations.json file contains the field of view (FOV) name, height, width, and XY positions. The position information is useful for determining where FOV boundaries are to assess transcript deduplication and any FOV edge effects.

Cell summary file

The cell summary file (cells.csv.gz) in gzipped CSV format contains data to help QC the transcript counts for each identified cell. The file contains one row for each cell, with the following columns:

Column Name	Description
cell_id	Unique ID of the cell
x_centroid	X location of the cell centroid in µm
y_centroid	Y location of the cell centroid in µm
transcript_counts	Molecule count of gene features
control_probe_counts	Molecule count of negative control probes
control_codeword_counts	Count of negative control codewords
unassigned_codeword_counts	Count of unassigned codewords
total_counts	Sum total of transcript_counts, control_probe_counts, control_codeword_counts, and unassigned_codeword_counts
cell_area	The two-dimensional area covered by the cell in µm²
nucleus_area	The two-dimensional area covered by the nucleus in µm²

The cells.parquet is an additional cell summary file in Parquet format. It contains the same information as cells.csv.gz but enables faster loading and reading of data.

Panel file

The gene_panel.json file is a copy of the input gene panel file used in the experiment saved in JSON format. For more information, refer to the Pre-designed Xenium Gene Expression Panels page.

Secondary analysis results

The Xenium onboard analysis pipeline outputs an analysis/ directory with subdirectories containing several CSV files, which store the automated secondary analysis results. A subset of these results is used to render the Analysis tab in the Web summary file. The subdirectories correspond to:

Clustering (clustering/) with graph-based and K-means results. Graph-based clustering (under graphclust) is run once as it does not require a pre-specified number of clusters. K-means (under kmeans) is run for K=2..N where K corresponds to the number clusters, and N=10 by default. Each value of K has its own results directory.
Differential Expression (diffexp/) with graph-based and K-means results. Under each of the subdirectories are the differential_expression.csv files, which contain the list of cluster-specific features that are differentially expressed in each cluster relative to all the other clusters.
Principal Component Analysis (pca/) which contains a total of five files listing the features used in the dimension reduction i.e., to reduce the feature space. These results are used to perform clustering.
UMAP (umap/) contains the Uniform Manifold Approximation and Projection results.

The secondary analysis results are also saved as a zipped Zarr file (analysis.zarr.zip), which can be read by Xenium Explorer for data visualization.

Morphology images

A series of tissue morphology images are output by the pipeline, which are nuclei-stained (DAPI) images in OME-TIFF format. These files include a pyramid of resolutions, and tiled chunks of image data, which allows for efficient interactive image visualization (JPEG-2000 compression, 16-bit grayscale, full and downsampled resolutions down to 256 x 256 pixels). All three image files can be read by Xenium Explorer.

The morphology.ome.tif is a 3D Z-stack image that can be useful to resegment cells, assess segmentation quality, and view data. DAPI image processing is described here.
The morphology_mip.ome.tif is a 2D maximum projection intensity (MIP) image of the tissue morphology image.
The morphology_focus.ome.tif is a 2D autofocus projection image of the tissue morphology image.

Cell and nucleus segmentation files

Nucleus boundaries are determined by a nucleus segmentation algorithm that runs on the nuclei-stained (DAPI) morphology image. Cell boundaries are determined by expanding the nucleus boundaries or until the expanded boundary hits another cell.

The cells.zarr.zip file in zipped Zarr format contains segmentation masks and boundaries for nuclei and cells. These segmentation masks are used for assigning transcripts to cells. The boundaries are approximations of the segmentation masks, and are provided for efficient visualization of cell segmentation in Xenium Explorer and other analysis software.

The nucleus_boundaries.csv.gz and cell_boundaries.csv.gz are the CSV representation of the nucleus and cell boundaries, respectively. Each row represents a vertex in the boundary polygon of one cell. The boundary points for each cell appear in clockwise order, and the first and the last points are duplicates to indicate a closed polygon. Both files contain the following columns:

Column Name	Description
cell_id	Unique ID of the cell
vertex_x	X-coordinate of the boundary point in µm
vertex_y	Y-coordinate of the boundary point in µm

The nucleus_boundaries.parquet and cell_boundaries.parquet are the nucleus and cell boundaries in Parquet format. They contain the same information as the CSV files above but enable faster loading and reading of data.

Xenium experiment file

The experiment.xenium is an experiment manifest file in JSON format that includes experiment metadata and relative file paths to other data files in the output folder needed by Xenium Explorer to visualize results.

Field	Description
major_version	Indicates major version of analysis output file formats read by Xenium Explorer
minor_version	Indicates minor version of analysis output file formats read by Xenium Explorer
run_name	User-specified Run Name entered on instrument
run_start_time	Instrument run start time
region_name	User-specified name for region selected on instrument
preservation_method	User-specified sample preservation method
num_cells	Cells detected by Xenium Onboard Analysis pipeline
transcripts_per_cell	Median transcripts per cell calculated by Xenium Onboard Analysis pipeline
transcripts_per_100um	Transcripts per 100 µm² calculated by Xenium Onboard Analysis pipeline
cassette_name	User-specified Xenium Cassette Name entered on instrument
slide_id	User-specified Xenium Slide ID entered on instrument
panel_design_id	Panel design ID specified by panel selection on instrument
panel_name	Panel name specified by panel selection on instrument
panel_organism	Sample organism specified by selected gene panel
panel_tissue_type	User-specified tissue type selected on instrument
panel_num_targets_predesigned	Number of gene targets from the pre-designed gene panel
panel_num_targets_custom	Number of gene targets from custom add-on panel if included in panel design
pixel_size	Pixel size in the `morphology.ome.tif` image file (in µm)
instrument_sn	Xenium Analyzer instrument serial number
instrument_sw_version	Version of the Xenium Analyzer firmware used during analysis run
analysis_sw_version	Version of Xenium Onboard Analysis pipeline used to analyze data
experiment_uuid	Instrument metadata
cassette_uuid	Instrument metadata
roi_uuid	Instrument metadata
z_step_size	Z-step size (in µm) used for subsampling the `morphology.ome.tif` image Z-stacks
well_uuid	Instrument metadata
calibration_uuid	Instrument metadata
images	Specifies the file paths to the morphology image files; used by Xenium Explorer to find input files
xenium_explorer_files	Specifies the file paths to transcript, cell, and secondary analysis files; used by Xenium Explorer to find input files

Understanding Xenium Outputs