10x
Support
Search Support
Contact us
10x Genomics Support/In Situ Gene Expression/Documentation/Onboard Analysis/

Understanding Xenium Outputs

This page describes raw output (decoded transcript counts and morphology images) and other standard output files derived from them, which are included in the Xenium output directory for each selected region. These data reduce low level internal image sensor data, preserving details needed to assess decoded transcript quality (learn more at Overview of Xenium Algorithms).

Table of contents

Important
See what's new in the Xenium Onboard Analysis software pipeline. Click here to read the release notes.

Output directory size

Each tissue region selected on the Xenium Analyzer produces a separate output directory with images, decoded transcripts, cell-feature count matrices, and more. For a complete table of output files, see At a Glance: Xenium Output Files.

The file formats were deliberately designed and chosen to balance compatibility, performance, and file size. There is no simple formula for calculating the output directory size from the Xenium Analyzer region area alone. Output size also depends on sample-specific factors like tissue shape, number of cells, number of decoded transcripts, and percent of high quality transcripts.

To help budget for data storage requirements, here are some examples based on estimations and 10x Genomics public datasets.

The table below shows estimated output directory sizes as a function of tissue area, assuming the sample has similar properties to a model mouse brain coronal section with the following metrics:

  • 0.72 cm2 tissue area
  • 11 Z-slices
  • 162k cells
  • 62.4M transcripts
  • 0.25 cells per 100 µm2
  • 107 transcripts > Q20 per 100 µm2
  • 80% of transcripts > Q20
TissueTissue area (cm2)Estimated output directory size (GB)
Core needle biopsy0.010.2
Hemisphere of coronal mouse brain0.510
Full coronal mouse brain120
Tissue section covering entire sample area360

The 10x Genomics public datasets page provides additional examples of several sample configurations. For example:

DatasetTissue area (cm2)Output directory size (GB)
Mouse brain tiny subset~0.173.5
Mouse brain full coronal section0.6613.0
FFPE human breast, Tissue 10.9024.4
FFPE human breast using the entire sample area, Replicate 12.2851.9

Overview of the output structure

All run data will be stored in the output/ directory on the Xenium Analysis Computer and will be accessible on the Desktop. Refer to the Xenium Instrument User Guide (CG000584) for instructions to export run data off the instrument.

Within the output/ directory, the data from individual runs are stored as subfolders and include the user-defined run name in the folder name. Within the top-level run folder, there are subfolders for each of the user-defined regions on the Xenium slides. The overall organization of subfolders is shown below:

output └── <yyyymmdd>__<runName> └── output-<instrumentSN>__<slideID>__<regionName>__<yyyymmdd>__<hhmmss>

The runName and regionName strings are user-defined; the other components of the directory names are auto-generated. The separators between the strings in the directory name are two underscores. Spaces in runName and regionName will be replaced by an underscore (_) in the output directory name.

Xenium output file descriptions

Within each of the subfolders, users can expect the following files:

Web summary

The Xenium onboard analysis pipeline outputs an interactive HTML file named analysis_summary.html. Open it in a web browser or Xenium Explorer. It contains summary metrics and automated secondary analysis results. Any alerts issued by the pipeline are displayed at the top of the page.

There are four clickable tabs that capture different information:

  • The Summary tab contains summary metrics, images, and experiment information for a quick overview of the data.
  • The Decoding tab contains more specific transcript decoding metrics.
  • The Cell Segmentation tab shows the metrics for cell segmentation and partitioning transcripts into single cells.
  • The Analysis tab captures the results from the pipeline's secondary analysis run on single cell data.

Click the ? at the top of each dashboard for more information about each metric.

Gene expression metrics

The Xenium onboard analysis pipeline outputs key metrics in text format as metrics_summary.csv. This file contains metrics that are useful for assessing decoding and cell segmentation quality.

Cell-feature matrix

The Xenium onboard analysis pipeline outputs a cell-feature matrix (cell_feature_matrix) in three file formats: the Market Exchange Format (MEX), the Hierarchical Data Format (HDF5), and the Zarr format. The matrixes only include transcripts that pass the default quality value (Q-Score) threshold of Q20.

Each matrix in the cell_feature_matrix/ folder is stored in the MEX format for sparse matrices. It also contains gzipped TSV files with feature and barcode sequences corresponding to row and column indices respectively. The cell_feature_matrix/features.tsv.gz file contains a list of pre-designed panel genes (and any custom add-on genes), negative controls, and unassigned codewords (learn more about controls on the Algorithms page).

Column NumberDescription
1Ensemble ID for panel and add-on genes
2Gene name for panel and add-on genes
3Feature type (Gene Expression, Negative Control Codeword, Negative Control Probe, Unassigned Codeword).

There are two classes of negative controls:

  • Negative control codewords are codewords in the codebook that do not have any probes matching that code. They can be used to assess the specificity of the decoding algorithm.
  • Negative control probes are probes that exist in the panels but target non-biological sequences. They can be used to assess the specificity of the assay.

Unassigned codewords are unused codewords. There is no probe in this particular gene panel that will generate the codeword.

The cell_feature_matrix.h5 is a HDF5 file type, which is a binary format that compresses and accesses data more efficiently than text formats such as MEX, and is useful when dealing with large datasets. H5 files are supported in both R and Python.

The cell_feature_matrix.zarr.zip is a zipped Zarr file, which is a format for storage of chunked, compressed, N-dimensional arrays. This file can be read by Xenium Explorer.

Transcript data

The transcripts file (transcripts.csv.gz) in gzipped CSV format contains data to evaluate transcript quality and localization. The file contains one row for each decoded transcript, with the following columns:

Column NameDescription
transcript_idUnique ID of the transcript
cell_idUnique ID of the cell
overlaps_nucleusBinary value to indicate if the transcript falls within the segmented nucleus of the cell or not
feature_nameGene or control name
x_locationX location in µm
y_locationY location in µm
z_locationZ location in µm
qvPhred-scaled quality value (Q-Score) estimating the probability of incorrect call
fov_nameName of the field of view (FOV)

The transcripts.parquet is an additional transcripts file in Parquet format. It contains the same information as the transcripts.csv.gz file but enables faster loading and reading of data.

The transcripts.zarr.zip is a zipped Zarr format file with the same information as the transcripts.csv.gz file. This file can be read by Xenium Explorer.

The aux_outputs/fov_locations.json file contains the field of view (FOV) name, height, width, and XY positions. The position information is useful for determining where FOV boundaries are to assess transcript deduplication and any FOV edge effects.

Cell summary file

The cell summary file (cells.csv.gz) in gzipped CSV format contains data to help QC the transcript counts for each identified cell. The file contains one row for each cell, with the following columns:

Column NameDescription
cell_idUnique ID of the cell
x_centroidX location of the cell centroid in µm
y_centroidY location of the cell centroid in µm
transcript_countsMolecule count of gene features
control_probe_countsMolecule count of negative control probes
control_codeword_countsCount of negative control codewords
unassigned_codeword_countsCount of unassigned codewords
total_countsSum total of transcript_counts, control_probe_counts, control_codeword_counts, and unassigned_codeword_counts
cell_areaThe two-dimensional area covered by the cell in µm2
nucleus_areaThe two-dimensional area covered by the nucleus in µm2

The cells.parquet is an additional cell summary file in Parquet format. It contains the same information as cells.csv.gz but enables faster loading and reading of data.

Panel file

The gene_panel.json file is a copy of the input gene panel file used in the experiment saved in JSON format. For more information, refer to the Pre-designed Xenium Gene Expression Panels page.

Secondary analysis results

The Xenium onboard analysis pipeline outputs an analysis/ directory with subdirectories containing several CSV files, which store the automated secondary analysis results. A subset of these results is used to render the Analysis tab in the Web summary file. The subdirectories correspond to:

  • Clustering (clustering/) with graph-based and K-means results. Graph-based clustering (under graphclust) is run once as it does not require a pre-specified number of clusters. K-means (under kmeans) is run for K=2..N where K corresponds to the number clusters, and N=10 by default. Each value of K has its own results directory.
  • Differential Expression (diffexp/) with graph-based and K-means results. Under each of the subdirectories are the differential_expression.csv files, which contain the list of cluster-specific features that are differentially expressed in each cluster relative to all the other clusters.
  • Principal Component Analysis (pca/) which contains a total of five files listing the features used in the dimension reduction i.e., to reduce the feature space. These results are used to perform clustering.
  • UMAP (umap/) contains the Uniform Manifold Approximation and Projection results.

The secondary analysis results are also saved as a zipped Zarr file (analysis.zarr.zip), which can be read by Xenium Explorer for data visualization.

Morphology images

A series of tissue morphology images are output by the pipeline, which are nuclei-stained (DAPI) images in OME-TIFF format. These files include a pyramid of resolutions, and tiled chunks of image data, which allows for efficient interactive image visualization (JPEG-2000 compression, 16-bit grayscale, full and downsampled resolutions down to 256 x 256 pixels). All three image files can be read by Xenium Explorer.

  • The morphology.ome.tif is a 3D Z-stack image that can be useful to resegment cells, assess segmentation quality, and view data. DAPI image processing is described here.
  • The morphology_mip.ome.tif is a 2D maximum projection intensity (MIP) image of the tissue morphology image.
  • The morphology_focus.ome.tif is a 2D autofocus projection image of the tissue morphology image.

Cell and nucleus segmentation files

Nucleus boundaries are determined by a nucleus segmentation algorithm that runs on the nuclei-stained (DAPI) morphology image. Cell boundaries are determined by expanding the nucleus boundaries or until the expanded boundary hits another cell.

The cells.zarr.zip file in zipped Zarr format contains segmentation masks and boundaries for nuclei and cells. These segmentation masks are used for assigning transcripts to cells. The boundaries are approximations of the segmentation masks, and are provided for efficient visualization of cell segmentation in Xenium Explorer and other analysis software.

The nucleus_boundaries.csv.gz and cell_boundaries.csv.gz are the CSV representation of the nucleus and cell boundaries, respectively. Each row represents a vertex in the boundary polygon of one cell. The boundary points for each cell appear in clockwise order, and the first and the last points are duplicates to indicate a closed polygon. Both files contain the following columns:

Column NameDescription
cell_idUnique ID of the cell
vertex_xX-coordinate of the boundary point in µm
vertex_yY-coordinate of the boundary point in µm

The nucleus_boundaries.parquet and cell_boundaries.parquet are the nucleus and cell boundaries in Parquet format. They contain the same information as the CSV files above but enable faster loading and reading of data.

Xenium experiment file

The experiment.xenium is an experiment manifest file in JSON format that includes experiment metadata and relative file paths to other data files in the output folder needed by Xenium Explorer to visualize results.

FieldDescription
major_versionIndicates major version of analysis output file formats read by Xenium Explorer
minor_versionIndicates minor version of analysis output file formats read by Xenium Explorer
run_nameUser-specified Run Name entered on instrument
run_start_timeInstrument run start time
region_nameUser-specified name for region selected on instrument
preservation_methodUser-specified sample preservation method
num_cellsCells detected by Xenium Onboard Analysis pipeline
transcripts_per_cellMedian transcripts per cell calculated by Xenium Onboard Analysis pipeline
transcripts_per_100umTranscripts per 100 µm2 calculated by Xenium Onboard Analysis pipeline
cassette_nameUser-specified Xenium Cassette Name entered on instrument
slide_idUser-specified Xenium Slide ID entered on instrument
panel_design_idPanel design ID specified by panel selection on instrument
panel_namePanel name specified by panel selection on instrument
panel_organismSample organism specified by selected gene panel
panel_tissue_typeUser-specified tissue type selected on instrument
panel_num_targets_predesignedNumber of gene targets from the pre-designed gene panel
panel_num_targets_customNumber of gene targets from custom add-on panel if included in panel design
pixel_sizePixel size in the morphology.ome.tif image file (in µm)
instrument_snXenium Analyzer instrument serial number
instrument_sw_versionVersion of the Xenium Analyzer firmware used during analysis run
analysis_sw_versionVersion of Xenium Onboard Analysis pipeline used to analyze data
experiment_uuidInstrument metadata
cassette_uuidInstrument metadata
roi_uuidInstrument metadata
z_step_sizeZ-step size (in µm) used for subsampling the morphology.ome.tif image Z-stacks
well_uuidInstrument metadata
calibration_uuidInstrument metadata
imagesSpecifies the file paths to the morphology image files; used by Xenium Explorer to find input files
xenium_explorer_filesSpecifies the file paths to transcript, cell, and secondary analysis files; used by Xenium Explorer to find input files
Document Type
Software

Last Modified
March 13, 2023