This page describes raw output (decoded transcript counts and morphology images) and other standard output files derived from them, which are included in the Xenium output directory for each selected region. These data reduce low level internal image sensor data, preserving details needed to assess decoded transcript quality (learn more at Overview of Xenium Algorithms).
Table of contents
- Output directory size
- Overview of the output structure
- Xenium output file descriptions
- Web summary
- Gene expression metrics
- Cell-feature matrix
- Transcript data
- Cell summary file
- Panel file
- Secondary analysis results
- Morphology images
- Cell and nucleus segmentation files
- Xenium experiment file
Output directory size
Each tissue region selected on the Xenium Analyzer produces a separate output directory with images, decoded transcripts, cell-feature count matrices, and more. For a complete table of output files, see At a Glance: Xenium Output Files.
The file formats were deliberately designed and chosen to balance compatibility, performance, and file size. There is no simple formula for calculating the output directory size from the Xenium Analyzer region area alone. Output size also depends on sample-specific factors like tissue shape, number of cells, number of decoded transcripts, and percent of high quality transcripts.
To help budget for data storage requirements, here are some examples based on estimations and 10x Genomics public datasets.
The table below shows estimated output directory sizes as a function of tissue area, assuming the sample has similar properties to a model mouse brain coronal section with the following metrics:
- 0.72 cm2 tissue area
- 11 Z-slices
- 162k cells
- 62.4M transcripts
- 0.25 cells per 100 µm2
- 107 transcripts > Q20 per 100 µm2
- 80% of transcripts > Q20
Tissue | Tissue area (cm2) | Estimated output directory size (GB) |
---|---|---|
Core needle biopsy | 0.01 | 0.2 |
Hemisphere of coronal mouse brain | 0.5 | 10 |
Full coronal mouse brain | 1 | 20 |
Tissue section covering entire sample area | 3 | 60 |
The 10x Genomics public datasets page provides additional examples of several sample configurations. For example:
Dataset | Tissue area (cm2) | Output directory size (GB) |
---|---|---|
Mouse brain tiny subset | ~0.17 | 3.5 |
Mouse brain full coronal section | 0.66 | 13.0 |
FFPE human breast, Tissue 1 | 0.90 | 24.4 |
FFPE human breast using the entire sample area, Replicate 1 | 2.28 | 51.9 |
Overview of the output structure
All run data will be stored in the output/
directory on the Xenium Analysis Computer and will be accessible on the Desktop. Refer to the Xenium Instrument User Guide (CG000584) for instructions to export run data off the instrument.
Within the output/
directory, the data from individual runs are stored as subfolders and include the user-defined run name in the folder name. Within the top-level run folder, there are subfolders for each of the user-defined regions on the Xenium slides. The overall organization of subfolders is shown below:
output
└── <yyyymmdd>__<runName>
└── output-<instrumentSN>__<slideID>__<regionName>__<yyyymmdd>__<hhmmss>
The runName
and regionName
strings are user-defined; the other components of the directory names are auto-generated. The separators between the strings in the directory name are two underscores. Spaces in runName
and regionName
will be replaced by an underscore (_
) in the output directory name.
Xenium output file descriptions
Within each of the subfolders, users can expect the following files:
Web summary
The Xenium onboard analysis pipeline outputs an interactive HTML file named analysis_summary.html
. Open it in a web browser or Xenium Explorer. It contains summary metrics and automated secondary analysis results. Any alerts issued by the pipeline are displayed at the top of the page.
There are four clickable tabs that capture different information:
- The Summary tab contains summary metrics, images, and experiment information for a quick overview of the data.
- The Decoding tab contains more specific transcript decoding metrics.
- The Cell Segmentation tab shows the metrics for cell segmentation and partitioning transcripts into single cells.
- The Analysis tab captures the results from the pipeline's secondary analysis run on single cell data.
Click the ?
at the top of each dashboard for more information about each metric.
Gene expression metrics
The Xenium onboard analysis pipeline outputs key metrics in text format as metrics_summary.csv
. This file contains metrics that are useful for assessing decoding and cell segmentation quality.
Cell-feature matrix
The Xenium onboard analysis pipeline outputs a cell-feature matrix (cell_feature_matrix
) in three file formats: the Market Exchange Format (MEX), the Hierarchical Data Format (HDF5), and the Zarr format. The matrixes only include transcripts that pass the default quality value (Q-Score) threshold of Q20.
Each matrix in the cell_feature_matrix/
folder is stored in the MEX format for sparse matrices. It also contains gzipped TSV files with feature and barcode sequences corresponding to row and column indices respectively. The cell_feature_matrix/features.tsv.gz
file contains a list of pre-designed panel genes (and any custom add-on genes), negative controls, and unassigned codewords (learn more about controls on the Algorithms page).
Column Number | Description |
---|---|
1 | Ensemble ID for panel and add-on genes |
2 | Gene name for panel and add-on genes |
3 | Feature type (Gene Expression , Negative Control Codeword , Negative Control Probe , Unassigned Codeword ). |
There are two classes of negative controls:
- Negative control codewords are codewords in the codebook that do not have any probes matching that code. They can be used to assess the specificity of the decoding algorithm.
- Negative control probes are probes that exist in the panels but target non-biological sequences. They can be used to assess the specificity of the assay.
Unassigned codewords are unused codewords. There is no probe in this particular gene panel that will generate the codeword.
The cell_feature_matrix.h5
is a HDF5 file type, which is a binary format that compresses and accesses data more efficiently than text formats such as MEX, and is useful when dealing with large datasets. H5 files are supported in both R and Python.
The cell_feature_matrix.zarr.zip
is a zipped Zarr file, which is a format for storage of chunked, compressed, N-dimensional arrays. This file can be read by Xenium Explorer.
Transcript data
The transcripts file (transcripts.csv.gz
) in gzipped CSV format contains data to evaluate transcript quality and localization. The file contains one row for each decoded transcript, with the following columns:
Column Name | Description |
---|---|
transcript_id | Unique ID of the transcript |
cell_id | Unique ID of the cell |
overlaps_nucleus | Binary value to indicate if the transcript falls within the segmented nucleus of the cell or not |
feature_name | Gene or control name |
x_location | X location in µm |
y_location | Y location in µm |
z_location | Z location in µm |
qv | Phred-scaled quality value (Q-Score) estimating the probability of incorrect call |
fov_name | Name of the field of view (FOV) |
The transcripts.parquet
is an additional transcripts file in Parquet format. It contains the same information as the transcripts.csv.gz
file but enables faster loading and reading of data.
The transcripts.zarr.zip
is a zipped Zarr format file with the same information as the transcripts.csv.gz
file. This file can be read by Xenium Explorer.
The aux_outputs/fov_locations.json
file contains the field of view (FOV) name, height, width, and XY positions. The position information is useful for determining where FOV boundaries are to assess transcript deduplication and any FOV edge effects.
Cell summary file
The cell summary file (cells.csv.gz
) in gzipped CSV format contains data to help QC the transcript counts for each identified cell. The file contains one row for each cell, with the following columns:
Column Name | Description |
---|---|
cell_id | Unique ID of the cell |
x_centroid | X location of the cell centroid in µm |
y_centroid | Y location of the cell centroid in µm |
transcript_counts | Molecule count of gene features |
control_probe_counts | Molecule count of negative control probes |
control_codeword_counts | Count of negative control codewords |
unassigned_codeword_counts | Count of unassigned codewords |
total_counts | Sum total of transcript_counts, control_probe_counts, control_codeword_counts, and unassigned_codeword_counts |
cell_area | The two-dimensional area covered by the cell in µm2 |
nucleus_area | The two-dimensional area covered by the nucleus in µm2 |
The cells.parquet
is an additional cell summary file in Parquet format. It contains the same information as cells.csv.gz
but enables faster loading and reading of data.
Panel file
The gene_panel.json
file is a copy of the input gene panel file used in the experiment saved in JSON format. For more information, refer to the Pre-designed Xenium Gene Expression Panels page.
Secondary analysis results
The Xenium onboard analysis pipeline outputs an analysis/
directory with subdirectories containing several CSV files, which store the automated secondary analysis results. A subset of these results is used to render the Analysis tab in the Web summary file. The subdirectories correspond to:
- Clustering (
clustering/
) with graph-based and K-means results. Graph-based clustering (undergraphclust
) is run once as it does not require a pre-specified number of clusters. K-means (underkmeans
) is run for K=2..N where K corresponds to the number clusters, and N=10 by default. Each value of K has its own results directory. - Differential Expression (
diffexp/
) with graph-based and K-means results. Under each of the subdirectories are thedifferential_expression.csv
files, which contain the list of cluster-specific features that are differentially expressed in each cluster relative to all the other clusters. - Principal Component Analysis (
pca/
) which contains a total of five files listing the features used in the dimension reduction i.e., to reduce the feature space. These results are used to perform clustering. - UMAP (
umap/
) contains the Uniform Manifold Approximation and Projection results.
The secondary analysis results are also saved as a zipped Zarr file (analysis.zarr.zip
), which can be read by Xenium Explorer for data visualization.
Morphology images
A series of tissue morphology images are output by the pipeline, which are nuclei-stained (DAPI) images in OME-TIFF format. These files include a pyramid of resolutions, and tiled chunks of image data, which allows for efficient interactive image visualization (JPEG-2000 compression, 16-bit grayscale, full and downsampled resolutions down to 256 x 256 pixels). All three image files can be read by Xenium Explorer.
- The
morphology.ome.tif
is a 3D Z-stack image that can be useful to resegment cells, assess segmentation quality, and view data. DAPI image processing is described here. - The
morphology_mip.ome.tif
is a 2D maximum projection intensity (MIP) image of the tissue morphology image. - The
morphology_focus.ome.tif
is a 2D autofocus projection image of the tissue morphology image.
Cell and nucleus segmentation files
Nucleus boundaries are determined by a nucleus segmentation algorithm that runs on the nuclei-stained (DAPI) morphology image. Cell boundaries are determined by expanding the nucleus boundaries or until the expanded boundary hits another cell.
The cells.zarr.zip
file in zipped Zarr format contains segmentation masks and boundaries for nuclei and cells. These segmentation masks are used for assigning transcripts to cells. The boundaries are approximations of the segmentation masks, and are provided for efficient visualization of cell segmentation in Xenium Explorer and other analysis software.
The nucleus_boundaries.csv.gz
and cell_boundaries.csv.gz
are the CSV representation of the nucleus and cell boundaries, respectively. Each row represents a vertex in the boundary polygon of one cell. The boundary points for each cell appear in clockwise order, and the first and the last points are duplicates to indicate a closed polygon. Both files contain the following columns:
Column Name | Description |
---|---|
cell_id | Unique ID of the cell |
vertex_x | X-coordinate of the boundary point in µm |
vertex_y | Y-coordinate of the boundary point in µm |
The nucleus_boundaries.parquet
and cell_boundaries.parquet
are the nucleus and cell boundaries in Parquet format. They contain the same information as the CSV files above but enable faster loading and reading of data.
Xenium experiment file
The experiment.xenium
is an experiment manifest file in JSON format that includes experiment metadata and relative file paths to other data files in the output folder needed by Xenium Explorer to visualize results.
Field | Description |
---|---|
major_version | Indicates major version of analysis output file formats read by Xenium Explorer |
minor_version | Indicates minor version of analysis output file formats read by Xenium Explorer |
run_name | User-specified Run Name entered on instrument |
run_start_time | Instrument run start time |
region_name | User-specified name for region selected on instrument |
preservation_method | User-specified sample preservation method |
num_cells | Cells detected by Xenium Onboard Analysis pipeline |
transcripts_per_cell | Median transcripts per cell calculated by Xenium Onboard Analysis pipeline |
transcripts_per_100um | Transcripts per 100 µm2 calculated by Xenium Onboard Analysis pipeline |
cassette_name | User-specified Xenium Cassette Name entered on instrument |
slide_id | User-specified Xenium Slide ID entered on instrument |
panel_design_id | Panel design ID specified by panel selection on instrument |
panel_name | Panel name specified by panel selection on instrument |
panel_organism | Sample organism specified by selected gene panel |
panel_tissue_type | User-specified tissue type selected on instrument |
panel_num_targets_predesigned | Number of gene targets from the pre-designed gene panel |
panel_num_targets_custom | Number of gene targets from custom add-on panel if included in panel design |
pixel_size | Pixel size in the morphology.ome.tif image file (in µm) |
instrument_sn | Xenium Analyzer instrument serial number |
instrument_sw_version | Version of the Xenium Analyzer firmware used during analysis run |
analysis_sw_version | Version of Xenium Onboard Analysis pipeline used to analyze data |
experiment_uuid | Instrument metadata |
cassette_uuid | Instrument metadata |
roi_uuid | Instrument metadata |
z_step_size | Z-step size (in µm) used for subsampling the morphology.ome.tif image Z-stacks |
well_uuid | Instrument metadata |
calibration_uuid | Instrument metadata |
images | Specifies the file paths to the morphology image files; used by Xenium Explorer to find input files |
xenium_explorer_files | Specifies the file paths to transcript, cell, and secondary analysis files; used by Xenium Explorer to find input files |