matrixOperations package#

Submodules#

matrixOperations.HIMmatrixOperations module#

Created on Tue Jun 2 21:21:02 2020

@author: marcnol

contains functions and classes needed for the analysis and plotting of HiM matrices

class matrixOperations.HIMmatrixOperations.AnalysisHiMMatrix(run_parameters, root_folder='.')#

Bases: object

this class is used for loading data processed by processHiMmatrix.py Main use is to produce paper quality figures of HiM matrices, 3-way interaction matrices and HiM matrix ratios

load_data()#

loads dataset

Returns

  • self.foldes2Load contains the parameters used for the processing of HiM matrices.

  • self.data_files dictionary containing the extensions needed to load data files

  • self.data dictionary containing the datasets loaded

n_cells_loaded()#
plot_1d_profile1dataset(ifigure, anchor, i_fig_label, yticks, xticks)#
plot_2d_matrix_simple(ifigure, matrix, unique_barcodes, yticks, xticks, cmtitle='probability', c_min=0, c_max=1, c_m='coolwarm', fontsize=12, colorbar=False, axis_ticks=False, n_cells=0, n_datasets=0, show_title=False, fig_title='')#
retrieve_sc_matrix()#

retrieves single cells that have the label requested

Return type

self.sc_matrix_selected

matrixOperations.HIMmatrixOperations.attributes_labels2cells(snd_table, results_table, label='doc')#
matrixOperations.HIMmatrixOperations.calculate_3_way_contact_matrix(i_sc_matrix_collated, i_unique_barcodes, pixel_size, anchor, s_out, threshold=0.25, norm='nonNANs')#
matrixOperations.HIMmatrixOperations.calculate_contact_probability_matrix(i_sc_matrix_collated, i_unique_barcodes, pixel_size, threshold=0.25, norm='n_cells', min_number_contacts=0)#
matrixOperations.HIMmatrixOperations.calculate_ensemble_pwd_matrix(sc_matrix, pixel_size, cells_to_plot, mode='median')#

performs a KDE or median to calculate the max of the PWD distribution

Parameters
  • sc_matrix (TYPE) – DESCRIPTION.

  • pixel_size (TYPE) – DESCRIPTION.

Return type

matrix = 2D npy array.

matrixOperations.HIMmatrixOperations.comp_func(mat_a, n_w)#
matrixOperations.HIMmatrixOperations.coord_2_distances(coordinates)#

Derive distance matrix from given set of coordinates

matrixOperations.HIMmatrixOperations.decodes_trace(single_trace)#

from a trace entry, provides Numpy array with coordinates, barcode and trace names

Parameters

single_trace (astropy table) – astropy table for a single trace.

Returns

  • list of barcodes

  • x, y and z coordinates as numpy arrays,

  • trace name as string

matrixOperations.HIMmatrixOperations.distances_2_coordinates(distances)#

Infer coordinates from distances

matrixOperations.HIMmatrixOperations.distribution_maximum_kernel_density_estimation(sc_matrix_collated, bin1, bin2, pixel_size, optimize_kernel_width=False, kernel_width=0.25, max_distance=4.0)#

calculates the kernel distribution and its maximum from a set of PWD distances

Parameters
  • sc_matrix_collated (np array 3 dims) – SC PWD matrix.

  • bin1 (int) – first bin.

  • bin2 (int) – first bin.

  • pixel_size (float) – pixel size in um

  • optimize_kernel_width (Boolean, optional) – does kernel need optimization?. The default is False.

Returns

  • float – maximum of kernel.

  • np array – list of PWD distances used.

  • np array – kernel distribution.

  • x_d (np array) – x grid.

matrixOperations.HIMmatrixOperations.find_optimal_kernel_width(distance_distribution)#
matrixOperations.HIMmatrixOperations.fuses_sc_matrix_collated_from_datasets(sc_matrix_collated, unique_barcodes, p, run_name, i_list_data)#
matrixOperations.HIMmatrixOperations.get_barcodes_per_cell(sc_matrix_collated)#

Returns the number of barcodes that were detected in each cell of sc_matrix_collated.

matrixOperations.HIMmatrixOperations.get_coordinates_from_pwd_matrix(matrix)#
matrixOperations.HIMmatrixOperations.get_detection_eff_barcodes(sc_matrix_collated)#

Return the detection efficiency of all barcodes. Assumes a barcode is detected as soon as one PWD with this barcode is detected.

matrixOperations.HIMmatrixOperations.get_multi_contact(mat, anchor, bait1, bait2, threshold)#

Input: mat : pwd matrix, including only the bins of used rts anchor : anchor bin bait1 : bait bin #1 bait2 : bait bin #2 threshold : contact threshold Output: n_contacts : number of contacts between bins anchor, bait1, and bait2 n_non_nan : number of cells where the distances anchor-bait1 and anchor-bait2 are present

matrixOperations.HIMmatrixOperations.get_rg_from_pwd(pwd_matrix_0, min_number_pwd=4, threshold=6)#

Calculates the Rg from a 2D pairwise distance matrix while taking into account that some of the PWD might be NaN

PWDmatrix: numpy array, NxN minFracNotNaN: require a minimal fraction of PWDs to be not NaN, return NaN otherwise

for the math, see https://en.wikipedia.org/wiki/Radius_of_gyration#Molecular_applications

matrixOperations.HIMmatrixOperations.is_notebook()#

This function detects if you are running on an ipython console or in the shell. It is used to either kill plots or leave them open.

Returns

true if running in Jupyter or Ipython consoles within spyder. false otherwise (terminal)

Return type

TYPE Boolean

matrixOperations.HIMmatrixOperations.kde_fit(x, x_d, bandwidth=0.2, kernel='gaussian')#
matrixOperations.HIMmatrixOperations.list_sc_to_keep(p, mask)#
matrixOperations.HIMmatrixOperations.load_list(file_name)#
matrixOperations.HIMmatrixOperations.load_sc_data(list_data, dataset_name, p)#

loads SC datasets from a dict of folders (list_data)

Parameters
  • list_data (dict) – dict of folders with data that can be loaded.

  • dataset2Load (int, optional) – The item in the dictionary that will be loaded. The default is 3.

Returns

  • sc_matrix_collated (list of np arrays n_barcodes x n_barcodes x n_cells) – Cummulative SC PWD matrix.

  • unique_barcodes (list of np arrays) – containing the barcode identities for each matrix.

  • build_pwd_matrix_collated (list of Tables) – Tables with all the data for cells and barcodes used to produce sc_matrix_collated.

matrixOperations.HIMmatrixOperations.load_sc_data_matlab(list_data, dataset_name, p)#
matrixOperations.HIMmatrixOperations.normalize_matrix(sc_matrix_wt)#
matrixOperations.HIMmatrixOperations.normalize_profile(profile1, profile2, run_parameters)#
matrixOperations.HIMmatrixOperations.plot_1d_profile2datasets(ifigure, him_data_1, him_data_2, run_parameters, anchor, i_fig_label, yticks, xticks, legend=False)#
matrixOperations.HIMmatrixOperations.plot_distance_histograms(sc_matrix_collated, pixel_size, output_filename='test', log_name_md='log.md', mode='hist', limit_n_plots=10, kernel_width=0.25, optimize_kernel_width=False, max_distance=4.0)#
matrixOperations.HIMmatrixOperations.plot_ensemble_3_way_contact_matrix(sc_matrix_collated, unique_barcodes, anchors, s_out, run_name, i_list_data, p, markdown_filename='tmp.md', dataset_name='')#
matrixOperations.HIMmatrixOperations.plot_ensemble_contact_probability_matrix(sc_matrix_collated, unique_barcodes, run_name, i_list_data, p, markdown_filename='tmp.md', dataset_name='')#
matrixOperations.HIMmatrixOperations.plot_inverse_pwd_matrix(sc_matrix_collated, unique_barcodes, run_name, i_list_data, p, markdown_filename, dataset_name='')#
matrixOperations.HIMmatrixOperations.plot_matrix(sc_matrix_collated, unique_barcodes, pixel_size, number_rois=1, output_filename='test', log_name_md='log.md', clim=1.4, c_m='seismic', figtitle='PWD matrix', cmtitle='distance, um', n_cells=0, mode='median', inverse_matrix=False, c_min=0, cells_to_plot=None, filename_ending='_HiMmatrix.png', font_size=22)#
matrixOperations.HIMmatrixOperations.plot_single_contact_probability_matrix(sc_matrix_collated, unique_barcodes, run_name, i_list_data, p, markdown_filename='tmp.md', dataset_name='')#
matrixOperations.HIMmatrixOperations.plot_single_pwd_matrice(sc_matrix_collated, unique_barcodes, run_name, i_list_data, p, markdown_filename='tmp.md', dataset_name='')#
matrixOperations.HIMmatrixOperations.retrieve_kernel_density_estimator(distance_distribution_0, x_d, optimize_kernel_width=False, kernel_width=0.25)#

Gets the kernel density function and maximum from a distribution of PWD distances

Parameters
  • distance_distribution_0 (nd array) – List of PWD distances.

  • x_d (nd array) – x grid.

  • optimize_kernel_width (Boolean, optional) – whether to optimize bandwidth. The default is False.

Returns

  • np array – kde distribution.

  • np array – Original distribution without NaNs

matrixOperations.HIMmatrixOperations.shuffle_matrix(matrix, index)#
matrixOperations.HIMmatrixOperations.sort_cells_by_number_pwd(him_data)#
matrixOperations.HIMmatrixOperations.write_xyz_2_pdb(file_name, single_trace, barcode_type={})#

matrixOperations.build_matrix module#

Created on Fri Feb 11 08:40:30 2022

@author: marcnol

This script:
  • iterates over chromatin traces
    • calculates the pair-wise distances for each single-cell mask

    • outputs are:
      • Table with #cell #PWD #coordinates (e.g. buildsPWDmatrix_3D_order:0_ROI:1.ecsv)

      • NPY array with single cell PWD single cell matrices (e.g. buildsPWDmatrix_3D_HiMscMatrix.npy)

      • NPY array with barcode identities (e.g. buildsPWDmatrix_3D_uniqueBarcodes.ecsv)

      • the files with no “3D” tag contain data analyzed using 2D localizations.

  • Single-cell results are combined together to calculate:
    • Distribution of pairwise distance for each barcode combination

    • Ensemble mean pairwise distance matrix using mean of distribution

    • Ensemble mean pairwise distance matrix using Kernel density estimation

    • Ensemble Hi-M matrix using a predefined threshold

    • For each of these files, there is an image in PNG format saved. Images containing “3D” are for 3D other are for 2D.

class matrixOperations.build_matrix.BuildMatrix(param, acq_params: core.parameters.AcquisitionParams, colormaps={})#

Bases: object

build_distance_matrix(mode='min', distance_threshold=inf)#

Builds pairwise distance matrix from a coordinates table

Parameters

mode (string, optional) – The default is “mean”: calculates the mean distance if there are several combinations possible. “min”: calculates the minimum distance if there are several combinations possible. “last”: keeps the last distance calculated

Returns

  • self.sc_matrix the single-cell PWD matrix

  • self.unique_barcodes list of unique barcodes

calculate_n_matrix()#
calculate_pwd_single_mask(x, y, z)#
Calculates PWD between barcodes detected in a given mask. For this:
  • converts xyz pixel coordinates into nm using self.pixel_size dictionary

  • calculates pair-wise distance matrix in nm

  • converts it into pixel units using self.pixel_size[‘x’] as an isotropic pixelsize.

Parameters
  • r1 (list of floats with xyz coordinates for spot 1 in microns) –

  • r2 (list of floats with xyz coordinates for spot 2 in microns) –

Return type

Returns pairwise distance matrix between barcodes in microns

initialize_parameters(acq_params: core.parameters.AcquisitionParams)#
launch_analysis(file, distance_threshold=inf)#

run analysis for a chromatin trace table.

Return type

None.

plots_all_matrices(file)#

Plots all matrices after analysis

Parameters

file (str) – trace file name used for get output filenames.

Return type

None.

run(data_path, matrix_params)#
save_matrices(file)#
class matrixOperations.build_matrix.BuildMatrixTempo(params: core.parameters.MatrixParams)#

Bases: imageProcessing.makeProjections.Feature

matrixOperations.build_traces module#

This script will build chromatin traces using a segmentObjects_barcode table.

The methods that will be implemented are:

  • 1 = assigment by mask (either DAPI mask or other)

  • 2 = spatial clusterization using KDtree. This method is mask-free.

Method 1 iterates over rois:

  • assigns barcode localizations to masks

  • applies local drift correction, if available

  • removes localizations using flux and driftTolerance

  • calculates the pair-wise distances for each single-cell mask

Outputs are:

  • Table with #cell #PWD #coordinates (e.g. buildsPWDmatrix_3D_order:0_ROI:1.ecsv)

  • NPY array with single cell PWD single cell matrices (e.g. buildsPWDmatrix_3D_HiMscMatrix.npy)

  • NPY array with barcode identities (e.g. buildsPWDmatrix_3D_uniqueBarcodes.ecsv)

  • the files with no “3D” tag contain data analyzed using 2D localizations.

Single-cell results are combined together to calculate:

  • Distribution of pairwise distance for each barcode combination

  • Ensemble mean pairwise distance matrix using mean of distribution

  • Ensemble mean pairwise distance matrix using Kernel density estimation

  • Ensemble Hi-M matrix using a predefined threshold

  • For each of these files, there is an image in PNG format saved. Images containing “3D” are for 3D other are for 2D.

class matrixOperations.build_traces.BuildTraces(param, acq_params: core.parameters.AcquisitionParams)#

Bases: object

align_by_masking(matrix_params: core.parameters.MatrixParams)#

Assigns barcodes to masks and creates <n_barcodes_in_mask> And by filling in the “Cell #” key of barcode_map_roi This routine will only select which barcodes go to each cell mask

Returns

  • self.barcodes_in_mask # dictionnary with the identities of barcodes contained in each mask. – Keys: ‘maskID_1’, ‘maskID_2’, and so on

  • self.n_barcodes_in_mask # vector containing the number of barcodes for each mask

  • self.n_cells_assigned # number of cells assigned

  • self.n_cells_unassigned # number of cells unassigned

assign_masks(output_filename, barcode_map, data_path, seg_params, acq_params: core.parameters.AcquisitionParams, matrix_params: core.parameters.MatrixParams)#
Main function that:

loads and processes barcode localization files, local alignment file, and masks initializes <cell_roi> class and assigns barcode localizations to masks then constructs the single cell PWD matrix and outputs it toghether with the contact map and the N-map.

Parameters
  • output_filename (string) –

  • self.current_param (Parameters Class) –

  • self.current_folder (string) –

  • self.pixel_size (dict, optional) –

    pixel_size = {‘x’: pixelSizeXY,

    ’y’: pixelSizeXY, ‘z’: pixel_size_z}

    The default is 0.1 for x and y, 0.0 for z. Pixelsize in um

  • self.log_name_md (str, optional) – Filename of Markdown output. The default is “log.md”.

  • self.ndims (int, optional) – indicates whether barcodes were localized in 2 or 3D. The default is 2.

  • self.mask_identifier

Return type

None.

build_trace_by_clustering(barcode_map, data_path, matrix_params: core.parameters.MatrixParams)#
build_trace_by_masking(barcode_map, data_path, seg_params, matrix_params, acq_params: core.parameters.AcquisitionParams)#
build_vector(x, y, z)#

Builds vector from coordinates

Parameters
  • x (float) – x coordinates

  • y (float) – y coordinates

  • z (float) – z coordinates

Returns

coords – vector with coordinates in nanometers.

Return type

np array

builds_sc_distance_table()#

iterates over all masks, calculates PWD for each mask, assigns them to sc_distance_table

Return type

sc_distance_table

group_localizations_by_coordinate(matrix_params: core.parameters.MatrixParams)#

Uses a KDTree to group detections by it’s coordinates, given a certain distance threshold Returns a list of lists. Each list contains the lines if the input data (segmentedObjects_3D_barcode.dat) where the detections are less than a pixel away from each other

Parameters
  • coordinates (numpy array, float) – Matrix containing the xyz coordinates of barcodes.

  • distance_threshold (float, defaul 1.0) – Distance threshold in pixels used to detect neighboring barcodes.

Returns

group_list – list of lists containing the coordinates of barcodes associated together.

Return type

list

initialize_lists()#
initialize_parameters(acq_params: core.parameters.AcquisitionParams)#
initializes_masks(masks)#
launch_analysis(file, data_path, seg_params, matrix_params: core.parameters.MatrixParams, acq_params: core.parameters.AcquisitionParams)#
load_mask(files_in_folder, data_path, seg_params, acq_params: core.parameters.AcquisitionParams, matrix_params: core.parameters.MatrixParams)#

searches and loads mask files for building chromatin trace

Parameters

files_in_folder (list of str) – list of TIF files to be explored.

Returns

True: mask found and loaded False: failed to find mask file

Return type

bool

run(data_path, seg_params, matrix_params, acq_params: core.parameters.AcquisitionParams)#

Function that assigns barcode localizations to masks and constructs single cell cummulative PWD matrix.

Parameters
  • current_param (class) – Parameters

  • current_log (class) – logging class.

Return type

None.

class matrixOperations.build_traces.BuildTracesTempo(params: core.parameters.MatrixParams)#

Bases: imageProcessing.makeProjections.Feature

matrixOperations.build_traces.binarize_coordinate(x)#
matrixOperations.build_traces.debug_mask_filename(files_in_folder, full_filename_masks, mask_identifier, n_roi, label='')#

matrixOperations.chromatin_trace_table module#

Created on Thu Feb 10 12:33:57 2022

@author: marcnol

trace table management class

class matrixOperations.chromatin_trace_table.ChromatinTraceTable(xyz_unit='micron', genome_assembly='mm10')#

Bases: object

append(table)#

appends <table> to self.data

Parameters

table (astropy table) – table to append to existing self.data table.

Return type

None.

barcode_statistics(trace_table)#

calculates the number of times a barcode is repeated in a trace for all traces in a trace table

Parameters

trace_table (ASTROPY table) – trace table.

Returns

collective_barcode_stats – dict with barcode identities as keys and a list of the number of times it was present in each trace treated.

Return type

dict

filter_repeated_barcodes(trace_file='mock')#

This function will remove the barcodes that are present more than once in a trace. All other barcodes are kept.

Return type

updated trace table is kept in self.data

filter_traces_by_coordinate(coor='z', coor_min=0.0, coor_max=inf)#

This function will remove the spots that are outside coordinate limits

Parameters
  • coor (string, optional) – which coordinate to process (‘x’,’y’ or ‘z’). The default is ‘z’.

  • coor_min (float, optional) – minimum value. The default is 0..

  • coor_max (float, optional) – maximum value. The default is np.inf.

Return type

updated trace table is kept in self.data

filter_traces_by_n(minimum_number_barcodes=2)#

Removes rows in trace table with less than minimum_number_barcodes barcodes

Parameters
  • trace_table (ASTROPY Table) – input trace table.

  • minimum_number_barcodes (TYPE, optional) – minimum number of barcodes in trace. The default is 1.

Returns

trace_table – output trace table.

Return type

ASTROPY Table

initialize()#
load(file)#

Loads chromatin trace table

Parameters

filename_barcode_coordinates (string) – filename with chromatin trace table

Returns

  • chromatin trace table (Table())

  • unique_barcodes (list) – lis of unique barcodes read from chromatin trace table

plots_barcode_statistics(collective_barcode_stats, file_name='barcode_stats', kind='violin', norm=True)#

plots the collecive_bracode stats (see previous function)

Parameters
  • collective_barcode_stats (dict) – dict with barcode identities as keys and a list of the number of times it was present in each trace treated.

  • file_name (str, optional) – output filename for saving figure. The default is ‘barcode_stats.png’.

  • kind (str, optional) – Options for plotting styles: ‘violin’ or ‘matrix’. The default is ‘violin’.

Return type

None.

plots_traces(filename_list, masks=array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]), pixel_size=None)#

This function plots 3 subplots (xy, xz, yz) with the localizations. One figure is produced per ROI.

Parameters

filename_list (list) – filename

remove_barcode(remove_barcode=None)#

Removes a specific barcode from a trace table

Returns

trace_table – output trace table.

Return type

ASTROPY Table

remove_duplicates()#

removes duplicated (identical) barcodes

Returns

trace_table – output trace table.

Return type

ASTROPY Table

save(file_name, table, comments='')#

Saves output table

Parameters
  • file_name (string) – filename of table.

  • table (astropy Table) – Table to be written to file.

  • comments (list of strings, optional) – Will output as comments to the header. The default is [].

Return type

None.

trace_keep_label(label='')#

This function will remove traces that do not contain the word ‘label’ in the ‘label’ column

Parameters

label (TYPE, string) – the labe to keep. The default is “”.

Return type

None.

trace_remove_label(label='')#

This function will remove traces that do not contain the word ‘label’ in the ‘label’ column

Parameters

label (TYPE, string) – the labe to keep. The default is “”.

Return type

None.

matrixOperations.filter_localizations module#

Created on Mon Feb 7 16:45:44 2022

@author: marcnol

class matrixOperations.filter_localizations.FilterLocalizations(param)#

Bases: object

filter_barcode_table(barcode_map)#

iterates over rows of a barcode localization table and filters unwanted rows

Parameters

barcode_map_roi (TYPE) – DESCRIPTION.

Return type

None.

filter_folder(data_path, seg_params, matrix_params: core.parameters.MatrixParams)#

Function that filters barcodes using a number of user-provided parameters

Return type

None.

filter_localizations_quality(barcode_map, i)#

[filters barcode localizations either by brigthness or 3D localization accuracy]

Parameters

i (int) – index in barcode_map Table

Returns

keep – True if the test is passed.

Return type

Boolean

setup_filter_values(matrix_params: core.parameters.MatrixParams)#
Returns

  • self.block_size (int) – size of blocks used for blockAlignment.

  • self.flux_min (float) – Minimum flux to keep barcode localization

class matrixOperations.filter_localizations.FilterLocalizationsTempo(params: core.parameters.MatrixParams)#

Bases: imageProcessing.makeProjections.Feature

matrixOperations.filter_localizations.get_file_table_new_name(file)#

matrixOperations.register_localizations module#

Created on Tue Feb 8 15:00:35 2022

@author: marcnol

This class will handle correction of barcode positions from a table of local alignments

Remember that global alignments have already been corrected.

class matrixOperations.register_localizations.RegisterLocalizations(param, matrix_params: core.parameters.MatrixParams)#

Bases: object

build_local_alignment_dict()#

Builds dictionary of local corrections for each ROI, barcode cycle, and block combination

Parameters
  • self.alignment_results_table (astropy Table) – alignment_results_table table

  • self.alignment_results_table_read (Boolean) – True when alignment_results_table table was read from disk

Returns

  • exit_code (Boolean)

  • self.dict_error_block_masks (dict)

load_local_alignment(local_shifts_path, reg_params: core.parameters.RegistrationParams)#
register(data_path, local_shifts_path, seg_params, reg_params: core.parameters.RegistrationParams)#

Function that registers barcodes using a local drift correction table produced by register_local

Return type

None.

register_barcode_map_file(file, reg_params: core.parameters.RegistrationParams)#
register_barcodes(barcode_map, reg_params: core.parameters.RegistrationParams)#

This function will take a barcode_map and a Table of 3D alignments to register barcode coordinates

Return type

None.

search_local_shift(roi, barcode, zxy_uncorrected)#
search_local_shift_block_3d(roi, barcode, zxy_uncorrected)#

Searches for local drift for a specific barcode in a given roi. If it exists then it adds to the uncorrected coordinates

Parameters
  • roi (int) – roi used

  • x_uncorrected (float) – x coordinate.

  • y_uncorrected (float) – y coordinate.

Returns

  • x_corrected (float) – corrected x coordinate.

  • y_corrected (float) – corrected y coordinate.

class matrixOperations.register_localizations.RegisterLocalizationsTempo(params: core.parameters.MatrixParams)#

Bases: imageProcessing.makeProjections.Feature

Module contents#