matrixOperations package#
Submodules#
matrixOperations.HIMmatrixOperations module#
Created on Tue Jun 2 21:21:02 2020
@author: marcnol
contains functions and classes needed for the analysis and plotting of HiM matrices
- class matrixOperations.HIMmatrixOperations.AnalysisHiMMatrix(run_parameters, root_folder='.')#
Bases:
object
this class is used for loading data processed by processHiMmatrix.py Main use is to produce paper quality figures of HiM matrices, 3-way interaction matrices and HiM matrix ratios
- load_data()#
loads dataset
- Returns
self.foldes2Load contains the parameters used for the processing of HiM matrices.
self.data_files dictionary containing the extensions needed to load data files
self.data dictionary containing the datasets loaded
- n_cells_loaded()#
- plot_1d_profile1dataset(ifigure, anchor, i_fig_label, yticks, xticks)#
- plot_2d_matrix_simple(ifigure, matrix, unique_barcodes, yticks, xticks, cmtitle='probability', c_min=0, c_max=1, c_m='coolwarm', fontsize=12, colorbar=False, axis_ticks=False, n_cells=0, n_datasets=0, show_title=False, fig_title='')#
- retrieve_sc_matrix()#
retrieves single cells that have the label requested
- Return type
self.sc_matrix_selected
- matrixOperations.HIMmatrixOperations.attributes_labels2cells(snd_table, results_table, label='doc')#
- matrixOperations.HIMmatrixOperations.calculate_3_way_contact_matrix(i_sc_matrix_collated, i_unique_barcodes, pixel_size, anchor, s_out, threshold=0.25, norm='nonNANs')#
- matrixOperations.HIMmatrixOperations.calculate_contact_probability_matrix(i_sc_matrix_collated, i_unique_barcodes, pixel_size, threshold=0.25, norm='n_cells', min_number_contacts=0)#
- matrixOperations.HIMmatrixOperations.calculate_ensemble_pwd_matrix(sc_matrix, pixel_size, cells_to_plot, mode='median')#
performs a KDE or median to calculate the max of the PWD distribution
- Parameters
sc_matrix (TYPE) – DESCRIPTION.
pixel_size (TYPE) – DESCRIPTION.
- Return type
matrix = 2D npy array.
- matrixOperations.HIMmatrixOperations.comp_func(mat_a, n_w)#
- matrixOperations.HIMmatrixOperations.coord_2_distances(coordinates)#
Derive distance matrix from given set of coordinates
- matrixOperations.HIMmatrixOperations.decodes_trace(single_trace)#
from a trace entry, provides Numpy array with coordinates, barcode and trace names
- Parameters
single_trace (astropy table) – astropy table for a single trace.
- Returns
list of barcodes
x, y and z coordinates as numpy arrays,
trace name as string
- matrixOperations.HIMmatrixOperations.distances_2_coordinates(distances)#
Infer coordinates from distances
- matrixOperations.HIMmatrixOperations.distribution_maximum_kernel_density_estimation(sc_matrix_collated, bin1, bin2, pixel_size, optimize_kernel_width=False, kernel_width=0.25, max_distance=4.0)#
calculates the kernel distribution and its maximum from a set of PWD distances
- Parameters
sc_matrix_collated (np array 3 dims) – SC PWD matrix.
bin1 (int) – first bin.
bin2 (int) – first bin.
pixel_size (float) – pixel size in um
optimize_kernel_width (Boolean, optional) – does kernel need optimization?. The default is False.
- Returns
float – maximum of kernel.
np array – list of PWD distances used.
np array – kernel distribution.
x_d (np array) – x grid.
- matrixOperations.HIMmatrixOperations.find_optimal_kernel_width(distance_distribution)#
- matrixOperations.HIMmatrixOperations.fuses_sc_matrix_collated_from_datasets(sc_matrix_collated, unique_barcodes, p, run_name, i_list_data)#
- matrixOperations.HIMmatrixOperations.get_barcodes_per_cell(sc_matrix_collated)#
Returns the number of barcodes that were detected in each cell of sc_matrix_collated.
- matrixOperations.HIMmatrixOperations.get_coordinates_from_pwd_matrix(matrix)#
- matrixOperations.HIMmatrixOperations.get_detection_eff_barcodes(sc_matrix_collated)#
Return the detection efficiency of all barcodes. Assumes a barcode is detected as soon as one PWD with this barcode is detected.
- matrixOperations.HIMmatrixOperations.get_multi_contact(mat, anchor, bait1, bait2, threshold)#
Input: mat : pwd matrix, including only the bins of used rts anchor : anchor bin bait1 : bait bin #1 bait2 : bait bin #2 threshold : contact threshold Output: n_contacts : number of contacts between bins anchor, bait1, and bait2 n_non_nan : number of cells where the distances anchor-bait1 and anchor-bait2 are present
- matrixOperations.HIMmatrixOperations.get_rg_from_pwd(pwd_matrix_0, min_number_pwd=4, threshold=6)#
Calculates the Rg from a 2D pairwise distance matrix while taking into account that some of the PWD might be NaN
PWDmatrix: numpy array, NxN minFracNotNaN: require a minimal fraction of PWDs to be not NaN, return NaN otherwise
for the math, see https://en.wikipedia.org/wiki/Radius_of_gyration#Molecular_applications
- matrixOperations.HIMmatrixOperations.is_notebook()#
This function detects if you are running on an ipython console or in the shell. It is used to either kill plots or leave them open.
- Returns
true if running in Jupyter or Ipython consoles within spyder. false otherwise (terminal)
- Return type
TYPE Boolean
- matrixOperations.HIMmatrixOperations.kde_fit(x, x_d, bandwidth=0.2, kernel='gaussian')#
- matrixOperations.HIMmatrixOperations.list_sc_to_keep(p, mask)#
- matrixOperations.HIMmatrixOperations.load_list(file_name)#
- matrixOperations.HIMmatrixOperations.load_sc_data(list_data, dataset_name, p)#
loads SC datasets from a dict of folders (list_data)
- Parameters
list_data (dict) – dict of folders with data that can be loaded.
dataset2Load (int, optional) – The item in the dictionary that will be loaded. The default is 3.
- Returns
sc_matrix_collated (list of np arrays n_barcodes x n_barcodes x n_cells) – Cummulative SC PWD matrix.
unique_barcodes (list of np arrays) – containing the barcode identities for each matrix.
build_pwd_matrix_collated (list of Tables) – Tables with all the data for cells and barcodes used to produce sc_matrix_collated.
- matrixOperations.HIMmatrixOperations.load_sc_data_matlab(list_data, dataset_name, p)#
- matrixOperations.HIMmatrixOperations.normalize_matrix(sc_matrix_wt)#
- matrixOperations.HIMmatrixOperations.normalize_profile(profile1, profile2, run_parameters)#
- matrixOperations.HIMmatrixOperations.plot_1d_profile2datasets(ifigure, him_data_1, him_data_2, run_parameters, anchor, i_fig_label, yticks, xticks, legend=False)#
- matrixOperations.HIMmatrixOperations.plot_distance_histograms(sc_matrix_collated, pixel_size, output_filename='test', log_name_md='log.md', mode='hist', limit_n_plots=10, kernel_width=0.25, optimize_kernel_width=False, max_distance=4.0)#
- matrixOperations.HIMmatrixOperations.plot_ensemble_3_way_contact_matrix(sc_matrix_collated, unique_barcodes, anchors, s_out, run_name, i_list_data, p, markdown_filename='tmp.md', dataset_name='')#
- matrixOperations.HIMmatrixOperations.plot_ensemble_contact_probability_matrix(sc_matrix_collated, unique_barcodes, run_name, i_list_data, p, markdown_filename='tmp.md', dataset_name='')#
- matrixOperations.HIMmatrixOperations.plot_inverse_pwd_matrix(sc_matrix_collated, unique_barcodes, run_name, i_list_data, p, markdown_filename, dataset_name='')#
- matrixOperations.HIMmatrixOperations.plot_matrix(sc_matrix_collated, unique_barcodes, pixel_size, number_rois=1, output_filename='test', log_name_md='log.md', clim=1.4, c_m='seismic', figtitle='PWD matrix', cmtitle='distance, um', n_cells=0, mode='median', inverse_matrix=False, c_min=0, cells_to_plot=None, filename_ending='_HiMmatrix.png', font_size=22)#
- matrixOperations.HIMmatrixOperations.plot_single_contact_probability_matrix(sc_matrix_collated, unique_barcodes, run_name, i_list_data, p, markdown_filename='tmp.md', dataset_name='')#
- matrixOperations.HIMmatrixOperations.plot_single_pwd_matrice(sc_matrix_collated, unique_barcodes, run_name, i_list_data, p, markdown_filename='tmp.md', dataset_name='')#
- matrixOperations.HIMmatrixOperations.retrieve_kernel_density_estimator(distance_distribution_0, x_d, optimize_kernel_width=False, kernel_width=0.25)#
Gets the kernel density function and maximum from a distribution of PWD distances
- Parameters
distance_distribution_0 (nd array) – List of PWD distances.
x_d (nd array) – x grid.
optimize_kernel_width (Boolean, optional) – whether to optimize bandwidth. The default is False.
- Returns
np array – kde distribution.
np array – Original distribution without NaNs
- matrixOperations.HIMmatrixOperations.shuffle_matrix(matrix, index)#
- matrixOperations.HIMmatrixOperations.sort_cells_by_number_pwd(him_data)#
- matrixOperations.HIMmatrixOperations.write_xyz_2_pdb(file_name, single_trace, barcode_type={})#
matrixOperations.build_matrix module#
Created on Fri Feb 11 08:40:30 2022
@author: marcnol
- This script:
- iterates over chromatin traces
calculates the pair-wise distances for each single-cell mask
- outputs are:
Table with #cell #PWD #coordinates (e.g. buildsPWDmatrix_3D_order:0_ROI:1.ecsv)
NPY array with single cell PWD single cell matrices (e.g. buildsPWDmatrix_3D_HiMscMatrix.npy)
NPY array with barcode identities (e.g. buildsPWDmatrix_3D_uniqueBarcodes.ecsv)
the files with no “3D” tag contain data analyzed using 2D localizations.
- Single-cell results are combined together to calculate:
Distribution of pairwise distance for each barcode combination
Ensemble mean pairwise distance matrix using mean of distribution
Ensemble mean pairwise distance matrix using Kernel density estimation
Ensemble Hi-M matrix using a predefined threshold
For each of these files, there is an image in PNG format saved. Images containing “3D” are for 3D other are for 2D.
- class matrixOperations.build_matrix.BuildMatrix(param, acq_params: core.parameters.AcquisitionParams, colormaps={})#
Bases:
object
- build_distance_matrix(mode='min', distance_threshold=inf)#
Builds pairwise distance matrix from a coordinates table
- Parameters
mode (string, optional) – The default is “mean”: calculates the mean distance if there are several combinations possible. “min”: calculates the minimum distance if there are several combinations possible. “last”: keeps the last distance calculated
- Returns
self.sc_matrix the single-cell PWD matrix
self.unique_barcodes list of unique barcodes
- calculate_n_matrix()#
- calculate_pwd_single_mask(x, y, z)#
- Calculates PWD between barcodes detected in a given mask. For this:
converts xyz pixel coordinates into nm using self.pixel_size dictionary
calculates pair-wise distance matrix in nm
converts it into pixel units using self.pixel_size[‘x’] as an isotropic pixelsize.
- Parameters
r1 (list of floats with xyz coordinates for spot 1 in microns) –
r2 (list of floats with xyz coordinates for spot 2 in microns) –
- Return type
Returns pairwise distance matrix between barcodes in microns
- initialize_parameters(acq_params: core.parameters.AcquisitionParams)#
- launch_analysis(file, distance_threshold=inf)#
run analysis for a chromatin trace table.
- Return type
None.
- plots_all_matrices(file)#
Plots all matrices after analysis
- Parameters
file (str) – trace file name used for get output filenames.
- Return type
None.
- run(data_path, matrix_params)#
- save_matrices(file)#
- class matrixOperations.build_matrix.BuildMatrixTempo(params: core.parameters.MatrixParams)#
matrixOperations.build_traces module#
This script will build chromatin traces using a segmentObjects_barcode table.
The methods that will be implemented are:
1 = assigment by mask (either DAPI mask or other)
2 = spatial clusterization using KDtree. This method is mask-free.
Method 1 iterates over rois:
assigns barcode localizations to masks
applies local drift correction, if available
removes localizations using flux and driftTolerance
calculates the pair-wise distances for each single-cell mask
Outputs are:
Table with #cell #PWD #coordinates (e.g. buildsPWDmatrix_3D_order:0_ROI:1.ecsv)
NPY array with single cell PWD single cell matrices (e.g. buildsPWDmatrix_3D_HiMscMatrix.npy)
NPY array with barcode identities (e.g. buildsPWDmatrix_3D_uniqueBarcodes.ecsv)
the files with no “3D” tag contain data analyzed using 2D localizations.
Single-cell results are combined together to calculate:
Distribution of pairwise distance for each barcode combination
Ensemble mean pairwise distance matrix using mean of distribution
Ensemble mean pairwise distance matrix using Kernel density estimation
Ensemble Hi-M matrix using a predefined threshold
For each of these files, there is an image in PNG format saved. Images containing “3D” are for 3D other are for 2D.
- class matrixOperations.build_traces.BuildTraces(param, acq_params: core.parameters.AcquisitionParams)#
Bases:
object
- align_by_masking(matrix_params: core.parameters.MatrixParams)#
Assigns barcodes to masks and creates <n_barcodes_in_mask> And by filling in the “Cell #” key of barcode_map_roi This routine will only select which barcodes go to each cell mask
- Returns
self.barcodes_in_mask # dictionnary with the identities of barcodes contained in each mask. – Keys: ‘maskID_1’, ‘maskID_2’, and so on
self.n_barcodes_in_mask # vector containing the number of barcodes for each mask
self.n_cells_assigned # number of cells assigned
self.n_cells_unassigned # number of cells unassigned
- assign_masks(output_filename, barcode_map, data_path, seg_params, acq_params: core.parameters.AcquisitionParams, matrix_params: core.parameters.MatrixParams)#
- Main function that:
loads and processes barcode localization files, local alignment file, and masks initializes <cell_roi> class and assigns barcode localizations to masks then constructs the single cell PWD matrix and outputs it toghether with the contact map and the N-map.
- Parameters
output_filename (string) –
self.current_param (Parameters Class) –
self.current_folder (string) –
self.pixel_size (dict, optional) –
- pixel_size = {‘x’: pixelSizeXY,
’y’: pixelSizeXY, ‘z’: pixel_size_z}
The default is 0.1 for x and y, 0.0 for z. Pixelsize in um
self.log_name_md (str, optional) – Filename of Markdown output. The default is “log.md”.
self.ndims (int, optional) – indicates whether barcodes were localized in 2 or 3D. The default is 2.
self.mask_identifier –
- Return type
None.
- build_trace_by_clustering(barcode_map, data_path, matrix_params: core.parameters.MatrixParams)#
- build_trace_by_masking(barcode_map, data_path, seg_params, matrix_params, acq_params: core.parameters.AcquisitionParams)#
- build_vector(x, y, z)#
Builds vector from coordinates
- Parameters
x (float) – x coordinates
y (float) – y coordinates
z (float) – z coordinates
- Returns
coords – vector with coordinates in nanometers.
- Return type
np array
- builds_sc_distance_table()#
iterates over all masks, calculates PWD for each mask, assigns them to sc_distance_table
- Return type
sc_distance_table
- group_localizations_by_coordinate(matrix_params: core.parameters.MatrixParams)#
Uses a KDTree to group detections by it’s coordinates, given a certain distance threshold Returns a list of lists. Each list contains the lines if the input data (segmentedObjects_3D_barcode.dat) where the detections are less than a pixel away from each other
- Parameters
coordinates (numpy array, float) – Matrix containing the xyz coordinates of barcodes.
distance_threshold (float, defaul 1.0) – Distance threshold in pixels used to detect neighboring barcodes.
- Returns
group_list – list of lists containing the coordinates of barcodes associated together.
- Return type
list
- initialize_lists()#
- initialize_parameters(acq_params: core.parameters.AcquisitionParams)#
- initializes_masks(masks)#
- launch_analysis(file, data_path, seg_params, matrix_params: core.parameters.MatrixParams, acq_params: core.parameters.AcquisitionParams)#
- load_mask(files_in_folder, data_path, seg_params, acq_params: core.parameters.AcquisitionParams, matrix_params: core.parameters.MatrixParams)#
searches and loads mask files for building chromatin trace
- Parameters
files_in_folder (list of str) – list of TIF files to be explored.
- Returns
True: mask found and loaded False: failed to find mask file
- Return type
bool
- run(data_path, seg_params, matrix_params, acq_params: core.parameters.AcquisitionParams)#
Function that assigns barcode localizations to masks and constructs single cell cummulative PWD matrix.
- Parameters
current_param (class) – Parameters
current_log (class) – logging class.
- Return type
None.
- class matrixOperations.build_traces.BuildTracesTempo(params: core.parameters.MatrixParams)#
- matrixOperations.build_traces.binarize_coordinate(x)#
- matrixOperations.build_traces.debug_mask_filename(files_in_folder, full_filename_masks, mask_identifier, n_roi, label='')#
matrixOperations.chromatin_trace_table module#
Created on Thu Feb 10 12:33:57 2022
@author: marcnol
trace table management class
- class matrixOperations.chromatin_trace_table.ChromatinTraceTable(xyz_unit='micron', genome_assembly='mm10')#
Bases:
object
- append(table)#
appends <table> to self.data
- Parameters
table (astropy table) – table to append to existing self.data table.
- Return type
None.
- barcode_statistics(trace_table)#
calculates the number of times a barcode is repeated in a trace for all traces in a trace table
- Parameters
trace_table (ASTROPY table) – trace table.
- Returns
collective_barcode_stats – dict with barcode identities as keys and a list of the number of times it was present in each trace treated.
- Return type
dict
- filter_repeated_barcodes(trace_file='mock')#
This function will remove the barcodes that are present more than once in a trace. All other barcodes are kept.
- Return type
updated trace table is kept in self.data
- filter_traces_by_coordinate(coor='z', coor_min=0.0, coor_max=inf)#
This function will remove the spots that are outside coordinate limits
- Parameters
coor (string, optional) – which coordinate to process (‘x’,’y’ or ‘z’). The default is ‘z’.
coor_min (float, optional) – minimum value. The default is 0..
coor_max (float, optional) – maximum value. The default is np.inf.
- Return type
updated trace table is kept in self.data
- filter_traces_by_n(minimum_number_barcodes=2)#
Removes rows in trace table with less than minimum_number_barcodes barcodes
- Parameters
trace_table (ASTROPY Table) – input trace table.
minimum_number_barcodes (TYPE, optional) – minimum number of barcodes in trace. The default is 1.
- Returns
trace_table – output trace table.
- Return type
ASTROPY Table
- initialize()#
- load(file)#
Loads chromatin trace table
- Parameters
filename_barcode_coordinates (string) – filename with chromatin trace table
- Returns
chromatin trace table (Table())
unique_barcodes (list) – lis of unique barcodes read from chromatin trace table
- plots_barcode_statistics(collective_barcode_stats, file_name='barcode_stats', kind='violin', norm=True)#
plots the collecive_bracode stats (see previous function)
- Parameters
collective_barcode_stats (dict) – dict with barcode identities as keys and a list of the number of times it was present in each trace treated.
file_name (str, optional) – output filename for saving figure. The default is ‘barcode_stats.png’.
kind (str, optional) – Options for plotting styles: ‘violin’ or ‘matrix’. The default is ‘violin’.
- Return type
None.
- plots_traces(filename_list, masks=array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]), pixel_size=None)#
This function plots 3 subplots (xy, xz, yz) with the localizations. One figure is produced per ROI.
- Parameters
filename_list (list) – filename
- remove_barcode(remove_barcode=None)#
Removes a specific barcode from a trace table
- Returns
trace_table – output trace table.
- Return type
ASTROPY Table
- remove_duplicates()#
removes duplicated (identical) barcodes
- Returns
trace_table – output trace table.
- Return type
ASTROPY Table
- save(file_name, table, comments='')#
Saves output table
- Parameters
file_name (string) – filename of table.
table (astropy Table) – Table to be written to file.
comments (list of strings, optional) – Will output as comments to the header. The default is [].
- Return type
None.
- trace_keep_label(label='')#
This function will remove traces that do not contain the word ‘label’ in the ‘label’ column
- Parameters
label (TYPE, string) – the labe to keep. The default is “”.
- Return type
None.
- trace_remove_label(label='')#
This function will remove traces that do not contain the word ‘label’ in the ‘label’ column
- Parameters
label (TYPE, string) – the labe to keep. The default is “”.
- Return type
None.
matrixOperations.filter_localizations module#
Created on Mon Feb 7 16:45:44 2022
@author: marcnol
- class matrixOperations.filter_localizations.FilterLocalizations(param)#
Bases:
object
- filter_barcode_table(barcode_map)#
iterates over rows of a barcode localization table and filters unwanted rows
- Parameters
barcode_map_roi (TYPE) – DESCRIPTION.
- Return type
None.
- filter_folder(data_path, seg_params, matrix_params: core.parameters.MatrixParams)#
Function that filters barcodes using a number of user-provided parameters
- Return type
None.
- filter_localizations_quality(barcode_map, i)#
[filters barcode localizations either by brigthness or 3D localization accuracy]
- Parameters
i (int) – index in barcode_map Table
- Returns
keep – True if the test is passed.
- Return type
Boolean
- setup_filter_values(matrix_params: core.parameters.MatrixParams)#
- Returns
self.block_size (int) – size of blocks used for blockAlignment.
self.flux_min (float) – Minimum flux to keep barcode localization
- class matrixOperations.filter_localizations.FilterLocalizationsTempo(params: core.parameters.MatrixParams)#
- matrixOperations.filter_localizations.get_file_table_new_name(file)#
matrixOperations.register_localizations module#
Created on Tue Feb 8 15:00:35 2022
@author: marcnol
This class will handle correction of barcode positions from a table of local alignments
Remember that global alignments have already been corrected.
- class matrixOperations.register_localizations.RegisterLocalizations(param, matrix_params: core.parameters.MatrixParams)#
Bases:
object
- build_local_alignment_dict()#
Builds dictionary of local corrections for each ROI, barcode cycle, and block combination
- Parameters
self.alignment_results_table (astropy Table) – alignment_results_table table
self.alignment_results_table_read (Boolean) – True when alignment_results_table table was read from disk
- Returns
exit_code (Boolean)
self.dict_error_block_masks (dict)
- load_local_alignment(local_shifts_path, reg_params: core.parameters.RegistrationParams)#
- register(data_path, local_shifts_path, seg_params, reg_params: core.parameters.RegistrationParams)#
Function that registers barcodes using a local drift correction table produced by register_local
- Return type
None.
- register_barcode_map_file(file, reg_params: core.parameters.RegistrationParams)#
- register_barcodes(barcode_map, reg_params: core.parameters.RegistrationParams)#
This function will take a barcode_map and a Table of 3D alignments to register barcode coordinates
- Return type
None.
- search_local_shift(roi, barcode, zxy_uncorrected)#
- search_local_shift_block_3d(roi, barcode, zxy_uncorrected)#
Searches for local drift for a specific barcode in a given roi. If it exists then it adds to the uncorrected coordinates
- Parameters
roi (int) – roi used
x_uncorrected (float) – x coordinate.
y_uncorrected (float) – y coordinate.
- Returns
x_corrected (float) – corrected x coordinate.
y_corrected (float) – corrected y coordinate.
- class matrixOperations.register_localizations.RegisterLocalizationsTempo(params: core.parameters.MatrixParams)#