postProcessing package#

Submodules#

postProcessing.analyze_localizations module#

Created on Mon Dec 19 2022

@author: marcnol

This script will load a localizations table and analyze a number of properties such as:

number of detections per barcode

$ analyze_localizations.py

Planned features:

export: localization_table_stats.csv

provide the possibility to merge several input localization files and produce joint statistics

postProcessing.analyze_localizations.analyze_table(table, localization_file, barcode_map, unique_barcodes)#

Launcher function that will perform different kinds of trace analyses

Parameters

trace (ChromatinTraceTable Class) – Trace table, instance of the ChromatinTraceTable Class.
trace_file (string) – file name of trace table in ecsv format.

Return type

None.

postProcessing.analyze_localizations.get_barcode_statistics(barcode_map, output_filename='localization_analysis.png')#

Function that calculates the

histogram of number of localizations per barcode

Parameters

trace (TYPE) – Trace table in ASTROPY Table format.
output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.

Return type

None.

postProcessing.analyze_localizations.get_number_localization_per_barcode(barcode_map, output_filename='localization_analysis.png')#

Function that calculates the

number of localizations per barcode

Parameters

trace (TYPE) – Trace table in ASTROPY Table format.
output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.

Return type

None.

postProcessing.analyze_localizations.main()#

postProcessing.analyze_localizations.parseArguments()#

postProcessing.analyze_localizations.process_localizations(folder, localization_files=[])#

Processes list of trace files and sends each to get analyzed individually

Parameters

folder (TYPE) – DESCRIPTION.
trace_files (TYPE, optional) – DESCRIPTION. The default is list().

Return type

None.

postProcessing.compare_PWD_matrices module#

postProcessing.mask_cellpose module#

postProcessing.mask_manual module#

Created on Wed Oct 11 17:47:13 2023

@author: marcnol

postProcessing.mask_manual.creates_user_mask(file_name, label)#

postProcessing.mask_manual.find_label_in_path(mask_3d_path)#

postProcessing.mask_manual.find_roi_name_in_path(mask_3d_path)#

postProcessing.mask_manual.get_dict_shifts()#

postProcessing.mask_manual.load_json(file_name)#

Load a JSON file like a python dict

Parameters: file_name (str) – JSON file name
Returns: Python dict
Return type: dict

postProcessing.mask_manual.load_params()#

postProcessing.mask_manual.main()#

postProcessing.mask_manual.shift_3d_mask(mask_3d_path)#

postProcessing.mask_manual.show_image(data_2d, normalization='simple', size=(10, 10))#

postProcessing.npy_to_tiff module#

Created on Mon Mar 21 14:49:22 2022

@author: marcnol

converts NPY to TIFF

postProcessing.npy_to_tiff.main()#

postProcessing.processHiMmatrix module#

Created on Wed May 6 12:36:20 2020

@author: marcnol

This script takes JSON file with folders where datasets are stored and processes multiple PWD matrices together.

$ processHiMmatrix.py -F root_folder

outputs

sc_matrix_collated: 3D npy matrix. PWD matrix for single cells. Axes:0-1 barcodes, Axis:2, cellID unique_barcodes: npy array. list of unique barcodes SClabeledCollated: npy array. binary label indicating if cell is in pattern or not. Axis:0 cellID

postProcessing.processHiMmatrix.joinsListArrays(ListArrays, axis=0)#

postProcessing.processHiMmatrix.main()#

postProcessing.processHiMmatrix.parse_arguments()#

postProcessing.pwd_matrix_2_pdb module#

Created on Mon Jun 12 09:27:01 2023

@author: marcnol

from a set of coordinates it calculates the PWD matrix, and from it it gets back the coordinates.

postProcessing.pwd_matrix_2_pdb.main()#: Main function

postProcessing.pwd_matrix_2_pdb.matrix_2_pdb(sc_matrix, folder_path, barcode_type={}, output_file='ensemble_pwd_matrix')#

postProcessing.pwd_matrix_2_pdb.parse_arguments()#

postProcessing.pwd_matrix_2_pdb.remove_nans(ensemble_matrix, min_number_nans=3)#

postProcessing.pwd_matrix_2_pdb.runtime(matrix_files=[], folder_path='./ensemble_structure', barcode_type={})#

postProcessing.pwd_matrix_2_pdb.xyz_2_pdb(file_name, xyz, barcode_type={})#

postProcessing.trace_analyzer module#

Created on Fri Jun 17 16:07:05 2022

@author: marcnol

This script will load a trace file and analyze a number of properties such as:

number of barcodes detected per trace
number of duplicated barcodes
trace Rg

$ trace_analyzer.py

output:

trace_stats.csv

trace_ID, number of barcodes, number of duplications, Rg,

postProcessing.trace_analyzer.analyze_trace(trace, trace_file)#

Launcher function that will perform different kinds of trace analyses

Parameters

trace (ChromatinTraceTable Class) – Trace table, instance of the ChromatinTraceTable Class.
trace_file (string) – file name of trace table in ecsv format.

Return type

None.

postProcessing.trace_analyzer.get_barcode_statistics(trace, output_filename='test_barcodes.png')#

Function that calculates the

number of barcodes per trace
number of unique barcodes per trace
number of repeated barcodes per trace

Parameters

trace (TYPE) – Trace table in ASTROPY Table format.
output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.

Return type

None.

postProcessing.trace_analyzer.get_xyz_statistics(trace, output_filename='test_coor.png')#

Function that calculates the

distribution of localizations in x y z

Parameters

trace (TYPE) – Trace table in ASTROPY Table format.
output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.

Return type

None.

postProcessing.trace_analyzer.main()#

postProcessing.trace_analyzer.parseArguments()#

postProcessing.trace_analyzer.process_traces(trace_files=[])#

Processes list of trace files and sends each to get analyzed individually

Parameters

folder (TYPE) – DESCRIPTION.
trace_files (TYPE, optional) – DESCRIPTION. The default is list().

Return type

None.

postProcessing.trace_assign_mask module#

Created on Sat Feb 19 10:47:29 2022

@author: marcnol

This script will load a trace file and a number of numpy masks and assign them labels

$ trace_selector.py

outputs

ChromatinTraceTable() object and output .ecsv trace table file .

postProcessing.trace_assign_mask.assign_masks(trace, mask_file, label='labeled', pixel_size=0.1)#

postProcessing.trace_assign_mask.main()#

postProcessing.trace_assign_mask.parse_arguments()#

postProcessing.trace_assign_mask.process_traces(trace_files=[], mask_file='', label='labeled', pixel_size=0.1)#

postProcessing.trace_combinator module#

Created on Sat Feb 19 08:43:49 2022

@author: marcnol - This script takes JSON file with folders where datasets are stored. - It searches for Trace files with the expected methods, loads them, and - combines them into a single table that is outputed to the buildPWDmatrix folder.

$ trace_combinator.py

outputs

ChromatinTraceTable() object and output .ecsv formatted file with assembled trace tables.

postProcessing.trace_combinator.appends_traces(traces, trace_files, label, action)#

postProcessing.trace_combinator.filter_trace(trace, label, action)#

postProcessing.trace_combinator.load_traces(folders=[], ndims=3, method='mask', label='none', action='all', trace_files=[])#

postProcessing.trace_combinator.main()#

postProcessing.trace_combinator.parse_arguments()#

postProcessing.trace_combinator.run(p)#

postProcessing.trace_filter module#

Created on Sun Apr 3 09:57:36 2022

@author: marcnol

This script will load a trace file and a filter traces based on a series of user arguments

–> Usage

$ trace_filter.py –input Trace.ecsv –z_min 4 –z_max 5 –y_max 175 –output ‘zy_filtered’ –N_barcodes 3 –clean_spots

will analyze ‘Trace.ecsv’ and remove spots with 4>z>5 amd z>175 and less than 3 barcodes

–clean_spots will remove barcode spots that are repeated within a trace

–remove_barcode will remove the barcode name provided. This needs to be an integer

–> outputs

.ecsv trace table file. .png files with stats of number of spots for the same barcode per trace [only if –clean_spots was used]

postProcessing.trace_filter.main()#

postProcessing.trace_filter.parse_arguments()#

postProcessing.trace_filter.runtime(trace_files=[], N_barcodes=2, coor_limits={}, tag='filtered', remove_duplicate_spots=False, remove_barcode=None, dist_max=inf, label='', keep=True)#

postProcessing.trace_filter_advanced module#

Usage:

$ trace_filter.py –input Trace.ecsv –N_barcodes 3 –fraction_missing_barcodes -0.5 –overlapping_threshold 0.03

will analyze ‘Trace.ecsv’ and remove traces with:

less than 3 barcodes
fraction of missing barcodes < 0.5
barcodes closer than 0.03 um will be merged.

outputs:

.ecsv trace table file with the ‘_filtered’ tag appended.

class postProcessing.trace_filter_advanced.FilterTraces(data_folder, data_file, dest_folder, threshold=0, verbose=False)#

Bases: object

calculate_pwd_threshold(trace_id, verbose=False, save=False)#

For all the traces, calculated the pairwise distance between all the detections. From the distribution, calculate the 95% and 99% quantiles.

@param trace_id: (list) list of all the ID of the traces without duplicated barcodes @param verbose: (bool) indicate whether the distribution should be plotted @param save: (bool) indicate whether the plot should be saved instead of being displayed in a popup window @return: p95 and p99 (float) for the values of the 95% and 99% quantiles

static clustering(trace_data, radius_min, radius_max, verbose=False)#

For each single trace, a KDTree is first calculated based on the 3d localizations. Using the lower-bound threshold, a “query-radius” is launched and the neighbors associated to each localization are found. An iterative process is launched in order to reconstruct the different clusters aggregated in the initial trace.

@param trace_data: (pandas dataframe) data associated to a single trace defined by a unique trace_ID @param radius_min: (float) lower-bound threshold for the pwd between two barcodes (seeding of the cluster) @param radius_max: (float) higher-bound threshold for the pwd between two barcodes (maximum distance allowed) @param verbose: (bool) indicate whether the plot should be displayed @return: kept_spot_id (list) contains lists of spot_ID. Each list is a trace reconstructed by the clustering algorithm out_spot_id (list) contains all the spot_ID of the isolated detections

detect_overlapping_barcodes(trace_data, verbose=False, save=True)#

Detect barcodes that are duplicated within the same trace. If the distance between two barcodes is lower than a specific threshold d_min (overlapping_threshold), they are replaced by their average localization.

@param trace_data: (pandas dataframe) input data with all the traces & detections @param verbose: (bool) indicate whether the plot should be displayed @return: new_trace_data (pandas dataframe) after removing the duplicated barcodes

filter_traces(verbose=False)#

All the traces are analyzed based on their ID. Using a clustering algorithm and the threshold calculated based on the pwd distribution, each trace is redefined as a list of spot_ID and kept in “updated_spot_id”. That way, traces composed of multiple duplicated barcodes can now be separated into multiple sub-traces, each represented as a single list of spot_ID. All the isolated detections (not associated to a trace) are discarded. Same for the traces presenting less than 20% of the available barcodes.

@param verbose: (bool) indicate whether the plot should be displayed @return: filtered_data (pandas dataframe) output a new dataframe with the updated traces associated to a new unique ID

hard_filtering()#

Filtering the originally loaded traces by removing all the trace with at least one duplicated barcode.

@return: (panda dataframe) filtered traces

static him_map_2d_to_1d(map_2d)#

Flatten a him 2d-map (either contact or distance) into a single vector. Since the map is symmetric along the first diagonal, only the first half is kept.

@param map_2d: (numpy array) 2d him map @return: (numpy array) distance_flatten as a 1-dimension vector

open_him_traces()#: Open HiM trace file and convert it to panda dataframe.

static reformat_dataframe(dataframe, in_spot_id, out_spot_id)#

Based on the list of spot_ID selected for the trace, reformat the dataframe by reassigning to all the new traces a unique ID. All the detections not associated to a trace are removed from the dataframe.

@param dataframe: (pandas dataframe) input data with all the traces & detections @param in_spot_id: (list) contains lists of spot_ID, each defining a single unique trace @param out_spot_id: (list) contains the spot_ID of all the discarded detections @return: new_dataframe (pandas dataframe) with the new traces and their unique ID

static remove_duplicates(trace_data)#

For each individual trace, the duplicated barcodes are removed. If the remaining trace contains enough barcodes (above the minimal fraction p) the trace is saved, else it is discarded.

@param trace_data: (pandas dataframe) input data with all the traces & detections @param p: (float) minimum fraction of barcodes required to keep a trace @return: (pandas dataframe) output a dataframe with the updated traces

save_individual_labels(trace, tag=None)#

helper function used to sort individual traces based on label value and save them in individual ecsv file.

@param trace: (pd dataframe) input trace data @param tag: (str) tag to add to all individual files

save_to_astropy(trace_data, tag=None)#

save panda dataframe into astropy table

@param trace_data: (pd dataframe) contains all the traces @param tag: (str) indicate the tag to add to the filename

static select_traces_wo_duplicates(data, N_barcodes=2)#

Analyze the trace dataframe and select only the traces with no duplicates and containing at least 2 barcodes.

@type data: (dataframe) input trace on which the analysis is performed @return: (list) list of all the trace_ID selected

trace_statistics(save=True, tag='')#

plot the statistics for the selected traces. Two plots are displayed : 1- for each barcode, indicate the number of detected spots as well as the proportion of duplicated barcodes 2- the detection efficiency, that is the number of traces with a given proportion of detected barcodes. Again, the proportion of traces with duplicated barcodes is indicated

@param save: (bool) indicate if the figure should be saved instead of displayed @param tag: (str) string to add to the image name

postProcessing.trace_filter_advanced.parse_arguments()#

postProcessing.trace_filter_advanced.plot_repeated_barcodes(trace_data)#

Plot a 3d graph with all the localizations. For the repeated barcodes, the localizations are plotted with a specific legend.

@param (pandas dataframe) input data with all the traces & detections

postProcessing.trace_merge module#

Created on Thu June 15 2023

@author: marcnol

Simpler version of trace_combinator.

This just takes a list of trace files and merges them together

$ ls Trace*.ecsv | trace_merge.py

outputs

ChromatinTraceTable() object and output .ecsv formatted file with assembled trace tables.

postProcessing.trace_merge.appends_traces(traces, trace_files)#

postProcessing.trace_merge.load_traces(trace_files=[])#

postProcessing.trace_merge.main()#

postProcessing.trace_merge.parse_arguments()#

postProcessing.trace_merge.run(p)#

postProcessing.trace_plot module#

Created on Wed Jun 7 13:39:06 2023

@author: marcnol

trace_plot

script to plot one or multiple traces in 3D

Takes a trace file and either:

ranks traces and plots a selection
plots a user-selected trace in .ecsv (barcode, xyz) and PDF formats. The output files contain the trace name.
saves output coordinates for selected traces in pdb format so they can be loaded by other means including https://www.rcsb.org/3d-view, pymol, or nglviewer.

future:

output PDBs for all the traces in a trace file

ls Trace_3D_barcode_KDtree_ROI:1.ecsv | trace_plot.py –pipe –selected_trace 5b1e6f89-0362-4312-a7ed-fc55ae98a0a5

>> this pipes the file ‘Trace_3D_barcode_KDtree_ROI:1.ecsv’ into trace_plot and then selects a trace for conversion.

trace_plot.py –input Trace_3D_barcode_KDtree_ROI:1.ecsv –all

>> this plots all traces in the trace file.

keys provide barcode names in the trace file, these should be attributed to 3 character codes

set grid_mode,1 color green, (name C*) color red, (name P*)

postProcessing.trace_plot.main()#

postProcessing.trace_plot.parse_arguments()#

postProcessing.trace_plot.runtime(folder, N_barcodes=2, trace_files=[], selected_trace='fa9f0eb5-abcc-4730-bcc7-ba1da682d776', barcode_type={}, folder_path='./PDBs', select_traces='one')#

postProcessing.trace_to_matrix module#

Created on Thu Jun 15 08:42:12 2023

@author: marcnol

uses the core routines of pyHiM to convert a trace file to a matrix in a standalone script

postProcessing.trace_to_matrix.main()#

postProcessing.trace_to_matrix.parse_arguments()#

postProcessing.trace_to_matrix.runtime(trace_files=[], colormaps={}, distance_threshold=inf)#

postProcessing package

Contents

postProcessing package#

Submodules#

postProcessing.analyze_localizations module#

postProcessing.compare_PWD_matrices module#

postProcessing.mask_cellpose module#

postProcessing.mask_manual module#

postProcessing.npy_to_tiff module#

postProcessing.processHiMmatrix module#

postProcessing.pwd_matrix_2_pdb module#

postProcessing.trace_analyzer module#

postProcessing.trace_assign_mask module#

postProcessing.trace_combinator module#

postProcessing.trace_filter module#

postProcessing.trace_filter_advanced module#

postProcessing.trace_merge module#

postProcessing.trace_plot module#

postProcessing.trace_to_matrix module#

Module contents#