postProcessing package#

Submodules#

postProcessing.analyze_localizations module#

Created on Mon Dec 19 2022

@author: marcnol

This script will load a localizations table and analyze a number of properties such as:
  • number of detections per barcode

$ analyze_localizations.py

Planned features:

export: localization_table_stats.csv

provide the possibility to merge several input localization files and produce joint statistics

postProcessing.analyze_localizations.analyze_table(table, localization_file, barcode_map, unique_barcodes)#

Launcher function that will perform different kinds of trace analyses

Parameters
  • trace (ChromatinTraceTable Class) – Trace table, instance of the ChromatinTraceTable Class.

  • trace_file (string) – file name of trace table in ecsv format.

Return type

None.

postProcessing.analyze_localizations.get_barcode_statistics(barcode_map, output_filename='localization_analysis.png')#
Function that calculates the
  • histogram of number of localizations per barcode

Parameters
  • trace (TYPE) – Trace table in ASTROPY Table format.

  • output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.

Return type

None.

postProcessing.analyze_localizations.get_number_localization_per_barcode(barcode_map, output_filename='localization_analysis.png')#
Function that calculates the
  • number of localizations per barcode

Parameters
  • trace (TYPE) – Trace table in ASTROPY Table format.

  • output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.

Return type

None.

postProcessing.analyze_localizations.main()#
postProcessing.analyze_localizations.parseArguments()#
postProcessing.analyze_localizations.process_localizations(folder, localization_files=[])#

Processes list of trace files and sends each to get analyzed individually

Parameters
  • folder (TYPE) – DESCRIPTION.

  • trace_files (TYPE, optional) – DESCRIPTION. The default is list().

Return type

None.

postProcessing.compare_PWD_matrices module#

postProcessing.mask_cellpose module#

postProcessing.mask_manual module#

Created on Wed Oct 11 17:47:13 2023

@author: marcnol

postProcessing.mask_manual.creates_user_mask(file_name, label)#
postProcessing.mask_manual.find_label_in_path(mask_3d_path)#
postProcessing.mask_manual.find_roi_name_in_path(mask_3d_path)#
postProcessing.mask_manual.get_dict_shifts()#
postProcessing.mask_manual.load_json(file_name)#

Load a JSON file like a python dict

Parameters

file_name (str) – JSON file name

Returns

Python dict

Return type

dict

postProcessing.mask_manual.load_params()#
postProcessing.mask_manual.main()#
postProcessing.mask_manual.shift_3d_mask(mask_3d_path)#
postProcessing.mask_manual.show_image(data_2d, normalization='simple', size=(10, 10))#

postProcessing.npy_to_tiff module#

Created on Mon Mar 21 14:49:22 2022

@author: marcnol

converts NPY to TIFF

postProcessing.npy_to_tiff.main()#

postProcessing.processHiMmatrix module#

Created on Wed May 6 12:36:20 2020

@author: marcnol

This script takes JSON file with folders where datasets are stored and processes multiple PWD matrices together.

$ processHiMmatrix.py -F root_folder

outputs

sc_matrix_collated: 3D npy matrix. PWD matrix for single cells. Axes:0-1 barcodes, Axis:2, cellID unique_barcodes: npy array. list of unique barcodes SClabeledCollated: npy array. binary label indicating if cell is in pattern or not. Axis:0 cellID

postProcessing.processHiMmatrix.joinsListArrays(ListArrays, axis=0)#
postProcessing.processHiMmatrix.main()#
postProcessing.processHiMmatrix.parse_arguments()#

postProcessing.pwd_matrix_2_pdb module#

Created on Mon Jun 12 09:27:01 2023

@author: marcnol

from a set of coordinates it calculates the PWD matrix, and from it it gets back the coordinates.

postProcessing.pwd_matrix_2_pdb.main()#

Main function

postProcessing.pwd_matrix_2_pdb.matrix_2_pdb(sc_matrix, folder_path, barcode_type={}, output_file='ensemble_pwd_matrix')#
postProcessing.pwd_matrix_2_pdb.parse_arguments()#
postProcessing.pwd_matrix_2_pdb.remove_nans(ensemble_matrix, min_number_nans=3)#
postProcessing.pwd_matrix_2_pdb.runtime(matrix_files=[], folder_path='./ensemble_structure', barcode_type={})#
postProcessing.pwd_matrix_2_pdb.xyz_2_pdb(file_name, xyz, barcode_type={})#

postProcessing.trace_analyzer module#

Created on Fri Jun 17 16:07:05 2022

@author: marcnol

This script will load a trace file and analyze a number of properties such as:
  • number of barcodes detected per trace

  • number of duplicated barcodes

  • trace Rg

$ trace_analyzer.py

output:

trace_stats.csv

trace_ID, number of barcodes, number of duplications, Rg,

postProcessing.trace_analyzer.analyze_trace(trace, trace_file)#

Launcher function that will perform different kinds of trace analyses

Parameters
  • trace (ChromatinTraceTable Class) – Trace table, instance of the ChromatinTraceTable Class.

  • trace_file (string) – file name of trace table in ecsv format.

Return type

None.

postProcessing.trace_analyzer.get_barcode_statistics(trace, output_filename='test_barcodes.png')#
Function that calculates the
  • number of barcodes per trace

  • number of unique barcodes per trace

  • number of repeated barcodes per trace

Parameters
  • trace (TYPE) – Trace table in ASTROPY Table format.

  • output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.

Return type

None.

postProcessing.trace_analyzer.get_xyz_statistics(trace, output_filename='test_coor.png')#
Function that calculates the
  • distribution of localizations in x y z

Parameters
  • trace (TYPE) – Trace table in ASTROPY Table format.

  • output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.

Return type

None.

postProcessing.trace_analyzer.main()#
postProcessing.trace_analyzer.parseArguments()#
postProcessing.trace_analyzer.process_traces(trace_files=[])#

Processes list of trace files and sends each to get analyzed individually

Parameters
  • folder (TYPE) – DESCRIPTION.

  • trace_files (TYPE, optional) – DESCRIPTION. The default is list().

Return type

None.

postProcessing.trace_assign_mask module#

Created on Sat Feb 19 10:47:29 2022

@author: marcnol

This script will load a trace file and a number of numpy masks and assign them labels

$ trace_selector.py

outputs

ChromatinTraceTable() object and output .ecsv trace table file .

postProcessing.trace_assign_mask.assign_masks(trace, mask_file, label='labeled', pixel_size=0.1)#
postProcessing.trace_assign_mask.main()#
postProcessing.trace_assign_mask.parse_arguments()#
postProcessing.trace_assign_mask.process_traces(trace_files=[], mask_file='', label='labeled', pixel_size=0.1)#

postProcessing.trace_combinator module#

Created on Sat Feb 19 08:43:49 2022

@author: marcnol - This script takes JSON file with folders where datasets are stored. - It searches for Trace files with the expected methods, loads them, and - combines them into a single table that is outputed to the buildPWDmatrix folder.

$ trace_combinator.py

outputs

ChromatinTraceTable() object and output .ecsv formatted file with assembled trace tables.

postProcessing.trace_combinator.appends_traces(traces, trace_files, label, action)#
postProcessing.trace_combinator.filter_trace(trace, label, action)#
postProcessing.trace_combinator.load_traces(folders=[], ndims=3, method='mask', label='none', action='all', trace_files=[])#
postProcessing.trace_combinator.main()#
postProcessing.trace_combinator.parse_arguments()#
postProcessing.trace_combinator.run(p)#

postProcessing.trace_filter module#

Created on Sun Apr 3 09:57:36 2022

@author: marcnol

This script will load a trace file and a filter traces based on a series of user arguments

–> Usage

$ trace_filter.py –input Trace.ecsv –z_min 4 –z_max 5 –y_max 175 –output ‘zy_filtered’ –N_barcodes 3 –clean_spots

will analyze ‘Trace.ecsv’ and remove spots with 4>z>5 amd z>175 and less than 3 barcodes

–clean_spots will remove barcode spots that are repeated within a trace

–remove_barcode will remove the barcode name provided. This needs to be an integer

–> outputs

.ecsv trace table file. .png files with stats of number of spots for the same barcode per trace [only if –clean_spots was used]

postProcessing.trace_filter.main()#
postProcessing.trace_filter.parse_arguments()#
postProcessing.trace_filter.runtime(trace_files=[], N_barcodes=2, coor_limits={}, tag='filtered', remove_duplicate_spots=False, remove_barcode=None, dist_max=inf, label='', keep=True)#

postProcessing.trace_filter_advanced module#

Usage:

$ trace_filter.py –input Trace.ecsv –N_barcodes 3 –fraction_missing_barcodes -0.5 –overlapping_threshold 0.03

will analyze ‘Trace.ecsv’ and remove traces with:

  • less than 3 barcodes

  • fraction of missing barcodes < 0.5

  • barcodes closer than 0.03 um will be merged.

outputs:

.ecsv trace table file with the ‘_filtered’ tag appended.

class postProcessing.trace_filter_advanced.FilterTraces(data_folder, data_file, dest_folder, threshold=0, verbose=False)#

Bases: object

calculate_pwd_threshold(trace_id, verbose=False, save=False)#

For all the traces, calculated the pairwise distance between all the detections. From the distribution, calculate the 95% and 99% quantiles.

@param trace_id: (list) list of all the ID of the traces without duplicated barcodes @param verbose: (bool) indicate whether the distribution should be plotted @param save: (bool) indicate whether the plot should be saved instead of being displayed in a popup window @return: p95 and p99 (float) for the values of the 95% and 99% quantiles

static clustering(trace_data, radius_min, radius_max, verbose=False)#

For each single trace, a KDTree is first calculated based on the 3d localizations. Using the lower-bound threshold, a “query-radius” is launched and the neighbors associated to each localization are found. An iterative process is launched in order to reconstruct the different clusters aggregated in the initial trace.

@param trace_data: (pandas dataframe) data associated to a single trace defined by a unique trace_ID @param radius_min: (float) lower-bound threshold for the pwd between two barcodes (seeding of the cluster) @param radius_max: (float) higher-bound threshold for the pwd between two barcodes (maximum distance allowed) @param verbose: (bool) indicate whether the plot should be displayed @return: kept_spot_id (list) contains lists of spot_ID. Each list is a trace reconstructed by the clustering algorithm out_spot_id (list) contains all the spot_ID of the isolated detections

detect_overlapping_barcodes(trace_data, verbose=False, save=True)#

Detect barcodes that are duplicated within the same trace. If the distance between two barcodes is lower than a specific threshold d_min (overlapping_threshold), they are replaced by their average localization.

@param trace_data: (pandas dataframe) input data with all the traces & detections @param verbose: (bool) indicate whether the plot should be displayed @return: new_trace_data (pandas dataframe) after removing the duplicated barcodes

filter_traces(verbose=False)#

All the traces are analyzed based on their ID. Using a clustering algorithm and the threshold calculated based on the pwd distribution, each trace is redefined as a list of spot_ID and kept in “updated_spot_id”. That way, traces composed of multiple duplicated barcodes can now be separated into multiple sub-traces, each represented as a single list of spot_ID. All the isolated detections (not associated to a trace) are discarded. Same for the traces presenting less than 20% of the available barcodes.

@param verbose: (bool) indicate whether the plot should be displayed @return: filtered_data (pandas dataframe) output a new dataframe with the updated traces associated to a new unique ID

hard_filtering()#

Filtering the originally loaded traces by removing all the trace with at least one duplicated barcode.

@return: (panda dataframe) filtered traces

static him_map_2d_to_1d(map_2d)#

Flatten a him 2d-map (either contact or distance) into a single vector. Since the map is symmetric along the first diagonal, only the first half is kept.

@param map_2d: (numpy array) 2d him map @return: (numpy array) distance_flatten as a 1-dimension vector

open_him_traces()#

Open HiM trace file and convert it to panda dataframe.

static reformat_dataframe(dataframe, in_spot_id, out_spot_id)#

Based on the list of spot_ID selected for the trace, reformat the dataframe by reassigning to all the new traces a unique ID. All the detections not associated to a trace are removed from the dataframe.

@param dataframe: (pandas dataframe) input data with all the traces & detections @param in_spot_id: (list) contains lists of spot_ID, each defining a single unique trace @param out_spot_id: (list) contains the spot_ID of all the discarded detections @return: new_dataframe (pandas dataframe) with the new traces and their unique ID

static remove_duplicates(trace_data)#

For each individual trace, the duplicated barcodes are removed. If the remaining trace contains enough barcodes (above the minimal fraction p) the trace is saved, else it is discarded.

@param trace_data: (pandas dataframe) input data with all the traces & detections @param p: (float) minimum fraction of barcodes required to keep a trace @return: (pandas dataframe) output a dataframe with the updated traces

save_individual_labels(trace, tag=None)#

helper function used to sort individual traces based on label value and save them in individual ecsv file.

@param trace: (pd dataframe) input trace data @param tag: (str) tag to add to all individual files

save_to_astropy(trace_data, tag=None)#

save panda dataframe into astropy table

@param trace_data: (pd dataframe) contains all the traces @param tag: (str) indicate the tag to add to the filename

static select_traces_wo_duplicates(data, N_barcodes=2)#

Analyze the trace dataframe and select only the traces with no duplicates and containing at least 2 barcodes.

@type data: (dataframe) input trace on which the analysis is performed @return: (list) list of all the trace_ID selected

trace_statistics(save=True, tag='')#

plot the statistics for the selected traces. Two plots are displayed : 1- for each barcode, indicate the number of detected spots as well as the proportion of duplicated barcodes 2- the detection efficiency, that is the number of traces with a given proportion of detected barcodes. Again, the proportion of traces with duplicated barcodes is indicated

@param save: (bool) indicate if the figure should be saved instead of displayed @param tag: (str) string to add to the image name

postProcessing.trace_filter_advanced.parse_arguments()#
postProcessing.trace_filter_advanced.plot_repeated_barcodes(trace_data)#

Plot a 3d graph with all the localizations. For the repeated barcodes, the localizations are plotted with a specific legend.

@param (pandas dataframe) input data with all the traces & detections

postProcessing.trace_merge module#

Created on Thu June 15 2023

@author: marcnol

Simpler version of trace_combinator.

This just takes a list of trace files and merges them together

$ ls Trace*.ecsv | trace_merge.py

outputs

ChromatinTraceTable() object and output .ecsv formatted file with assembled trace tables.

postProcessing.trace_merge.appends_traces(traces, trace_files)#
postProcessing.trace_merge.load_traces(trace_files=[])#
postProcessing.trace_merge.main()#
postProcessing.trace_merge.parse_arguments()#
postProcessing.trace_merge.run(p)#

postProcessing.trace_plot module#

Created on Wed Jun 7 13:39:06 2023

@author: marcnol

trace_plot

script to plot one or multiple traces in 3D

Takes a trace file and either:
  • ranks traces and plots a selection

  • plots a user-selected trace in .ecsv (barcode, xyz) and PDF formats. The output files contain the trace name.

  • saves output coordinates for selected traces in pdb format so they can be loaded by other means including https://www.rcsb.org/3d-view, pymol, or nglviewer.

future:
  • output PDBs for all the traces in a trace file

ls Trace_3D_barcode_KDtree_ROI:1.ecsv | trace_plot.py –pipe –selected_trace 5b1e6f89-0362-4312-a7ed-fc55ae98a0a5

>> this pipes the file ‘Trace_3D_barcode_KDtree_ROI:1.ecsv’ into trace_plot and then selects a trace for conversion.

trace_plot.py –input Trace_3D_barcode_KDtree_ROI:1.ecsv –all

>> this plots all traces in the trace file.

keys provide barcode names in the trace file, these should be attributed to 3 character codes

set grid_mode,1 color green, (name C*) color red, (name P*)

postProcessing.trace_plot.main()#
postProcessing.trace_plot.parse_arguments()#
postProcessing.trace_plot.runtime(folder, N_barcodes=2, trace_files=[], selected_trace='fa9f0eb5-abcc-4730-bcc7-ba1da682d776', barcode_type={}, folder_path='./PDBs', select_traces='one')#

postProcessing.trace_to_matrix module#

Created on Thu Jun 15 08:42:12 2023

@author: marcnol

uses the core routines of pyHiM to convert a trace file to a matrix in a standalone script

postProcessing.trace_to_matrix.main()#
postProcessing.trace_to_matrix.parse_arguments()#
postProcessing.trace_to_matrix.runtime(trace_files=[], colormaps={}, distance_threshold=inf)#

Module contents#