postProcessing package#
Submodules#
postProcessing.analyze_localizations module#
Created on Mon Dec 19 2022
@author: marcnol
- This script will load a localizations table and analyze a number of properties such as:
number of detections per barcode
$ analyze_localizations.py
- Planned features:
export: localization_table_stats.csv
provide the possibility to merge several input localization files and produce joint statistics
- postProcessing.analyze_localizations.analyze_table(table, localization_file, barcode_map, unique_barcodes)#
Launcher function that will perform different kinds of trace analyses
- Parameters
trace (ChromatinTraceTable Class) – Trace table, instance of the ChromatinTraceTable Class.
trace_file (string) – file name of trace table in ecsv format.
- Return type
None.
- postProcessing.analyze_localizations.get_barcode_statistics(barcode_map, output_filename='localization_analysis.png')#
- Function that calculates the
histogram of number of localizations per barcode
- Parameters
trace (TYPE) – Trace table in ASTROPY Table format.
output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.
- Return type
None.
- postProcessing.analyze_localizations.get_number_localization_per_barcode(barcode_map, output_filename='localization_analysis.png')#
- Function that calculates the
number of localizations per barcode
- Parameters
trace (TYPE) – Trace table in ASTROPY Table format.
output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.
- Return type
None.
- postProcessing.analyze_localizations.main()#
- postProcessing.analyze_localizations.parseArguments()#
- postProcessing.analyze_localizations.process_localizations(folder, localization_files=[])#
Processes list of trace files and sends each to get analyzed individually
- Parameters
folder (TYPE) – DESCRIPTION.
trace_files (TYPE, optional) – DESCRIPTION. The default is list().
- Return type
None.
postProcessing.compare_PWD_matrices module#
postProcessing.mask_cellpose module#
postProcessing.mask_manual module#
Created on Wed Oct 11 17:47:13 2023
@author: marcnol
- postProcessing.mask_manual.creates_user_mask(file_name, label)#
- postProcessing.mask_manual.find_label_in_path(mask_3d_path)#
- postProcessing.mask_manual.find_roi_name_in_path(mask_3d_path)#
- postProcessing.mask_manual.get_dict_shifts()#
- postProcessing.mask_manual.load_json(file_name)#
Load a JSON file like a python dict
- Parameters
file_name (str) – JSON file name
- Returns
Python dict
- Return type
dict
- postProcessing.mask_manual.load_params()#
- postProcessing.mask_manual.main()#
- postProcessing.mask_manual.shift_3d_mask(mask_3d_path)#
- postProcessing.mask_manual.show_image(data_2d, normalization='simple', size=(10, 10))#
postProcessing.npy_to_tiff module#
Created on Mon Mar 21 14:49:22 2022
@author: marcnol
converts NPY to TIFF
- postProcessing.npy_to_tiff.main()#
postProcessing.processHiMmatrix module#
Created on Wed May 6 12:36:20 2020
@author: marcnol
This script takes JSON file with folders where datasets are stored and processes multiple PWD matrices together.
$ processHiMmatrix.py -F root_folder
outputs
sc_matrix_collated: 3D npy matrix. PWD matrix for single cells. Axes:0-1 barcodes, Axis:2, cellID unique_barcodes: npy array. list of unique barcodes SClabeledCollated: npy array. binary label indicating if cell is in pattern or not. Axis:0 cellID
- postProcessing.processHiMmatrix.joinsListArrays(ListArrays, axis=0)#
- postProcessing.processHiMmatrix.main()#
- postProcessing.processHiMmatrix.parse_arguments()#
postProcessing.pwd_matrix_2_pdb module#
Created on Mon Jun 12 09:27:01 2023
@author: marcnol
from a set of coordinates it calculates the PWD matrix, and from it it gets back the coordinates.
- postProcessing.pwd_matrix_2_pdb.main()#
Main function
- postProcessing.pwd_matrix_2_pdb.matrix_2_pdb(sc_matrix, folder_path, barcode_type={}, output_file='ensemble_pwd_matrix')#
- postProcessing.pwd_matrix_2_pdb.parse_arguments()#
- postProcessing.pwd_matrix_2_pdb.remove_nans(ensemble_matrix, min_number_nans=3)#
- postProcessing.pwd_matrix_2_pdb.runtime(matrix_files=[], folder_path='./ensemble_structure', barcode_type={})#
- postProcessing.pwd_matrix_2_pdb.xyz_2_pdb(file_name, xyz, barcode_type={})#
postProcessing.trace_analyzer module#
Created on Fri Jun 17 16:07:05 2022
@author: marcnol
- This script will load a trace file and analyze a number of properties such as:
number of barcodes detected per trace
number of duplicated barcodes
trace Rg
$ trace_analyzer.py
output:
trace_stats.csv
trace_ID, number of barcodes, number of duplications, Rg,
- postProcessing.trace_analyzer.analyze_trace(trace, trace_file)#
Launcher function that will perform different kinds of trace analyses
- Parameters
trace (ChromatinTraceTable Class) – Trace table, instance of the ChromatinTraceTable Class.
trace_file (string) – file name of trace table in ecsv format.
- Return type
None.
- postProcessing.trace_analyzer.get_barcode_statistics(trace, output_filename='test_barcodes.png')#
- Function that calculates the
number of barcodes per trace
number of unique barcodes per trace
number of repeated barcodes per trace
- Parameters
trace (TYPE) – Trace table in ASTROPY Table format.
output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.
- Return type
None.
- postProcessing.trace_analyzer.get_xyz_statistics(trace, output_filename='test_coor.png')#
- Function that calculates the
distribution of localizations in x y z
- Parameters
trace (TYPE) – Trace table in ASTROPY Table format.
output_filename (TYPE, optional) – Output figure in PNG. The default is ‘test.png’.
- Return type
None.
- postProcessing.trace_analyzer.main()#
- postProcessing.trace_analyzer.parseArguments()#
- postProcessing.trace_analyzer.process_traces(trace_files=[])#
Processes list of trace files and sends each to get analyzed individually
- Parameters
folder (TYPE) – DESCRIPTION.
trace_files (TYPE, optional) – DESCRIPTION. The default is list().
- Return type
None.
postProcessing.trace_assign_mask module#
Created on Sat Feb 19 10:47:29 2022
@author: marcnol
This script will load a trace file and a number of numpy masks and assign them labels
$ trace_selector.py
outputs
ChromatinTraceTable() object and output .ecsv trace table file .
- postProcessing.trace_assign_mask.assign_masks(trace, mask_file, label='labeled', pixel_size=0.1)#
- postProcessing.trace_assign_mask.main()#
- postProcessing.trace_assign_mask.parse_arguments()#
- postProcessing.trace_assign_mask.process_traces(trace_files=[], mask_file='', label='labeled', pixel_size=0.1)#
postProcessing.trace_combinator module#
Created on Sat Feb 19 08:43:49 2022
@author: marcnol - This script takes JSON file with folders where datasets are stored. - It searches for Trace files with the expected methods, loads them, and - combines them into a single table that is outputed to the buildPWDmatrix folder.
$ trace_combinator.py
outputs
ChromatinTraceTable() object and output .ecsv formatted file with assembled trace tables.
- postProcessing.trace_combinator.appends_traces(traces, trace_files, label, action)#
- postProcessing.trace_combinator.filter_trace(trace, label, action)#
- postProcessing.trace_combinator.load_traces(folders=[], ndims=3, method='mask', label='none', action='all', trace_files=[])#
- postProcessing.trace_combinator.main()#
- postProcessing.trace_combinator.parse_arguments()#
- postProcessing.trace_combinator.run(p)#
postProcessing.trace_filter module#
Created on Sun Apr 3 09:57:36 2022
@author: marcnol
This script will load a trace file and a filter traces based on a series of user arguments
–> Usage
$ trace_filter.py –input Trace.ecsv –z_min 4 –z_max 5 –y_max 175 –output ‘zy_filtered’ –N_barcodes 3 –clean_spots
will analyze ‘Trace.ecsv’ and remove spots with 4>z>5 amd z>175 and less than 3 barcodes
–clean_spots will remove barcode spots that are repeated within a trace
–remove_barcode will remove the barcode name provided. This needs to be an integer
–> outputs
.ecsv trace table file. .png files with stats of number of spots for the same barcode per trace [only if –clean_spots was used]
- postProcessing.trace_filter.main()#
- postProcessing.trace_filter.parse_arguments()#
- postProcessing.trace_filter.runtime(trace_files=[], N_barcodes=2, coor_limits={}, tag='filtered', remove_duplicate_spots=False, remove_barcode=None, dist_max=inf, label='', keep=True)#
postProcessing.trace_filter_advanced module#
Usage:
$ trace_filter.py –input Trace.ecsv –N_barcodes 3 –fraction_missing_barcodes -0.5 –overlapping_threshold 0.03
will analyze ‘Trace.ecsv’ and remove traces with:
less than 3 barcodes
fraction of missing barcodes < 0.5
barcodes closer than 0.03 um will be merged.
outputs:
.ecsv trace table file with the ‘_filtered’ tag appended.
- class postProcessing.trace_filter_advanced.FilterTraces(data_folder, data_file, dest_folder, threshold=0, verbose=False)#
Bases:
object
- calculate_pwd_threshold(trace_id, verbose=False, save=False)#
For all the traces, calculated the pairwise distance between all the detections. From the distribution, calculate the 95% and 99% quantiles.
@param trace_id: (list) list of all the ID of the traces without duplicated barcodes @param verbose: (bool) indicate whether the distribution should be plotted @param save: (bool) indicate whether the plot should be saved instead of being displayed in a popup window @return: p95 and p99 (float) for the values of the 95% and 99% quantiles
- static clustering(trace_data, radius_min, radius_max, verbose=False)#
For each single trace, a KDTree is first calculated based on the 3d localizations. Using the lower-bound threshold, a “query-radius” is launched and the neighbors associated to each localization are found. An iterative process is launched in order to reconstruct the different clusters aggregated in the initial trace.
@param trace_data: (pandas dataframe) data associated to a single trace defined by a unique trace_ID @param radius_min: (float) lower-bound threshold for the pwd between two barcodes (seeding of the cluster) @param radius_max: (float) higher-bound threshold for the pwd between two barcodes (maximum distance allowed) @param verbose: (bool) indicate whether the plot should be displayed @return: kept_spot_id (list) contains lists of spot_ID. Each list is a trace reconstructed by the clustering algorithm out_spot_id (list) contains all the spot_ID of the isolated detections
- detect_overlapping_barcodes(trace_data, verbose=False, save=True)#
Detect barcodes that are duplicated within the same trace. If the distance between two barcodes is lower than a specific threshold d_min (overlapping_threshold), they are replaced by their average localization.
@param trace_data: (pandas dataframe) input data with all the traces & detections @param verbose: (bool) indicate whether the plot should be displayed @return: new_trace_data (pandas dataframe) after removing the duplicated barcodes
- filter_traces(verbose=False)#
All the traces are analyzed based on their ID. Using a clustering algorithm and the threshold calculated based on the pwd distribution, each trace is redefined as a list of spot_ID and kept in “updated_spot_id”. That way, traces composed of multiple duplicated barcodes can now be separated into multiple sub-traces, each represented as a single list of spot_ID. All the isolated detections (not associated to a trace) are discarded. Same for the traces presenting less than 20% of the available barcodes.
@param verbose: (bool) indicate whether the plot should be displayed @return: filtered_data (pandas dataframe) output a new dataframe with the updated traces associated to a new unique ID
- hard_filtering()#
Filtering the originally loaded traces by removing all the trace with at least one duplicated barcode.
@return: (panda dataframe) filtered traces
- static him_map_2d_to_1d(map_2d)#
Flatten a him 2d-map (either contact or distance) into a single vector. Since the map is symmetric along the first diagonal, only the first half is kept.
@param map_2d: (numpy array) 2d him map @return: (numpy array) distance_flatten as a 1-dimension vector
- open_him_traces()#
Open HiM trace file and convert it to panda dataframe.
- static reformat_dataframe(dataframe, in_spot_id, out_spot_id)#
Based on the list of spot_ID selected for the trace, reformat the dataframe by reassigning to all the new traces a unique ID. All the detections not associated to a trace are removed from the dataframe.
@param dataframe: (pandas dataframe) input data with all the traces & detections @param in_spot_id: (list) contains lists of spot_ID, each defining a single unique trace @param out_spot_id: (list) contains the spot_ID of all the discarded detections @return: new_dataframe (pandas dataframe) with the new traces and their unique ID
- static remove_duplicates(trace_data)#
For each individual trace, the duplicated barcodes are removed. If the remaining trace contains enough barcodes (above the minimal fraction p) the trace is saved, else it is discarded.
@param trace_data: (pandas dataframe) input data with all the traces & detections @param p: (float) minimum fraction of barcodes required to keep a trace @return: (pandas dataframe) output a dataframe with the updated traces
- save_individual_labels(trace, tag=None)#
helper function used to sort individual traces based on label value and save them in individual ecsv file.
@param trace: (pd dataframe) input trace data @param tag: (str) tag to add to all individual files
- save_to_astropy(trace_data, tag=None)#
save panda dataframe into astropy table
@param trace_data: (pd dataframe) contains all the traces @param tag: (str) indicate the tag to add to the filename
- static select_traces_wo_duplicates(data, N_barcodes=2)#
Analyze the trace dataframe and select only the traces with no duplicates and containing at least 2 barcodes.
@type data: (dataframe) input trace on which the analysis is performed @return: (list) list of all the trace_ID selected
- trace_statistics(save=True, tag='')#
plot the statistics for the selected traces. Two plots are displayed : 1- for each barcode, indicate the number of detected spots as well as the proportion of duplicated barcodes 2- the detection efficiency, that is the number of traces with a given proportion of detected barcodes. Again, the proportion of traces with duplicated barcodes is indicated
@param save: (bool) indicate if the figure should be saved instead of displayed @param tag: (str) string to add to the image name
- postProcessing.trace_filter_advanced.parse_arguments()#
- postProcessing.trace_filter_advanced.plot_repeated_barcodes(trace_data)#
Plot a 3d graph with all the localizations. For the repeated barcodes, the localizations are plotted with a specific legend.
@param (pandas dataframe) input data with all the traces & detections
postProcessing.trace_merge module#
Created on Thu June 15 2023
@author: marcnol
Simpler version of trace_combinator.
This just takes a list of trace files and merges them together
$ ls Trace*.ecsv | trace_merge.py
outputs
ChromatinTraceTable() object and output .ecsv formatted file with assembled trace tables.
- postProcessing.trace_merge.appends_traces(traces, trace_files)#
- postProcessing.trace_merge.load_traces(trace_files=[])#
- postProcessing.trace_merge.main()#
- postProcessing.trace_merge.parse_arguments()#
- postProcessing.trace_merge.run(p)#
postProcessing.trace_plot module#
Created on Wed Jun 7 13:39:06 2023
@author: marcnol
trace_plot
script to plot one or multiple traces in 3D
- Takes a trace file and either:
ranks traces and plots a selection
plots a user-selected trace in .ecsv (barcode, xyz) and PDF formats. The output files contain the trace name.
saves output coordinates for selected traces in pdb format so they can be loaded by other means including https://www.rcsb.org/3d-view, pymol, or nglviewer.
- future:
output PDBs for all the traces in a trace file
ls Trace_3D_barcode_KDtree_ROI:1.ecsv | trace_plot.py –pipe –selected_trace 5b1e6f89-0362-4312-a7ed-fc55ae98a0a5
>> this pipes the file ‘Trace_3D_barcode_KDtree_ROI:1.ecsv’ into trace_plot and then selects a trace for conversion.
trace_plot.py –input Trace_3D_barcode_KDtree_ROI:1.ecsv –all
>> this plots all traces in the trace file.
keys provide barcode names in the trace file, these should be attributed to 3 character codes
set grid_mode,1 color green, (name C*) color red, (name P*)
- postProcessing.trace_plot.main()#
- postProcessing.trace_plot.parse_arguments()#
- postProcessing.trace_plot.runtime(folder, N_barcodes=2, trace_files=[], selected_trace='fa9f0eb5-abcc-4730-bcc7-ba1da682d776', barcode_type={}, folder_path='./PDBs', select_traces='one')#
postProcessing.trace_to_matrix module#
Created on Thu Jun 15 08:42:12 2023
@author: marcnol
uses the core routines of pyHiM to convert a trace file to a matrix in a standalone script
- postProcessing.trace_to_matrix.main()#
- postProcessing.trace_to_matrix.parse_arguments()#
- postProcessing.trace_to_matrix.runtime(trace_files=[], colormaps={}, distance_threshold=inf)#