[Castor-users] Multi-thread Run Problem

tmerlin Thibaut.Merlin at univ-brest.fr
Fri Jun 19 20:55:24 CEST 2020


Hi Xinjie,

Thanks for your data. I could reproduce your error and it seems it is 
linked to the Joseph projector. In particular, there is a hard-coded 
tolerance factor used to compute the planes which are crossed by the 
lines. Decreasing that factor seems to remove the crash in my tests.

So a quick fix may be to reduce that threshold (you can replace the 
/src/projector/iProjectorJoseph.cc/ file by the one in attached, then 
recompile castor).

I am not sure yet whether the crash is limited to that factor. Let me 
know if it fixes the error for you.

Best,
Thibaut

On 14/06/2020 02:01, Xinjie Cao wrote:
> Hi tmerlin,
>
> Here is my dataset, geometry and entire error log. Hope you could find 
> something!
> Brain_Double_df.Cdf 
> <https://drive.google.com/a/stonybrook.edu/file/d/1wC4X99y4ATwDWGdrh8OoHfPmeA8l_m2G/view?usp=drive_web>
> Brain_Double_df.Cdh 
> <https://drive.google.com/a/stonybrook.edu/file/d/1CFoCBMTOVrLgbP-chCu4bXVf379jWEQR/view?usp=drive_web>
>
> Best,
>
> On Tue, Jun 9, 2020 at 10:57 AM tmerlin <Thibaut.Merlin at univ-brest.fr 
> <mailto:Thibaut.Merlin at univ-brest.fr>> wrote:
>
>     Hi Xinjie,
>
>     Does the crash occur directly after you launch the the
>     reconstruction. Could you run it with -vb 3 to have a bit more
>     feedback ?
>
>     There is no reason why you couldn't change reconstruction
>     parameters between two consecutive reconstructions.
>
>     Maybe you could send us/upload your dataset & geometry file so we
>     have a closer look at the problem ?
>
>     Best regards,
>
>
>     On 08/06/2020 17:20, Xinjie Cao wrote:
>>     Hi Thibaut and Simon,
>>
>>     I have compiled two different Castor versions for multi-thread
>>     running and single core in separate directories, respectively.
>>     The multi-thread is running well now, but single core test had a
>>     problem when I move from a lower resolution configuration to a
>>     higher one like below,
>>
>>     *First recon try*:
>>     ```
>>     castor-recon -df Brain_Double_df.Cdh -opti MLEM -it 10:16 -proj
>>     joseph -conv gaussian,4.,4.5,3.5::psf -dim 100,100,16 -vox
>>     2.5,2.5,10. -oit -1  -dout Brain_Double
>>     .........
>>     iIterativeAlgorithm::StepAfterSubsetLoop() -> Save image at
>>     iteration 10
>>     vAlgorithm::IterateCPU() -> Total time spent | User: 3699 sec |
>>     CPU: 3.6990100e+03 sec
>>     sChronoManager::Display() -> Results from the profiling
>>     .........
>>       --> Custom update step 1: 00 hours 00 mins 00 secs 000 ms
>>     ```
>>     *Second recon try*:
>>     ```
>>     castor-recon -df Brain_Double_df.Cdh -opti MLEM -it 2:10 -proj
>>     joseph -conv gaussian,4.,4.5,3.5::psf -dim 200,200,16 -vox
>>     1.25,1.25,10. -oit -1  -dout Brain_Double_2
>>
>>     *** Break *** segmentation violation
>>     ===========================================================
>>     There was a crash.
>>     This is the entire stack trace of all threads:
>>     ===========================================================
>>     #0  0x00007fc5feb0641c in waitpid () from /lib64/libc.so.6
>>     #1  0x00007fc5fea83f12 in do_system () from /lib64/libc.so.6
>>     #2  0x00007fc60384e0c4 in TUnixSystem::StackTrace() () from
>>     /home/goldan/GATE/root/lib/libCore.so.6.18
>>     #3  0x00007fc6038507fc in TUnixSystem::DispatchSignals(ESignals)
>>     () from /home/goldan/GATE/root/lib/libCore.so.6.18
>>     #4  <signal handler called>
>>     #5  0x00007fc5feac6eec in free () from /lib64/libc.so.6
>>     #6  0x000000000044e9ba in
>>     oSensitivityGenerator::ComputeSensitivityFromScanner(int) ()
>>     #7  0x000000000044f425 in oSensitivityGenerator::LaunchCPU() ()
>>     #8  0x000000000042b7bc in main ()
>>     ===========================================================
>>
>>
>>     The lines below might hint at the cause of the crash.
>>     You may get help by asking at the ROOT forum
>>     http://root.cern.ch/forum
>>     Only if you are really convinced it is a bug in ROOT then please
>>     submit a
>>     report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
>>     from above as an attachment in addition to anything else
>>     that might help us fixing this issue.
>>     ===========================================================
>>     #5  0x00007fc5feac6eec in free () from /lib64/libc.so.6
>>     #6  0x000000000044e9ba in
>>     oSensitivityGenerator::ComputeSensitivityFromScanner(int) ()
>>     #7  0x000000000044f425 in oSensitivityGenerator::LaunchCPU() ()
>>     #8  0x000000000042b7bc in main ()
>>     ===========================================================
>>     ````
>>     Does that mean I cannot modify the iteration setting or any
>>     setting between two consecutive reconstructions? Or was there
>>     something wrong other than configurations? Thank you!
>>
>>     Best,
>>
>>     On Fri, May 29, 2020 at 3:07 PM tmerlin
>>     <Thibaut.Merlin at univ-brest.fr
>>     <mailto:Thibaut.Merlin at univ-brest.fr>> wrote:
>>
>>         Hi Xinjie,
>>
>>         Hard to say what's going wrong without having the data
>>         because memory errors could come from a wide number of
>>         things, but we didn't experienced this kind of issue with
>>         reconstructing large images.
>>
>>         As Simon suggested, in the case you compiled the code
>>         multiple time with the castor makefile, such errors could
>>         occur if the files generated from the previous compilation
>>         are not cleaned.
>>
>>         Best,
>>         Thibaut
>>
>>
>>         On 28/05/2020 16:35, Xinjie Cao wrote:
>>>         Hi Thibaut,
>>>
>>>         Thanks for your response! I am trying to recon a list-mode
>>>         data in root. I got this problem when I changed my recon
>>>         parameters to a higher resolution, like from 6mm voxel size
>>>         to 1.5mm voxel size.
>>>
>>>         Best,
>>>
>>>         On Thu, May 28, 2020 at 7:04 AM Thibaut Merlin
>>>         <Thibaut.Merlin at univ-brest.fr
>>>         <mailto:Thibaut.Merlin at univ-brest.fr>> wrote:
>>>
>>>             Hi Xinjie,
>>>
>>>             On which kind of dataset did you get this problem ? Did
>>>             it occur on every data you tried to reconstruct or just
>>>             some of them ?
>>>
>>>             Best,
>>>             Thibaut
>>>
>>>             Xinjie Cao <xinjie.cao at stonybrook.edu
>>>             <mailto:xinjie.cao at stonybrook.edu>> a écrit :
>>>
>>>>             Dear all,
>>>>             I am testing CASToR performance on multi-thread
>>>>             running, but it looks like using the multi-thread
>>>>             function is not very stable.
>>>>             Before applying multi-thread to recon, every job was
>>>>             good. But recon jobs always dumped with unrecognized
>>>>             problem since I used multi-thread as below:
>>>>             ```
>>>>             *** Error in `castor-recon': munmap_chunk(): invalid
>>>>             pointer: 0x0000000002259380 ***
>>>>             ```
>>>>             Did anyone ever see this problem before?
>>>>             Any answer will be highly appreciated! Thank you!
>>>>             Best,
>>>
>>>
>>>             _______________________________________________
>>>             Castor-users mailing list
>>>             Castor-users at lists.castor-project.org
>>>             <mailto:Castor-users at lists.castor-project.org>
>>>             http://lists.castor-project.org/listinfo/castor-users
>>>
>>>
>>>
>>>         -- 
>>>         *....................................................*
>>>         *Xinjie Cao*
>>>         *M.E. / Ph.D. student*
>>>         *Research Project Assistant*
>>>         *Department of Electrical and Computer Engineering & Radiology *
>>>         *Novel Medical Imaging Technologies Lab*
>>>         *Health Science Center Level 8*
>>>         *Stony Brook, NY 11794-8460 *
>>>         *Tel: +1 (631)202-9445*
>>>         you.stonybrook.edu/goldan/people/
>>>         <https://you.stonybrook.edu/goldan/people/>*
>>>         *
>>>         *email: **xinjie.cao at stonybroo*k.edu
>>>         <mailto:xinjie.cao at stonybrook.edu>
>>>
>>>         *....................................................*
>>>         It is prohibited to distribute or publish the files attached
>>>         to any other people unless you get permission from the
>>>         writer himself. All rights reserved.
>>
>>         -- 
>>         Thibaut MERLIN -- PhD
>>
>>         Docteur en Imagerie Médicale au Laboratoire de Traitement de l'Information Medicale (LaTIM - INSERM UMR 1101)
>>         Institut Brestois de recherche en Bio-Santé (IBRBS)
>>         12 Avenue Foch, 29200 Brest, FRANCE
>>         Tel: 06.75.12.24.90
>>
>>
>>
>>     -- 
>>     *....................................................*
>>     *Xinjie Cao*
>>     *M.E. / Ph.D. student*
>>     *Research Project Assistant*
>>     *Department of Electrical and Computer Engineering & Radiology *
>>     *Novel Medical Imaging Technologies Lab*
>>     *Health Science Center Level 8*
>>     *Stony Brook, NY 11794-8460 *
>>     *Tel: +1 (631)202-9445*
>>     you.stonybrook.edu/goldan/people/
>>     <https://you.stonybrook.edu/goldan/people/>*
>>     *
>>     *email: **xinjie.cao at stonybroo*k.edu
>>     <mailto:xinjie.cao at stonybrook.edu>
>>
>>     *....................................................*
>>     It is prohibited to distribute or publish the files attached to
>>     any other people unless you get permission from the writer
>>     himself. All rights reserved.
>
>     -- 
>     Thibaut MERLIN -- PhD
>
>     Docteur en Imagerie Médicale au Laboratoire de Traitement de l'Information Medicale (LaTIM - INSERM UMR 1101)
>     Institut Brestois de recherche en Bio-Santé (IBRBS)
>     12 Avenue Foch, 29200 Brest, FRANCE
>     Tel: 06.75.12.24.90
>
>
>
> -- 
> *....................................................*
> *Xinjie Cao*
> *M.E. / Ph.D. student*
> *Research Project Assistant*
> *Department of Electrical and Computer Engineering & Radiology *
> *Novel Medical Imaging Technologies Lab*
> *Health Science Center Level 8*
> *Stony Brook, NY 11794-8460 *
> *Tel: +1 (631)202-9445*
> you.stonybrook.edu/goldan/people/ 
> <https://you.stonybrook.edu/goldan/people/>*
> *
> *email: **xinjie.cao at stonybroo*k.edu <mailto:xinjie.cao at stonybrook.edu>
>
> *....................................................*
> It is prohibited to distribute or publish the files attached to any 
> other people unless you get permission from the writer himself. All 
> rights reserved.

-- 
Thibaut MERLIN -- PhD

Docteur en Imagerie Médicale au Laboratoire de Traitement de l'Information Medicale (LaTIM - INSERM UMR 1101)
Institut Brestois de recherche en Bio-Santé (IBRBS)
12 Avenue Foch, 29200 Brest, FRANCE
Tel: 06.75.12.24.90

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.castor-project.org/pipermail/castor-users/attachments/20200619/938e9887/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iProjectorJoseph.cc
Type: text/x-c++src
Size: 64283 bytes
Desc: not available
URL: <http://lists.castor-project.org/pipermail/castor-users/attachments/20200619/938e9887/attachment-0001.cc>


More information about the Castor-users mailing list