[Castor-users] Multi-thread Run Problem

tmerlin Thibaut.Merlin at univ-brest.fr
Tue Jun 9 16:56:34 CEST 2020


Hi Xinjie,

Does the crash occur directly after you launch the the reconstruction. 
Could you run it with -vb 3 to have a bit more feedback ?

There is no reason why you couldn't change reconstruction parameters 
between two consecutive reconstructions.

Maybe you could send us/upload your dataset & geometry file so we have a 
closer look at the problem ?

Best regards,


On 08/06/2020 17:20, Xinjie Cao wrote:
> Hi Thibaut and Simon,
>
> I have compiled two different Castor versions for multi-thread running 
> and single core in separate directories, respectively. The 
> multi-thread is running well now, but single core test had a problem 
> when I move from a lower resolution configuration to a higher one like 
> below,
>
> *First recon try*:
> ```
> castor-recon -df Brain_Double_df.Cdh -opti MLEM -it 10:16 -proj joseph 
> -conv gaussian,4.,4.5,3.5::psf -dim 100,100,16 -vox 2.5,2.5,10. -oit 
> -1  -dout Brain_Double
> .........
> iIterativeAlgorithm::StepAfterSubsetLoop() -> Save image at iteration 10
> vAlgorithm::IterateCPU() -> Total time spent | User: 3699 sec | CPU: 
> 3.6990100e+03 sec
> sChronoManager::Display() -> Results from the profiling
> .........
>   --> Custom update step 1: 00 hours 00 mins 00 secs 000 ms
> ```
> *Second recon try*:
> ```
> castor-recon -df Brain_Double_df.Cdh -opti MLEM -it 2:10 -proj joseph 
> -conv gaussian,4.,4.5,3.5::psf -dim 200,200,16 -vox 1.25,1.25,10. -oit 
> -1  -dout Brain_Double_2
>
> *** Break *** segmentation violation
> ===========================================================
> There was a crash.
> This is the entire stack trace of all threads:
> ===========================================================
> #0  0x00007fc5feb0641c in waitpid () from /lib64/libc.so.6
> #1  0x00007fc5fea83f12 in do_system () from /lib64/libc.so.6
> #2  0x00007fc60384e0c4 in TUnixSystem::StackTrace() () from 
> /home/goldan/GATE/root/lib/libCore.so.6.18
> #3  0x00007fc6038507fc in TUnixSystem::DispatchSignals(ESignals) () 
> from /home/goldan/GATE/root/lib/libCore.so.6.18
> #4  <signal handler called>
> #5  0x00007fc5feac6eec in free () from /lib64/libc.so.6
> #6  0x000000000044e9ba in 
> oSensitivityGenerator::ComputeSensitivityFromScanner(int) ()
> #7  0x000000000044f425 in oSensitivityGenerator::LaunchCPU() ()
> #8  0x000000000042b7bc in main ()
> ===========================================================
>
>
> The lines below might hint at the cause of the crash.
> You may get help by asking at the ROOT forum http://root.cern.ch/forum
> Only if you are really convinced it is a bug in ROOT then please submit a
> report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
> from above as an attachment in addition to anything else
> that might help us fixing this issue.
> ===========================================================
> #5  0x00007fc5feac6eec in free () from /lib64/libc.so.6
> #6  0x000000000044e9ba in 
> oSensitivityGenerator::ComputeSensitivityFromScanner(int) ()
> #7  0x000000000044f425 in oSensitivityGenerator::LaunchCPU() ()
> #8  0x000000000042b7bc in main ()
> ===========================================================
> ````
> Does that mean I cannot modify the iteration setting or any setting 
> between two consecutive reconstructions? Or was there something wrong 
> other than configurations? Thank you!
>
> Best,
>
> On Fri, May 29, 2020 at 3:07 PM tmerlin <Thibaut.Merlin at univ-brest.fr 
> <mailto:Thibaut.Merlin at univ-brest.fr>> wrote:
>
>     Hi Xinjie,
>
>     Hard to say what's going wrong without having the data because
>     memory errors could come from a wide number of things, but we
>     didn't experienced this kind of issue with reconstructing large
>     images.
>
>     As Simon suggested, in the case you compiled the code multiple
>     time with the castor makefile, such errors could occur if the
>     files generated from the previous compilation are not cleaned.
>
>     Best,
>     Thibaut
>
>
>     On 28/05/2020 16:35, Xinjie Cao wrote:
>>     Hi Thibaut,
>>
>>     Thanks for your response! I am trying to recon a list-mode data
>>     in root. I got this problem when I changed my recon parameters to
>>     a higher resolution, like from 6mm voxel size to 1.5mm voxel size.
>>
>>     Best,
>>
>>     On Thu, May 28, 2020 at 7:04 AM Thibaut Merlin
>>     <Thibaut.Merlin at univ-brest.fr
>>     <mailto:Thibaut.Merlin at univ-brest.fr>> wrote:
>>
>>         Hi Xinjie,
>>
>>         On which kind of dataset did you get this problem ? Did it
>>         occur on every data you tried to reconstruct or just some of
>>         them ?
>>
>>         Best,
>>         Thibaut
>>
>>         Xinjie Cao <xinjie.cao at stonybrook.edu
>>         <mailto:xinjie.cao at stonybrook.edu>> a écrit :
>>
>>>         Dear all,
>>>         I am testing CASToR performance on multi-thread running, but
>>>         it looks like using the multi-thread function is not very
>>>         stable.
>>>         Before applying multi-thread to recon, every job was good.
>>>         But recon jobs always dumped with unrecognized problem since
>>>         I used multi-thread as below:
>>>         ```
>>>         *** Error in `castor-recon': munmap_chunk(): invalid
>>>         pointer: 0x0000000002259380 ***
>>>         ```
>>>         Did anyone ever see this problem before?
>>>         Any answer will be highly appreciated! Thank you!
>>>         Best,
>>
>>
>>         _______________________________________________
>>         Castor-users mailing list
>>         Castor-users at lists.castor-project.org
>>         <mailto:Castor-users at lists.castor-project.org>
>>         http://lists.castor-project.org/listinfo/castor-users
>>
>>
>>
>>     -- 
>>     *....................................................*
>>     *Xinjie Cao*
>>     *M.E. / Ph.D. student*
>>     *Research Project Assistant*
>>     *Department of Electrical and Computer Engineering & Radiology *
>>     *Novel Medical Imaging Technologies Lab*
>>     *Health Science Center Level 8*
>>     *Stony Brook, NY 11794-8460 *
>>     *Tel: +1 (631)202-9445*
>>     you.stonybrook.edu/goldan/people/
>>     <https://you.stonybrook.edu/goldan/people/>*
>>     *
>>     *email: **xinjie.cao at stonybroo*k.edu
>>     <mailto:xinjie.cao at stonybrook.edu>
>>
>>     *....................................................*
>>     It is prohibited to distribute or publish the files attached to
>>     any other people unless you get permission from the writer
>>     himself. All rights reserved.
>
>     -- 
>     Thibaut MERLIN -- PhD
>
>     Docteur en Imagerie Médicale au Laboratoire de Traitement de l'Information Medicale (LaTIM - INSERM UMR 1101)
>     Institut Brestois de recherche en Bio-Santé (IBRBS)
>     12 Avenue Foch, 29200 Brest, FRANCE
>     Tel: 06.75.12.24.90
>
>
>
> -- 
> *....................................................*
> *Xinjie Cao*
> *M.E. / Ph.D. student*
> *Research Project Assistant*
> *Department of Electrical and Computer Engineering & Radiology *
> *Novel Medical Imaging Technologies Lab*
> *Health Science Center Level 8*
> *Stony Brook, NY 11794-8460 *
> *Tel: +1 (631)202-9445*
> you.stonybrook.edu/goldan/people/ 
> <https://you.stonybrook.edu/goldan/people/>*
> *
> *email: **xinjie.cao at stonybroo*k.edu <mailto:xinjie.cao at stonybrook.edu>
>
> *....................................................*
> It is prohibited to distribute or publish the files attached to any 
> other people unless you get permission from the writer himself. All 
> rights reserved.

-- 
Thibaut MERLIN -- PhD

Docteur en Imagerie Médicale au Laboratoire de Traitement de l'Information Medicale (LaTIM - INSERM UMR 1101)
Institut Brestois de recherche en Bio-Santé (IBRBS)
12 Avenue Foch, 29200 Brest, FRANCE
Tel: 06.75.12.24.90

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.castor-project.org/pipermail/castor-users/attachments/20200609/f70bb919/attachment-0001.html>


More information about the Castor-users mailing list