[Castor-users] Multi-thread Run Problem

Xinjie Cao xinjie.cao at stonybrook.edu
Mon Jun 8 17:20:00 CEST 2020


Hi Thibaut and Simon,

I have compiled two different Castor versions for multi-thread running and
single core in separate directories, respectively. The multi-thread is
running well now, but single core test had a problem when I move from a
lower resolution configuration to a higher one like below,

*First recon try*:
```
castor-recon -df Brain_Double_df.Cdh -opti MLEM -it 10:16 -proj joseph
-conv gaussian,4.,4.5,3.5::psf -dim 100,100,16 -vox 2.5,2.5,10. -oit -1
 -dout Brain_Double
.........
iIterativeAlgorithm::StepAfterSubsetLoop() -> Save image at iteration 10
vAlgorithm::IterateCPU() -> Total time spent | User: 3699 sec | CPU:
3.6990100e+03 sec
sChronoManager::Display() -> Results from the profiling
.........
  --> Custom update step 1: 00 hours 00 mins 00 secs 000 ms
```
*Second recon try*:
```
castor-recon -df Brain_Double_df.Cdh -opti MLEM -it 2:10 -proj joseph -conv
gaussian,4.,4.5,3.5::psf -dim 200,200,16 -vox 1.25,1.25,10. -oit -1  -dout
Brain_Double_2

*** Break *** segmentation violation
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007fc5feb0641c in waitpid () from /lib64/libc.so.6
#1  0x00007fc5fea83f12 in do_system () from /lib64/libc.so.6
#2  0x00007fc60384e0c4 in TUnixSystem::StackTrace() () from
/home/goldan/GATE/root/lib/libCore.so.6.18
#3  0x00007fc6038507fc in TUnixSystem::DispatchSignals(ESignals) () from
/home/goldan/GATE/root/lib/libCore.so.6.18
#4  <signal handler called>
#5  0x00007fc5feac6eec in free () from /lib64/libc.so.6
#6  0x000000000044e9ba in
oSensitivityGenerator::ComputeSensitivityFromScanner(int) ()
#7  0x000000000044f425 in oSensitivityGenerator::LaunchCPU() ()
#8  0x000000000042b7bc in main ()
===========================================================


The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum http://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#5  0x00007fc5feac6eec in free () from /lib64/libc.so.6
#6  0x000000000044e9ba in
oSensitivityGenerator::ComputeSensitivityFromScanner(int) ()
#7  0x000000000044f425 in oSensitivityGenerator::LaunchCPU() ()
#8  0x000000000042b7bc in main ()
===========================================================
````
Does that mean I cannot modify the iteration setting or any setting between
two consecutive reconstructions? Or was there something wrong other than
configurations? Thank you!

Best,

On Fri, May 29, 2020 at 3:07 PM tmerlin <Thibaut.Merlin at univ-brest.fr>
wrote:

> Hi Xinjie,
>
> Hard to say what's going wrong without having the data because memory
> errors could come from a wide number of things, but we didn't experienced
> this kind of issue with reconstructing large images.
>
> As Simon suggested, in the case you compiled the code multiple time with
> the castor makefile, such errors could occur if the files generated from
> the previous compilation are not cleaned.
>
> Best,
> Thibaut
>
>
> On 28/05/2020 16:35, Xinjie Cao wrote:
>
> Hi Thibaut,
>
> Thanks for your response! I am trying to recon a list-mode data in root. I
> got this problem when I changed my recon parameters to a higher resolution,
> like from 6mm voxel size to 1.5mm voxel size.
>
> Best,
>
> On Thu, May 28, 2020 at 7:04 AM Thibaut Merlin <
> Thibaut.Merlin at univ-brest.fr> wrote:
>
>> Hi Xinjie,
>>
>> On which kind of dataset did you get this problem ? Did it occur on every
>> data you tried to reconstruct or just some of them ?
>>
>> Best,
>> Thibaut
>>
>> Xinjie Cao <xinjie.cao at stonybrook.edu> a écrit :
>>
>> Dear all,
>>
>> I am testing CASToR performance on multi-thread running, but it looks
>> like using the multi-thread function is not very stable.
>> Before applying multi-thread to recon, every job was good. But recon jobs
>> always dumped with unrecognized problem since I used multi-thread as below:
>> ```
>> *** Error in `castor-recon': munmap_chunk(): invalid pointer:
>> 0x0000000002259380 ***
>> ```
>> Did anyone ever see this problem before?
>> Any answer will be highly appreciated! Thank you!
>>
>> Best,
>>
>>
>>
>>
>> _______________________________________________
>> Castor-users mailing list
>> Castor-users at lists.castor-project.org
>> http://lists.castor-project.org/listinfo/castor-users
>>
>
>
> --
> *....................................................*
> *Xinjie Cao*
> *M.E. / Ph.D. student*
> *Research Project Assistant*
> *Department of Electrical and Computer Engineering & Radiology *
> *Novel Medical Imaging Technologies Lab*
> *Health Science Center Level 8*
> *Stony Brook, NY 11794-8460 *
> *Tel: +1 (631)202-9445*
> you.stonybrook.edu/goldan/people/
> *email: **xinjie.cao at stonybroo*k.edu <xinjie.cao at stonybrook.edu>
>
>
> *....................................................*
> It is prohibited to distribute or publish the files attached to any other
> people unless you get permission from the writer himself. All rights
> reserved.
>
> --
> Thibaut MERLIN -- PhD
>
> Docteur en Imagerie Médicale au Laboratoire de Traitement de l'Information Medicale (LaTIM - INSERM UMR 1101)
> Institut Brestois de recherche en Bio-Santé (IBRBS)
> 12 Avenue Foch, 29200 Brest, FRANCE
> Tel: 06.75.12.24.90
>
>

-- 
*....................................................*
*Xinjie Cao*
*M.E. / Ph.D. student*
*Research Project Assistant*
*Department of Electrical and Computer Engineering & Radiology *
*Novel Medical Imaging Technologies Lab*
*Health Science Center Level 8*
*Stony Brook, NY 11794-8460 *
*Tel: +1 (631)202-9445*
you.stonybrook.edu/goldan/people/
*email: **xinjie.cao at stonybroo*k.edu <xinjie.cao at stonybrook.edu>


*....................................................*
It is prohibited to distribute or publish the files attached to any other
people unless you get permission from the writer himself. All rights
reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.castor-project.org/pipermail/castor-users/attachments/20200608/cba5635e/attachment.html>


More information about the Castor-users mailing list