[Castor-users] Multi-thread Run Problem
Xinjie Cao
xinjie.cao at stonybrook.edu
Wed Jun 24 23:39:59 CEST 2020
Hi Tmerlin,
It looks like there is no error now, i will let you know if the error
appears again. Thanks!
Best,
On Fri, Jun 19, 2020 at 2:56 PM tmerlin <Thibaut.Merlin at univ-brest.fr>
wrote:
> Hi Xinjie,
>
> Thanks for your data. I could reproduce your error and it seems it is
> linked to the Joseph projector. In particular, there is a hard-coded
> tolerance factor used to compute the planes which are crossed by the lines.
> Decreasing that factor seems to remove the crash in my tests.
>
> So a quick fix may be to reduce that threshold (you can replace the
> *src/projector/iProjectorJoseph.cc* file by the one in attached, then
> recompile castor).
>
> I am not sure yet whether the crash is limited to that factor. Let me know
> if it fixes the error for you.
>
> Best,
> Thibaut
> On 14/06/2020 02:01, Xinjie Cao wrote:
>
> Hi tmerlin,
>
> Here is my dataset, geometry and entire error log. Hope you could find
> something!
> Brain_Double_df.Cdf
> <https://drive.google.com/a/stonybrook.edu/file/d/1wC4X99y4ATwDWGdrh8OoHfPmeA8l_m2G/view?usp=drive_web>
> Brain_Double_df.Cdh
> <https://drive.google.com/a/stonybrook.edu/file/d/1CFoCBMTOVrLgbP-chCu4bXVf379jWEQR/view?usp=drive_web>
>
> Best,
>
> On Tue, Jun 9, 2020 at 10:57 AM tmerlin <Thibaut.Merlin at univ-brest.fr>
> wrote:
>
>> Hi Xinjie,
>>
>> Does the crash occur directly after you launch the the reconstruction.
>> Could you run it with -vb 3 to have a bit more feedback ?
>>
>> There is no reason why you couldn't change reconstruction parameters
>> between two consecutive reconstructions.
>>
>> Maybe you could send us/upload your dataset & geometry file so we have a
>> closer look at the problem ?
>>
>> Best regards,
>>
>>
>> On 08/06/2020 17:20, Xinjie Cao wrote:
>>
>> Hi Thibaut and Simon,
>>
>> I have compiled two different Castor versions for multi-thread running
>> and single core in separate directories, respectively. The multi-thread is
>> running well now, but single core test had a problem when I move from a
>> lower resolution configuration to a higher one like below,
>>
>> *First recon try*:
>> ```
>> castor-recon -df Brain_Double_df.Cdh -opti MLEM -it 10:16 -proj joseph
>> -conv gaussian,4.,4.5,3.5::psf -dim 100,100,16 -vox 2.5,2.5,10. -oit -1
>> -dout Brain_Double
>> .........
>> iIterativeAlgorithm::StepAfterSubsetLoop() -> Save image at iteration 10
>> vAlgorithm::IterateCPU() -> Total time spent | User: 3699 sec | CPU:
>> 3.6990100e+03 sec
>> sChronoManager::Display() -> Results from the profiling
>> .........
>> --> Custom update step 1: 00 hours 00 mins 00 secs 000 ms
>> ```
>> *Second recon try*:
>> ```
>> castor-recon -df Brain_Double_df.Cdh -opti MLEM -it 2:10 -proj joseph
>> -conv gaussian,4.,4.5,3.5::psf -dim 200,200,16 -vox 1.25,1.25,10. -oit -1
>> -dout Brain_Double_2
>>
>> *** Break *** segmentation violation
>> ===========================================================
>> There was a crash.
>> This is the entire stack trace of all threads:
>> ===========================================================
>> #0 0x00007fc5feb0641c in waitpid () from /lib64/libc.so.6
>> #1 0x00007fc5fea83f12 in do_system () from /lib64/libc.so.6
>> #2 0x00007fc60384e0c4 in TUnixSystem::StackTrace() () from
>> /home/goldan/GATE/root/lib/libCore.so.6.18
>> #3 0x00007fc6038507fc in TUnixSystem::DispatchSignals(ESignals) () from
>> /home/goldan/GATE/root/lib/libCore.so.6.18
>> #4 <signal handler called>
>> #5 0x00007fc5feac6eec in free () from /lib64/libc.so.6
>> #6 0x000000000044e9ba in
>> oSensitivityGenerator::ComputeSensitivityFromScanner(int) ()
>> #7 0x000000000044f425 in oSensitivityGenerator::LaunchCPU() ()
>> #8 0x000000000042b7bc in main ()
>> ===========================================================
>>
>>
>> The lines below might hint at the cause of the crash.
>> You may get help by asking at the ROOT forum http://root.cern.ch/forum
>> Only if you are really convinced it is a bug in ROOT then please submit a
>> report at http://root.cern.ch/bugs Please post the ENTIRE stack trace
>> from above as an attachment in addition to anything else
>> that might help us fixing this issue.
>> ===========================================================
>> #5 0x00007fc5feac6eec in free () from /lib64/libc.so.6
>> #6 0x000000000044e9ba in
>> oSensitivityGenerator::ComputeSensitivityFromScanner(int) ()
>> #7 0x000000000044f425 in oSensitivityGenerator::LaunchCPU() ()
>> #8 0x000000000042b7bc in main ()
>> ===========================================================
>> ````
>> Does that mean I cannot modify the iteration setting or any setting
>> between two consecutive reconstructions? Or was there something wrong other
>> than configurations? Thank you!
>>
>> Best,
>>
>> On Fri, May 29, 2020 at 3:07 PM tmerlin <Thibaut.Merlin at univ-brest.fr>
>> wrote:
>>
>>> Hi Xinjie,
>>>
>>> Hard to say what's going wrong without having the data because memory
>>> errors could come from a wide number of things, but we didn't experienced
>>> this kind of issue with reconstructing large images.
>>>
>>> As Simon suggested, in the case you compiled the code multiple time with
>>> the castor makefile, such errors could occur if the files generated from
>>> the previous compilation are not cleaned.
>>>
>>> Best,
>>> Thibaut
>>>
>>>
>>> On 28/05/2020 16:35, Xinjie Cao wrote:
>>>
>>> Hi Thibaut,
>>>
>>> Thanks for your response! I am trying to recon a list-mode data in root.
>>> I got this problem when I changed my recon parameters to a higher
>>> resolution, like from 6mm voxel size to 1.5mm voxel size.
>>>
>>> Best,
>>>
>>> On Thu, May 28, 2020 at 7:04 AM Thibaut Merlin <
>>> Thibaut.Merlin at univ-brest.fr> wrote:
>>>
>>>> Hi Xinjie,
>>>>
>>>> On which kind of dataset did you get this problem ? Did it occur on
>>>> every data you tried to reconstruct or just some of them ?
>>>>
>>>> Best,
>>>> Thibaut
>>>>
>>>> Xinjie Cao <xinjie.cao at stonybrook.edu> a écrit :
>>>>
>>>> Dear all,
>>>>
>>>> I am testing CASToR performance on multi-thread running, but it looks
>>>> like using the multi-thread function is not very stable.
>>>> Before applying multi-thread to recon, every job was good. But recon
>>>> jobs always dumped with unrecognized problem since I used multi-thread as
>>>> below:
>>>> ```
>>>> *** Error in `castor-recon': munmap_chunk(): invalid pointer:
>>>> 0x0000000002259380 ***
>>>> ```
>>>> Did anyone ever see this problem before?
>>>> Any answer will be highly appreciated! Thank you!
>>>>
>>>> Best,
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Castor-users mailing list
>>>> Castor-users at lists.castor-project.org
>>>> http://lists.castor-project.org/listinfo/castor-users
>>>>
>>>
>>>
>>> --
>>> *....................................................*
>>> *Xinjie Cao*
>>> *M.E. / Ph.D. student*
>>> *Research Project Assistant*
>>> *Department of Electrical and Computer Engineering & Radiology *
>>> *Novel Medical Imaging Technologies Lab*
>>> *Health Science Center Level 8*
>>> *Stony Brook, NY 11794-8460 *
>>> *Tel: +1 (631)202-9445*
>>> you.stonybrook.edu/goldan/people/
>>> *email: **xinjie.cao at stonybroo*k.edu <xinjie.cao at stonybrook.edu>
>>>
>>>
>>> *....................................................*
>>> It is prohibited to distribute or publish the files attached to any
>>> other people unless you get permission from the writer himself. All rights
>>> reserved.
>>>
>>> --
>>> Thibaut MERLIN -- PhD
>>>
>>> Docteur en Imagerie Médicale au Laboratoire de Traitement de l'Information Medicale (LaTIM - INSERM UMR 1101)
>>> Institut Brestois de recherche en Bio-Santé (IBRBS)
>>> 12 Avenue Foch, 29200 Brest, FRANCE
>>> Tel: 06.75.12.24.90
>>>
>>>
>>
>> --
>> *....................................................*
>> *Xinjie Cao*
>> *M.E. / Ph.D. student*
>> *Research Project Assistant*
>> *Department of Electrical and Computer Engineering & Radiology *
>> *Novel Medical Imaging Technologies Lab*
>> *Health Science Center Level 8*
>> *Stony Brook, NY 11794-8460 *
>> *Tel: +1 (631)202-9445*
>> you.stonybrook.edu/goldan/people/
>> *email: **xinjie.cao at stonybroo*k.edu <xinjie.cao at stonybrook.edu>
>>
>>
>> *....................................................*
>> It is prohibited to distribute or publish the files attached to any other
>> people unless you get permission from the writer himself. All rights
>> reserved.
>>
>> --
>> Thibaut MERLIN -- PhD
>>
>> Docteur en Imagerie Médicale au Laboratoire de Traitement de l'Information Medicale (LaTIM - INSERM UMR 1101)
>> Institut Brestois de recherche en Bio-Santé (IBRBS)
>> 12 Avenue Foch, 29200 Brest, FRANCE
>> Tel: 06.75.12.24.90
>>
>>
>
> --
> *....................................................*
> *Xinjie Cao*
> *M.E. / Ph.D. student*
> *Research Project Assistant*
> *Department of Electrical and Computer Engineering & Radiology *
> *Novel Medical Imaging Technologies Lab*
> *Health Science Center Level 8*
> *Stony Brook, NY 11794-8460 *
> *Tel: +1 (631)202-9445*
> you.stonybrook.edu/goldan/people/
> *email: **xinjie.cao at stonybroo*k.edu <xinjie.cao at stonybrook.edu>
>
>
> *....................................................*
> It is prohibited to distribute or publish the files attached to any other
> people unless you get permission from the writer himself. All rights
> reserved.
>
> --
> Thibaut MERLIN -- PhD
>
> Docteur en Imagerie Médicale au Laboratoire de Traitement de l'Information Medicale (LaTIM - INSERM UMR 1101)
> Institut Brestois de recherche en Bio-Santé (IBRBS)
> 12 Avenue Foch, 29200 Brest, FRANCE
> Tel: 06.75.12.24.90
>
>
--
*....................................................*
*Xinjie Cao*
*M.E. / Ph.D. student*
*Research Project Assistant*
*Department of Electrical and Computer Engineering & Radiology *
*Novel Medical Imaging Technologies Lab*
*Health Science Center Level 8*
*Stony Brook, NY 11794-8460 *
*Tel: +1 (631)202-9445*
you.stonybrook.edu/goldan/people/
*email: **xinjie.cao at stonybroo*k.edu <xinjie.cao at stonybrook.edu>
*....................................................*
It is prohibited to distribute or publish the files attached to any other
people unless you get permission from the writer himself. All rights
reserved.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.castor-project.org/pipermail/castor-users/attachments/20200624/5e6706db/attachment-0001.html>
More information about the Castor-users
mailing list