Page MenuHomePhabricator | MITK

Mirror MITK-Data to Github.com
Open, HighPublic

Description

The DKFZ proxy sometimes gives us a hard time cloning large repositories like MITK-Data. As we would prefer to clone this repository in the MITK superbuild instead of creating/uploading tarballs, we should consider mirroring MITK-Data to Github.com and use that clone URI instead.

Currently, only @nolden has the rights to create a MITK-Data repository on our MITK Github page.

Event Timeline

kislinsk created this task.Apr 6 2018, 12:08 PM
kislinsk triaged this task as Wishlist priority.
kislinsk edited projects, added MITK (2018-04); removed MITK.Apr 6 2018, 1:29 PM
kirchnth raised the priority of this task from Wishlist to High.May 28 2018, 2:32 PM
kirchnth added subscribers: adler, kirchnth.

This is actually quite the issue. I am not able to finish a superbuild since Friday - i tried at least 20 times (because i know it sometimes hangs up).

error: RPC failed; HTTP 502 curl 22 The requested URL returned error: 502 Proxy Error
fatal: The remote end hung up unexpectedly
-- Had to git clone more than once:
          3 times.
CMake Error at ep/tmp/MITK-Data-gitclone.cmake:66 (message):
  Failed to clone repository:
  'https://phabricator.mitk.org/source/mitkdata.git'

the same was reproduced by @adler
i am not saying we need to mirror MITK-Data to github ... but we do need some solution

floca added a subscriber: floca.May 28 2018, 4:03 PM

Same for me too. Stucked with some task because I cannot superbuild.

Workaround until the ITCF fixed their proxy: Use the SSH clone URI instead.

kislinsk added a comment.EditedAug 6 2018, 4:32 PM

We can't push to Github because of the 100 MB file size limit:

git push --verbose --mirror github
Pushing to https://github.com/MITK/MITK-Data.git
Counting objects: 2755, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (1618/1618), done.
POST git-receive-pack (chunked))
Writing objects: 100% (2755/2755), 294.40 MiB | 24.75 MiB/s, done.
Total 2755 (delta 1098), reused 2755 (delta 1098)
remote: Resolving deltas: 100% (1098/1098), done.
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: 0370172563c36c904501fb5f9aa232b7
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File UltrasoundImages/4D_TEE_Data_MV.dcm is 101.67 MB; this exceeds GitHub's file size limit of 100.00 MB
remote: error: File UltrasoundImages/4D_TEE_Data_MitralValve.dcm is 295.75 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/MITK/MITK-Data.git
 ! [remote rejected] master -> master (pre-receive hook declined)
 ! [remote rejected] origin/HEAD -> origin/HEAD (pre-receive hook declined)
 ! [remote rejected] origin/T24542-remove-3D+t-heart-data -> origin/T24542-remove-3D+t-heart-data (pre-receive hook declined)
 ! [remote rejected] origin/bug-16787-roi-measurement-tests -> origin/bug-16787-roi-measurement-tests (pre-receive hook declined)
 ! [remote rejected] origin/bug-16946-removeRedundantTestData -> origin/bug-16946-removeRedundantTestData (pre-receive hook declined)
 ! [remote rejected] origin/bug-17928-migrate-test-data -> origin/bug-17928-migrate-test-data (pre-receive hook declined)
 ! [remote rejected] origin/bug-18348-FiberDirExtract -> origin/bug-18348-FiberDirExtract (pre-receive hook declined)
 ! [remote rejected] origin/bug-18870-testdata -> origin/bug-18870-testdata (pre-receive hook declined)
 ! [remote rejected] origin/master -> origin/master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/MITK/MITK-Data.git'
nolden added a comment.EditedAug 6 2018, 5:50 PM

DICOM compression could be an option, I used gdcmconv -d and it reduced 4D_TEE_Data_MV.dcm from 100MB to 30MB

http://gdcm.sourceforge.net/html/gdcmconv.html

nolden added a comment.Aug 6 2018, 5:51 PM

To use it one can easily turn on the build applications option in a GDCM-build within a MITK Superbuild

@nolden Nice, but the large files will still be in the history. One of the two files mentioned above doesn't exist in HEAD anymore for example. Do you have an idea how we could solve this anyway?

nolden added a comment.Nov 7 2018, 9:46 AM

I can set up a simple git / gitweb on the new DMZ server and mirror it to there.

On this level the "cleanest" solution would be to push the ITCF to fix their proxy for large HTTPS requests. This way we wouldn't have any additional systems or maintenance duties and could continue with our Phabricator-hosted version of this repo.

kislinsk added a comment.EditedNov 7 2018, 12:19 PM

For a ticket or call to the ITCF, this line is all that's necessary to reproduce the connection aborts of the proxy:

GIT_CURL_VERBOSE=1 git clone https://phabricator.mitk.org/source/mitkdata.git

Note that it sometimes work, though. CMake sometimes need to call git three times until the clone is complete (it can resume), but it doesn't try a fourth time and will abort with an error.

This is the CMake output from a failed attempt:

2>------ Build started: Project: MITK-Data, Configuration: Debug x64 ------
2>Creating directories for 'MITK-Data'
2>Building Custom Rule F:/MITK/CMakeLists.txt
2>CMake does not need to re-run because F:/MITK-sb/CMakeFiles/generate.stamp is up-to-date.
2>Performing download step (git clone) for 'MITK-Data'
2>Cloning into 'MITK-Data'...
2>fatal: The remote end hung up unexpectedly
2>error : RPC failed; HTTP 502 curl 22 The requested URL returned error : 502 Proxy Error
2>Cloning into 'MITK-Data'...
2>fatal: The remote end hung up unexpectedly
2>error : RPC failed; HTTP 502 curl 22 The requested URL returned error : 502 Proxy Error
2>Cloning into 'MITK-Data'...
2>fatal: The remote end hung up unexpectedly
2>error : RPC failed; HTTP 502 curl 22 The requested URL returned error : 502 Proxy Error
2>-- Had to git clone more than once:
2>CMake Error at ep/tmp/MITK-Data-gitclone.cmake:66 (message):
2>          3 times.
2>  Failed to clone repository:
2>  'https://phabricator.mitk.org/source/mitkdata.git'
kislinsk edited projects, added MITK; removed MITK (2018-04).Nov 9 2018, 10:54 AM

Ticket with ITCF is filed, #2018110910000205

Ok, I discussed the ticket with ITCF

  1. to debug further we would need exact timestamps
  2. probably log file from our phabricator apache as well
  3. it possible the problem is partly on our side (phabricator apache): if our web server doesn't respond in time for the first request, the proxy will filter the repeated requests by cmake, since it wants to give the (possibly) overloaded webserver time to relax.

@kislinsk : could you have a look at the apache logs on our server and try to reproduce the problem? If it's really a load problem we could maybe trigger it by issuing the clone commands several times:

for i in $(seq 1 5) ; do { git clone https://phabricator.mitk.org/source/mitkdata.git /tmp/data-$i & } ; done

Worked for me, I mean it causes the error.

Ok, just as an additional symptom: submitting my last comment took a very long time, the ##git clone# commands had already failed. So maybe the server was still busy preparing the clones, and thus also took a long time to process my comment submission

@nolden FYI, you can get a clue about what is going on currently from here: https://phabricator.mitk.org/daemon/
All the heavy workload is done by the Phabricator Daemons.