summaryrefslogtreecommitdiff
path: root/test/integration/test-pdiff-usage
AgeCommit message (Collapse)AuthorFilesLines
2018-01-03refactor message generation for methodsDavid Kalnischkies1-1/+1
The format isn't too hard to get right, but it gets funny with multiline fields (which we don't really have yet) and its just easier to deal with it once and for all which can be reused for more messages later.
2017-07-26fail early in http if server answer is too small as wellDavid Kalnischkies1-3/+1
Failing on too much data is good, but we can do better by checking for exact filesizes as we know with hashsums how large a file should be, so if we get a file which has a size we do not expect we can drop it directly, regardless of if the file is larger or smaller than what we expect which should catch most cases which would end up as hashsum errors later now a lot sooner.
2017-07-26don't move failed pdiff indexes out of partialDavid Kalnischkies1-0/+30
The comment says this is intended, but looking at the history reveals that the comment comes from a different era. Nowadays we don't really need it anymore (and even back then it was disputeable) as we haven't used that file for our update in the end and nothing really needs this file after the update. Triggered is this by 188f297a2af4c15cb1d502360d1e478644b5b810 which moves various error conditions forward including this code expecting the file to exist – but it doesn't need to as download could have failed. We could fix that by simple checking if the file exists and only stage it if it does, but instead we don't stage it and instead even rename it out of the way with our conventional FAILED name (if it exists). That restores support for partial mirrors (= in this case mirrors which don't ship pdiff files). Note that apt heals itself even if only such a mirror is used as the update is successful even if that error is shown. Closes: 869425
2017-06-26show .diff/Index properly as ignored if we fallbackDavid Kalnischkies1-2/+2
Moving the code responsible for parsing the Index file from ::Done into the slightly earlier ::VerifyDone allows us to still "fail" the download if we can't make use of the Index for whatever reason, so that the progress log correctly displays "Ign" instead of "Get" for the file. This also makes quiet a few debug messages proper error messages (but those are still hidden by default for Ign lines).
2017-01-19fix various typos reported by spellintianDavid Kalnischkies1-1/+1
Most of them in (old) code comments. The two instances of user visible string changes the po files of the manpages are fixed up as well. Gbp-Dch: Ignore Reported-By: spellintian
2016-08-17support compression and by-hash for .diff/Index filesDavid Kalnischkies1-0/+29
In af81ab9030229b4ce6cbe28f0f0831d4896fda01 by-hash got implemented as a special compression type for our usual index files like Packages. Missing in this scheme was the special .diff/Index index file containing the info about individual patches for this index file. Deriving from the index file class directly we inherent the compression handling infrastructure and in this way also by-hash nearly for free. Closes: #824926
2016-08-10implement generic config fallback for methodsDavid Kalnischkies1-1/+1
The https method implemented for a long while now a hardcoded fallback to the same options in http, which, while it works, is rather inflexible if we want to allow the methods to use another name to change their behavior slightly, like apt-transport-tor does to https – most of the diff being s#https#tor#g which then fails to do the full circle fallthrough tor -> https -> http for https sources. With this config infrastructure this could be implemented now.
2016-07-27rred: truncate result file before writing to itDavid Kalnischkies1-5/+15
If another file in the transaction fails and hence dooms the transaction we can end in a situation in which a -patched file (= rred writes the result of the patching to it) remains in the partial/ directory. The next apt call will perform the rred patching again and write its result again to the -patched file, but instead of starting with an empty file as intended it will override the content previously in the file which has the same result if the new content happens to be longer than the old content, but if it isn't parts of the old content remain in the file which will pass verification as the new content written to it matches the hashes and if the entire transaction passes the file will be moved the lists/ directory where it might or might not trigger errors depending on if the old content which remained forms a valid file together with the new content. This has no real security implications as no untrusted data is involved: The old content consists of a base file which passed verification and a bunch of patches which all passed multiple verifications as well, so the old content isn't controllable by an attacker and the new one isn't either (as the new content alone passes verification). So the best an attacker can do is letting the user run into the same issue as in the report. Closes: #831762
2016-04-25don't ask server if we have entire file in partial/David Kalnischkies1-2/+13
We have this situation in cases were parts of the transaction are refused (e.g. in a hashsum mismatch) and rerun the update (e.g. in the hope that we get a mirror which is synced this time). Previously we would ask the server with an if-range and in the best case recieve a 416 in response (less featureful server might end up giving us the entire file again or we get the wrong file this time giving us a hashsum mismatch…), which is a waste of time if we know already by checking the hashsums that we got the complete and correct file.
2016-04-25make random acquire queues work less randomDavid Kalnischkies1-2/+4
Queues feeding workers like rred are created in a random pattern to get a few of them to run in parallel – but if we already have an idling queue we don't need to assign it to a (potentially new) random queue as that saves us the (agruably small) overhead of starting up a new queue, avoids adding jobs to an already busy queue while others idle and as a bonus reduces the size of debug logs a bit. We also keep starting new queues now until we reach our limit before we assign work at random to them, which should give us a more effective utilisation overall compared to potentially adding work to busy queues while we haven't reached our queue limit yet.
2016-04-07stop handling items in doomed transactionsDavid Kalnischkies1-1/+31
With the previous commit we track the state of transactions, so we can now use our knowledge to avoid processing data for a transaction which was already closed (via an abort in this case). This is needed as multiple independent processes are interacting in the process, so there isn't a simple immediate full-engine stop and it would also be bad to teach each and every item how to check if its manager has failed subordinate and what to do in that case. In the pdiff case, which deals (potentially) with many items during its lifetime e.g. a hashsum mismatch in another file can abort the transaction the file we try to patch via pdiff belongs to. This causes some of the items (which are already done) to be aborted with it, but items still in the process of acquisition continue in the processing and will later try to use all the items together failing in strange ways as cleanup already happened. The chosen solution is to dry up the communication channels instead by ignoring new requests for data acquisition, canceling requests which are not assigned to a queue and not calling Done/Failed on items anymore. This means that e.g. already started or pending (e.g. pipelined) downloads aren't stopped and continue as normal for now, but they remain in partial/ and aren't processed further so the next update command will pick them up and put them to good use while the current process fails updating (for this transaction group) in an orderly fashion. Closes: 817240 Thanks: Barr Detwix & Vincent Lefevre for log files
2016-03-14don't use Desc.URI to calculate .diff/Index filenamesDavid Kalnischkies1-1/+14
The URI descibing an item can change via mirrors/redirectors which causes the .diff/Index files to get the wrong names in storage. Git-Dch: Ignore
2016-03-14require $(HASH)-Download field in .diff/Index filesDavid Kalnischkies1-36/+41
Now that we ignore SHA1-only files it makes sense to require also the provision of hashes for the compressed patches as this was introduced in the same patchset as support for non-SHA1 hashes in the file itself in dak and adding support in other archive creators (if they support pdiffs at all) will likely be in the same batch. The reason for the change itself is simple: If you are 'scared' enough about the security of SHA1, you shouldn't uncompress a file you haven't verified at all – after all, it could be exploiting a bug or a zip bomb.
2016-03-14test: remove SHA1 support testing as unsupportedDavid Kalnischkies1-71/+19
Given that we refuse to use SHA1-only .diff/Indexes no point in shipping and running code which pretends to check support for it which given that all these tests are run 3 times eats a noticeable amount of time. Git-Dch: Ignore
2016-03-13Test that SHA1-only .diff/Index files are not usedJulian Andres Klode1-2/+32
Ensure that .diff/Index files that only contain SHA1 values and no SHA2 values are not used.
2016-03-06do not move not-failed pdiff-patches into CWD on failureDavid Kalnischkies1-0/+13
If a single pdiff fails, we have to fail the entire patching endeavour and fall back to getting the complete file instead. That is easy in serverside merged pdiffs as we get them one by one. For clientside we get them all at once through, which means that a failure in one has to stop the entire pipeline, which works as expected (as proven by the bugreporters as they don't even notice it happening). The problem is just that the first failing pdiff will do the cleanup, so another pdiff which happens to be successfully acquired after we processed the failure doesn't find the file it is supposed to use as a basename anymore, so the patch is renamed to what should be the unique extension and moved into the current working directory. Processing is then stopped as the patch realizes that it isn't the last one which completed downloading. On the plus side this means this is neither us using a bad temporary location nor a security problem. It "just" overrides unconditionally files in your current working directory (if you happen to have them named like a pdiff patch – a bit unlikely perhaps) and so drops files there which are never used again. I guess this was introduced in 4e3c5633b1e74b4f58b95f339cfbbf4cbf21ab3e for real as I made the need for the existence of the base file rather explicit, but the potential lingers in the code for far longer. Closes: #816837
2016-01-08remove uncompressed leftover partial file before pdiff bootstrapDavid Kalnischkies1-0/+6
The code already deals with compressed leftovers, but forgot the uncompressed files. The opertunity is picked to reorder this code and add debug messages about the actions taken as well as produce such a leftover file in the associated testcase.
2016-01-08use filesize of compressed pdiffs for the limit if possibleDavid Kalnischkies1-2/+6
With the addition of the $HASH-Download field in the .diff/Index we got the size of the compressed patches for 'free', so if that information is available we can use it for a more fitting calculation of the size requirements of the patches vs. the complete file. Note that this predicts a too small size in the transition case in which the information isn't available for all patches, but figuring this out would be a lot of code for practically nothing as only one update can ever be in such a transition phase.
2016-01-08keep compressed indexes in a low-cost formatDavid Kalnischkies1-1/+2
Downloading and storing are two different operations were different compression types can be preferred. For downloading we provide the choice via Acquire::CompressionTypes::Order as there is a choice to be made between download size and speed – and limited by whats available in the repository. Storage on the other hand has all compressions currently supported by apt available and to reduce runtime of tools accessing these files the compression type should be a low-cost format in terms of decompression. apt traditionally stores its indexes uncompressed on disk, but has options to keep them compressed. Now that apt downloads additional files we also deal with files which simply can't be stored uncompressed as they are just too big (like Contents for apt-file). Traditionally they are downloaded in a low-cost format (gz) as repositories do not provide other formats, but there might be even lower-cost formats and for download we could introduce higher-cost in the repositories. Downloading an entire index potentially requires recompression to another format, so an update takes potentially longer – but big files are usually updated via pdiffs which has to de- and re-compress anyhow and does it on the fly anyhow, so there is no extra time needed and in general it seems to be benefitial to invest the time in update to save time later on file access.
2016-01-08allow pdiff bootstrap from all supported compressorsDavid Kalnischkies1-10/+14
There is no reason to enforce that the file we start the bootstrap with is compressed with a compressor which is available online. This allows us to change the on-disk format as well as deals with repositories adding/removing support for a specific compressor.
2015-12-19tests: support spaces in path and TMPDIRDavid Kalnischkies1-57/+57
This doesn't allow all tests to run cleanly, but it at least allows to write tests which could run successfully in such environments. Git-Dch: Ignore
2015-09-14tests: try to support spaces in TMPDIRDavid Kalnischkies1-1/+1
Not all tests work yet, most notable the cdrom tests, but those require changes in libapt itself to have a proper fix and what we have fixed so far is good enough progress for now. Git-Dch: Ignore
2015-09-14tests: use SHA1 checksum only by default in testsDavid Kalnischkies1-0/+1
This is mostly a small speedup for the testcases, but it is also handy to document which tests actually deal with a specific hash compared to those which 'just' need some hash which can be important while adding new hashes. Git-Dch: Ignore
2015-09-14various changes to increase test-coverageDavid Kalnischkies1-8/+0
And of course, testing obscure things ends up showing obscure 'bugs' or better shortcomings/inconsitencies, so lets fix them with the tests. Git-Dch: Ignore
2015-08-28implement PDiff patching for compressed filesDavid Kalnischkies1-16/+21
Some additional files like 'Contents' are very big and should therefore kept compressed on the disk, which apt-file did in the past. It also implemented pdiff patching of these files by un- and recompressing these files on-the-fly, with this commit we can do the same – but we can do this in both pdiff patching styles (client and server merging) and secured by hashes. Hashes are in so far slightly complicated as we can't compare the hashes of the compressed files as we might compress them differently than the server would (different compressor versions, options, …), so we must compare the hashes of the uncompressed content. While this commit has changes in public headers, the classes it changes are marked as hidden, so nobody can use them directly, which means the ABI break is internal only.
2015-06-15show item ID in Hit, Ign and Err lines as wellDavid Kalnischkies1-1/+1
Again, consistency is the main sellingpoint here, but this way it is now also easier to explain that some files move through different stages and lines are printed for them hence multiple times: That is a bit hard to believe if the number is changing all the time, but now that it keeps consistent.
2015-06-09support hashes for compressed pdiff filesDavid Kalnischkies1-2/+44
At the moment we only have hashes for the uncompressed pdiff files, but via the new '$HASH-Download' field in the .diff/Index hashes can be provided for the .gz compressed pdiff file, which apt will pick up now and use to verify the download. Now, we "just" need a buy in from the creators of repositories…
2015-06-09add more parsing error checking for rredDavid Kalnischkies1-1/+2
The rred parser is very accepting regarding 'invalid' files. Given that we can't trust the input it might be a bit too relaxed. In any case, checking for more errors can't hurt given that we support only a very specific subset of ed commands.
2015-06-09check patch hashes in rred worker instead of in the handlerDavid Kalnischkies1-0/+2
rred is responsible for unpacking and reading the patch files in one go, but we currently only have hashes for the uncompressed patch files, so the handler read the entire patch file before dispatching it to the worker which would read it again – both with an implicit uncompress. Worse, while the workers operate in parallel the handler is the central orchestration unit, so having it busy with work means the workers do (potentially) nothing. This means rred is working with 'untrusted' data, which is bad. Yet, having the unpack in the handler meant that the untrusted uncompress was done as root which isn't better either. Now, we have it at least contained in a binary which we can harden a bit better. In the long run, we want hashes for the compressed patch files through to be safe.
2015-05-11improve partial/ cleanup in abort and failure casesDavid Kalnischkies1-1/+1
Especially pdiff-enhanced downloads have the tendency to fail for various reasons from which we can recover and even a successful download used to leave the old unpatched index in partial/. By adding a new method responsible for making the transaction of an individual file happen we can at specialisations especially for abort cases to deal with the cleanup. This also helps in keeping the compressed indexes around if another index failed instead of keeping the decompressed files, which we wouldn't pick up in the next call.
2015-04-19a hit on Release files means the indexes will be hits tooDavid Kalnischkies1-11/+4
If we get a IMSHit for the Transaction-Manager (= the InRelease file or as its still supported fallback Release + Release.gpg combo) we can assume that every file we would queue based on this manager, but already have locally is current and hence would get an IMSHit, too. We therefore save us and the server the trouble and skip the queuing in this case. Beside speeding up repetative executions of 'apt-get update' this way we also avoid hitting hashsum errors if the indexes are in fact already updated, but the Release file isn't yet as it is the case on well behaving mirrors as Release files is updated last. The implementation is a bit harder than the theory makes it sound as we still have to keep reverifying the Release files (e.g. to detect now expired once to avoid an attacker being able to silently stale us) and have to handle cases in which the Release file hits, but some indexes aren't present (e.g. user added a new foreign architecture).
2015-03-16test exitcode as well as string equalityDavid Kalnischkies1-6/+6
We use test{success,failure} now all over the place in the framework, so its only consequencial to do this in the situations in which we test for a specific output as well. Git-Dch: Ignore
2014-11-08reenable patchsize limit option for pdiffsDavid Kalnischkies1-0/+34
One word: "doh!" Commit f6d4ab9ad8a2cfe52737ab620dd252cf8ceec43d disabled the check to prevent apt from downloading bigger patches than the index it tries to patch. Happens rarly of course, but still. Detected by scan-build complaining about a dead assignment. To make up for the mistake a test is included as well.
2014-10-07Merge remote-tracking branch 'upstream/debian/experimental' into ↵Michael Vogt1-7/+43
feature/acq-trans Conflicts: apt-pkg/acquire-item.cc
2014-09-30support parsing of all hashes for pdiffDavid Kalnischkies1-6/+42
The fileformat of a pdiff index stores currently only SHA1 hashes. With this change, we look for all other hashes we support as well and take what we get, so that we can work after the release of jessie to get right of SHA1 if we want to. Note that the completely patched file is and was checked against the hashes collected from the Release file, so this transition isn't mission critical.
2014-09-23make pdiff transactional (but at the cost of a CopyFile()Michael Vogt1-0/+1
2014-05-09use HashStringList in the acquire systemDavid Kalnischkies1-6/+0
It is not very extensible to have the supported Hashes hardcoded everywhere and especially if it is part of virtual method names. It is also possible that a method does not support the 'best' hash (yet), so we might end up not being able to verify a file even though we have a common subset of supported hashes. And those are just two of the cases in which it is handy to have a more dynamic selection. The downside is that this is a MAJOR API break, but the HashStringList has a string constructor for compatibility, so with a bit of luck the few frontends playing with the acquire system directly are okay.
2014-02-10always cleanup patchfiles at the end of rred callDavid Kalnischkies1-7/+21
With APT::Get::List-Cleanup disabled the ed-style patch files are lingering in the lists/ directory otherwise. That was kinda okay in the old none-client-merge as the filename was always the same so it was constantly overridden, but now with different names for client-merge quiet a few could pill up on the system and are used by the next call as it picks them up based on the filename.
2014-01-15integrate Anthonys rred with POC for client-side mergeDavid Kalnischkies1-7/+16
Providing the benefits of both without the downsides :) (ABI breaks or external dependencies) For this Anthonys rred is equipped with: - magic-filename-pickup of patches rather than explicit messages - use of FileFd instead of FILE* to get on-the-fly uncompress of the gzip compressed pdiff patches The acquire code in turn stops checking for apt-file's helper as our own rred is now clever enough for our needs.
2013-12-13implement POC client-side merging of pdiffs via apt-fileDavid Kalnischkies1-20/+127
The idea of pdiffs is to avoid downloading the hole file by patching the existing index. This works very well, but becomes slow if a lot of patches needs to be applied to reconstruct an up-to-date index and in recent years more and more dinstall (or similar) runs are executed creating more and more pdiffs in the same amount of time, so pdiffs became less useful. The solution is simple: Reduce the amount of patches (which are very small) which need to be applied on top of the index we have available (which is usually pretty big). This can be done in two ways: Either merge the patches on the server-side so that the client has to download only one patch or the patches are all downloaded and merged on the client-side. The first needs a client who is doing one step at a time who can also skip patches if it needs (APT supports this for a long time now). The later is implemented by this commit, but depends on the server NOT merging the patches and the patches being in a strict order in which no patch is skipped. This is traditionally the case for dak, but other repository creators support merging – e.g. reprepro (which helpfully adds a flag indicating that the patches are merged). To support both or even mixes a client needs more information which isn't available for now. This POC uses the external diffindex-rred included in apt-file to do the heavy lifting of merging & applying all patches in one pass, hence to test this feature apt-file needs to be installed.
2013-08-28configurable compression for testcasesDavid Kalnischkies1-3/+1
Compressing files in 4 different styles eats test-time for no practical gain if we don't test them explicitly, so default to just building 'gz' compressed files as it is the simplest compression algorithm supported Git-Dch: Ignore
2013-08-12add chronic-like testsuccess/testfailure helpersDavid Kalnischkies1-2/+2
For many commands the output isn't stable (like then dpkg is called) but the exitcode is, so this helper enhances the common && msgpass || msgfail by generating automatically a msgtest and showing the output of the command in case of failure instead of discarding it unconditionally, the later being chronic-like behaviour Git-Dch: Ignore
2013-06-24simple fork and pidfile aptwebserverDavid Kalnischkies1-1/+1
Forking only after being ready to accept clients avoids running races with the tests which sometimes failed on the first 'apt-get update' (or similar) with the previous background-start and hope for the best… The commit fixes also some oversight output-order changes in regards to Description-md5 and (I-M-S) race conditions in various tests. Git-Dch: Ignore
2012-04-11fix the remaining lzma calls with xz --format=lzma in the testcasesDavid Kalnischkies1-1/+1
2011-01-24do not add Index file by hand now that ftparchive does it by itselfDavid Kalnischkies1-9/+2
2011-01-15* methods/rred.cc:David Kalnischkies1-0/+1
- operate optional on gzip compressed pdiffs * apt-pkg/acquire-item.cc: - don't uncompress downloaded pdiff files before feeding it to rred
2010-10-13tests/integration/test-*: remove a bunch of "local" that are used outside ↵Michael Vogt1-1/+1
funtions (bash complains)
2010-08-21* apt-pkg/acquire-item.cc:David Kalnischkies1-0/+51
- don't use ReadOnlyGzip mode for PDiffs as this mode doesn't work in combination with the AddFd methods of our hashclasses Add also 2 testcases: one to test pdiffs in general and one to test the handling of compressed indexes.