~toddy/bzr/bzr.i18n

« back to all changes in this revision

Viewing changes to doc/developers/lca-merge.txt

Committer: Tobias Toedter
Date: 2008-02-10 08:01:48 UTC
mfrom: (2438.1.783 +trunk)
Revision ID: t.toedter@gmx.net-20080210080148-bg5rh61oq2zk2xw3

Merge trunk

files added:
bzrlib/tests/repository_implementations/test_has_revisions.py

bzrlib/tests/test_http_implementations.py

contrib/bzr_access

doc/developers/inventory.txt

doc/developers/lca-merge.txt

doc/en/tutorials/using_bazaar_with_launchpad.txt

doc/en/user-guide/revnos.txt

files removed:
bzrlib/plugins/multiparent.py

files renamed:
bzrlib/tests/HttpServer.py => bzrlib/tests/http_server.py

bzrlib/tests/HTTPTestUtil.py => bzrlib/tests/http_utils.py

files modified:
NEWS

README

bzrlib/__init__.py

bzrlib/annotate.py

bzrlib/branch.py

bzrlib/bugtracker.py

bzrlib/builtins.py

bzrlib/bzrdir.py

bzrlib/commands.py

bzrlib/conflicts.py

bzrlib/debug.py

bzrlib/delta.py

bzrlib/diff.py

bzrlib/doc/api/__init__.py

bzrlib/errors.py

bzrlib/fetch.py

bzrlib/graph.py

bzrlib/help_topics/__init__.py

bzrlib/help_topics/en/conflicts.txt

bzrlib/inventory.py

bzrlib/knit.py

bzrlib/merge.py

bzrlib/option.py

bzrlib/osutils.py

bzrlib/plugin.py

bzrlib/plugins/launchpad/__init__.py

bzrlib/plugins/launchpad/lp_indirect.py

bzrlib/plugins/launchpad/lp_registration.py

bzrlib/plugins/launchpad/test_lp_indirect.py

bzrlib/plugins/launchpad/test_register.py

bzrlib/progress.py

bzrlib/reconfigure.py

bzrlib/remote.py

bzrlib/repofmt/knitrepo.py

bzrlib/repofmt/pack_repo.py

bzrlib/repofmt/weaverepo.py

bzrlib/repository.py

bzrlib/revisiontree.py

bzrlib/smart/client.py

bzrlib/smart/medium.py

bzrlib/smart/protocol.py

bzrlib/smart/repository.py

bzrlib/smart/request.py

bzrlib/smart/vfs.py

bzrlib/status.py

bzrlib/symbol_versioning.py

bzrlib/tests/TestUtil.py

bzrlib/tests/__init__.py

bzrlib/tests/blackbox/test_annotate.py

bzrlib/tests/blackbox/test_diff.py

bzrlib/tests/blackbox/test_ignore.py

bzrlib/tests/blackbox/test_log.py

bzrlib/tests/blackbox/test_merge.py

bzrlib/tests/blackbox/test_outside_wt.py

bzrlib/tests/blackbox/test_pull.py

bzrlib/tests/blackbox/test_selftest.py

bzrlib/tests/blackbox/test_too_much.py

bzrlib/tests/blackbox/test_version_info.py

bzrlib/tests/branch_implementations/test_branch.py

bzrlib/tests/branch_implementations/test_http.py

bzrlib/tests/branch_implementations/test_parent.py

bzrlib/tests/interrepository_implementations/test_interrepository.py

bzrlib/tests/inventory_implementations/__init__.py

bzrlib/tests/repository_implementations/__init__.py

bzrlib/tests/repository_implementations/test_repository.py

bzrlib/tests/test_annotate.py

bzrlib/tests/test_bundle.py

bzrlib/tests/test_bzrdir.py

bzrlib/tests/test_conflicts.py

bzrlib/tests/test_diff.py

bzrlib/tests/test_errors.py

bzrlib/tests/test_fetch.py

bzrlib/tests/test_graph.py

bzrlib/tests/test_http.py

bzrlib/tests/test_http_response.py

bzrlib/tests/test_log.py

bzrlib/tests/test_merge.py

bzrlib/tests/test_merge_core.py

bzrlib/tests/test_nonascii.py

bzrlib/tests/test_osutils.py

bzrlib/tests/test_plugins.py

bzrlib/tests/test_progress.py

bzrlib/tests/test_reconfigure.py

bzrlib/tests/test_remote.py

bzrlib/tests/test_repository.py

bzrlib/tests/test_revert.py

bzrlib/tests/test_revisionnamespaces.py

bzrlib/tests/test_selftest.py

bzrlib/tests/test_sftp_transport.py

bzrlib/tests/test_smart.py

bzrlib/tests/test_smart_transport.py

bzrlib/tests/test_trace.py

bzrlib/tests/test_transform.py

bzrlib/tests/test_transport.py

bzrlib/tests/test_transport_implementations.py

bzrlib/tests/test_tsort.py

bzrlib/tests/test_urlutils.py

bzrlib/tests/test_version_info.py

bzrlib/tests/test_versionedfile.py

bzrlib/tests/test_win32utils.py

bzrlib/tests/workingtree_implementations/test_rename_one.py

bzrlib/trace.py

bzrlib/transform.py

bzrlib/transport/__init__.py

bzrlib/transport/http/__init__.py

bzrlib/transport/http/_pycurl.py

bzrlib/transport/http/_urllib.py

bzrlib/transport/http/_urllib2_wrappers.py

bzrlib/transport/http/response.py

bzrlib/transport/remote.py

bzrlib/tree.py

bzrlib/tsort.py

bzrlib/urlutils.py

bzrlib/version_info_formats/format_custom.py

bzrlib/versionedfile.py

bzrlib/workingtree_4.py

bzrlib/xml_serializer.py

doc/developers/HACKING.txt

doc/developers/index.txt

doc/en/user-guide/bug_trackers.txt

doc/en/user-guide/controlling_registration.txt

doc/en/user-guide/core_concepts.txt

doc/en/user-guide/index.txt

doc/index.txt

tools/rst2html.py

Show diffs side-by-side

added added

removed removed

doc/developers/lca-merge.txt

LCA Merge

=========

by Aaron Bentley

Essential characteristics

-------------------------

In the general case (no criss-cross), it is a three-way merge. When

there is a criss-cross at the tree level, but not for the particular

file, it is still a three-way merge. When there's a file-level

criss-cross, it's superior to a three-way merge.

Algorithm

---------

First, we compare the files we are trying to merge, and find the lines

that differ. Next, we try to determine why they differ; this is

essential to the merge operation, because it affects how we resolve the

differences. In this merger, there are three possible outcomes:

1. The line was added in this version: "new-this"

2. The line was deleted in the other version: "killed-other"

3. The line was preserved as part of merge resolution in this version,

but deleted in the other version: "conflicted-this"

Option 3 is new, but I believe it is essential. When each side has made

a conflicting merge resolution, we should let the user decide how to

combine the two resolutions, i.e. we should emit a conflict. We cannot

silently drop the line, or silently keep the line, which can happen if

we choose options 1 or 2. If we choose options 1 or 2, there's also a

possibility that a conflict will be produced, but no guarantee. We need

a guarantee, which is why we need a new possible outcome.

To decide whether a line is "new-this", "killed-other" or

"conflicted-this", we compare this version against the versions from

each "least common ancestor" (LCA), in graph terminology. For each LCA

version, if the line is not present in the LCA version, we add it to the

"new" set. If the line is present in the LCA version, we add it to the

"killed" set.

When we are done going through each LCA version, each unique line will

be in at least one of the sets. If it is only in the "new" set, it's

handled as "new-this". If it is only in the "killed" set, it's handled

as "killed-other". If it is in both sets, it's handled as

"conflicted-this".

The logic here is a bit tricky: first, we know that the line is present

in some, but not all, LCAs. We can assume that all LCAs were produced

by merges of the same sets of revisions. That means that in those LCAs,

there were different merge resolutions. Since THIS and OTHER disagree

about whether the line is present, those differences have propogated

into THIS and OTHER. Therefore, we should declare that the lines are in

conflict, and let the user handle the issue.

LCA merge and Three-way merge

-----------------------------

Now, in the common case, there's a single LCA, and LCA merge behaves as

a three-way merge. Since there's only one LCA, we cannot get the

"conflicted-this" outcome, only "new-this" or "killed-other. Let's look

at the typical description of three-way merges:

+-----+------+-------+------------+

+-----+------+-------+------------+

|A | A | A | A |

+-----+------+-------+------------+

|A | B | A | A |

+-----+------+-------+------------+

|A | B | B | A |

+-----+------+-------+------------+

|A | A | B | B |

+-----+------+-------+------------+

|A | B | C |\*conflict\*|

+-----+------+-------+------------+

Now, let's assume that BASE is a common ancestor, as is typically the

case. In fact, for best-case merges, BASE is the sole LCA.

We always pick the version that represents a change from BASE, if there

is one. For the AAAA line, there is no change, so the output is

rightfully BASE/THIS/OTHER. For ABAA, the THIS and OTHER are changes

from BASE, and they are the same change so they both win. (This case is

sometimes called convergence.) For ABBA, THIS is a change from BASE, so

THIS wins. For AABB, OTHER is a change from BASE, so OTHER wins. For

ABC*, THIS and OTHER are both changes to BASE, but they are different

changes, so they can't both win cleanly. Instead, we have a conflict.

Now in three-way merging, we typically talk about regions of text. In

weave/knit/newness/lca merge, we also have regions. Each contiguous

group of "unchanged" lines is a region, and the areas between them are

also regions.

Let's assign a to THIS and b to OTHER. "unchanged" regions represent

the AAAA or ABAA cases; it doesn't matter which, because the outcome is

the same regardless. Regions which consist of only "new-a" or

"killed-a" represent the ABBA case. Regions which consist of only

"new-b" or "killed-b" represent the AABB case. Regions which have

100

(new-a or killed-a) AND (new-b or killed-b) are the ABC* case-- both

101

sides have made changes, and they are different changes, so a conflict

102

must be emitted.

103

104

This is what I mean when I say that it is a three-way merge in the

105

common case; if there is only one LCA, then it is merely an alternative

106

implementation of three-way. (One that happens to automatically do

107

``--reprocess``, ftw).

108

109

Why a new name

110

--------------

111

112

1. It was time. Although knit / annotate merge and newness merge have

113

tried to emulate the behavior of the original weave merge algorithm,

114

``--merge-type=weave`` hasn't been based on weaves for a long time.

115

2. Behavior differences. This algorithm should behave like a three-way

116

merge in the common case, while its predecessors did not. It also has

117

explicit support for handling conflicting merge resolutions, so it

118

should behave better in criss-cross merge scenarios.

119

120

Performance

121

-----------

122

123

Unlike the current "weave" merge implementation, lca merge does not

124

perform any whole-history operations. LCA selection should scale with

125

the number of uncommon revisions. Text comparison time should scale

126

mO(n\ :sup:`2`\ ), where m is the number of LCAs, and n is the number of lines

127

in the file. The current weave merge compares each uncommon ancestor,

128

potentially several times, so it is >= kO(n\ :sup:`2`\ ), where k is the

129

number of uncommon ancestors. So "lca" should beat "weave" both in history

130

analysis time and in text comparison time.

131

132

Possible flaws

133

==============

134

135

1. Inaccurate LCA selection. Our current LCA algorithm uses

136

``Graph.heads()``, which is known to be flawed. It may occasionally give

137

bad results. This risk is mitigated by the fact that the per-file graphs

138

tend to be simpler than the revision graph. And since we're already using

139

this LCA algorithm, this is not an additional risk. I hope that John Meinel

140

will soon have a fixed version of ``Graph.heads`` for us.

141

2. False matches. Weaves have a concept of line identity, but knits and

142

later formats do not. So a line may appear to be common to two files, when

143

in fact it was introduced separately into each for entirely different

144

reasons. This risk is the same for three-way merging. It is mitigated by

145

using Patience sequence matching, which a longest-common-subsequence match.

146

147

Acknowledgements

148

================

149

150

I think this could be a great merge algorithm, and a candidate to make

151

our default, but this work would not have been possible without the work

152

of others, especially:

153

154

- Martin Pool's weave merge and knit/annotate merge algorithms.

155

- Bram Cohen's discussions of merge algorithms

156

- Andrew Tridgell's dissection of BitKeeper merge

157

- Nathaniel Smith's analysis of why criss-cross histories necessarily

158

produce poor three-way merges.

Older »