Fossil

Check-in [cd689b3813]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Assorted grammar and spelling fixes in www/rebaseharm.md. Also added named anchors to all of the sections.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: cd689b3813464a41d1bfb3f0879960b9a1c18f1c0a268094ffddf0a03c31af68
User & Date: wyoung 2019-09-13 09:23:13.028
Context
2019-09-13
09:25
Added a few paras to section 3.0 in rebaseharm.md, giving consequences of siloed development in Socratic fashion. ... (check-in: 924bf44d39 user: wyoung tags: trunk)
09:23
Assorted grammar and spelling fixes in www/rebaseharm.md. Also added named anchors to all of the sections. ... (check-in: cd689b3813 user: wyoung tags: trunk)
09:00
Added another link from www/fossil-v-git.wiki to rebaseharm.md. ... (check-in: 29997f803b user: wyoung tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to www/rebaseharm.md.
1
2
3
4
5
6
7


8
9
10
11
12
13
14
15
16


17
18

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Rebase Considered Harmful

Fossil deliberately omits a "rebase" command, because the original
designer of Fossil (and author of this article) considers rebase to be 
an anti-pattern to be avoided. This article attempts to
explain that point of view.



## 1.0 Rebasing is dangerous

Most people, even strident advocates rebase, agree that rebase can
cause problems when misused.  Rebase documentation talks about the
[golden rule of rebase][golden]: that it should never be used on a public
branch.  Horror stories of misused rebase abound, and the rebase 
documentation devotes considerable space toward explaining how to
recover from rebase errors and/or misuse.



Sometimes sharp and dangerous tools are justified,
because they accomplish things that cannot be

(easily) done otherwise.  But rebase does not fall into that category,
because rebase provides no new capabilities.  To wit:

## 2.0 A rebase is just a merge with historical references omitted

A rebase is really nothing more than a merge (or a series of merges)
that deliberately forgets one of the parents of each merge step.
To help illustrate this fact,
consider the first rebase example from the 
[Git documentation][gitrebase].  The merge looks like this:

![merge case](./rebase01.svg)

And the rebase looks like this:

![rebase case](./rebase02.svg)

As the [Git documentation][gitrebase] points out, check-ins C4\' and C5
are identical.  The only difference between C4\' and C5 is that C5
records the fact that C4 is its merge parent but C4\' does not.

Thus, a rebase is just a merge that forgets where it came from.

The Git documentation acknowledges this fact (in so many words) and
justifies it by saying "rebas[e] makes for a cleaner history".  I read
that sentence as a tacit admission that the Git history display 
capabilities are weak and need active assistance from the user to 
keep things manageable.
Surely a better approach is to record
the complete ancestry of every check-in but then fix the tool to show
a "clean" history in those instances where a simplified display is
desirable and edifying, but retain the option to show the real,
complete, messy history for cases where detail and accuracy are more
important.

So, another way of thinking about rebase is that it is a kind of
merge that intentionally forgets some details in order to
not overwhelm the weak history display mechanisms available in Git.

### 2.1 Rebase does not actually provide better feature-branch diffs

Another argument, often cited, is that rebasing a feature branch
allows one to see just the changes in the feature branch without
the concurrent changes in the main line of development. 
Consider a hypothetical case:

![unmerged feature branch](./rebase03.svg)



|



>
>


|
|





>
>


>
|
|

|




















|














|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Rebase Considered Harmful

Fossil deliberately omits a "rebase" command, because the original
designer of Fossil (and [original author][vhist] of this article) considers rebase to be 
an anti-pattern to be avoided. This article attempts to
explain that point of view.

[vhist]: https://fossil-scm.org/fossil/finfo?name=www/rebaseharm.md

## 1.0 Rebasing is dangerous

Most people, even strident advocates of rebase, agree that rebase can
cause problems when misused. The Git rebase documentation talks about the
[golden rule of rebase][golden]: that it should never be used on a public
branch.  Horror stories of misused rebase abound, and the rebase 
documentation devotes considerable space toward explaining how to
recover from rebase errors and/or misuse.

## <a name="cap-loss"></a>2.0 Rebase provides no new capabilities

Sometimes sharp and dangerous tools are justified,
because they accomplish things that cannot be
done otherwise, or at least cannot be done easily.
Rebase does not fall into that category,
because it provides no new capabilities.

### <a name="orphaning"></a>2.1 A rebase is just a merge with historical references omitted

A rebase is really nothing more than a merge (or a series of merges)
that deliberately forgets one of the parents of each merge step.
To help illustrate this fact,
consider the first rebase example from the 
[Git documentation][gitrebase].  The merge looks like this:

![merge case](./rebase01.svg)

And the rebase looks like this:

![rebase case](./rebase02.svg)

As the [Git documentation][gitrebase] points out, check-ins C4\' and C5
are identical.  The only difference between C4\' and C5 is that C5
records the fact that C4 is its merge parent but C4\' does not.

Thus, a rebase is just a merge that forgets where it came from.

The Git documentation acknowledges this fact (in so many words) and
justifies it by saying "rebasing makes for a cleaner history."  I read
that sentence as a tacit admission that the Git history display 
capabilities are weak and need active assistance from the user to 
keep things manageable.
Surely a better approach is to record
the complete ancestry of every check-in but then fix the tool to show
a "clean" history in those instances where a simplified display is
desirable and edifying, but retain the option to show the real,
complete, messy history for cases where detail and accuracy are more
important.

So, another way of thinking about rebase is that it is a kind of
merge that intentionally forgets some details in order to
not overwhelm the weak history display mechanisms available in Git.

### <a name="clean-diffs"></a>2.2 Rebase does not actually provide better feature-branch diffs

Another argument, often cited, is that rebasing a feature branch
allows one to see just the changes in the feature branch without
the concurrent changes in the main line of development. 
Consider a hypothetical case:

![unmerged feature branch](./rebase03.svg)
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
diff is not determined by whether you select C7 or C5\' as the target
of the diff, but rather by your choice of the diff source, C2 or C6.

So, to help with the problem of viewing changes associated with a feature
branch, perhaps what is needed is not rebase but rather better tools to 
help users identify an appropriate baseline for their diffs.

## 3.0 Rebase encourages siloed development

The [golden rule of rebase][golden] is that you should never do it
on public branches, so if you are using rebase as intended, that means
you are keeping private branches.  Or, to put it another way, you are
doing siloed development.  You are not sharing your intermediate work
with collaborators.  This is not good for product quality.

[Nagappan, et. al][nagappan] studied bugs in Windows Vista and found
that best predictor of bugs is the distance on the org-chart between
the stake-holders.  Or, bugs are reduced when the engineers talk to
one another.  Similar findings arise in other disciplines.  Keeping
private branches does not prove that developers are communicating
insufficiently, but it is a key symptom that problem.

[Weinberg][weinberg] argues programming should be "egoless".  That
is to say, programmers should avoid linking their code with their sense of
self, as that makes it more difficult for them to find and respond
to bugs, and hence makes them less productive.  Many developers are
drawn to private branches out of sense of ego.  "I want to get the
code right before I publish it".  I sympathize with this sentiment,
and am frequently guilty of it myself.  It is humbling to display
your stupid mistake to the whole world on an internet that
never forgets.  And yet, humble programmers generate better code.

## 4.0 Rebase commits untested check-ins to the blockchain

Rebase adds new check-ins to the blockchain without giving the operator
an opportunity to test and verify those check-ins.  Just because the
underlying three-way merge had no conflict does not mean that the resulting
code actually works.  Thus, rebase runs the very real risk of adding
non-functional check-ins to the permanent record.

Of course, a user can also commit untested or broken check-ins without
the help of rebase.  But at least with an ordinary commit or merge
(in Fossil at least), the operator
has the *opportunity* to test and verify the merge before it is committed,
and a chance to back out or fix the change if it is broken, without leaving
busted check-ins on the blockchain to complicate future bisects.

With rebase, pre-commit testing is not an option.

## 5.0 Rebase causes timestamp confusion

Consider the earlier example of rebasing a feature branch:

![rebased feature branch, again](./rebase04.svg)

What timestamps go on the C3\' and C5\' check-ins?  If you choose
the same timestamps as the original C3 and C5, then you have the
odd situation C3' is older than its parent C6.  We call that a
"timewarp" in Fossil.  Timewarps can also happen due to misconfigured
system clocks, so they are not unique to rebase.  But they are very
confusing and best avoided.  The other option is to provide new
unique timestamps for C3' and C5'.  But then you lose the information
about when those check-ins were originally created, which can make
historical analysis of changes more difficult, and might also
complicate prior art claims.

## 6.0 Rebasing is the same as lying

By discarding parentage information, rebase attempts to deceive the
reader about how the code actually came together.

The [Git rebase documentation][gitrebase] admits as much.  They acknowledge
that when you view a repository as record of what actually happened,
doing a rebase is "blasphemous" and "you're _lying_ about what
actually happened", but then goes on to justify rebase as follows:

>
_"The opposing point of view is that the commit history is the **story of 
how your project was made.** You wouldn\'t publish the first draft of a 
book, and the manual for how to maintain your software deserves careful
editing. This is the camp that uses tools like rebase and filter-branch 
to tell the story in the way that’s best for future readers."_

I reject this argument utterly.
Unless you project is a work of fiction, it is not a "story" but a "history".
Honorable writers adjust their narrative to fit
history.  Rebase adjusts history to fit the narrative.

Truthful texts can be redrafted for clarity and accuracy.
Fossil supports this by providing mechanisms to fix
typos in check-in comments, attach supplemental notes,
and make other editorial changes.







|














|




|




|
















|
















|

















|







106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
diff is not determined by whether you select C7 or C5\' as the target
of the diff, but rather by your choice of the diff source, C2 or C6.

So, to help with the problem of viewing changes associated with a feature
branch, perhaps what is needed is not rebase but rather better tools to 
help users identify an appropriate baseline for their diffs.

## <a name="siloing"></a>3.0 Rebase encourages siloed development

The [golden rule of rebase][golden] is that you should never do it
on public branches, so if you are using rebase as intended, that means
you are keeping private branches.  Or, to put it another way, you are
doing siloed development.  You are not sharing your intermediate work
with collaborators.  This is not good for product quality.

[Nagappan, et. al][nagappan] studied bugs in Windows Vista and found
that best predictor of bugs is the distance on the org-chart between
the stake-holders.  Or, bugs are reduced when the engineers talk to
one another.  Similar findings arise in other disciplines.  Keeping
private branches does not prove that developers are communicating
insufficiently, but it is a key symptom that problem.

[Weinberg][weinberg] argues programming should be "egoless."  That
is to say, programmers should avoid linking their code with their sense of
self, as that makes it more difficult for them to find and respond
to bugs, and hence makes them less productive.  Many developers are
drawn to private branches out of sense of ego.  "I want to get the
code right before I publish it."  I sympathize with this sentiment,
and am frequently guilty of it myself.  It is humbling to display
your stupid mistake to the whole world on an internet that
never forgets.  And yet, humble programmers generate better code.

## <a name="testing"></a>4.0 Rebase commits untested check-ins to the blockchain

Rebase adds new check-ins to the blockchain without giving the operator
an opportunity to test and verify those check-ins.  Just because the
underlying three-way merge had no conflict does not mean that the resulting
code actually works.  Thus, rebase runs the very real risk of adding
non-functional check-ins to the permanent record.

Of course, a user can also commit untested or broken check-ins without
the help of rebase.  But at least with an ordinary commit or merge
(in Fossil at least), the operator
has the *opportunity* to test and verify the merge before it is committed,
and a chance to back out or fix the change if it is broken, without leaving
busted check-ins on the blockchain to complicate future bisects.

With rebase, pre-commit testing is not an option.

## <a name="timestamps"></a>5.0 Rebase causes timestamp confusion

Consider the earlier example of rebasing a feature branch:

![rebased feature branch, again](./rebase04.svg)

What timestamps go on the C3\' and C5\' check-ins?  If you choose
the same timestamps as the original C3 and C5, then you have the
odd situation C3' is older than its parent C6.  We call that a
"timewarp" in Fossil.  Timewarps can also happen due to misconfigured
system clocks, so they are not unique to rebase.  But they are very
confusing and best avoided.  The other option is to provide new
unique timestamps for C3' and C5'.  But then you lose the information
about when those check-ins were originally created, which can make
historical analysis of changes more difficult, and might also
complicate prior art claims.

## <a name="lying"></a>6.0 Rebasing is the same as lying

By discarding parentage information, rebase attempts to deceive the
reader about how the code actually came together.

The [Git rebase documentation][gitrebase] admits as much.  They acknowledge
that when you view a repository as record of what actually happened,
doing a rebase is "blasphemous" and "you're _lying_ about what
actually happened", but then goes on to justify rebase as follows:

>
_"The opposing point of view is that the commit history is the **story of 
how your project was made.** You wouldn\'t publish the first draft of a 
book, and the manual for how to maintain your software deserves careful
editing. This is the camp that uses tools like rebase and filter-branch 
to tell the story in the way that’s best for future readers."_

I reject this argument utterly.
Unless your project is a work of fiction, it is not a "story" but a "history."
Honorable writers adjust their narrative to fit
history.  Rebase adjusts history to fit the narrative.

Truthful texts can be redrafted for clarity and accuracy.
Fossil supports this by providing mechanisms to fix
typos in check-in comments, attach supplemental notes,
and make other editorial changes.
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240

Unfortunately, Git does not provide the ability to add corrections
or clarifications to historical check-ins in its blockchain.  Hence,
once again, rebase can be seen as an attempt to work around limitations
of Git.  Wouldn't it be better to fix the tool rather than to lie about
the project history?

## 7.0 Cherry-pick merges work better then rebase

Perhaps there are some cases where a rebase-like transformation
is actually helpful.  But those cases are rare.  And when they do
come up, running a series of cherry-pick merges achieve the same
topology, but with advantages:

  1.  Cherry-pick merges preserve an honest record of history.
      (They do in Fossil at least.  Git's file format does not have
      a slot to record cherry-pick merge history, unfortunately.)

  2.  Cherry-picks provide an opportunity to test each new check-in
      before it is committed to the blockchain

  3.  Cherry-pick merges are "safe" in the sense that they do not
      cause problems for collaborators if you do them on public branches.

  4.  Cherry-picks keep both the original and the revised check-ins,
      so both timestamps are preserved.

## 8.0 Summary and conclusion

Rebasing is an anti-pattern.  It is dishonest.  It deliberately
omits historical information.  It causes problems for collaboration.
And it has no offsetting benefits.

For these reasons, rebase is intentionally and deliberately omitted
from the design of Fossil.


[golden]: https://www.atlassian.com/git/tutorials/merging-vs-rebasing#the-golden-rule-of-rebasing
[gitrebase]: https://git-scm.com/book/en/v2/Git-Branching-Rebasing
[nagappan]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2008-11.pdf
[weinberg]: https://books.google.com/books?id=76dIAAAAMAAJ







|



















|













205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245

Unfortunately, Git does not provide the ability to add corrections
or clarifications to historical check-ins in its blockchain.  Hence,
once again, rebase can be seen as an attempt to work around limitations
of Git.  Wouldn't it be better to fix the tool rather than to lie about
the project history?

## <a name="better-plan"></a>7.0 Cherry-pick merges work better than rebase

Perhaps there are some cases where a rebase-like transformation
is actually helpful.  But those cases are rare.  And when they do
come up, running a series of cherry-pick merges achieve the same
topology, but with advantages:

  1.  Cherry-pick merges preserve an honest record of history.
      (They do in Fossil at least.  Git's file format does not have
      a slot to record cherry-pick merge history, unfortunately.)

  2.  Cherry-picks provide an opportunity to test each new check-in
      before it is committed to the blockchain

  3.  Cherry-pick merges are "safe" in the sense that they do not
      cause problems for collaborators if you do them on public branches.

  4.  Cherry-picks keep both the original and the revised check-ins,
      so both timestamps are preserved.

## <a name="conclusion"></a>8.0 Summary and conclusion

Rebasing is an anti-pattern.  It is dishonest.  It deliberately
omits historical information.  It causes problems for collaboration.
And it has no offsetting benefits.

For these reasons, rebase is intentionally and deliberately omitted
from the design of Fossil.


[golden]: https://www.atlassian.com/git/tutorials/merging-vs-rebasing#the-golden-rule-of-rebasing
[gitrebase]: https://git-scm.com/book/en/v2/Git-Branching-Rebasing
[nagappan]: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2008-11.pdf
[weinberg]: https://books.google.com/books?id=76dIAAAAMAAJ