Skip to content
This repository was archived by the owner on Oct 17, 2024. It is now read-only.

Commit a46ce21

Browse files
authored
Update README.md
1 parent af8ed1d commit a46ce21

File tree

1 file changed

+7
-172
lines changed

1 file changed

+7
-172
lines changed

README.md

+7-172
Original file line numberDiff line numberDiff line change
@@ -8,185 +8,20 @@ This problem involves writing some Ruby code to implement a filesystem base imag
88

99
45 minutes.
1010

11-
### Competencies
12-
13-
- Basic Ruby coding skills
14-
- Ability to synthesize requirements and turn those into code
15-
- Understanding of files & IO
16-
- Error handling
17-
- Deterministic hashing and / or uuid generation
18-
- Cache eviction strategies
19-
2011
### Environment
2112

2213
- IDE and Ruby 3 environment. VSCode, IntelliJ or RubyMine free community versions work fine.
2314
- [VSCode](https://code.visualstudio.com/)
2415
- [IntelliJ](https://www.jetbrains.com/idea/)
2516
- [RubyMine](https://www.jetbrains.com/ruby/)
2617

27-
### Resources for candidate
28-
29-
Some starting code is provided.
30-
3118
### Background knowledge
3219

33-
> Kaleido/Canva is a very visual product, there are lots of images involved. Many of our backend services
34-
> need to download these images and do something with them. For example, when downloading a design,
35-
> one of our backend services will download the image and store it on the filesystem while
36-
> processing the download. We’d like to avoid downloading the same image again and again, one
37-
> approach to this is to cache them.
38-
>
39-
> Through the course of the interview, we'd like you to write some Ruby code that caches images
40-
> on the filesystem. The images are uniquely identified by urls.
41-
42-
### Likely clarifying questions:
43-
44-
Q: How much disk space is there?
45-
46-
> A:
47-
> The instances have very large disks. For the purposes of this interview, you can consider the
48-
> instances to have infinite disk capacity. If there is time at the end, we can chat about how we
49-
> might implement a cache that can take disk capacity into consideration.
50-
51-
Q: Does every url have a unique image? Are there duplicate images?
52-
53-
> A:
54-
> No, we can consider a unique url to represent a unique image. We are not concerned with the binary
55-
> content of individual images. We are also not concerned if two different urls reference identical
56-
> image content, in this case we will still save two images on the filesystem. If there is time at
57-
> the end, we can chat about how we might implement a cache that can take duplicate images into
58-
> consideration.
59-
60-
Q: Is there any authentication on the urls?
61-
62-
> A:
63-
> For the purposes of the interview, we can assume that we are able to access all image urls.
64-
65-
Q: What sort of cache-eviction strategy should be used?
66-
67-
> A:
68-
> We will take a look at the code in a minute where the requirements are described in the comments.
69-
> We will initially use a simple lease / release mechanism, but if there is time at the end, we can
70-
> discuss different cache-eviction strategies.
71-
72-
## Session outline
73-
74-
Step 1: Brief walk through the code
75-
Step 2: Implement lease
76-
Step 3: Implement release
77-
78-
All candidates are expected to complete all steps. Better candidates might finish quickly and then
79-
there may be time for additional questions.
80-
81-
### Step 1: Brief walk through the code
82-
83-
Let the candidate open up their IDE and get oriented with the code.
84-
85-
> Open up your IDE and take a look at caches.py. There is an ImageCache class with a lease and
86-
> release method. This the class that we would like you to implement during the interview.
87-
>
88-
> Take a few minutes to read through the comments. Let me know if you have any questions or if
89-
> something isn't clear.
90-
91-
Answer any clarifying questions the candidate might have, but don't let them get too stuck on the
92-
details just yet. Opening up the tests often can be helpful to clarify the requirements.
93-
94-
> Lets take a look at the tests inside tests.py. There are 3 tests. Lets run them quickly. They
95-
> should all fail. If you can get all 3 tests passing, it is a good sign that you are on the right
96-
> track.
97-
>
98-
> Let’s just take a look at the first test. You can see that we are leasing the same image twice.
99-
> And we are asserting that we only need to download the image once, because it is being cached.
100-
>
101-
> Take a look at the other 2 tests to confirm your understanding of the problem.
102-
103-
Again, answer any clarifying questions the candidate may have before showing them files.py
104-
105-
> Take a look at files.py. This is only here for your convenience. There are a few helpful
106-
> utilities for working with files. You don't need to use them.
107-
108-
#### Levelling
109-
110-
- B1: there is no expectation for B1 to ask any clarifying question.
111-
- B2: disk space and some understanding of constraints on the excercise, how do we store the cache?
112-
- B3: question around understanding constraints, some knowledge about how a cache works
113-
114-
### Step 2: Implement lease
115-
116-
> Ok, lets get started on the implementation! Feel free to use Google and ask more clarifying
117-
> questions as you progress.
118-
119-
The candidate should now begin their implementation.
120-
121-
Some things to note:
122-
123-
- If the candidate asks about ImageClient (or image_client) explain that they don't need to
124-
implement any HTTP requests to fetch images. The ImageCache is provided for explanatory purposes
125-
and is mocked out in the tests. Hopefully, they are familiar with IoC or dependency injection.
126-
- Using a hard-coded location such as "/tmp" to store images is acceptable and makes it easy to
127-
debug issues. Instead of a hardcoded location, the base path for downloaded images can also be
128-
injected into the constructor.
129-
- The candidate should realise that they need a unique name for each file that is downloaded. They
130-
should hopefully also realise that the image’s URL is a good candidate for a unique name if it
131-
wasn’t for the special characters used. Ideally, a candidate would encode or hash the url so it is
132-
safe to use as a filename. Other solutions might include mapping URL to some unique id.
133-
- A good candidate should recognise that exceptions can occur in the implementation. If a candidate
134-
provides a solution without error handling, ask them where they think exceptions can happen and to
135-
add some error handling. ImageCacheException is provided Exception for the candidate to use.
136-
- If the candidate runs the tests after implementing lease, the first two tests should probably be
137-
passing.
138-
139-
Once lease is implemented and you are satisfied, move on to the implementation of release.
140-
141-
#### Levelling
142-
143-
- B1: no additional expectations apart from passing the interview.
144-
- B2: unique id generation and url/mapping, interface usage -> being familiar enough with dependency injection, error handling (not implementing exceptions in particular, but having a discussion around errors)
145-
- B3: edge cases, performance considerations
146-
- C1: B3 same expectations as B3
147-
- C2: B3 same expectations as B3
148-
149-
### Step 3: Implement release
150-
151-
> Now let's implement release!
152-
153-
Implementing release is normally straightforward after implementing lease. Even if the candidate
154-
didn't realise they need to keep a counter of some sort for the leases to know when to delete the
155-
images.
156-
157-
Some things to note:
158-
159-
- If the candidate recognises that it might be a poor implementation to delete files synchronously
160-
during the release (it might be better in reality to delete them asynchronously), agree, but explain
161-
that we want the naive implementation for the purposes of the interview.
162-
- The candidate should realise that well-behaved clients should not call release before lease and
163-
that it indicates an error condition. If not, raise this topic and ask them to add some error
164-
handling. A good candidate might also add a test for this.
165-
166-
#### Levelling
167-
168-
- B1: no additional expectations apart from passing the interview.
169-
- B2: after changing release they should change lease if necessary without additional prompting
170-
- B3: tests are running, notice that you need some sort of counter (multiple clients), add more tests
171-
- C1: B3 same expectations as B3
172-
- C2: B3 same expectations as B3
173-
174-
### Bonus discussion questions
175-
176-
These are some deeper discussion questions to get further signals:
20+
Kaleido/Canva is a very visual product, there are lots of images involved. Many of our backend services
21+
need to download these images and do something with them. For example, when downloading a design,
22+
one of our backend services will download the image and store it on the filesystem while
23+
processing the download. We’d like to avoid downloading the same image again and again, one
24+
approach to this is to cache them.
17725

178-
- The current implementation of the cache is a bit naive and some images are leased again shortly
179-
after they have been released. This causes extra downloads to occur. Is there a way we can keep an
180-
image in the cache, even if no one holds a lease? Note: Now is a good time to discuss [cache
181-
eviction strategies](https://en.wikipedia.org/wiki/Cache_replacement_policies).
182-
- If it didn't come up during the implementation, point out the image_client in the constructor of
183-
ImageCache and ask them if the candidate recognises the pattern. Dependency Injection. Discuss why
184-
it is useful.
185-
- What can we do to ensure that the cache is warm after a restart occurs? The simple solution is
186-
to traverse the directory where cached files are stored and loading the cache up with those files.
187-
How might one know which file maps to which url?
188-
- How might we design the cache if the file system has limited space. What implications does this
189-
have? This should raise questions like "What should we do when there is no more space left to
190-
download?". Throwing an exception is fine here.
191-
- How might we design the system if we want to optimise for filesystem space by de-duplicating
192-
images? Possibly content-address the cache by hashing the file contents. What are the pros vs cons?
26+
Through the course of the interview, we'd like you to write some Ruby code that caches images
27+
on the filesystem. The images are uniquely identified by urls.

0 commit comments

Comments
 (0)