Solution Journal
1. Define the development environment
Category | Details |
---|---|
Programming Language | Python 3.12+ |
Python IDE | PyCharm Community Edition |
Code style | PEP-8, Google Doc Strings |
Linting | PyCharm built-in linter |
Testing | Pytest |
Version Control | Git |
Git Hosting | GitHub |
CI/CD | GitHub Actions |
Documentation | GitHub Pages, MkDocs |
2. Validate the data from the problem statement
Let's first check if the given numbers are correct. As QA engineer, the data given is never trusted ;)
- Is the length of random_nums and probabilities equal?
- Is the sum of probabilities equal to 1?
- Are all probabilities positive?
A manual check is enough for this task.
- Yes, the length of random_nums and probabilities is equal.
- Yes, the sum of probabilities is equal to 1.
- Yes, all probabilities are positive.
We can proceed with the implementation.
3. Questions, questions and more questions
We don't have any information on how the data is generated, for example, how many samples were taken to get the given distribution. Typically, this may lead to skewed results due to under-sampling.
The given distribution doesn't seem to be a binomial distribution...
... or a Poisson distribution.
Visually, our distribution looks like a skewed custom distribution.
We will assume that the given distribution is correct and proceed with the implementation. We will test our implementation with a large number of samples for fairness using the Chi-Squared test.
4. Proof of concept
import random
class RandomGen(object):
def __init__(self, random_nums, probabilities):
self._random_nums = random_nums
self._probabilities = probabilities
def next_num(self):
return random.choices(self._random_nums, self._probabilities)[0]
Questions:
- Do we have constraints regarding the compatibility with older versions of Python? -> use contaiers
- Are we allowed to use external libraries for statistical tests and visualization? -> yes, the Chi-Square CDF is very complex for this task
5. Brainstorm the system design
We will implement two classes, one using random.choices
and the other using
random.random
. We will provide an abstract class to be used as an
interface for both classes. It will also decouple implementation from client
code.
A sample of the API design is shown below:
# Create a random number generator
rg = (
RandomGenV1()
.set_random_nums([-1, 0, 1, 2, 3])
.set_probabilities([0.01, 0.3, 0.58, 0.1, 0.01])
.validate_input()
)
# Get a random number
num = rg.next_num()
print("Random number is: ", num)
# Create a hypothesis test
random_numbers = random.
hypothesis = (
ChiSquareTest()
.set_random_numbers()
.set_probabilities([0.01, 0.3, 0.58, 0.1, 0.01])
.calc_chi()
.calc_p()
.test()
)
# Hypothesis is True if the distribution is fair
print("Hypothesis is: ", hypothesis)
# Create a histogram object
histogram = (
Histogram()
.set_random_numbers(random_numbers)
.build()
.plot()
)
# Print the histogram
print(hist)
Typically, at this stage the client shall approve the design, and we can proceed with the prototype implementation.
6. Implement the core prototype
We will implement the following classes:
RandomGenAbc
: An abstract class to be used as an interface.RandomGenV1
: A class usingrandom.choices
.RandomGenV2
: A class usingrandom.random
.Histogram
: A helper class that creates a simple histogram object.ChiSquaredTest
: A helper class to perform the Chi-Squared test.
7. Implement the REST API with Flask
We will implement a simple REST API using Flask to access the solution. The API will have the following endpoints:
/api/v1/randomgen?number
: Returns a number of random numbers based on the random.random method./api/v2/randomgen?number
: Returns a number of random numbers based on the random.choice method./api/v1/config
: An optional endpoint to configure the random numbers and probabilities.
8. Manual Integration tests
The following problems arised during the manual's integration tests:
-
Rounding errors: Using older versions of Python yield rounding errors. Round the probabilities to 3 decimal places.
-
Fairness: The Chi-Squared test is not fair in case the number of samples is low. The distribution is fair when the number of random values is above 50.
-
Interface We need to simplify the ChiSquaredTest class to make it more user-friendly. Define method calculate that will the chi-squared, p-value and the degrees of freedom.
-
Configuration: We will allow the user to configure its own distribution using the parameters
numbers
andprobabilities
as defined in the problem statement.
9. Refactor the backend after the manual tests
Randomgen classes API:
from randomgen.core import RandomGenV1
# Create a random number generator
rg = (
RandomGenV1()
.set_numbers([-1, 0, 1, 2, 3])
.set_expected_probabilities([0.01, 0.3, 0.58, 0.1, 0.01])
.validate()
)
# Get a random number
num = rg.next_num()
print("Random number is: ", num)
ChiSquaredTest API:
from randomgen.hypothesis import ChiSquareTest
# Create a hypothesis test
random_numbers = random.
hypothesis = (
ChiSquareTest()
.set_numbers()
.set_probabilities([0.01, 0.3, 0.58, 0.1, 0.01])
.calculate()
)
# Hypothesis is True if it is accepted
print("Hypothesis is: ", hypothesis.test())
Histogram API:
from randomgen.helpers import Histogram
# Create a histogram object
histogram = (
Histogram()
.set_numbers(random_numbers)
.build()
.plot()
)
# Print the histogram
print(hist)
10. Implement unit tests for the backend
Functional tests
We will implement unit tests for the Backend using Pytest. We will cover each class and method with unit tests to guarantee that the solution is working as expected.
The testing discovered some bugs in the implementation that are related to the validation of the input parameters.
Performance tests
We will implement performance tests to check the performance of the solution. We will use the maximum number of random numbers to check the performance of the solution. Manual testing showed that RandomGenV2 is around 3 times slower than RandomGenV1.
The task definition doesn't mention any performance requirements. We will assume that the solution should be fast enough to generate random numbers in a reasonable time. A reasonable time for the backend is around 50 msec, bearing in mind that the REST API is going to stack on it and add some latency.
11. Implement automated integration tests
The integration tests showed that we need to wrap the exceptions by the backend and return a well-defined HTTP status code.
12. Refactor the webserver after the integration tests
First, we will refactor the webserver to return a well-defined HTTP status code. We will also add a configuration endpoint to allow the user to configure the bins and probabilities of a custom distribution histogram.
The response format will be also improved for better readability.
{
"numbers": [1, 1, 1, 1, 0, 1, 1, 1, 1, 0],
"quality": {
"chi_square_test": {
"chi_square": 44.4333333333333,
"df": 1,
"is_fair": 0,
"p_value": 2.63168375980172e-11
},
"expected_histogram": {
"-1": 0.01,
"0": 0.3,
"1": 0.58,
"2": 0.1,
"3": 0.01
},
"observed_histogram": {
"0": 0.2,
"1": 0.8
}
},
"version": 1
}
13. Feedback from a beta tester
The beta tester found the solution to be working as expected. The tester was happy we provided an interface to configure the app using the REST API.
Some issues were found regarding the interface of the ChiSquaredTest class. The
tester recommended simplifying the interface to make it more user-friendly.
The naming of the methods must also be improved. The tester recommended
renaming the methods (e.g. calc()
instead of calculate()
or test()
).
The tester had some difficulties understanding the method chaining This opens a new conceptional question about the design.
As a thum of rule, we will use the method chaining for methods that initialize the internal state and then just access the internal state using properties. Producing or consuming methods shall not be chained.
14. Refactor the backend after the feedback
- We will refactor the ChiSquaredTest class to make it more user-friendly.
- We will rename the test() method to calc() and remove some methods
- We need to replace
is fair
withis null
- Encapsulate the internal state of the class as per requirements
After the changes, the unittests showed that the solution is working as expected. The server was also tested manually and the results were as expected.
15. Docstrings
Now it is time to add docstrings to the classes and methods. We will use the Google Docstring format. The docstrings will be used to generate the API manual.
16. Containerize the solution
We will distribute the solution as a container. The application will consist of a flask server that will implement a simple API to access the random generator. The container will guarantee that the solution will run on any machine that has Docker installed.
Considerations:
- No dockerfile optimizations (e.g. multi-stage builds)
- No memory or cpu limits (with docker compose)
- No persistent storage required
- Choose a base image that is small and secure (e.g. alpine)
- Use a
.dockerignore
file to exclude unnecessary files - Use digests to guarantee the integrity of the image
- The application will run on port 8080
- The application is not exposed to the internet (no need for HTTPS)
17. Final code review
The final code review showed that the solution is working as expected. The solution is well-documented, and the code is clean and readable. The solution is ready for deployment.
18. Documentation
We will use MkDocs to build the documentation. The documentation will be deployed to GitHub Pages. The documentation will contain at least the following sections:
- Problem
- Solution
- Installation
- Rest API
- CONTRIBUTING.md
- README.md
19. Create the CI/CD pipeline
We will create a GitHub Actions workflow to run the tests on every push to the main branch. We will also create a GitHub Actions workflow to build and push the Docker image to Docker Hub on every release.
What we want:
- Run the tests on every push to the main branch.
- Build and push the Docker image to Docker Hub on each push.
- Build the documentation and deploy it to GitHub Pages on every release.
20. Tag the first increment
Till now, we were in the pre-development phase. After the tag, changes will be tracked using concrete issues in the commit messages.