Entry 4: Get the Data
In Entry 3 I defined my problem as: Holding all other factors constant, what mass is needed to retain an atmosphere on Mars? I need data to solve it.
The Problem
This sounds like a dataset I’m going to have to create myself.
Hands on Machine Learning with Scikit-Learn & TensorFlow recommends automating as much of the data acquisition process as possible. If I were going to do any automation of this dataset, it would revolve around scraping table data from an HTML page. However, this is a one-off dataset and the known parameters of planets don’t change very often, so I’m not going to worry about it in this entry.
If I were working on a project where I would connect to the data source over and over, like a Twitter natural language processing (NLP) project, I would most certainly want to automate pulling data. Sounds like a project for another entry on a different mini-project.
The Options
The type of entities from which to draw the necessary data is rather limited. There are the planets, moons, and dwarf planets of this solar system and possibly exoplanets of other systems.
The Proposed Solution
Fortunately, I didn’t have to comb through information on each and every planetary body individually. The planetary fact sheet has many of the features I need. This included 8 planets, 1 moon, and 1 dwarf planet. Starting with this as a base, I gathered 27 features on 11 planetary bodies.
I considered including more moons (Jupiter has 79, Saturn 82, Uranus 27, and Neptune 14), but couldn’t find sufficient information on the necessary features. Most importantly, there was a lack of information on atmospheric mass. The same was true for exoplanets. They’re just too far away to have good measurements.
The planetary bodies and features placed into a pandas DataFrame look like this:
name | type | mass_1024kg | diameter_km | density_kg_m3 | gravity_m_s2 | escape_vel_km_s | rotation_period_hr | day_len_hr | distance_from_sun_106_km | ... | mean_temp_c | surface_pressure_bars | nbr_moons | rings | magnetic_field | equatorial_radius_km | mean_radius_km | V(1,0) (mag) | geometric_albedo | atmospheric_mass_kg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Mercury | planet | 0.3300 | 4879.0 | 5427 | 3.7 | 4.3 | 1407.6 | 4222.6 | 57.9 | ... | 167 | 1.000000e-14 | 0 | No | Yes | 2440.5300 | 2439.4000 | -0.60 | 0.106 | 1.000000e+03 |
1 | Venus | planet | 4.8700 | 12104.0 | 5243 | 8.9 | 10.4 | -5832.5 | 2802.0 | 108.2 | ... | 464 | 9.200000e+01 | 0 | No | No | 6051.8000 | 6051.8000 | -4.47 | 0.650 | 4.800000e+20 |
2 | Earth | planet | 5.9700 | 12756.0 | 5514 | 9.8 | 11.2 | 23.9 | 24.0 | 149.6 | ... | 15 | 1.014000e+00 | 1 | No | Yes | 6378.1366 | 6371.0084 | -3.86 | 0.367 | 1.400000e+21 |
3 | Moon | moon | 0.0730 | 3475.0 | 3340 | 1.6 | 2.4 | 655.7 | 708.7 | 149.6 | ... | -20 | 3.000000e-15 | 0 | No | No | 1737.5000 | 1737.4000 | -0.08 | 0.120 | 1.000000e+05 |
4 | Mars | planet | 0.6420 | 6792.0 | 3933 | 3.7 | 5.0 | 24.6 | 24.7 | 227.9 | ... | -65 | 1.000000e-02 | 2 | No | No | 3396.1900 | 3389.5000 | -1.52 | 0.150 | 2.500000e+16 |
5 | Jupiter | planet | 1898.0000 | 142984.0 | 1326 | 23.1 | 59.5 | 9.9 | 9.9 | 778.6 | ... | -110 | 2.000000e+00 | 79 | Yes | Yes | 71492.0000 | 69911.0000 | -9.40 | 0.520 | 1.900000e+27 |
6 | Saturn | planet | 568.0000 | 120536.0 | 687 | 9.0 | 35.5 | 10.7 | 10.7 | 1433.5 | ... | -140 | 1.000000e+03 | 82 | Yes | Yes | 60268.0000 | 58232.0000 | -8.88 | 0.470 | 5.400000e+26 |
7 | Titan | moon | 0.1260 | 5149.4 | 1882 | 1.4 | 2.6 | 382.0 | 382.0 | 1433.5 | ... | -179 | 1.600000e+00 | 0 | No | No | 2574.7000 | 2574.7000 | -8.10 | 0.210 | 9.100000e+18 |
8 | Uranus | planet | 86.8000 | 51118.0 | 1271 | 8.7 | 21.3 | -17.2 | 17.2 | 2872.5 | ... | -195 | 1.000000e+03 | 27 | Yes | Yes | 25559.0000 | 25362.0000 | -7.19 | 0.510 | 8.600000e+25 |
9 | Neptune | planet | 102.0000 | 49528.0 | 1638 | 11.0 | 23.5 | 16.1 | 16.1 | 4495.1 | ... | -200 | 1.000000e+03 | 14 | Yes | Yes | 24764.0000 | 24622.0000 | -6.87 | 0.410 | 1.000000e+26 |
10 | Pluto | dwarf | 0.0146 | 2370.0 | 2095 | 0.7 | 1.3 | -153.3 | 153.3 | 5906.4 | ... | -225 | 1.000000e-05 | 5 | No | No | 1188.3000 | 1188.3000 | -1.00 | 0.300 | 1.300000e+14 |
11 rows × 27 columns
The Failure
This step was virtually error free. Other than spending hours looking at a bunch of different sources and trying to find more information on moons and exoplanets, it went pretty smoothly.
The limited number of examples (only 11) may be problematic in future steps. For example, measuring how well the model performs could prove extremely challenging.
Up Next
Resources:
The planetary body data was retrieved from a variety of sources. There were also a couple of one-off searches for some of the more obscure information, like the V(1,0)(mag) for Titan. These were the major contributors: